One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of
one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1:
is even. … Case 2:
is odd. …”), then for any given
, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that
is equal to the ratio
of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.
Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:
Lecturer: “Theorem. Let
be a prime number. Then…”
Student: “But how do you know that
is a prime number? Couldn’t it be composite?”
or
Lecturer: “Now we see what the function
does when we give it the input of
instead. …”
Student: “But didn’t you just say that the input was equal to
just a moment ago?”
This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.
Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.
Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number
is equal to 1, and the solution to the blue-eyed islanders puzzle.)
Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.
I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.
It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.
For instance, consider the very first no-self-defeating result presented in the previous post:
Proposition 1 (No largest natural number). There does not exist a natural number
that is larger than all the other natural numbers.
This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:
Proposition 1′. Given any natural number
, one can find another natural number
which is larger than
.
Proof. Take .
While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives
), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or “constructive” than the “counterfactual” Proposition 1.
In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),
Proposition 3. There are infinitely many primes.
can be recast in “all objects can be defeated” form as
Proposition 3′. Let
be a collection of primes. Then there exists a prime
which is distinct from any of the primes
.
Proof. Take to be any prime factor of
(for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since
is not divisible by any of the primes
,
must be distinct from all of these primes.
One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the
), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say
, and it will actually exhibit a prime not in that list (in this case,
). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.
[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”. For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]
— 1. Set theory —
Now we revisit examples of the no-self-defeating object in set theory. Take, for instance, Cantor’s theorem:
Proposition 6. The reals are uncountable.
One can easily recast this in a “non-counterfactual” or “all objects can be defeated” form:
Proposition 6′. Let
be a countable sequence of real numbers (possibly with repetition). Then there exists a real number
that is not equal to any of the
.
Proof. Set equal to
, where
is the smallest digit in
that is not equal to the first digit past the decimal point of any decimal representation of
;
is the smallest digit in
that is not equal to the second digit past the decimal point of any decimal representation of
;
- etc.
(Here we write “any” decimal representation rather than “the” decimal representation to deal with the annoying issue mentioned earlier. As with the proof of Euclid’s theorem, there is nothing special about taking the smallest digit here; this is just for sake of concreteness.) Note that any real number has at most two decimal representations, and there are ten digits available, so one can always find
with the desired properties. Then, by construction, the real number
cannot equal
(because it differs in the first digit from any of the decimal representations of
), it cannot equal
(because it differs in the second digit), and so forth.
Again, Proposition 6′ is trivially equivalent to Proposition 6, and still briefly uses contradiction in its proof, but in this non-counterfactual form one can actually illustrate it with examples. For instance, one could start with a list of terminating decimals, starting with the single digit terminating decimals, the two-digit terminating decimals, and so forth:
and one then sees that the construction will, in this case, give the number , which indeed does not occur on the above list.
It is instructive to try to “outrun” Proposition 6′ by modifying the list to accommodate to the list. One cannot simply tack on this number “at the end” of this list, as the list is infinite and does not actually have an end. One can insert it at, say, the beginning of the list, and then move all the other numbers down one, but then Proposition 6′ gives a new number not on the list (in this case,
). One can add that number to the list also, bumping everyone else down one, but then Proposition 6′ gives yet another number not on the list (in this case,
). After doing this a few times, one can begin to appreciate how Proposition 6′ always defeats any attempt to outrun it, much as one cannot obtain a largest natural number by continually adding +1 to one’s previous proposed candidate.
It is also remarkable how inoffensive Proposition 6′ and its proof is, when compared against the reaction one sometimes encounters to Proposition 6, which is logically equivalent. A single contraposition can dramatically change one’s impression of a result.
In a similar spirit, the result
Proposition 4 (No universal set) There does not exist a set that contains all sets (including itself).
(which, of course, assumes one is working in something like the Zermelo-Frankel axioms of set theory) becomes
Proposition 4′. Given any set A, there exists a set B which is not an element of A.
Proof. Consider the set
;
the existence of this set is guaranteed by the axiom schema of specification. If was an element of itself, then by construction we would have
, a contradiction. Thus we must have
. From construction, this forces
.
In the usual axiomatic formulation of set theory, the axiom of foundation implies, among other things, that no set is an element of itself. With that axiom, the set B given by Proposition 4′ is nothing other than A itself, which by the axiom of foundation is not an element of A. But since the axiom of foundation was not used in the proof of Proposition 4′, one can also explore (counterfactually!) what happens in set theories in which one does not assume the axiom of foundation. Suppose, for instance, that one managed somehow to produce a set A that contained itself as its only element: . (Informally, one could think of
as an infinite nested chain,
.) Then the only element that
has, namely
, is an element of itself, so the set
produced by Proposition 4′ is the empty set
, which is indeed not in
.
One can try outrunning Proposition 4′ again to see what happens. For instance, let’s add the empty set to the set produced earlier, to give the new set
. The Proposition 4′ construction then gives the set
, which is indeed not in
. If we then try to add that set in to get a new set
, then one gets the set
, which is again not in
. Iterating this, one in fact begins constructing the von Neumann ordinals! (Actually, the original set
plays essentially no role in this construction; one could have started with the empty set and it would have generated the same sequence of ordinals.)
— 2. Logic —
One can also convert the no-self-defeating arguments given in the logic section of the previous post into “every object can be defeated” forms, but it took me a while to find them. (One may wish to pause to try this before moving on.)
We first turn to the result (essentially coming from the liar paradox) that the notion of truth cannot be captured by a predicate. This was discussed in the previous blog post, but not formalised as a theorem. I will do so now, starting with the easier “self-referential case”:
Theorem 8.5 (Impredicativity of truth, self-referential case) (Informal statement) Let
be a formal language that contains the concepts of predicates and allows self-reference, and let
be an interpretation of that language (i.e. a way of consistently assigning values to every constant, ranges to every variable, and truth values to every sentence in that language, obeying all the axioms of that language). Then there does not exist a “truth predicate”
in
that takes a sentence
as input, with the property that for every sentence
in
, that
is true (in
) if and only if
is true (in
).
Here is the non-counterfactual version:
Theorem 8.5′ (Informal statement) Let
be a formal language that contains the concepts of predicates and strings and allows self-reference, and let
be an interpretation of that language. Let
be a predicate in
that takes sentences as input. Then there exists a sentence
such that the truth value of
(in
) is different from the truth value of
(in
).
Proof. We define be the self-referential “liar sentence”
G := “T(G) is false”.
Then, clearly, is true if and only if
is false, and the claim follows.
Using the Quining trick to achieve indirect self-reference, one can remove the need for direct self-reference in the above argument:
Theorem 8.6 (Impredicativity of truth) (Informal statement) Let
be a formal language that contains the concepts of predicates and strings, and let
be an interpretation of that language (i.e. a way of consistently assigning values to every constant, ranges to every variable, and truth values to every sentence in that language) that interprets strings in the standard manner (so in particular, every sentence or predicate in
can also be viewed as a string constant in
). Then there does not exist a “truth predicate”
in
that takes a string x as input, with the property that for every sentence
in
, that
is true (in
) if and only if
is true (in
).
A more formal version of the above theorem is Tarski’s undefinability theorem.
Here is the non-counterfactual version:
Theorem 8.6′. (Informal statement) Let
be a formal language containing the concepts of predicates and strings, and let
be a predicate on strings. Then there exists a sentence
in
with the property that, for any interpretation
of
that interprets strings in the standard manner,that the truth value of
in
is different from the truth value of
in
.
Proof. (Sketch) We use the “quining” trick. Let be the predicate on strings defined by
Q(x) = “x is a predicate on strings, and T(x(x)) is false”
and let be the “Gödel sentence”
. Then, by construction,
is true in
if and only if
is false in
, and the claim follows.
Actually Theorem 8.6′ is marginally stronger than Theorem 8.6 because it makes the sentence independent of the interpretation
, whereas Theorem 8.6 (when viewed in the contrapositive) allows
to depend on
. This slight strengthening will be useful shortly.
An important special case of Theorem 8.6′ is the first incompleteness theorem:
Corollary 8.7 (Gödel’s first incompleteness theorem) (Informal statement) Let
be a formal language containing the concepts of predicates and strings that has at least one interpretation
that gives the standard interpretation of strings (in particular,
must be consistent). Then there exists a sentence
in
that is undecidable in
(or more precisely, in a formal recursive proof system for
).
Proof. (Sketch) A language that is powerful enough to contain predicates and strings will also be able to contain a provability predicate
, so that a sentence
in
is provable in
‘s proof system if and only if
is true in the standard interpretation
. Applying Theorem 8.6′ to this predicate, we obtain a Gödel sentence
such that the truth value of
in
differs from the truth value of
in
. If
is true in
, then
must be true in
also since
is consistent, so the only remaining option is that
is false in
and
is true in
. Thus neither
nor its negation can be provable, and hence
is undecidable.
Now we turn to the second incompleteness theorem:
Theorem 9 (Second Gödel incompleteness theorem). (Informal statement) No consistent logical system which has the notion of a predicate and a string, can provide a proof of its own logical consistency.
Here is the non-counterfactual version:
Theorem 9′. (Informal statement) Let
be consistent logical systems that have the notion of a predicate and a string, such that every sentence in
is also a sentence in
, and such that the consistency of
can be proven in
. Then there exists a sentence
that lies in both
and
that is provable in
but is not provable in
.
Proof. (Sketch) In the common language of and
, let
be the predicate
T(x) := ” x is provable in L’ “.
Applying Theorem 8.6′, we can find a sentence (common to both
and
) with the property that in any interpretation M of either L or L’, the truth value of
and the truth value of
differ.
By Corollary 8.7′ (or more precisely, the proof of that corollary), is not provable in
. Now we show that
is provable in
. Because L can prove the consistency of L’, one can embed the proof Corollary 8.7′ inside the language L, and deduce that the sentence “
is not provable in
” is also provable in
. In other words,
can prove that
is false. On the other hand, embedding the proof of Theorem 8.6′ inside L,
can also prove that the truth value of
and
differ. Thus
can prove that
is true.
The advantage of this formulation of the second incompleteness theorem, as opposed to the usual counterfactual one, is that one can actually trace through the argument with a concrete example. For instance, Zermelo-Frankel-Choice (ZFC) set theory can prove the consistency of Peano arithmetic (a result of Gentzen), and so one can follow the above argument to show that the Gödel sentence of Peano arithmetic is provably true in ZFC, but not provable in Peano arithmetic.
— 3. Computability —
By now, it should not be surprising that the no-self-defeating arguments in computability also have a non-counterfactual form, given how close they are to the analogous arguments in set theory and logic. For sake of completeness, we record this for Turing’s theorem:
Theorem 10 (Turing halting theorem) (Informal statement) There does not exist a program
which takes a string
as input, and determines in finite time whether
is a program (with no input) that halts in finite time.
Here is the non-counterfactual version:
Theorem 10′. (Informal statement) Let
be a program that takes a string
as input, returns a yes-no answer
as output, and which always halts in finite time. Then there exists a string
that is a program with no input, such that if
is given
as input, then
does not determine correctly whether
halts in finite time.
Proof. Define to be the program taking a string
as input which does the following:
- If
is not a program that takes a string as input, it halts.
- Otherwise, it runs
with input
(which is a program with no input).
- If
returns “no”, it halts, while if
returns “yes”, it runs forever.
Now, let be the program
. By construction,
halts if and only if
returns “no”, and the claim follows.
One can apply Theorem 10′ to various naive halting algorithms. For instance, let be the program that simulates
for (say) 1000 CPU cycles, and then returns “yes” if
halted by that time, or “no” otherwise. Then the program
generated by the above proof will take more than 1000 CPU cycles to execute, and so
will determine incorrectly whether
halted or not. (Notice the similarity here with Proposition 1′.)
The same argument also gives a non-counterfactual version of the non-computability of the Busy Beaver function:
Proposition 11′. Let
be a computable function. Then there exists a natural number
and a program
of length
(and taking no input) that halts in finite time, but requires more than
CPU cycles before it halts.
Proof. Let be the program that simulates
for
CPU cycles, where
is the length of
, and returns “yes” if
halted by that time, or “no” otherwise. Then the program
generated by Theorem 10′ is such that
does not correctly determine if
halts. Since
is always correct when it returns “yes”, this means that
does halt, but that
returned “no”, which implies that
takes more than
cycles to execute.
Of course, once one has a program of length that runs for more than
CPU cycles, it is not hard to make a program of length a little bit larger than
that outputs a number greater than
, so that one can conclude as a corollary that the Busy Beaver function outgrows any computable function.
— 4. Miscellaneous —
The strategy stealing argument in game theory is already more or less set up in non-counterfactual form: in any game that admits “harmless moves” (such as noughts and crosses), any strategy of the second player can be stolen to be defeated (or at least held to a draw) by the first player. Similarly for arbitrage strategies in finance (unless there are loopholes due to imperfect information or friction costs).
It is a bit more difficult to recast the no-self-defeating objects in physics in a non-counterfactual form, due to the large number of implicit physical assumptions in these arguments. I will present just one simple example of this, which is the grandfather paradox that asserts that controlled time travel is impossible because you could use such travel to go back in time to kill your grandfather before you were born. One can convert this to a slightly less counterfactual format:
“Theorem”. (Very imprecisely stated!) Suppose that one has a mechanism in universe U to travel back in time and arrive at universe U’. Then there can exist events in U that occurred differently in universe U’.
The “proof” is, of course, the same: starting from U, go back in time and kill your grandfather in universe U’. This version of the “theorem” (though not the precise “proof” given here) is of course invoked often in many science fiction stories involving time travel.
It seems possible to also cast the no-immovable-objects and no-controlled-and-detectable-tachyon-particles arguments from the previous blog post in this form, but one would have to consider multiple universes to do this properly, and I will not attempt to do so here, as it appears to be rather complicated.
The “omnipotence paradox” in philosophy (can an omnipotent being create a stone so heavy that He cannot lift it?) can also be rephrased in a non-counterfactual form that does not require consideration of any omnipotent beings:
“Theorem”: If G is a being, then G will be unable to do at least one of the following two tasks:
- Create a stone so heavy that G cannot lift it.
- Be able to lift any possible stone.
(Of course, most beings will fail at both Task 1 and Task 2.)
67 comments
Comments feed for this article
18 October, 2010 at 11:00 am
pavel zorin
Dear Mr. Tao,
In the proof of Proposition 4′ the word “contradiction” is superfluous since, starting with
, one shows
anyway. Unfortunately I do not remember whether the former formula is intuitionistically valid.
kind regards,
pavel
18 October, 2010 at 12:05 pm
Terence Tao
Fair enough; I guess this just shows how ingrained a habit it is for mathematicians like myself to reach for the convenient tool of proof by contradiction, even when it is not strictly necessary. (But in intuitionistic logic, the law of the excluded middle is not available, and so to deduce
from
still requires a proof by contradiction (or more precisely, a proof by negation).)
18 October, 2010 at 11:35 am
DavidTweed
Perhaps biggest area of counterfactual analysis outside mathematics/philosophy (and maybe still bigger) is in the law, where almost any case involving damages implicitly depends upon constructing what would have happened absent the criminal act in order to figure out how much money has been “lost”, eg, if X libels Y such that Y maintains he can’t get a job, the damages will be based upon how much it is thought he could have been earning “in a world where he wasn’t slandered”.
18 October, 2010 at 12:04 pm
Terence Tao
Ah, law is a great example (and law does share many features in common with mathematics, such as an insistence on reading definitions literally, and formulating them precisely). One could argue that the fictionalised cross-examinations that one often sees in Hollywood-style legal dramas are a rough analogue of proofs by contradiction (or at least attempted proofs by contradiction). But I would imagine the real-world situation is much more mundane, though I don’t have much experience with actual courtrooms.
18 October, 2010 at 1:48 pm
Andrej Bauer
I wish more mathematicians strived to state their results in a non-counterfactual manner. Statements which are phrased positively are usually much easier to understand, for students as well as for experts. I would just like to make a small remark: irrationality of the square root of 2, as well as several other examples you give, are not proofs by contradiction. They are simply proofs of negation, see Proof of negation and proof by contradiction for an explanation. I think it is worthwhile keeping the distinction, especially in teaching.
18 October, 2010 at 2:24 pm
Terence Tao
Yes, I know of this distinction, but it is a bit orthogonal to the one discussed here (being more an issue of intuitionism/constructivism vs. classical logic, as opposed to counterfactual vs. non-counterfactual arguments) so I decided not to emphasise it. Many arguments that are genuinely of proof by contradiction type tend, in fact, to already be non-counterfactual in nature (because they are asserting a positive statement rather than a negative one), but have other undesirable properties (such as ineffectivity) that are not related to the counterfactual/not-counterfactual distinction.
18 October, 2010 at 1:53 pm
Robert Scarth
I haven’t read the whole article yet, and I haven’t even looked at the previous one. However the general idea that people find the “any object can be defeated” form easier to digest than the “no self defeating object” form is quite similar to the idea of loss aversion in behavioural economics which is about how people dislike losses more than they like gains. Or more precisely given two identical offers, one framed as a gain, and the other framed as a loss, almost everyone will prefer the offer framed as a gain.
18 October, 2010 at 1:54 pm
mk
Prof. Tao,
In trying to count the reals, I still have trouble understanding Cantor’s argument. If the method you were using to construct the list were eventually comprehensive, wouldn’t you be able to say that given any real number, the list would eventually contain a number close to it up to an arbitrary precision? So, Cantor’s argument would both fail, and be true, depending on whether you carry out Cantor’s argument further, or generate the countable list further? It always has seemed to me like Cantor’s proof relies on just always being able to jump ahead to a number not yet contained in the list you’re building.
e.g. in the example above, any number such as .2111111… generated by Cantor’s argument may not be included in the list, but would eventually be, if you carried out construction of the list far enough to infinite.
18 October, 2010 at 2:32 pm
Terence Tao
Dear mk,
You can’t use a partially completed “work in progress” list of real numbers, that you keep updating as needed, to enumerate the real line – you have to supply a finished product. And once you do so, Cantor’s argument then gives you a number that your list didn’t supply. Sure, you could then add that to your list somehow, but it’s too late to do that. Actually, the situation is already well described by the “no largest number” argument, which is perhaps the very simplest example of arguments of this type. Consider the following dialogue:
A: “I’ve found the largest natural number! It’s 100!”
B: “Um, isn’t 101 an even larger number?”
A: “Did I say 100? Sorry, I meant 1,000.”
B: “But 1,001 is larger than 1,000.”
A: “Did I say 1,000? Sorry, I meant 10,000”.
B: “…”
I think you will agree that A’s claim that he or she has found the largest natural number is thoroughly unconvincing, even though B can never force A to concede completely. The situation is much the same as someone who claims to have a comprehensive list of real numbers that will give every real number you list “if you let it run long enough”, and who decides to extend the list a little bit more every time someone points out that the list is incomplete:
A: “I’ve found a complete and countable list of all the real numbers! Look, here it is!”
B: (looks) “Um, 0.2111…. is not on the list.”
A: “Oops, I didn’t run the list for long enough.” (scribble, scribble.) “OK, now this is the complete list.”
B: (looks) “But 0.0111… is not on the list either.”
A: “Oops, I needed to run the list a little bit more.” (scribble, scribble.) “Now it’s OK.”
B: “…”
Again, I hope you agree that A does not have a convincing case that he or she possesses a complete list of reals. One should tell A to let the list run for as long as is needed, and then supply the final, finished list, surrendering the right to alter it any further; then Proposition 6′ will supply a number that is not on that list. The fact that the list can then be modified afterwards to accommodate that number does not change the fact that A was unable to supply a complete list in the first place.
To put it another way: any “work in progress” list, that has not yet been fully completed, can still potentially contain any real number one pleases; but in mathematics we do not work with such “potential” lists, as they tend to be slippery “moving targets” that lack the precision and definiteness needed to do maths, but instead only with complete lists that are not subject to further modification. It is those lists that Cantor’s theorem demonstrates to be unable to cover all the real line.
(If you do suppose, as a counterfactual hypothesis, that a countable list exists that does contain all the real numbers, then one is in the bizarre counterfactual and inconsistent universe of ex falso quodlibet, in which statements are both true and false simultaneously – so that a real number can both be on a list and not on it at the same time. The reason for all this bizarreness is, of course, that countable lists that exhaust the reals do not actually exist, as Proposition 6′ shows.)
p.s. One also needs to distinguish between a real number being approximated by a number on a list, and actually being on the list itself. For instance, in the sample list I gave in the blog, the terminating decimals 0.2, 0.21, 0.211, 0.2111, etc. all lie in the list. However, their limiting value, the non-terminating decimal 0.21111…., does not lie on the list, as the list consisted only of terminating decimals. It is the limit of some of the decimals on that list, but being a limit point of a set is a different concept from being in the set itself; there is no requirement that the list has to be necessarily closed with respect to limits.
p.p.s. It is also important to keep in mind that while a countable sequence or list
is infinite in length, each member
of that sequence only occurs at a finite position into that sequence. For instance, such a sequence can have a 37th member
that is 37 places down the list, or a 967th member
that is 967 places down the list, but it does not have members that are “infinitely far down the list”. (One can find members that are arbitrarily far down the list, but this is a different concept.) In particular, there is no final element at the “end’ of the list, and there is a limit to what one can do by running the list “for long enough”.
18 October, 2010 at 6:21 pm
Joshua Zelinsky
Regarding the classic idea of just tacking on the new real numbers produced by Cantor’s diagnolization argument to your list there’s actually some non-trivial math that comes from iterating this. If you iterate this process you get countably many more real numbers added to the list. You can then start again, and keep iterating. And so on. This process works up to any countable ordinal although you will need to reorder so you still have a list. At each stage, you still have only a countable number of reals. But this process will break down if we go out to the first uncountable ordinal and this is connected to the fact that the set of all countable ordinals is not countable.
19 October, 2010 at 12:12 am
mk
thank you prof. tao for your response.
I think my unclear understanding is in the concept of a finished product that is infinite, and differentiating a finished product that is countably infinite from one that is uncountable. When I only consider the method or rule used for generating a listing of a set, if the method carried out to infinite lengths seems to me comprehensive in reaching every element of the set in question, then the method you had above for attempting to generate a list of the reals looks to me to be as acceptable for enumerating the reals as a method for enumerating say the natural numbers by counting by 1 would be. It doesn’t seem like either method would leave an element out, and both methods involve counting. But if I consider the finished product of a list to be a general sequence of n elements, then I’m led to the conclusion that the listing for the reals wouldn’t work a la prop 6′. But why doesn’t this same sort of argument apply to other sets that are countably infinite?
i.e. for any listing of natural numbers, produce a natural number not on the list by adding all the elements of the list together. The way the addition jumps ahead of the listed natural numbers is what I feel the Cantor argument is doing by continuously forming a number not part of the enumerated list but which is eventually reached by the generating method of the list.
Also, I can accept that at the limit to infinite, the cantor argument produces a real number not on any list, so that’s why I was thinking of taking the limit of the generation of the list to infinite, and counting the numbers approximated as belonging to the set of reals. But I now see your point that limits can’t be considered part of the list necessarily.
Could you provide just some further clarification to help me understand the concept of a finished complete product when it is an infinite list?
19 October, 2010 at 9:56 am
Terence Tao
You have hit upon one of the key differences between the natural numbers and the real numbers that is actually at the heart of Cantor’s theorem; the reals have many convergent series and sequences (due to the various completeness and perfectness properties they enjoy), whereas the natural numbers do not. Thus, for instance, you can often sum an infinite series of reals (e.g. 1 + 1/2 + 1/4 + … ) to get another real number (in this case, 2), but when you sum an infinite series of natural numbers (e.g. 1+1+1+…) you almost never converge to another natural number. The fact that reals capture many of their limits, while natural numbers do not, already goes a long way to explaining why one cannot enumerate the former by the latter.
In particular, the argument “for any (countable) listing of natural numbers, produce a natural number not on the list by adding all the elements of the list together” does not work, because the sum of infinitely many natural numbers does not converge to another natural number. In contrast, given an infinite decimal
, the infinite series
does converge to a real number. For instance,
converges to 1/3.
On the other hand, that argument you gave is a perfectly valid proof that the set
of natural numbers is not finite (and this is discussed as Proposition 2 of the previous post on this topic), because the sum of finitely many natural numbers is still a natural number.
As for how to understand the concept of an infinite list (or infinite sequence) of numbers, I think the mathematical notation
may be helpful here (As opposed to the more informal notation
, in which the ellipsis
may lead to confusion if one is not yet comfortable with the mathematical concept of infinity). Thus, an infinite sequence of real numbers, is, formally, a recipe (or more precisely, a function) that assigns to each natural number
, a real number
that represents the n^th element in the sequence. Thus, such a sequence will contain a 37th element
, a 967th element
, and so forth, but there are no elements that are “infinitely far” along the sequence, because there are no infinite natural numbers, only finite natural numbers (even though the set
of natural numbers is infinite – this is a key distinction!).
(In general, I recommend trying to express your thoughts for now in formal mathematical notation whenever possible, rather than in words. At this stage of one’s education, this type of mental discipline is very useful for clearing away misconceptions and fuzzy thinking. Once this is done, though, words are better; I talk about this in another page on this blog.)
25 October, 2010 at 3:52 pm
Koray
Could you outline how the 2nd conversation between A & B would proceed if A were to claim that the complete and countable set of reals was that of “computable” reals?
In my view B would have no rebuttal as any counter example that he/she can name is a computable real.
In fact the proof of 6′ states that one is dealing with a countable set of reals whose digits can be computed to arbitrary length, and follows by “constructing” a new real (essentially another computable real).
How would we prove that the set R minus countable reals is also not countable?
25 October, 2010 at 5:10 pm
Terence Tao
Ah, this is a good question, which highlights the connection between Cantor’s theorem and Turing’s theorem. The conversation would go in one of two ways, depending on whether A’s algorithm for generating the list was itself computable. If it was, it would go like this:
Or, if A uses an uncomputable method to enumerate the reals (such as one that assumes that the halting problem is solvable), the conversation goes instead as something like this:
In short, what is going on here is that the computable reals are indeed a countable set, but any countable enumeration of this set must itself be uncomputable, and can thus be used to generate uncomputable reals (via diagonalisation) also. (See also Skolem’s paradox for a closely related phenomenon.)
In general, diagonalisation arguments work at just about any level of computational resources; if one limits the resources available to build (say) individual real numbers, then the enumeration of the set of all such real numbers can become possible, but will require a level of resources above the cutoff that you have imposed. It’s the same with many of the other results discussed here; for instance, if one limits the axioms of mathematics one is allowed to use, e.g. to Peano arithmetic, then proving consistency of the system becomes possible, but requires axioms that are more powerful than the ones you have limited yourself to. Or if one limits the maximum size N of the natural numbers that one is allowed to consider, then the set {0,…,N} of all natural numbers becomes finite, but its cardinality requires a number N+1 that exceeds the limits that you had set. If one limits one’s ordinals to be countable, then the set of all ordinals becomes a set – but one which is uncountable. The first uninteresting natural number (which, incidentally, could be used in an attempt to outrun Proposition 1) is interesting, but at a level beyond the limits one had previously set for what constitutes “interesting” (see also the Berry paradox). And so forth.
27 October, 2010 at 11:42 am
Koray
Firstly, thanks for the detailed response.
The “complete” in “a countable, complete” list I first understood as to mean one that doesn’t exclude any numbers. However, your first conversation, which doesn’t assume halting is solvable, makes it clear that the list is meant to be recursively enumerable.
In fact at this moment I think I may start treating Cantor’s proof as a proof that the set of computable reals is not recursively enumerable. (i.e. there’s no computable function whose image is the set of computable reals since there’s another computable function whose image includes one more.)
But, I feel his proof was supposed to be about all reals and whether they were countable; not r.e. or anything else.
Your second discussion carries on with the r.e. requirement, and so assumes a solution to the halting problem, admits that computable reals were accounted for, but uncomputable reals were left out.
However, even uncomputable reals like Chaitin’s Omega belong to a countable set (of programs, along with computable reals). I’m afraid I have not yet seen why the reals are uncountable.
I am well beyond my depth. Thanks for your time.
27 October, 2010 at 2:20 pm
Terence Tao
You can adapt Cantor’s argument to many settings to give a variety of useful conclusions. In its original formulation, it shows that the set of _all_ reals cannot be countably enumerated by _any_ method. However, the argument can be localised to the computable reals (as you requested in your previous comment), in which case the argument shows that the set of all _computable_ reals cannot be countably enumerated by any _computable_ method. As such, the argument actually provides a proof of the halting theorem (though this is pretty close to the standard proof).
It can also be used to demonstrate a number of other facts too. For instance, when you say that “even uncomputable reals like Chaitin’s Omega belong to a countable set”, you are implicitly using some language L that describes all the programs that you are willing to use to define such numbers as Omega (e.g. one could take a “second order Turing language” that contains the usual language of Turing machines, as well as the ability to solve the halting problem for such machines). If one then enumerates all the real numbers that this language L, then Cantor’s argument provides a new real number x that is not describable in that language L, but is instead describable in a higher order language L’ that incorporates not only L, but also the truth predicate for L (in order to determine which programs in L actually produce valid real numbers). This is a strictly larger language than L itself because of the impredicativity of truth (Theorem 8.6).
For instance, numbers such as Chaitin’s Omega are definable using a second order Turing language, but one can diagonalise a list of such numbers and create a real number that requires a third order Turing language to define.
28 October, 2010 at 1:28 pm
Koray
I believe from your last reply I’ve isolated our differences. The version of the original proof “localized” to computable reals is actually the only version I’d accept while you consider it valid for _all_ reals. As an example if a certain list included the Omega and you tried to expand its digits for diagonalization I could no longer follow the proof.
I don’t know whether this position is shared by anyone else. It seems I require proofs to ‘materialize fully’ and then allow syntactic deduction step-by-step.
Although I don’t believe this is the position of constructivism, wikipedia tells me that there are forms of constructivism where the definition of reals produces only computable reals.
(On the same page it reads “…In short, one who takes the view that real numbers are effectively computable interprets Cantor’s result as showing that the real numbers are not recursively enumerable.”, which makes my comments above late to the party already.)
Thanks again for your time.
18 October, 2010 at 3:17 pm
Dave Doty
I like proofs by contradiction, except when they have this form:
Theorem. A ==> B.
Proof.
1) Assume for the sake of contradiction that (A and not(B)) is true.
2) Then by some statements that can be derived solely from A, without once using not(B), we arrive at the conclusion that B holds, which is a contradiction.
3) Our original assumption of (A and not(B)) was false; therefore A ==> B.
QED
Of course, then entire proof works with statements (1) and (3) removed, but the superfluous contradiction has me scrambling to find where in (2) the assumption of not(B) was used. These sorts of proofs are sadly common in papers.
18 October, 2010 at 3:35 pm
Terence Tao
Yes, this illustrates a disparity between the style of argument that makes it easy to prove theorems, and the style of argument that makes it easy to understand those proofs. Proof by contradiction, even the “trivial” proofs by contradiction of the type you mention, can be conceptually helpful for finding a valid proof of a theorem; for instance, to show that a function
is injective, it is often helpful to visualise (for sake of contradiction) two distinct inputs
that collide to give the same output
, and at some point realise that the two inputs in fact had to be equal. Even if one never used the initial hypothesis
except to obtain the final contradiction, this type of mental image is very useful for figuring out exactly what obstructs this type of collision from occurring, thus, identifying the mechanism that gives injectivity. Here, contradiction is used as a sort of diagnostic tool to locate the argument, rather than being an essential component of the argument itself.
But yes, when the time comes to explain the argument to somebody else, who is more interested in understanding the argument than in knowing how it was arrived at, then “polishing” the argument to remove the superfluous proof by contradiction can be preferable. (But if taken too far, it can lead to proofs that are extremely “slick” and “elegant” but do not seem to arise organically from any innate intuition. So there is a tradeoff to be had here.)
18 October, 2010 at 3:20 pm
sb
Two subtle points to think about. 1. If the original result is negative (something is uncountable or something does not exist), then this procedure does give a constructive proof of the “there exists” version. But then to go from “there exisits” version to the original result you need contradiction. So its sort of postponing the inevitable. 2. If the original result is positive, does this approach work ? The only one I thought was positive was Godel’s first incompleteness theorem and that avoided contradiction by making if and only if arguments and was a corollary of “impredicativity of truth” which
is a negative result.
18 October, 2010 at 5:57 pm
Sam Alexander
One way to rephrase theorems is to say “If p is prime then…”, rather than “Let p be prime…”
It’s interesting to consider the logic of such a statement, and it naturally leads to the correct truth table for implication (i.e. that F->F=T), something which, itself, is often confusing to freshmen. For example, take the statement: “If x is positive then x+1 is positive.” Nobody will challenge this statement, however skeptical they might be about truth tables. But that implicitly means the statement is true for all values of x, in particular x=-1, in which case it takes the form F->F…
Why do we use “Let p be prime…” instead of “If p is prime…”? It seems to be part tradition, and partly a trick to avoid extremely long run-on sentences. Also a writer might mix and match the two just for the sheer purpose of having variety in their writing (math papers can easily fall prey to very very repetitive writing…)
19 October, 2010 at 11:50 am
Terence Tao
The command “Let” in mathematics is more than a supposition; it also serves the purpose of declaring a new variable that was not previously used or defined (similar to variable declarations in many modern computer languages). A phrase such as “If p is prime, …” or “Suppose that p is prime.” is only valid mathematical grammar if p has already been defined previously in the argument, whereas “Let p be a prime” is used when p is introduced for the first time.
(There is also “Suppose that p is a prime,” which is more agnostic as to whether p has previously been defined or not, and also does not carry as much of connotation that suggests that at least one prime exists.)
29 October, 2010 at 4:26 am
David Speyer
The big problem with “If p is prime then X” is that X has to fit within a single sentence. English has no grammatical structure that allows me to make a hypothetical last until the end of a paragraph. When I edit my work, I constantly find overlong sentences of the form “If P, then A, so B, and then C.” When I break them up, they have to become “Suppose P. Then A. So B, and then C. In summary, P implies C.” I wish there were a better way to be clear about exactly how long the supposition of P lasts.
18 October, 2010 at 7:59 pm
MathStudent
Dear Prof. Tao,
Is it also reasonable to conclude that proofs by “no self-defeating object” argument work only when the statement of the existence of the object (or its negation) is sufficiently “strong” and not otherwise? For instance, would it be correct to say that the negation of the “infinity of primes” is strong enough to lead to a proof of the infinity of primes, but fails to work for twin primes because neither that statement (there are infinitely many twin primes) nor its negation is sufficiently “strong”.
19 October, 2010 at 12:00 pm
Terence Tao
Yes, one can think in these terms. For instance, a strong property that the primes have (as opposed to the twin primes) is the fundamental theorem of arithmetic, which asserts that the primes multiplicatively generate the entire natural numbers. It is this fact that underlies most of the proofs of the infinitude of primes that I know of. Of course, the twin primes do not have anything resembling this property, and so one cannot adapt these arguments for primes directly to the almost primes.
18 October, 2010 at 9:27 pm
Kaizer
Is proof by contradiction the only way to prove any of these famous results, such as that there are infinitely many primes?
19 October, 2010 at 12:02 pm
Terence Tao
It’s hard to expunge proof by contradiction (or proof by negation) from one’s toolbox completely (especially when trying to prove a negative result, as opposed to a positive one), but there are other ways to prove the infinitude of primes that have a somewhat different flavour than Euclid’s proof. For instance, starting with Euler’s formula
for s > 1 (which is, incidentally, a generating function formulation of the fundamental theorem of arithmetic), one can use the divergence of the harmonic series
and a limiting argument to conclude the divergence of the prime harmonic series
, which implies in particular that there must be infinitely many primes (and also gives some stronger information regarding the distribution of the primes). However, proof by contradiction is still probably buried on many of the steps of the above proof, most notably in the “limiting argument” that I glossed over.
19 October, 2010 at 8:00 pm
Joshua Zelinsky
There are a few others that have very different flavor also. For example, using the same sort of argument with s=2 in the above along with the fact that Pi^2 is irrational you get the same result. This proof is in some ways a deliberate obfuscation.
At some point someone should list all the interesting proofs that there are infinitely many primes. Possibly my favorite such proof uses deterministic finite automata’s inability to recognize powers of 2 in a base that isn’t itself a power of 2. Proof sketch: A DFA cannot recognize the language that consists of powers of 2 written out in base 10. But, for any positive integer M, the language “in base 10 n represents an integer divisible by m” is DFA-recognizable. Languages recognizable by DFAs are closed under complements, intersections and unions. So if there are only finite many primes, we can then use the divisibility DFAs to recognize powers of 2. Contradiction and end of proof.
24 October, 2010 at 1:17 pm
David
Hmm. You could also use that \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6} and note that it is irrational to conclude that the RHS must also be irrational? If finite, the RHS would be a product of finitely many rational numbers and hence rational. Hence it can’t be a finite product. This introduces contradiction (possibly twice, depending on how you show \frac{\pi^2}{6} is irrational), but if all we want is a proof that is ‘essentially different’ from Euclid, this fits the bill too.
(If \pi^2/6 were rational, we’d have integers p, q such that p/q = \pi^2/6 and hence \pi would be a root of qx^2 -6p, hence algebraic. Exercise: \pi is not algebraic!)
24 October, 2010 at 1:18 pm
David
I see someone’s already posted this. Sorry!
19 October, 2010 at 6:16 am
small e
I am not 100% happy with the final argument in the proof of Proposition 6′: Since there are several (in fact, two) ways to write y as a decimal, one needs to show that none of these representations occurs.
I think it is easier to just define decimal numbers by one representation (e.g., the one that does not break off).
19 October, 2010 at 9:19 am
Terence Tao
I addressed this point in the construction of y; the n^th digit of the decimal expansion of y that is given is not equal to the n^th digit of any of the representations of x_n, and so cannot equal x_n. (The fact that y may possibly have another decimal representation is not relevant for this argument; as long as at least one of the decimal representations of y is not a decimal representation of x_n, then y is not equal to x_n.)
19 October, 2010 at 10:09 am
Josh
This may be a bit pedantic: For what it’s worth, I’m not sure “counterfactual” is the best term to use here. What’s really going on is that you’re distinguishing between reasoning with and without suppositions. The particular cases you’re focusing on involve reasoning with suppositions that turn out to be false. But that isn’t exactly the same as reasoning counterfactually. Compare the old chestnut: If Oswald did not shoot Kennedy, someone else did. (True.) If Oswald hadn’t shot Kennedy, someone else would have. (False.) The former is a indicative conditional with a false antecedent. The latter is a genuine subjunctive/counterfactual conditional. We can imagine reasoning with the supposition that Oswald didn’t shoot Kennedy and come to believe someone else did. The various illustrations of “counterfactual thinking” seem rather more like this sort of reasoning than true counterfactual reasoning. More generally, there doesn’t seem to be any need for a genuinely counterfactual/subjunctive conditional in mathematical reasoning.
19 October, 2010 at 11:15 am
Terence Tao
This is a valid distinction, though as you say it is indeed largely erased in mathematics because our existing body of knowledge about a mathematical world consists only of necessary truths and whatever is deducible from one’s hypotheses, whereas in the real world one also has access to contingent truths (e.g. “Kennedy was shot”) that are neither necessarily true in all worlds, nor necessarily deducible from one’s hypotheses, which is the reason that one has to make the distinction in philosophy between indicative implications (adding more hypotheses while retaining the contingent truths) and counterfactual implications (adding more hypotheses, but dropping the contingent truths), whilst in mathematics there is only the material implication (the truth table relation between the truth of the hypotheses and the truth of the conclusions).
Nevertheless, even if there is no formal distinction between different types of implication in mathematics, I think the way that one _thinks_ about an implication in mathematics can be classified, at least approximately, into such categories as indicative and counterfactual. In particular, when reasoning by contradiction, one has to temporarily be in the mental state where one has to suppress any contingent information about the problem at hand; for instance, you may already be aware from textbooks that the reals are known to be uncountable, but when actually proving Cantor’s theorem, one of course cannot utilise this fact, and so one is indeed temporarily operating in a counterfactual universe in which Cantor’s theorem did not actually appear in textbooks (or Kennedy was not shot, etc.).
(One can also see the desire to signal different types of implication in mathematical grammar. “If
is an enumeration of the reals, then…” and “Suppose that
is an enumeration of the reals. Then…” are logically equivalent, but the latter, to me at least, suggests a more counterfactual point of view than the former.)
19 October, 2010 at 1:53 pm
Gil Kalai
The confusions at the beginning at the post remind me of a similar one I posted here:
http://gilkalai.wordpress.com/2009/07/22/a-proof-by-induction-with-a-difficulty/
Sasha Barvinok mentioned a similar, more basic, story (or a profound philosophical joke according to Wittgenstein) from “Littelwood’s Miscellany”:
“Schoolmaster: ‘suppose that x is the number of sheep in the problem.’
Pupil: ‘But, Sir, suppose that x is not the number of sheep.’ “
Another remark is that game theory is a nice context to formalize and study counterfactuals.
25 May, 2016 at 12:44 pm
arch1
Gil, do you think you understand that item in the Miscellany? I thought your blog post might possibly help me do so, but (not your fault of course) no luck.
25 May, 2016 at 1:30 pm
Anonymous
A similar story:
Schoolmaster: ‘solve the equation
.’
Pupil: ‘But, Sir, isn’t it the solution of itself?’
19 October, 2010 at 4:11 pm
Peter
Somewhere I read the interesting (and useful) point that when one is trying to prove things, it often in ones best interest to try to avoid proofs by contradiction, especially long proofs by contradiction. The reasoning is that if a mistake is made in a proof by contradiction the arguments that follow are often useless, but if a theorem is “proved” using many lemmas and one lemma is wrong, the others are often still useful. This often makes me suspiscious of long proofs by contradiction, wherever I see them.
20 October, 2010 at 8:47 am
Terence Tao
Well, the more precise statement, I think, is that it is best to avoid long proofs by contradiction which consist of many parts, each of which is different to convert to a non-counterfactual form. One sees this sometimes in attempts by amateurs to prove, say, the Riemann hypothesis; one starts with assuming a zero off of the critical line, and then derives a large number of random statements both using this fact, and not using this fact. At some point, one makes an error (e.g. division by zero), but one does not notice it until several pages later, when two of the equations derived disagree with each other. At this point, the author triumphantly declares victory.
On the other hand, there are many valid long proofs by contradiction in the literature. For instance, in PDE, a common and powerful way to show that a solution to an evolution equation exists for all time is to assume that it doesn’t, and deduce that it must develop a singularity somewhere. One then applies an increasingly sophisticated sequence of analytical tools to control and understand this singularity, until one eventually is able to show that the singularity has an impossible nature (e.g. some limiting asymptotic profile of this singularity has a positive norm in one function space, and a zero norm in another). In many cases this is the only known way to obtain a global regularity result, but most of the proof is left in the counterfactual world where singularities exist. But, the use of contradiction is often “shallow”, in that large parts of the proof can be converted into non-counterfactual form (and indeed, if one looks at the actual proof, it is usually the case that most of the key lemmas and sub-propositions in the proof are stated non-counterfactually). In fact, there are two closely related arguments in PDE, known as the “no minimal energy blowup solution” argument, and the “induction on energy” argument, which are related to each other in much the same way as the well-ordering principle is to the principle of mathematical induction (or the way that the “no self-defeating object” argument is related to the “every object can be defeated” argument); the former is counterfactual and significantly simpler, the latter is not but requires much lengthier and messier arguments. But it is generally accepted that the two methods are, on some level, equivalent.
Here is an analogy. One can think of a long proof as a long rope connecting point A (the hypotheses) to point B (the conclusion). This rope may be submerged in murky water (the counterfactual world) or held up above it (the non-counterfactual world). A proof by contradiction thus is like a rope that is almost completely submerged underwater, but as long as the rope is only shallowly underwater, one can still see it well enough to conclude that it is unbroken. But if it sinks too far into the depths of the counterfactual world, then it becomes suspicious.
Finding a non-counterfactual formulation of an argument then resembles the action of lifting the rope up so that it is mostly above water (though, if either the hypothesis A or conclusion B are negative in nature (and thus underwater), one must still spend a tiny fraction of the argument at least in the counterfactual world, as Gowers pointed out in another comment; also, the proof of the non-counterfactual statement may still occasionally use contradiction, so the rope may still dip below the water now and then). This can make the argument clearer, but it is also a bit tiring to lift the entire rope up this way; if one’s objective is simply to connect A to B in the quickest way possible, letting the rope slide underwater is often the simplest solution.
20 October, 2010 at 2:00 am
Ken Joffaniel Gonzales
How would you compare proof by contradiction with other methods of proof in terms of rigor? A non-mathematician would, of course, appreciate a proof that really tells what something is, instead of assuming that something isn’t and then proceeding with the conclusion that it actually is. Do we prefer proofs by contradiction when we are stuck in an argument that becomes ugly when proven directly?
20 October, 2010 at 9:55 am
Terence Tao
I think Joel Hamkins’ response to this MathOverflow question,
http://mathoverflow.net/questions/12342/reductio-ad-absurdum-or-the-contrapositive
gives an excellent answer to these questions.
20 October, 2010 at 3:48 am
John Sidles
It seems to me that counterfactual conversion can be applied to many forms of scientific and medical reasoning, not just pure mathematics.
For example, the conclusion of Richard Wilson’s 1989 Nature review, titled “Is quantum mechanics linear?” can be stated in no-self-defeating form as follows:
Proposition (No nonlinearity of quantum dynamics) On any finite-dimensional nonlinear complex state-space, there does not exist a Hamiltonian flow for which cyclic trajectories have an amplitude-independent frequency.
Stated in non-counterfactual form this becomes:
Proposition’ If
is a Hamiltonian flow on a finite-dimensional nonlinear complex state-space, then the cyclic trajectories associated to
have amplitude-dependent frequency.
In the latter form we appreciate that the proposition has counter-examples, for example, state-spaces that are tensor products of Bloch spheres, with a uniform magnetic field as the Hamiltonian.
Thus counterfactual conversion helps us appreciate that although the resonant frequencies observed in atomic beam experiments are observed to be amplitude-independent to within one part in
(or better), it does not follow that the state-space of quantum mechanics is a (linear) Hilbert space.
20 October, 2010 at 5:27 am
gowers
A very interesting post. In particular, it interests me that you found some examples easier to recast than others. Do you have a general explanation for this? With at least some of them it is quite easy to see what to do: you start with a statement of the form
and you prove instead the statement
For instance, if
is the statement “X is countable” then to prove that
is uncountable you prove that every countable set fails to equal
Though actually you prove something more precise, which is that for every countable subset of
you can construct a real number that does not belong to that set.
It’s also interesting to think about what is left to do once one has proved that. How does one actually prove the statement that if
and
then
? One way would be to say that if
then every element of
is an element of
so
must be empty. But normally we would regard this result as so obvious that we would think of the proof as finished once we had got to the stage of showing that
contains an element. So the contradiction, being relegated to a very trivial part of the argument (rather than the usual practice of making it contain the entire argument) hardly feels like a contradiction any more. (I still haven’t embedded the distinction between proof by contradiction and proof by negation into my skull, so apologies to Andrej if this too is not a genuine proof by contradiction.)
20 October, 2010 at 9:13 am
Terence Tao
It’s certainly true that I found some of the counterfactual arguments more difficult to convert than others. Somehow, it seems that some arguments are more deeply embedded in the counterfactual universe than others, by relying on the counterfactual premise in multiple, interconnected, ways. (See also the “rope” analogy in my reply to Peter.) Also, there was often a conceptual block to overcome; one often had to separate a single object X in the counterfactual argument into two objects X and Y in the non-counterfactual one, and one had to remove certain preconceptions about the argument in order to see exactly how to achieve this separation. (This is perhaps most extreme in the case of the grandfather paradox, when one had to separate the entire universe into two separate universes.)
The Godel theorems, in particular, took quite a while for me to convert to non-counterfactual form. Once I saw the trick (replacing the concept of “provability in
” by “provability in
“, or by a more abstract predicate which might or might not have anything to do with provability), it was straightforward, but making that connection took perhaps fifteen minutes of thought.
Actually, I still do not have a fully satisfactory non-counterfactual version of the “no immovable object” or the “no controlled, detectable tachyon particles” arguments in special relativity. The splitting universe trick becomes enormously complicated due to all the possible trajectories in spacetime one would have to split over (unless one uses quantum field theory, but this of course creates its own set of conceptual difficulties).
For instance, here is one formulation of the “no immovable object” argument that I am not fully satisfied with yet:
Proposition. (All objects are movable) Assume the postulates of special relativity. Suppose that one has a mechanism M for generating an object O=O(F,X) at rest in any specified inertial frame F, at any specified location X in that frame. Then at least one of the following facts must be true:
1. Any object generated by this mechanism, can be moved by some external force.
2. If an object O at rest in one frame F in this mechanism encounters another object O’ at rest in frame F’ also created by this mechanism, then the two objects must pass through each other instead of colliding.
But perhaps there is a better formulation. The situation with the tachyon pulses is also complicated. The best I can do currently is to say that any controllable generator of detectable particles that can be set up in every inertial frame, must either generate streams of particles that travel at or below the speed of light, or else generate streams of particles that travel to an alternate universe.
20 October, 2010 at 7:44 am
Jason Gaddis
I apologize if this is just semantics, but you say,
“Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say , and it will actually exhibit a prime not in that list (in this case, 31)”.
It will exhibit a number that is relatively prime to every number in your list, but that number is not necessarily prime. For example,
2*3*5*7*11*13+1 = 30031 = (59)(509)
20 October, 2010 at 7:54 am
Qiaochu Yuan
The number being exhibited is a prime factor of the product, not the product itself.
20 October, 2010 at 2:34 pm
Gerard
Excellent post. It made me think that in undergraduate mathematics the proof by contradiction method is very commonly used (first university courses, IMO’s, etc.), but in more advanced courses (specially in areas where the idea of continuum is important), and except some typical cases (e.g. show that this thing is unique), i have the impression that it isn’t as useful as it used to be. Is it the case in modern investigation?
20 October, 2010 at 5:32 pm
Jason Rute
Here is a relevant example from calculus. “Show that f(x) = 3x – 1 – sin(x) has exactly one root.” To show at least one root exists, just use IVT. But to show there are no more roots, we teach them a proof by contradiction. Namely pick two roots a and b. Since f(a) = f(b), by Rolle’s Theorem (or MVT) there is a c in-between a and b such that f'(c) = 0. But f'(x) = 3 – sin(x) >=2 > 0. Contradiction.
The less counter-factual proof would go something like follows. Since f'(x) > 0, we have that f(x) is strictly increasing, hence it can only cross the x-axis once. QED. Of course this is no longer a cookie cutter application of Rolle’s theorem, which is what I feel the book (Stewart) is going for.
A more MVT-theorem-based-version is as follows. Pick two points a and b. By MVT, there is a point c in-between such that f'(c) (b – a) = f(b) – f(a). Then we have f(b) = f(a) + f'(c) (b – a) >= f(a) + 3(b – a). Not only does this show f(a) doesn’t equal f(b), but it shows how much bigger (or smaller) f(b) is. This is actually related to another problem we assign from the same chapter, namely, “If f is a differentiable function with f'(x)<3, and f(2)=4, what is the largest f(6) can be?".
Teaching calculus aside, this post was really insightful. Thanks for sharing!
24 October, 2010 at 2:02 pm
Isaac S.
From a purely pedagogical perspective, isn’t the learning curve of counterfactual reasoning an important step in developing abstract thinking?
One might argue that a proof of Cantor’s theorem by contradiction is preferable precisely because it offends student’s sensibilities, and thereby develops lateral thinking.
Personally, I have found that the de-institutionalization of mathematics de-mystifies it. When mathematics “just has to make sense,”as opposed to needing to adhere to contrived, small-minded rules, creativity ensues.
If students are to grow in mathematics and critical thinking, they need to struggle with the question of “why,” just as much, if not more, than with the question of “how.”
24 October, 2010 at 10:59 pm
AK
I think the discussion of the last “Theorem” is superficial. Suppose I am such
a being, and by Wednesday I have by some means already convinced you that I fulfill condition 2. (It is an interesting question _how_ I convince you of this,
clearly any finite set of stone-lifting acts can be deemed insufficient, but we leave this to the side now.) Now, having convinced you of C2, on Thursday I proceed to create a stone so heavy that even I can’t lift it. (Again, it is an interesting question on its own right how I convince you of this, how do you know that the created stone is really _that_ heavy, but again this is not what the “Theorem” is about.) It is evident that on Wednesday I fulfilled both C1 and
C2, being able to lift all stones and being able to create unliftable ones.
In general, the notion of omnipotence includes the potential to limit this omnipotence. Once limited, the being will no longer be omnipotent, but as long as she chooses not to the “Theorem” fails.
24 October, 2010 at 11:34 pm
David
Being able to lift “all stones” (your words) is not necessarily the same as being able to lift “any possible stone” (the article’s words). The former could be taken to mean only all currently existing stones; the latter means that even on Wednesday, you (the G in the theorem) are able to lift Thursday’s stone. That stone is a ‘possible stone’ on Wednesday because it’s possible for you to create it.
You’ve implicitly introduced the notion of time into the language we’re using to describe things here, but you haven’t unambiguously rephrased the original theorem to both account for this notion and adhere to the spirit of the theorem. The obvious interpretation of ‘possible stone’ is ‘stone that exists now or could exist in the future.’ But then if you can lift it on Wednesday then to create it on Thursday you have to change your lifting capabilities to explicitly exclude this stone that on Wednesday you could lift. You’re falling prey to the same sort of trap that ‘mv’ did higher up in the thread: you’re not considering the same object on Wed that you are on Thurs.
2 November, 2010 at 1:47 pm
The “no self-defeating object” argument, and the vagueness paradox « What’s new
[…] is the third in a series of posts on the “no self-defeating object” argument in mathematics – a powerful and useful […]
28 November, 2010 at 4:38 pm
Why, when we round n+sqrt(n), do we never get a square? « Republic of Mathematics
[…] Terry Tao has written extensively about these sort of arguments, The general idea is that we set up a world that we believe cannot exist – sort of like in science fiction, but math fiction. Then we carefully examine this world for consequences and see if something truly unacceptable occurs. If it does, we conclude that no such world can exist. […]
29 November, 2010 at 2:24 pm
Michael Hardy
“It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world,”
I remember pointing this out to you when you visited Minneapolis last year. But what I didn’t say, and possibly should have, but didn’t, mention in my joint article with Catherine Woodgold in the _Mathematical Intelligencer_ about what Euclid did and modern perceptions of it, was that what Euclid did was closer to your Proposition 3′ than to your Proposition 3. In particular, Euclid did not explicitly mention anything being “infinite”. Some people regard this as more than a mere terminology issue: Harold Edwards wrote a letter-to-the-editor that was published in the _Intelligencer_ making an issue of this point.
11 January, 2011 at 7:31 pm
Could God make pi = 3.15? - Quora
[…] grad student studying automorphic forms The following is a nice discussion of a related topic:https://terrytao.wordpress.com/20…Insert a dynamic date hereCannot add comment at this […]
21 January, 2011 at 6:40 am
Avoiding Proof By Contradiction « PseudoRandom
[…] Proof By Contradiction In this post Terence Tao explains how to avoid proof by contradiction. It shows another way to explain […]
19 February, 2011 at 9:53 am
Finite Sets | Poincare's Paradise
[…] links are link1,link2,link3. By going through these links we can understand how this method has been used in mathematics, […]
23 July, 2011 at 4:17 pm
Self-Defeating Sentences « Gödel’s Lost Letter and P=NP
[…] attempts to prove circuit lower bounds are self-defeating. This suggests the need for new ideas. In a fight-fire-with-fire mood, we hope that exploring the arena of self-defeat will stimulate […]
11 October, 2011 at 9:46 am
Anonymous
“Theorem”: If G is a being, then G will be unable to do at least one of the following two tasks:
Create a stone so heavy that G cannot lift it.
Be able to lift any possible stone.
disproof: I can create a stone that weighs 200 pounds. Work out for five weeks. Lift stone.
11 October, 2011 at 9:49 am
Anonymous
perhaps the act of creating a stone involved a lot of heavy lifting, and it also took exactly five weeks to make it.
28 November, 2011 at 11:23 pm
Anonymous
I feel like a simpler method of approaching that theorem would be to say “instead of lifting the rock, drop everything else.” thus the rock was “lifted” per se.
20 May, 2012 at 2:58 pm
Heuristic limitations of the circle method « What’s new
[…] itself (in the spirit of “self-defeating object” arguments, discussed previously several times on this blog). This is actually a fairly common technique in analytic number theory (though not […]
28 June, 2014 at 4:58 am
What books to read to get smarter? | Key to study
[…] Terence Tao’s blog. Tao on sets and logic. […]
27 November, 2014 at 8:34 am
Here Are 97 Books, Articles, And Movies That Will Make You Smarter | TAR News
[…] Terence Tao’s blog. Tao on sets and logic. […]
12 October, 2016 at 4:19 pm
Anonymous
When explaining the transition from “no self-defeating object” into “every object can be defeated”, you write:
“This change is almost trivial to enact (it is often little more than just taking the CONTRAPOSITIVE of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.”
Now I wonder: what does this have to do with the contrapositive? I think that the “no self-defeating object” version says
, and the “any object can be defeated” version asserts
. But the transition from
to
has nothing to do with the contrapositive.
22 October, 2016 at 7:39 am
Anonymous
Terry, could you please answer my question?
12 January, 2020 at 4:09 am
Lukasz
“Theorem 9 (Second Gödel incompleteness theorem). (Informal statement) No consistent logical system which has the notion of a predicate and a string, can provide a proof of its own logical consistency.”
In the paper: T. J. Stępień, Ł. T. Stępień, “On the Consistency of the Arithmetic System”, J. Math. Syst. Sci. 7, No.2, 43-55 (2017); arXiv:1803.11072 , a proof of the consistency of the Arithmetic System has been presented. This proof has been done within this Arithmetic System.
I know that this message can be received with a disbelief, however, please simply read this paper and consider this proof.
This paper does not contain any considerations related to either Second Gödel incompleteness theorem or First Gödel incompleteness theorem.