You are currently browsing the category archive for the ‘update’ category.
Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.
I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces are constructed for each frequency scale , and then a further space is constructed for a given frequency envelope by the formula
where is the Littlewood-Paley projection of to frequency magnitudes . Then, given a spacetime slab , we define the restrictions
where the infimum is taken over all extensions of to the Minkowski spacetime ; similarly one defines
The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab to obtain the equality
(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions of that are optimal for computing are not necessarily the Littlewood-Paley projections of the extensions of that are optimal for computing .
To remedy the problem, one has to prove an upper bound of the form
the extension will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the norm does control the norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients , while bounded, are not necessarily summable).
This can be fixed as follows. For each we introduce a time cutoff supported on that equals on and obeys the usual derivative estimates in between (the time derivative of size for each ). Later we will prove the truncation estimate
Assuming this estimate, then if we set , then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the summation and taking limits we can check that is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that
for all .
For this is immediate from (2). Now suppose that for some integer (the case when is treated similarly). Then we can split
The contribution of the term is acceptable by (6) and estimate (82) from my paper. The term sums to which is acceptable by (2). So it remains to control the norm of . By the triangle inequality and the fundamental theorem of calculus, we can bound
By hypothesis, . Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have
and then we are done by summing the geometric series in .
It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.
Two quick updates with regards to polymath projects. Firstly, given the poll on starting the mini-polymath4 project, I will start the project at Thu July 12 2012 UTC 22:00. As usual, the main research thread on this project will be held at the polymath blog, with the discussion thread hosted separately on this blog.
Second, the Polymath7 project, which seeks to establish the “hot spots conjecture” for acute-angled triangles, has made a fair amount of progress so far; for instance, the first part of the conjecture (asserting that the second Neumann eigenfunction of an acute non-equilateral triangle is simple) is now solved, and the second part (asserting that the “hot spots” (i.e. extrema) of that second eigenfunction lie on the boundary of the triangle) has been solved in a number of special cases (such as the isosceles case). It’s been quite an active discussion in the last week or so, with almost 200 comments across two threads (and a third thread freshly opened up just now). While the problem is still not completely solved, I feel optimistic that it should fall within the next few weeks (if nothing else, it seems that the problem is now at least amenable to a brute force numerical attack, though personally I would prefer to see a more conceptual solution).
Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown. (Requires either Java or Flash.) Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:
After completing each level, an icon appears which one can click on to proceed to the next level. (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)
The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)
One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)
A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post. As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity. But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions. As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion. I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link). (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.) I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.
Let me now describe the issue in more detail (and also to explain why I missed it previously). A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time. To illustrate this, consider a solution to the supercritical nonlinear wave equation
for some field . Suppose one already knew that had some regularity in space, and in particular the norm of was bounded (thus and up to two spatial derivatives of were bounded). Then, by (1), we see that two time derivatives of were also bounded, and one then gets the additional regularity of .
In a similar vein, suppose one initially knew that had the regularity . Then (1) soon tells us that also has the regularity ; then, if one differentiates (1) in time to obtain
one can conclude that also has the regularity of . One can continue this process indefinitely; in particular, if one knew that , then these sorts of manipulations show that is infinitely smooth in both space and time.
The issue that caught me by surprise is that for the Navier-Stokes equations
(setting the forcing term equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space . The problem lies with the pressure term , which is recovered from the velocity via the elliptic equation
that can be obtained by taking the divergence of (2). This equation is solved by a non-local integral operator:
If, say, lies in , then there is no difficulty establishing a bound on in terms of (for instance, one can use singular integral theory and Sobolev embedding to place in . However, one runs into difficulty when trying to compute time derivatives of . Differentiating (3) once, one gets
At the regularity of , one can still (barely) control this quantity by using (2) to expand out and using some integration by parts. But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form
and now there is not enough regularity on available to get any control on , even if one assumes that is smooth. Indeed, following this observation, I was able to show that given generic smooth data, the pressure will instantaneously fail to be in time, and thence (by (2)) the velocity will instantaneously fail to be in time. (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity will fail to be in time. Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)
For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade regularity to regularity instantaneously. It is only the initial time at which some time irregularity can occur.
This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing). In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild solution) and the data is Schwartz, then the solution will be smooth in spacetime. (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)
This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth data) is not correct as stated. But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time. More precisely, call a solution and almost smooth if
- and are smooth on the half-open slab ; and
- For every , exist and are continuous on the full slab .
Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only , and the pressure field is only . This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.
(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth setting also.)
One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth. Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).
The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:
Christian Elsholtz and I have recently finished our joint paper “Counting the number of solutions to the Erdös-Straus equation on unit fractions“, submitted to the Journal of the Australian Mathematical Society. This supercedes my previous paper on the subject, by obtaining stronger and more general results. (The paper is currently in the process of being resubmitted to the arXiv, and should appear at this link within a few days.)
with positive integers. The Erdös-Straus conjecture asserts that for all . Since for all positive integers , it suffices to show that for all primes .
We single out two special types of solutions: Type I solutions, in which is divisible by and are coprime to , and Type II solutions, in which is coprime to and are divisible by . Let denote the number of Type I and Type II solutions respectively. For any , one has
with equality when is an odd primes . Thus, to prove the Erdös-Strauss conjecture, it suffices to show that at least one of , is positive whenever is an odd prime.
Our first main results are the asymptotics
This improves upon the results in the previous paper, which only established
The double logarithmic factor in the upper bound for is artificial (arising from the inefficiency in the Brun-Titchmarsh inequality on very short progressions) but we do not know how to remove it.
The methods are similar to those in the previous paper (which were also independently discovered in unpublished work of Elsholtz and Heath-Brown), but with the additional input of the Erdös divisor bound on expressions of the form for polynomials , discussed in this recent blog post. (Actually, we need to tighten Erdös’ bound somewhat, to obtain some uniformity in the bounds even as the coefficients of become large, but this turns out to be achievable by going through the original arguments of Erdös more carefully.)
We also note an observation of Heath-Brown, that in our notation gives the lower bound
thus, we see that for typical , that most solutions to the Erdös-Straus equation are not of Type I or Type II, in contrast to the case when is prime.
We also have a number other new results. We find a way to systematically unify all the previously known parameterisations of solutions to the Erdös-Straus equation, by lifting the Cayley-type surface to a certain three-dimensional variety in six-dimensional affine space, in such a way that integer points in the former arise from integer points in the latter. Each of the previously known characterisations of solutions then corresponds to a different choice of coordinates on this variety. (This point of view was also adopted in a paper of Heath-Brown, who interprets this lifted variety as the universal torsor of the Cayley surface.) By optimising between these parameterisations and exploiting the divisor bound, we obtain some bounds on the worst-case behaviour of and , namely
which should be compared to a recent previous bound of Browning and Elsholtz. In the other direction, we show that for infinitely many , and for almost all primes . Here, the main tools are some bounds for the representation of a rational as a sum of two unit fractions in the above-mentioned work of Browning and Elsholtz, and also the Turán-Kubilius inequality.
We also completely classify all the congruence classes that can be solved by polynomials, completing the partial list discussed in the previous post. Specifically, the Erdös-Straus conjecture is true for whenever one of the following congruence-type conditions is satisfied:
- , where are such that .
- and , where are such that .
- and , where are such that .
- or , where are such that and .
- , where are such that .
- , where are such that .
In principle, this suggests a way to extend the existing verification of the Erdös-Straus conjecture beyond the current range of by collecting all congruences to small moduli (e.g. up to ), and then using this to sieve out the primes up to a given size.
where are fixed. We can obtain a partial analogue of our main bounds for the case, namely that
were denotes the number of solutions to (2) which are of “Type II” in the sense that are all divisible by . However, we do not believe our bounds to be sharp in the large regime, though it does show that the expected number of solutions to (2) should grow rapidly in .
[This is a (lightly edited) repost of an old blog post of mine, which had attracted over 400 comments, and as such was becoming difficult to load; I request that people wishing to comment on that puzzle use this fresh post instead. -T]
This is one of my favorite logic puzzles, because of the presence of two highly plausible, but contradictory, solutions to the puzzle. Resolving this apparent contradiction requires very clear thinking about the nature of knowledge; but I won’t spoil the resolution here, and will simply describe the logic puzzle and its two putative solutions. (Readers, though, are welcome to discuss solutions in the comments.)
— The logic puzzle —
There is an island upon which a tribe resides. The tribe consists of 1000 people, with various eye colours. Yet, their religion forbids them to know their own eye color, or even to discuss the topic; thus, each resident can (and does) see the eye colors of all other residents, but has no way of discovering his or her own (there are no reflective surfaces). If a tribesperson does discover his or her own eye color, then their religion compels them to commit ritual suicide at noon the following day in the village square for all to witness. All the tribespeople are highly logical and devout, and they all know that each other is also highly logical and devout (and they all know that they all know that each other is highly logical and devout, and so forth).
Of the 1000 islanders, it turns out that 100 of them have blue eyes and 900 of them have brown eyes, although the islanders are not initially aware of these statistics (each of them can of course only see 999 of the 1000 tribespeople).
One day, a blue-eyed foreigner visits to the island and wins the complete trust of the tribe.
One evening, he addresses the entire tribe to thank them for their hospitality.
However, not knowing the customs, the foreigner makes the mistake of mentioning eye color in his address, remarking “how unusual it is to see another blue-eyed person like myself in this region of the world”.
What effect, if anything, does this faux pas have on the tribe?
Note 1: For the purposes of this logic puzzle, “highly logical” means that any conclusion that can logically deduced from the information and observations available to an islander, will automatically be known to that islander.
Note 2: Bear in mind that this is a logic puzzle, rather than a description of a real-world scenario. The puzzle is not to determine whether the scenario is plausible (indeed, it is extremely implausible) or whether one can find a legalistic loophole in the wording of the scenario that allows for some sort of degenerate solution; instead, the puzzle is to determine (holding to the spirit of the puzzle, and not just to the letter) which of the solutions given below (if any) are correct, and if one solution is valid, to correctly explain why the other solution is invalid. (One could also resolve the logic puzzle by showing that the assumptions of the puzzle are logically inconsistent or not well-defined. However, merely demonstrating that the assumptions of the puzzle are highly unlikely, as opposed to logically impossible to satisfy, is not sufficient to resolve the puzzle.)
Note 3: An essentially equivalent version of the logic puzzle is also given at the xkcd web site. Many other versions of this puzzle can be found in many places; I myself heard of the puzzle as a child, though I don’t recall the precise source.
Below the fold are the two putative solutions to the logic puzzle. If you have not seen the puzzle before, I recommend you try to solve it first before reading either solution.
Last year, Emmanuel Breuillard, Ben Green, Bob Guralnick, and I wrote a paper entitled “Strongly dense free subgroups of semisimple Lie groups“. The main theorem in that paper asserted that given any semisimple algebraic group over an uncountable algebraically closed field , there existed a free subgroup which was strongly dense in the sense that any non-abelian subgroup of was Zariski dense in . This type of result is useful for establishing expansion in finite simple groups of Lie type, as we will discuss in a subsequent paper.
An essentially equivalent formulation of the main result is that if are two non-commuting elements of the free group on two generators, and is a generic pair of elements in , then and are not contained in any proper closed algebraic subgroup of . Here, “generic” means “outside of at most countably many proper subvarieties”. In most cases, one expects that if are generically drawn from , then will also be generically drawn from , but this is not always the case, which is a key source of difficulty in the paper. For instance, if is conjugate to in , then and must be conjugate in and so the pair lie in a proper subvariety of . It is currently an open question to determine all the pairs of words for which is not generic for generic (or equivalently, the double word map is not dominant).
The main strategy of proof was as follows. It is not difficult to reduce to the case when is simple. Suppose for contradiction that we could find two non-commuting words such that were generically trapped in a proper closed algebraic subgroup. As it turns out, there are only finitely many conjugacy classes of such groups, and so one can assume that were generically trapped in a conjugate of a fixed proper closed algebraic subgroup . One can show that , , and are generically regular semisimple, which implies that is a maximal rank semisimple subgroup. The key step was then to find another proper semisimple subgroup of which was not a degeneration of , by which we mean that there did not exist a pair in the Zariski closure of the products of conjugates of , such that generated a Zariski-dense subgroup of . This is enough to establish the theorem, because we could use an induction hypothesis to find in (and hence in such that generated a Zariski-dense subgroup of , which contradicts the hypothesis that was trapped in for generic (and hence in for all .
To illustrate the concept of a degeneration, take and let be the stabiliser of a non-degenerate -space in . All other stabilisers of non-degenerate -spaces are conjugate to . However, stabilisers of degenerate -spaces are not conjugate to , but are still degenerations of . For instance, the stabiliser of a totally singular -space (which is isomorphic to the affine group on , extended by ) is a degeneration of .
A significant portion of the paper was then devoted to verifying that for each simple algebraic group , and each maximal rank proper semisimple subgroup of , one could find another proper semisimple subgroup which was not a degeneration of ; roughly speaking, this means that is so “different” from that no conjugate of can come close to covering . This required using the standard classification of algebraic groups via Dynkin diagrams, and knowledge of the various semisimple subgroups of these groups and their representations (as we used the latter as obstructions to degeneration, for instance one can show that a reducible representation cannot degenerate to an irreducible one).
During the refereeing process for this paper, we discovered that there was precisely one family of simple algebraic groups for which this strategy did not actually work, namely the group (or the group that is double-covered by this group) in characteristic . This group (which has Dynkin diagram , as discussed in this previous post) has one maximal rank proper semisimple subgroup up to conjugacy, namely , which is the stabiliser of a line in . To find a proper semisimple group that is not a degeneration of this group, we basically need to find a subgroup that does not stabilise any line in . In characteristic larger than three (or characteristic zero), one can proceed by using the action of on the five-dimensional space of homogeneous degree four polynomials on , which preserves a non-degenerate symmetric form (the four-fold tensor power of the area form on ) and thus embeds into ; as no polynomial is fixed by all of , we see that this copy of is not a degeneration of .
Unfortunately, in characteristics two and three, the symmetric form on degenerates, and this embedding is lost. In the characteristic two case, one can proceed by using the characteristic fact that is isomorphic to (because in characteristic two, the space of null vectors is a hyperplane, and the symmetric form becomes symplectic on this hyperplane), and thus has an additional maximal rank proper semisimple subgroup which is not conjugate to the subgroup. But in characteristic three, it turns out that there are no further semisimple subgroups of that are not already contained in a conjugate of the . (This is not a difficulty for larger groups such as or , where there are plenty of other semisimple groups to utilise; it is only this smallish group that has the misfortune of having exactly one maximal rank proper semisimple group to play with, and not enough other semisimples lying around in characteristic three.)
As a consequence of this issue, our argument does not actually work in the case when the characteristic is three and the semisimple group contains a copy of (or ), and we have had to modify our paper to delete this case from our results. We believe that such groups still do contain strongly dense free subgroups, but this appears to be just out of reach of our current method.
One thing that this experience has taught me is that algebraic groups behave somewhat pathologically in low characteristic; in particular, intuition coming from the characteristic zero case can become unreliable in characteristic two or three.
The first volume of my 2009 blog book, “An epsilon of room“, has now been published by the AMS, as part of the Graduate Studies in Mathematics series. (So I finally have a book whose cover is at least partially in yellow, which for some reason seems to be the traditional colour for mathematics texts.) This volume contains the material from my 245B and 245C classes, and can thus be viewed as a second text in graduate real analysis. (I plan to have one volume of the 2010 blog book to be devoted to the material for the 245A class I just taught, and would thus serve as a first text in graduate real analysis to complement this volume.)
The second volume, which covers a wide range of other topics, should also be published in the near future.
One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1: is even. … Case 2: is odd. …”), then for any given , at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that is equal to the ratio of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.
Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:
Lecturer: “Theorem. Let be a prime number. Then…”
Student: “But how do you know that is a prime number? Couldn’t it be composite?”
Lecturer: “Now we see what the function does when we give it the input of instead. …”
Student: “But didn’t you just say that the input was equal to just a moment ago?”
This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.
Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.
Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number is equal to 1, and the solution to the blue-eyed islanders puzzle.)
Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.
I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.
It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.
For instance, consider the very first no-self-defeating result presented in the previous post:
Proposition 1 (No largest natural number). There does not exist a natural number that is larger than all the other natural numbers.
This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:
Proposition 1′. Given any natural number , one can find another natural number which is larger than .
Proof. Take .
While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or “constructive” than the “counterfactual” Proposition 1.
Proposition 3. There are infinitely many primes.
can be recast in “all objects can be defeated” form as
Proposition 3′. Let be a collection of primes. Then there exists a prime which is distinct from any of the primes .
Proof. Take to be any prime factor of (for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since is not divisible by any of the primes , must be distinct from all of these primes.
One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the ), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say , and it will actually exhibit a prime not in that list (in this case, ). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.
[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”. For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]