You are currently browsing the category archive for the ‘update’ category.

One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion.  For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let $n$ be a natural number such that …”), which may or may not apply to the particular value of $n$ one may have in mind.  Or, if one ever argues by dividing into separate cases (e.g. “Case 1: $n$ is even. … Case 2: $n$ is odd.  …”), then for any given $n$, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives.     But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that $\sqrt{2}$ is equal to the ratio $p/q$ of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives.  This can lead to confused exchanges such as the following:

Lecturer: “Theorem.  Let $p$ be a prime number.  Then…”

Student: “But how do you know that $p$ is a prime number?  Couldn’t it be composite?”

or

Lecturer: “Now we see what the function $f$ does when we give it the input of $x+dx$ instead.  …”

Student: “But didn’t you just say that the input was equal to $x$ just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics.  For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history.  Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result.  The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing.  Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning.  For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it.  And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses.   Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics.  Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set ${\bf R}$ of real numbers is uncountable.  This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics.  (The only other two examples I know of that come close to doing this are the fact that the real number $0.999\ldots$ is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist.  Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction.  Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses.  This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter.  This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number). There does not exist a natural number $N$ that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation.  But it has a logically equivalent “any object can be defeated” form:

Proposition 1′. Given any natural number $N$, one can find another natural number $N'$ which is larger than $N$.

Proof. Take $N' := N+1$. $\Box$

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take $N = 100$, so that the proof gives $N'=101$ ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number).  So there is a sense in which Proposition 1′ is more “non-counterfactual” or  “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3. There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′.  Let $p_1,\ldots,p_n$ be a collection of primes.   Then there exists a prime $q$ which is distinct from any of the primes $p_1,\ldots,p_n$.

Proof. Take $q$ to be any prime factor of $p_1 \ldots p_n + 1$ (for instance, one could take the smallest prime factor, if one wished to be completely concrete).   Since $p_1 \ldots p_n + 1$ is not divisible by any of the primes $p_1,\ldots,p_n$, $q$ must be distinct from all of these primes.  $\Box$

One could argue that  there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that $q$ was equal to one of the $p_1,\ldots,p_n$), but the proposition itself is not inherently counterfactual, as  it does not make as patently impossible a hypothesis as a finite enumeration of the primes.  Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today.  Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say $2, 3, 5$, and it will actually exhibit a prime not in that list (in this case, $31$).  The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”.  For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]

Ordinarily, I only mention my research papers on this blog when they are first submitted, or if a major update is required.  With the paper arising from the DHJ Polymath “Low Dimensions” project, though, the situation is a little different as the collaboration to produce the paper took place on this blog.

Anyway, the good news is that the paper has been accepted for the Szemerédi birthday conference proceedings.  The referee was positive and recommended only some minor changes (I include the list of changes below the fold).  I have incorporated these changes, and the new version of the paper can be found here.  Within a few days I need to return the paper to the editor, so this is the last chance to propose any further corrections or changes (though at this stage any major changes are unlikely to be feasible).

The editor asked a good question: should we have a list of participants for this project somewhere?  If so, it seems to make more sense to have this list as a link on the wiki, rather than in the paper itself. But making a list opens the can of worms of deciding what level of participation should be the threshold for inclusion in the list – should someone who only contributed, say, one tangential comment to one of the blog posts be listed alongside a substantially more active participant?

One possibility is that of self-reporting; we could set up a page for participants on the wiki and let anyone who felt like they contributed add their name, and rely on the honour code to keep it accurate.    This might be feasible as long as the page is kept unofficial (so, in particular, it will not be used as the formal list of authors for the paper).

A related question is whether to add an explicit link to the timeline for progress on this project and on the sister “New proof” project.  If so, this should also be kept unofficial (there was no formal guidelines as to what was included in the timeline and what was not).

These decisions do not necessarily have to be taken quickly; one can simply point to the wiki in the paper (as is already done in the current version), and update the wiki later if we decide to add these sorts of acknowledgments on that site.

Incidentally, if we have another successful Polymath project to write up, I would now strongly recommend using version control software (such as Subversion or git) to organise the writing process, both at the informal notes stage and also at the drafting stage.  It is certainly far superior to our improvised solution of putting the raw TeX files on a wiki…

After a hiatus of several months, I’ve made an effort to advance the writing of the second Polymath1 paper, entitled “Density Hales-Jewett and Moser numbers“.  This is in part due to a request from the Szemeredi 60th 70th birthday conference proceedings (which solicited the paper) to move the submission date up from April to February.  (Also, the recent launch of Polymath5 on Tim Gowers blog reminds me that I should get this older project out of the way.)

The current draft of the paper is here, with source files here.  I have been trimming the paper, in particular replacing some of the auxiliary or incomplete material in the paper with references to pages on the polymath wiki instead.  Nevertheless this is still a large paper, at 51 pages.  It is now focused primarily on the computation of the Density Hales-Jewett numbers $c_{n,3}$ and the Moser numbers $c'_{n,3}$ for all n up to 6, with the latter requiring a significant amount of computer assistance.

There are a number of minor issues remaining with the paper:

1. A picture of a Fujimura set for the introduction would be nice.
2. In the proof of Theorem 1.3 (asymptotic lower bound for DHJ numbers), it is asserted without proof that the circulant matrix with first row 1,2,…,k-1 is nonsingular.  One can prove this claim by computing the Fourier coefficients $\sum_{j=1}^{k-1} j e^{2\pi i j t / (k-1)}$ for all t, but is there a slicker way to see this (e.g. by citing a reference?).
3. Reference [15] (which is Komlos’s lower bound on the Moser numbers) is missing a volume number.  The reference is currently given as
J. Komlos, solution to problem P.170 by Leo Moser, Canad. Math.. Bull. vol  ???  (1972), 312-313, 1970.

Finally, the text probably needs to be proofread one or two more times before it is ready to go, hopefully by early February.  There is still also one last opportunity to propose non-trivial restructuring of the paper (in particular, if there are other ways to trim the size of the paper, this may be worth looking into).

1. In a previous post, I noted John Baez’s thread discussing his incipient article for the Notices of the AMS, entitled “What do mathematicians need to know about blogging?”.  John has now completed an initial draft of his article and is welcoming comments on it here. [Update, Oct 2: the article has now been submitted, incorporating much of the feedback.]
2. In another previous post, I talked about the forthcoming Google Wave platform being developed currently by Google, and its potential usefulness for online mathematical collaborative projects, such as the polymath projects.  My brother, who is one of the developers for this project, has just informed me that there are now a limited number of invites available to others who would like to develop specific Wave extensions or other projects (see for instance his own blog post, aimed at the GNOME community).  As I understand it, the Wave platform is not yet ready for general use, so these invites would be intended for technical developers (or preferably, a group of developers) who would be working on specific projects.  (For instance, I understand that there is already a preliminary extension for encoding LaTeX in a Wave, but it could be developed further.)  If any readers are interested, one can request an invite directly from the Google Wave page, or I can forward requests to my brother.  [At some point, I may ask for help in trying to build a Wave platform for the next generation of Polymath projects, but this will probably not occur for several months yet, due to a large number of other things on my plate (including existing Polymath projects).]

My two-volume book, “Poincaré’s legacies: pages from year two of a mathematical blog“, which was based on the 2008 posts of this blog, has now been published by the American Mathematical Society.

[Update, Aug 3: Actually, only the first volume has been published so far.  The second volume of this book will be available on Aug 10.]

A few months ago, I announced that I was going to convert a significant fraction of my 2007 blog posts into a book format. For various reasons, this conversion took a little longer than I had anticipated, but I have finally completed a draft copy of this book, which I have uploaded here; note that this is a moderately large file (1.5MB 1.3MB 1.1MB), as the book is 374 pages 287 pages 270 pages long. There are still several formatting issues to resolve, but the content has all been converted.

It may be a while before I hear back from the editors at the American Mathematical Society as to the status of the book project, but in the meantime any comments on the book, ranging from typos to suggestions as to the format, are of course welcome.

[Update, April 21: New version uploaded, incorporating contributed corrections. The formatting has been changed for the internet version to significantly reduce the number of pages. As a consequence, note that the page numbering for the internet version of the book will differ substantially from that in the print version.]

[Update, April 21: As some readers may have noticed, I have placed paraphrased versions of some of the blog comments in the book, using the handles given in the blog comments to identify the authors. If any such commenters wish to change one’s handle (e.g. to one’s full name) or to otherwise modify or remove any comments I have placed in the book, you are welcome to contact me by email to do so.]

[Update, April 23: Another new version uploaded, incorporating contributed corrections and shrinking the page size a little further.]

[Update, May 8: A few additional corrections to the book.]

Over two years ago, Emmanuel Candés and I submitted the paper “The Dantzig selector: Statistical estimation when $p$ is much
larger than $n$
” to the Annals of Statistics. This paper, which appeared last year, proposed a new type of selector (which we called the Dantzig selector, due to its reliance on the linear programming methods to which George Dantzig, who had died as we were finishing our paper, had contributed so much to) for statistical estimation, in the case when the number $p$ of unknown parameters is much larger than the number $n$ of observations. More precisely, we considered the problem of obtaining a reasonable estimate $\beta^*$ for an unknown vector $\beta \in {\Bbb R}^p$ of parameters given a vector $y = X \beta + z \in {\Bbb R}^n$ of measurements, where $X$ is a known $n \times p$ predictor matrix and $z$ is a (Gaussian) noise error with some variance $\sigma^2$. We assumed that the predictor matrix X obeyed the restricted isometry property (RIP, also known as UUP), which roughly speaking asserts that $X\beta$ has norm comparable to $\beta$ whenever the vector $\beta$ is sparse. This RIP property is known to hold for various ensembles of random matrices of interest; see my earlier blog post on this topic.

Our selection algorithm, inspired by our previous work on compressed sensing, chooses the estimated parameters $\beta^*$ to have minimal $l^1$ norm amongst all vectors which are consistent with the data in the sense that the residual vector $r := y - X \beta^*$ obeys the condition

$\| X^* r \|_\infty \leq \lambda$, where $\lambda := C \sqrt{\log p} \sigma$ (1)

(one can check that such a condition is obeyed with high probability in the case that $\beta^* = \beta$, thus the true vector of parameters is feasible for this selection algorithm). This selector is similar, though not identical, to the more well-studied lasso selector in the literature, which minimises the $l^1$ norm of $\beta^*$ penalised by the $l^2$ norm of the residual.

A simple model case arises when n=p and X is the identity matrix, thus the observations are given by a simple additive noise model $y_i = \beta_i + z_i$. In this case, the Dantzig selector $\beta^*$ is given by the hard soft thresholding formula

$\beta^*_i = \max(|y_i| - \lambda, 0 ) \hbox{sgn}(y_i).$

The mean square error ${\Bbb E}( \| \beta - \beta^* \|^2 )$ for this selector can be computed to be roughly

$\lambda^2 + \sum_{i=1}^n \min( |y_i|^2, \lambda^2)$ (2)

and one can show that this is basically best possible (except for constants and logarithmic factors) amongst all selectors in this model. More generally, the main result of our paper was that under the assumption that the predictor matrix obeys the RIP, the mean square error of the Dantzig selector is essentially equal to (2) and thus close to best possible.

After accepting our paper, the Annals of Statistics took the (somewhat uncommon) step of soliciting responses to the paper from various experts in the field, and then soliciting a rejoinder to these responses from Emmanuel and I. Recently, the Annals posted these responses and rejoinder on the arXiv:

Recently, I had tentatively announced a forthcoming result with Ben Green establishing the “Gowers inverse conjecture” (or more accurately, the “inverse conjecture for the Gowers uniformity norm”) for vector spaces ${\Bbb F}_p^n$ over a finite field ${\Bbb F}_p$, in the special case when p=2 and when the function $f: {\Bbb F}_p^n \to {\Bbb C}$ for which the inverse conjecture is to be applied is assumed to be a polynomial phase of bounded degree (thus $f= e^{2\pi i P/|{\Bbb F}|}$, where $P: {\Bbb F}_p^n \to {\Bbb F}_p$ is a polynomial of some degree $d=O(1)$). See my FOCS article for some further discussion of this conjecture, which has applications to both polynomiality testing and to various structural decompositions involving the Gowers norm.

This conjecture can be informally stated as follows. By iterating the obvious fact that the derivative of a polynomial of degree at most d is a polynomial of degree at most d-1, we see that a function $P: {\Bbb F}_p^n \to {\Bbb F}_p$ is a polynomial of degree at most d if and only if

$\sum_{\omega_1,\ldots,\omega_{d+1} \in \{0,1\}} (-1)^{\omega_1+\ldots+\omega_{d+1}} P(x +\omega_1 h_1 + \ldots + \omega_{d+1} h_{d+1}) = 0$

for all $x,h_1,\ldots,h_{d+1} \in {\Bbb F}_p^n$. From this one can deduce that a function $f: {\Bbb F}_p^n \to {\Bbb C}$ bounded in magnitude by 1 is a polynomial phase of degree at most d if and only if the Gowers norm

$\|f\|_{U^{d+1}({\Bbb F}_p^n)} := \bigl( {\Bbb E}_{x,h_1,\ldots,h_{d+1} \in {\Bbb F}_p^n} \prod_{\omega_1,\ldots,\omega_{d+1} \in \{0,1\}}$

${\mathcal C}^{\omega_1+\ldots+\omega_{d+1}} f(x + \omega_1 h_1 + \ldots + \omega_{d+1} h_{d+1}) \bigr)^{1/2^{d+1}}$

is equal to its maximal value of 1. The inverse conjecture for the Gowers norm, in its usual formulation, says that, more generally, if a function $f: {\Bbb F}_p^n \to {\Bbb C}$ bounded in magnitude by 1 has large Gowers norm (e.g. $\|f\|_{U^{d+1}} \geq \varepsilon$) then f has some non-trivial correlation with some polynomial phase g (e.g. $\langle f, g \rangle > c(\varepsilon)$ for some $c(\varepsilon) > 0$). Informally, this conjecture asserts that if a function has biased $(d+1)^{th}$ derivatives, then one should be able to “integrate” this bias and conclude that the function is biased relative to a polynomial of degree d. The conjecture has already been proven for $d \leq 2$. There are analogues of this conjecture for cyclic groups which are of relevance to Szemerédi’s theorem and to counting linear patterns in primes, but I will not discuss those here.

At the time of the announcement, our paper had not quite been fully written up. This turned out to be a little unfortunate, because soon afterwards we discovered that our arguments at one point had to go through a version of Newton’s interpolation formula, which involves a factor of d! in the denominator and so is only valid when the characteristic p of the field exceeds the degree. So our arguments in fact are only valid in the range $p > d$, and in particular are rather trivial in the important case $p=2$; my previous announcement should thus be amended accordingly.