You are currently browsing the category archive for the ‘update’ category.

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.

I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces ${S[k] = S[k]({\bf R}^{1+n})}$ are constructed for each frequency scale ${k}$, and then a further space ${S(c) = S(c)({\bf R}^{1+n})}$ is constructed for a given frequency envelope ${c}$ by the formula

$\displaystyle \| \phi \|_{S(c)({\bf R}^{1+n})} := \|\phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]({\bf R}^{1+n})} \ \ \ \ \ (1)$

where ${\phi_k := P_k \phi}$ is the Littlewood-Paley projection of ${\phi}$ to frequency magnitudes ${\sim 2^k}$. Then, given a spacetime slab ${[-T,T] \times {\bf R}^n}$, we define the restrictions

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi \|_{S(c)({\bf R}^{1+n})}: \tilde \phi \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi \}$

where the infimum is taken over all extensions ${\tilde \phi}$ of ${\phi}$ to the Minkowski spacetime ${{\bf R}^{1+n}}$; similarly one defines

$\displaystyle \| \phi_k \|_{S_k([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi_k \|_{S_k({\bf R}^{1+n})}: \tilde \phi_k \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi_k \}.$

The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab ${[-T,T] \times {\bf R}^n}$ to obtain the equality

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} = \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}.$

(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions ${\tilde \phi_k}$ of ${\phi_k}$ that are optimal for computing ${\| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}}$ are not necessarily the Littlewood-Paley projections of the extensions ${\tilde \phi}$ of ${\phi}$ that are optimal for computing ${\| \phi \|_{S(c)([-T,T] \times {\bf R}^n)}}$.

To remedy the problem, one has to prove an upper bound of the form

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} \lesssim \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}$

for all Schwartz ${\phi}$ (actually we need affinely Schwartz ${\phi}$, but one can easily normalise to the Schwartz case). Without loss of generality we may normalise the RHS to be ${1}$. Thus

$\displaystyle \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} \leq 1 \ \ \ \ \ (2)$

and

$\displaystyle \|P_k \phi \|_{S[k]([-T,T] \times {\bf R}^n)} \leq c_k \ \ \ \ \ (3)$

for each ${k}$, and one has to find a single extension ${\tilde \phi}$ of ${\phi}$ such that

$\displaystyle \|\tilde \phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 1 \ \ \ \ \ (4)$

and

$\displaystyle \|P_k \tilde \phi \|_{S[k]({\bf R}^{1+n})} \lesssim c_k \ \ \ \ \ (5)$

for each ${k}$. Achieving a ${\tilde \phi}$ that obeys (4) is trivial (just extend ${\phi}$ by zero), but such extensions do not necessarily obey (5). On the other hand, from (3) we can find extensions ${\tilde \phi_k}$ of ${P_k \phi}$ such that

$\displaystyle \|\tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim c_k; \ \ \ \ \ (6)$

the extension ${\tilde \phi := \sum_k \tilde \phi_k}$ will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the ${S[k]}$ norm does control the ${L^\infty_t L^\infty_x}$ norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients ${c_k}$, while bounded, are not necessarily summable).

This can be fixed as follows. For each ${k}$ we introduce a time cutoff ${\eta_k}$ supported on ${[-T-2^{-k}, T+2^{-k}]}$ that equals ${1}$ on ${[-T-2^{-k-1},T+2^{-k+1}]}$ and obeys the usual derivative estimates in between (the ${j^{th}}$ time derivative of size ${O_j(2^{jk})}$ for each ${j}$). Later we will prove the truncation estimate

$\displaystyle \| \eta_k \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})}. \ \ \ \ \ (7)$

Assuming this estimate, then if we set ${\tilde \phi := \sum_k \eta_k \tilde \phi_k}$, then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that ${\tilde \phi}$ is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the ${k}$ summation and taking limits we can check that ${\tilde \phi}$ is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that

$\displaystyle \| \sum_k \eta_k(t) \tilde \phi_k(t) \|_{L^\infty_x({\bf R}^n)} \lesssim 1$

for all ${t \in {\bf R}}$.

For ${t \in [-T,T]}$ this is immediate from (2). Now suppose that ${t \in [T+2^{k_0-1}, T+2^{k_0}]}$ for some integer ${k_0}$ (the case when ${t \in [-T-2^{k_0}, -T-2^{k_0-1}]}$ is treated similarly). Then we can split

$\displaystyle \sum_k \eta_k(t) \tilde \phi_k(t) = \Phi_1 + \Phi_2 + \Phi_3$

where

$\displaystyle \Phi_1 := \sum_{k < k_0} \tilde \phi_k(T)$

$\displaystyle \Phi_2 := \sum_{k < k_0} \tilde \phi_k(t) - \tilde \phi_k(T)$

$\displaystyle \Phi_3 := \eta_{k_0}(t) \tilde \phi_{k_0}(t).$

The contribution of the ${\Phi_3}$ term is acceptable by (6) and estimate (82) from my paper. The term ${\Phi_1}$ sums to ${P_{ which is acceptable by (2). So it remains to control the ${L^\infty_x}$ norm of ${\Phi_2}$. By the triangle inequality and the fundamental theorem of calculus, we can bound

$\displaystyle \| \Phi_2 \|_{L^\infty_x} \leq (t-T) \sum_{k < k_0} \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})}.$

By hypothesis, ${t-T \leq 2^{-k_0}}$. Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have

$\displaystyle \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 2^k \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim 2^k;$

and then we are done by summing the geometric series in ${k}$.

It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.

Two quick updates with regards to polymath projects.  Firstly, given the poll on starting the mini-polymath4 project, I will start the project at Thu July 12 2012 UTC 22:00.  As usual, the main research thread on this project will be held at the polymath blog, with the discussion thread hosted separately on this blog.

Second, the Polymath7 project, which seeks to establish the “hot spots conjecture” for acute-angled triangles, has made a fair amount of progress so far; for instance, the first part of the conjecture (asserting that the second Neumann eigenfunction of an acute non-equilateral triangle is simple) is now solved, and the second part (asserting that the “hot spots” (i.e. extrema) of that second eigenfunction lie on the boundary of the triangle) has been solved in a number of special cases (such as the isosceles case).  It’s been quite an active discussion in the last week or so, with almost 200 comments across two threads (and a third thread freshly opened up just now).  While the problem is still not completely solved, I feel optimistic that it should fall within the next few weeks (if nothing else, it seems that the problem is now at least amenable to a brute force numerical attack, though personally I would prefer to see a more conceptual solution).

Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown.  (Requires either Java or Flash.)   Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:

After completing each level, an icon appears which one can click on to proceed to the next level.  (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)

The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)

One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)

My graduate text on measure theory (based on these lecture notes) is now published by the AMS as part of the Graduate Studies in Mathematics series.  (See also my own blog page for this book, which among other things contains a draft copy of the book in PDF format.)

A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post.  As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity.  But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions.   As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion.   I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link).  (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.)   I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth $H^1$ initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.

Let me now describe the issue in more detail (and also to explain why I missed it previously).  A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time.  To illustrate this, consider a solution $u$ to the supercritical nonlinear wave equation

$-\partial_{tt} u + \Delta u = u^7$  (1)

for some field $u: {\bf R} \times {\bf R}^3 \to {\bf R}$.   Suppose one already knew that $u$ had some regularity in space, and in particular the $C^0_t C^2_x \cap C^1_t C^1_x$ norm of $u$ was bounded (thus $u$ and up to two spatial derivatives of $u$ were bounded).  Then, by (1), we see that two time derivatives of $u$ were also bounded, and one then gets the additional regularity of $C^2_t C^0_x$.

In a similar vein, suppose one initially knew that $u$ had the regularity $C^0_t C^3_x \cap C^1_t C^2_x$.  Then (1) soon tells us that $u$ also has the regularity $C^2_t C^1_x$; then, if one differentiates (1) in time to obtain

$-\partial_{ttt} u + \Delta \partial_t u = 7 u^6 \partial_t u$

one can conclude that $u$ also has the regularity of $C^3_t C^0_x$.  One can continue this process indefinitely; in particular, if one knew that $u \in C^0_t C^\infty_x \cap C^1_t C^\infty_x$, then these sorts of manipulations show that $u$ is infinitely smooth in both space and time.

The issue that caught me by surprise is that for the Navier-Stokes equations

$\partial_t u + (u \cdot \nabla) u =\Delta u -\nabla p$  (2)

$\nabla \cdot u = 0$

(setting the forcing term $f$ equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space $H^1_x({\bf R}^3)$.  The problem lies with the pressure term $p$, which is recovered from the velocity via the elliptic equation

$\Delta p = -\nabla^2 \cdot (u \otimes u)$ (3)

that can be obtained by taking the divergence of (2).   This equation is solved by a non-local integral operator:

$\displaystyle p(t,x) = \int_{{\bf R}^3} \frac{\nabla^2 \cdot (u \otimes u)(t,y)}{4\pi |x-y|}\ dy.$

If, say, $u$ lies in $H^1_x({\bf R}^3)$, then there is no difficulty establishing a bound on $p$ in terms of $u$ (for instance, one can use singular integral theory and Sobolev embedding to place $p$ in $L^3_x({\bf R}^3)$.  However, one runs into difficulty when trying to compute time derivatives of $p$.  Differentiating (3) once, one gets

$\Delta \partial_t p = -2\nabla^2 \cdot (u \otimes \partial_t u)$.

At the regularity of $H^1$, one can still (barely) control this quantity by using (2) to expand out $\partial_t u$ and using some integration by parts.  But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form

$\Delta \partial_{tt} p = -4\nabla^2 \cdot (\Delta u \otimes \Delta u) + \ldots$

and now there is not enough regularity on $u$ available to get any control on $\partial_{tt} p$, even if one assumes that $u$ is smooth.   Indeed, following this observation, I was able to show that given generic smooth $H^1$ data, the pressure $p$ will instantaneously fail to be $C^2$ in time, and thence (by (2)) the velocity will instantaneously fail to be $C^3$ in time.  (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity $\omega$ will fail to be $C^4$ in time.  Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)

For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade $H^1_x$ regularity to $H^\infty_x$ regularity instantaneously.  It is only the initial time at which some time irregularity can occur.

This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing).  In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild $H^1$ solution) and the data is Schwartz, then the solution will be smooth in spacetime.   (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)

This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth $H^1$ data) is not correct as stated.  But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time.  More precisely, call a solution $u: [0,T] \times {\bf R}^3 \to {\bf R}^3$ and $p: [0,T] \times {\bf R}^3 \to {\bf R}$ almost smooth if

• $u$ and $p$ are smooth on the half-open slab $(0,T] \times {\bf R}^3$; and
• For every $k \geq 0$, $\nabla^k_x u, \nabla^k_x p, \nabla^x_u \partial_t u$ exist and are continuous on the full slab $[0,T] \times {\bf R}^3$.

Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only $C^1_t C^\infty_x$, and the pressure field is only $C^0_t C^\infty_x$.  This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.

(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth $H^1$ setting also.)

One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth $H^1$ or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth.  Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth $H^1$ or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).

The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:

Christian Elsholtz and I have recently finished our joint paper “Counting the number of solutions to the Erdös-Straus equation on unit fractions“, submitted to the Journal of the Australian Mathematical Society. This supercedes my previous paper on the subject, by obtaining stronger and more general results. (The paper is currently in the process of being resubmitted to the arXiv, and should appear at this link within a few days.)

As with the previous paper, the main object of study is the number ${f(n)}$ of solutions to the Diophantine equation

$\displaystyle \frac{4}{n} = \frac{1}{x} + \frac{1}{y} + \frac{1}{z} \ \ \ \ \ (1)$

with ${x,y,z}$ positive integers. The Erdös-Straus conjecture asserts that ${f(n)>0}$ for all ${n>1}$. Since ${f(nm) \geq f(n)}$ for all positive integers ${n,m}$, it suffices to show that ${f(p)>0}$ for all primes ${p}$.

We single out two special types of solutions: Type I solutions, in which ${x}$ is divisible by ${n}$ and ${y,z}$ are coprime to ${n}$, and Type II solutions, in which ${x}$ is coprime to ${n}$ and ${y,z}$ are divisible by ${n}$. Let ${f_I(n), f_{II}(n)}$ denote the number of Type I and Type II solutions respectively. For any ${n}$, one has

$\displaystyle f(n) \geq 3 f_I(n) + 3 f_{II}(n),$

with equality when ${n}$ is an odd primes ${p}$. Thus, to prove the Erdös-Strauss conjecture, it suffices to show that at least one of ${f_I(p)}$, ${f_{II}(p)}$ is positive whenever ${p}$ is an odd prime.

Our first main results are the asymptotics

$\displaystyle N \log^3 N \ll \sum_{n \leq N} f_I(n) \ll N \log^3 N$

$\displaystyle N \log^3 N \ll \sum_{n \leq N} f_{II}(n) \ll N \log^3 N$

$\displaystyle N \log^2 N \ll \sum_{p \leq N} f_I(p) \ll N \log^2 N \log\log N$

$\displaystyle N \log^2 N \ll \sum_{p \leq N} f_{II}(p) \ll N \log^2 N.$

This improves upon the results in the previous paper, which only established

$\displaystyle N \log^2 N \ll \sum_{p \leq N} f_I(p) \ll N \exp(O( \frac{\log x}{\log\log x} ))$

and

$\displaystyle N \log^2 N \ll \sum_{p \leq N} f_{II}(p) \ll N \log^2 N \log \log N.$

The double logarithmic factor in the upper bound for ${\sum_{p \leq N} f_I(p)}$ is artificial (arising from the inefficiency in the Brun-Titchmarsh inequality on very short progressions) but we do not know how to remove it.

The methods are similar to those in the previous paper (which were also independently discovered in unpublished work of Elsholtz and Heath-Brown), but with the additional input of the Erdös divisor bound on expressions of the form ${\sum_{n \leq N} \tau(P(n))}$ for polynomials ${P}$, discussed in this recent blog post. (Actually, we need to tighten Erdös’ bound somewhat, to obtain some uniformity in the bounds even as the coefficients of ${P}$ become large, but this turns out to be achievable by going through the original arguments of Erdös more carefully.)

We also note an observation of Heath-Brown, that in our notation gives the lower bound

$\displaystyle N \log^6 N \ll \sum_{n \leq N} f(n);$

thus, we see that for typical ${n}$, that most solutions to the Erdös-Straus equation are not of Type I or Type II, in contrast to the case when ${n}$ is prime.

We also have a number other new results. We find a way to systematically unify all the previously known parameterisations of solutions to the Erdös-Straus equation, by lifting the Cayley-type surface ${\{ (x,y,z): \frac{4}{n} = \frac{1}{x} + \frac{1}{y} + \frac{1}{z} \}}$ to a certain three-dimensional variety in six-dimensional affine space, in such a way that integer points in the former arise from integer points in the latter. Each of the previously known characterisations of solutions then corresponds to a different choice of coordinates on this variety. (This point of view was also adopted in a paper of Heath-Brown, who interprets this lifted variety as the universal torsor of the Cayley surface.) By optimising between these parameterisations and exploiting the divisor bound, we obtain some bounds on the worst-case behaviour of ${f_I(n)}$ and ${f_{II}(n)}$, namely

$\displaystyle f_I(n) \ll n^{3/5 + O(1/\log \log n)}$

and

$\displaystyle f_{II}(n) \ll n^{2/5 + O(1/\log \log n)},$

which should be compared to a recent previous bound ${f(n) \ll n^{2/3 + O(1/\log \log n)}}$ of Browning and Elsholtz. In the other direction, we show that ${f(n) \gg n^{(3+o(1))/\log\log n}}$ for infinitely many ${n}$, and ${f(p) \gg \log^{\frac{\log 3}{2}-o(1)} p}$ for almost all primes ${p}$. Here, the main tools are some bounds for the representation of a rational as a sum of two unit fractions in the above-mentioned work of Browning and Elsholtz, and also the Turán-Kubilius inequality.

We also completely classify all the congruence classes that can be solved by polynomials, completing the partial list discussed in the previous post. Specifically, the Erdös-Straus conjecture is true for ${n}$ whenever one of the following congruence-type conditions is satisfied:

1. ${n = -f \mod 4ad}$, where ${a,d,f \in {\bf N}}$ are such that ${f|4a^2 d+1}$.
2. ${n = -f \mod 4ac}$ and ${n = -\frac{c}{a} \mod f}$, where ${a,c,f \in {\bf N}}$ are such that ${(4ac,f)=1}$.
3. ${n = -f \mod 4cd}$ and ${n^2 = -4c^2d \mod f}$, where ${c,d,f \in {\bf N}}$ are such that ${(4cd,f)=1}$.
4. ${n = -\frac{1}{e} \mod 4ab}$ or ${n = -e \mod 4ab}$, where ${a,b,e \in {\bf N}}$ are such that ${e|a+b}$ and ${(e,4ab)=1}$.
5. ${n = -4a^2d \mod f}$, where ${a,d,f \in {\bf N}}$ are such that ${4ad|f+1}$.
6. ${n = -4a^2d-e \mod 4ade}$, where ${a,d,e \in {\bf N}}$ are such that ${(4ad,e)=1}$.

In principle, this suggests a way to extend the existing verification of the Erdös-Straus conjecture beyond the current range of ${10^{14}}$ by collecting all congruences to small moduli (e.g. up to ${10^6}$), and then using this to sieve out the primes up to a given size.

Finally, we begin a study of the more general equation

$\displaystyle \frac{m}{n} = \frac{1}{n_1}+\ldots+\frac{1}{n_k} \ \ \ \ \ (2)$

where ${m > k \geq 3}$ are fixed. We can obtain a partial analogue of our main bounds for the ${m=4,k=3}$ case, namely that

$\displaystyle \sum_{n \leq N} f_{m,k,II}(n) \gg N \log^{2^{k-1}-1} N$

and

$\displaystyle \sum_{p \leq N} f_{m,k,II}(p) \gg N \log^{2^{k-1}-2} N / \log\log N$

were ${f_{m,k,II}(n)}$ denotes the number of solutions to (2) which are of “Type II” in the sense that ${n_2,\ldots,n_k}$ are all divisible by ${n}$. However, we do not believe our bounds to be sharp in the large ${k}$ regime, though it does show that the expected number of solutions to (2) should grow rapidly in ${k}$.

[This is a  (lightly edited) repost of an old blog post of mine, which had attracted over 400 comments, and as such was becoming difficult to load; I request that people wishing to comment on that puzzle use this fresh post instead.  -T]

This  is one of my favorite logic puzzles, because of the presence of two highly plausible, but contradictory, solutions to the puzzle.  Resolving this apparent contradiction requires very clear thinking about the nature of knowledge; but I won’t spoil the resolution here, and will simply describe the logic puzzle and its two putative solutions.  (Readers, though, are welcome to discuss solutions in the comments.)

— The logic puzzle —

There is an island upon which a tribe resides. The tribe consists of 1000 people, with various eye colours. Yet, their religion forbids them to know their own eye color, or even to discuss the topic; thus, each resident can (and does) see the eye colors of all other residents, but has no way of discovering his or her own (there are no reflective surfaces). If a tribesperson does discover his or her own eye color, then their religion compels them to commit ritual suicide at noon the following day in the village square for all to witness. All the tribespeople are highly logical and devout, and they all know that each other is also highly logical and devout (and they all know that they all know that each other is highly logical and devout, and so forth).

Of the 1000 islanders, it turns out that 100 of them have blue eyes and 900 of them have brown eyes, although the islanders are not initially aware of these statistics (each of them can of course only see 999 of the 1000 tribespeople).

One day, a blue-eyed foreigner visits to the island and wins the complete trust of the tribe.

One evening, he addresses the entire tribe to thank them for their hospitality.

However, not knowing the customs, the foreigner makes the mistake of mentioning eye color in his address, remarking “how unusual it is to see another blue-eyed person like myself in this region of the world”.

What effect, if anything, does this faux pas have on the tribe?

Note 1:  For the purposes of this logic puzzle, “highly logical” means that any conclusion that can logically deduced from the information and observations available to an islander, will automatically be known to that islander.

Note 2: Bear in mind that this is a logic puzzle, rather than a description of a real-world scenario.  The puzzle is not to determine whether the scenario is plausible (indeed, it is extremely implausible) or whether one can find a legalistic loophole in the wording of the scenario that allows for some sort of degenerate solution; instead, the puzzle is to determine (holding to the spirit of the puzzle, and not just to the letter) which of the solutions given below (if any) are correct, and if one solution is valid, to correctly explain why the other solution is invalid.  (One could also resolve the logic puzzle by showing that the assumptions of the puzzle are logically inconsistent or not well-defined.  However, merely demonstrating that the assumptions of the puzzle are highly unlikely, as opposed to logically impossible to satisfy, is not sufficient to resolve the puzzle.)

Note 3: An essentially equivalent version of the logic puzzle is also given at the xkcd web site.  Many other versions of this puzzle can be found in many places; I myself heard of the puzzle as a child, though I don’t recall the precise source.

Below the fold are the two putative solutions to the logic puzzle.  If you have not seen the puzzle before, I recommend you try to solve it first before reading either solution.

Last year, Emmanuel Breuillard, Ben Green, Bob Guralnick, and I wrote a paper entitled “Strongly dense free subgroups of semisimple Lie groups“. The main theorem in that paper asserted that given any semisimple algebraic group ${G(k)}$ over an uncountable algebraically closed field ${k}$, there existed a free subgroup ${\Gamma}$ which was strongly dense in the sense that any non-abelian subgroup of ${\Gamma}$ was Zariski dense in ${G(k)}$. This type of result is useful for establishing expansion in finite simple groups of Lie type, as we will discuss in a subsequent paper.

An essentially equivalent formulation of the main result is that if ${w_1, w_2 \in F_2}$ are two non-commuting elements of the free group ${F_2}$ on two generators, and ${(a, b)}$ is a generic pair of elements in ${G(k) \times G(k)}$, then ${w_1(a,b)}$ and ${w_2(a,b)}$ are not contained in any proper closed algebraic subgroup ${H}$ of ${G(k)}$. Here, “generic” means “outside of at most countably many proper subvarieties”. In most cases, one expects that if ${(a, b)}$ are generically drawn from ${G(k) \times G(k)}$, then ${(w_1(a,b), w_2(a,b))}$ will also be generically drawn from ${G(k) \times G(k)}$, but this is not always the case, which is a key source of difficulty in the paper. For instance, if ${w_2}$ is conjugate to ${w_1}$ in ${F_2}$, then ${w_1(a,b)}$ and ${w_2(a,b)}$ must be conjugate in ${G(k)}$ and so the pair ${(w_1(a,b), w_2(a,b))}$ lie in a proper subvariety of ${G(k) \times G(k)}$. It is currently an open question to determine all the pairs ${w_1, w_2}$ of words for which ${(w_1(a,b), w_2(a,b))}$ is not generic for generic ${a,b}$ (or equivalently, the double word map ${(a,b) \mapsto (w_1(a,b),w_2(a,b))}$ is not dominant).

The main strategy of proof was as follows. It is not difficult to reduce to the case when ${G}$ is simple. Suppose for contradiction that we could find two non-commuting words ${w_1, w_2}$ such that ${w_1(a,b), w_2(a,b)}$ were generically trapped in a proper closed algebraic subgroup. As it turns out, there are only finitely many conjugacy classes of such groups, and so one can assume that ${w_1(a,b), w_2(a,b)}$ were generically trapped in a conjugate ${H^g}$ of a fixed proper closed algebraic subgroup ${H}$. One can show that ${w_1(a,b)}$, ${w_2(a,b)}$, and ${[w_1(a,b),w_2(a,b)]}$ are generically regular semisimple, which implies that ${H}$ is a maximal rank semisimple subgroup. The key step was then to find another proper semisimple subgroup ${H'}$ of ${G}$ which was not a degeneration of ${H}$, by which we mean that there did not exist a pair ${(x,y)}$ in the Zariski closure ${\overline{\bigcup_{g \in G} H^g \times H^g}}$ of the products of conjugates of ${H}$, such that ${x, y}$ generated a Zariski-dense subgroup of ${H'}$. This is enough to establish the theorem, because we could use an induction hypothesis to find ${a,b}$ in ${H'}$ (and hence in ${G(k)}$ such that ${w_1(a,b), w_2(a,b)}$ generated a Zariski-dense subgroup of ${H'}$, which contradicts the hypothesis that ${(w_1(a,b),w_2(a,b))}$ was trapped in ${\bigcup_{g \in G} H^g \times H^g}$ for generic ${(a,b)}$ (and hence in ${\overline{\bigcup_{g \in G} H^g \times H^g}}$ for all ${(a,b)}$.

To illustrate the concept of a degeneration, take ${G(k) = SO(5)}$ and let ${H = SO(3) \times SO(2)}$ be the stabiliser of a non-degenerate ${2}$-space in ${k^5}$. All other stabilisers of non-degenerate ${2}$-spaces are conjugate to ${H}$. However, stabilisers of degenerate ${2}$-spaces are not conjugate to ${H}$, but are still degenerations of ${H}$. For instance, the stabiliser of a totally singular ${2}$-space (which is isomorphic to the affine group on ${k^2}$, extended by ${k}$) is a degeneration of ${H}$.

A significant portion of the paper was then devoted to verifying that for each simple algebraic group ${G}$, and each maximal rank proper semisimple subgroup ${H}$ of ${G}$, one could find another proper semisimple subgroup ${H'}$ which was not a degeneration of ${H}$; roughly speaking, this means that ${H'}$ is so “different” from ${H}$ that no conjugate of ${H}$ can come close to covering ${H'}$. This required using the standard classification of algebraic groups via Dynkin diagrams, and knowledge of the various semisimple subgroups of these groups and their representations (as we used the latter as obstructions to degeneration, for instance one can show that a reducible representation cannot degenerate to an irreducible one).

During the refereeing process for this paper, we discovered that there was precisely one family of simple algebraic groups for which this strategy did not actually work, namely the group ${G = Sp(4) = Spin(5)}$ (or the group ${SO(5)}$ that is double-covered by this group) in characteristic ${3}$. This group (which has Dynkin diagram ${B_2=C_2}$, as discussed in this previous post) has one maximal rank proper semisimple subgroup up to conjugacy, namely ${SO(4)}$, which is the stabiliser of a line in ${k^5}$. To find a proper semisimple group ${H'}$ that is not a degeneration of this group, we basically need to find a subgroup ${H'}$ that does not stabilise any line in ${k^5}$. In characteristic larger than three (or characteristic zero), one can proceed by using the action of ${SL_2(k)}$ on the five-dimensional space ${\hbox{Sym}^4(k^2)}$ of homogeneous degree four polynomials on ${k^2}$, which preserves a non-degenerate symmetric form (the four-fold tensor power of the area form on ${k^2}$) and thus embeds into ${SO(5)}$; as no polynomial is fixed by all of ${SL_2(k)}$, we see that this copy of ${SL_2(k)}$ is not a degeneration of ${H}$.

Unfortunately, in characteristics two and three, the symmetric form on ${\hbox{Sym}^4(k^2)}$ degenerates, and this embedding is lost. In the characteristic two case, one can proceed by using the characteristic ${2}$ fact that ${SO(5)}$ is isomorphic to ${Sp(4)}$ (because in characteristic two, the space of null vectors is a hyperplane, and the symmetric form becomes symplectic on this hyperplane), and thus has an additional maximal rank proper semisimple subgroup ${Sp(2) \times Sp(2)}$ which is not conjugate to the ${SO(4)}$ subgroup. But in characteristic three, it turns out that there are no further semisimple subgroups of ${SO(5)}$ that are not already contained in a conjugate of the ${SO(4)}$. (This is not a difficulty for larger groups such as ${SO(6)}$ or ${SO(7)}$, where there are plenty of other semisimple groups to utilise; it is only this smallish group ${SO(5)}$ that has the misfortune of having exactly one maximal rank proper semisimple group to play with, and not enough other semisimples lying around in characteristic three.)

As a consequence of this issue, our argument does not actually work in the case when the characteristic is three and the semisimple group ${G}$ contains a copy of ${SO(5)}$ (or ${Sp(4)}$), and we have had to modify our paper to delete this case from our results. We believe that such groups still do contain strongly dense free subgroups, but this appears to be just out of reach of our current method.

One thing that this experience has taught me is that algebraic groups behave somewhat pathologically in low characteristic; in particular, intuition coming from the characteristic zero case can become unreliable in characteristic two or three.

The first volume of my 2009 blog book, “An epsilon of room“, has now been published by the AMS, as part of the Graduate Studies in Mathematics series.  (So I finally have a book whose cover is at least partially in yellow, which for some reason seems to be the traditional colour for mathematics texts.) This volume contains the material from my 245B and 245C classes, and can thus be viewed as a second text in graduate real analysis.  (I plan to have one volume of the 2010 blog book to be devoted to the material for the 245A class  I just taught, and would thus serve as a first text in graduate real analysis to complement this volume.)

The second volume, which covers a wide range of other topics, should also be published in the near future.

One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion.  For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let $n$ be a natural number such that …”), which may or may not apply to the particular value of $n$ one may have in mind.  Or, if one ever argues by dividing into separate cases (e.g. “Case 1: $n$ is even. … Case 2: $n$ is odd.  …”), then for any given $n$, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives.     But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that $\sqrt{2}$ is equal to the ratio $p/q$ of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives.  This can lead to confused exchanges such as the following:

Lecturer: “Theorem.  Let $p$ be a prime number.  Then…”

Student: “But how do you know that $p$ is a prime number?  Couldn’t it be composite?”

or

Lecturer: “Now we see what the function $f$ does when we give it the input of $x+dx$ instead.  …”

Student: “But didn’t you just say that the input was equal to $x$ just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics.  For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history.  Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result.  The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing.  Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning.  For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it.  And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses.   Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics.  Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set ${\bf R}$ of real numbers is uncountable.  This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics.  (The only other two examples I know of that come close to doing this are the fact that the real number $0.999\ldots$ is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist.  Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction.  Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses.  This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter.  This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number). There does not exist a natural number $N$ that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation.  But it has a logically equivalent “any object can be defeated” form:

Proposition 1′. Given any natural number $N$, one can find another natural number $N'$ which is larger than $N$.

Proof. Take $N' := N+1$. $\Box$

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take $N = 100$, so that the proof gives $N'=101$ ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number).  So there is a sense in which Proposition 1′ is more “non-counterfactual” or  “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3. There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′.  Let $p_1,\ldots,p_n$ be a collection of primes.   Then there exists a prime $q$ which is distinct from any of the primes $p_1,\ldots,p_n$.

Proof. Take $q$ to be any prime factor of $p_1 \ldots p_n + 1$ (for instance, one could take the smallest prime factor, if one wished to be completely concrete).   Since $p_1 \ldots p_n + 1$ is not divisible by any of the primes $p_1,\ldots,p_n$, $q$ must be distinct from all of these primes.  $\Box$

One could argue that  there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that $q$ was equal to one of the $p_1,\ldots,p_n$), but the proposition itself is not inherently counterfactual, as  it does not make as patently impossible a hypothesis as a finite enumeration of the primes.  Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today.  Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say $2, 3, 5$, and it will actually exhibit a prime not in that list (in this case, $31$).  The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”.  For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]