You are currently browsing the category archive for the ‘paper’ category.

Daniel Kane and I have just uploaded to the arXiv our paper “A bound on partitioning clusters“, submitted to the Electronic Journal of Combinatorics. In this short and elementary paper, we consider a question that arose from biomathematical applications: given a finite family ${X}$ of sets (or “clusters”), how many ways can there be of partitioning a set ${A \in X}$ in this family as the disjoint union ${A = A_1 \uplus A_2}$ of two other sets ${A_1, A_2}$ in this family? That is to say, what is the best upper bound one can place on the quantity

$\displaystyle | \{ (A,A_1,A_2) \in X^3: A = A_1 \uplus A_2 \}|$

in terms of the cardinality ${|X|}$ of ${X}$? A trivial upper bound would be ${|X|^2}$, since this is the number of possible pairs ${(A_1,A_2)}$, and ${A_1,A_2}$ clearly determine ${A}$. In our paper, we establish the improved bound

$\displaystyle | \{ (A,A_1,A_2) \in X^3: A = A_1 \uplus A_2 \}| \leq |X|^{3/p}$

where ${p}$ is the somewhat strange exponent

$\displaystyle p := \log_3 \frac{27}{4} = 1.73814\dots, \ \ \ \ \ (1)$

so that ${3/p = 1.72598\dots}$. Furthermore, this exponent is best possible!

Actually, the latter claim is quite easy to show: one takes ${X}$ to be all the subsets of ${\{1,\dots,n\}}$ of cardinality either ${n/3}$ or ${2n/3}$, for ${n}$ a multiple of ${3}$, and the claim follows readily from Stirling’s formula. So it is perhaps the former claim that is more interesting (since many combinatorial proof techniques, such as those based on inequalities such as the Cauchy-Schwarz inequality, tend to produce exponents that are rational or at least algebraic). We follow the common, though unintuitive, trick of generalising a problem to make it simpler. Firstly, one generalises the bound to the “trilinear” bound

$\displaystyle | \{ (A_1,A_2,A_3) \in X_1 \times X_2 \times X_3: A_3 = A_1 \uplus A_2 \}|$

$\displaystyle \leq |X_1|^{1/p} |X_2|^{1/p} |X_3|^{1/p}$

for arbitrary finite collections ${X_1,X_2,X_3}$ of sets. One can place all the sets in ${X_1,X_2,X_3}$ inside a single finite set such as ${\{1,\dots,n\}}$, and then by replacing every set ${A_3}$ in ${X_3}$ by its complement in ${\{1,\dots,n\}}$, one can phrase the inequality in the equivalent form

$\displaystyle | \{ (A_1,A_2,A_3) \in X_1 \times X_2 \times X_3: \{1,\dots,n\} =A_1 \uplus A_2 \uplus A_3 \}|$

$\displaystyle \leq |X_1|^{1/p} |X_2|^{1/p} |X_3|^{1/p}$

for arbitrary collections ${X_1,X_2,X_3}$ of subsets of ${\{1,\dots,n\}}$. We generalise further by turning sets into functions, replacing the estimate with the slightly stronger convolution estimate

$\displaystyle f_1 * f_2 * f_3 (1,\dots,1) \leq \|f_1\|_{\ell^p(\{0,1\}^n)} \|f_2\|_{\ell^p(\{0,1\}^n)} \|f_3\|_{\ell^p(\{0,1\}^n)}$

for arbitrary functions ${f_1,f_2,f_3}$ on the Hamming cube ${\{0,1\}^n}$, where the convolution is on the integer lattice ${\bf Z}^n$ rather than on the finite field vector space ${\bf F}_2^n$. The advantage of working in this general setting is that it becomes very easy to apply induction on the dimension ${n}$; indeed, to prove this estimate for arbitrary ${n}$ it suffices to do so for ${n=1}$. This reduces matters to establishing the elementary inequality

$\displaystyle (ab(1-c))^{1/p} + (bc(1-a))^{1/p} + (ca(1-b))^{1/p} \leq 1$

for all ${0 \leq a,b,c \leq 1}$, which can be done by a combination of undergraduate multivariable calculus and a little bit of numerical computation. (The left-hand side turns out to have local maxima at ${(1,1,0), (1,0,1), (0,1,1), (2/3,2/3,2/3)}$, with the latter being the cause of the numerology (1).)

The same sort of argument also gives an energy bound

$\displaystyle E(A,A) \leq |A|^{\log_2 6}$

for any subset ${A \subset \{0,1\}^n}$ of the Hamming cube, where

$\displaystyle E(A,A) := |\{(a_1,a_2,a_3,a_4) \in A^4: a_1+a_2 = a_3 + a_4 \}|$

is the additive energy of ${A}$. The example ${A = \{0,1\}^n}$ shows that the exponent ${\log_2 6}$ cannot be improved.

I’ve just uploaded to the arXiv my paper Finite time blowup for a supercritical defocusing nonlinear Schrödinger system, submitted to Analysis and PDE. This paper is an analogue of a recent paper of mine in which I constructed a supercritical defocusing nonlinear wave (NLW) system ${-\partial_{tt} u + \Delta u = (\nabla F)(u)}$ which exhibited smooth solutions that developed singularities in finite time. Here, we achieve essentially the same conclusion for the (inhomogeneous) supercritical defocusing nonlinear Schrödinger (NLS) equation

$\displaystyle i \partial_t u + \Delta u = (\nabla F)(u) + G \ \ \ \ \ (1)$

where ${u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m}$ is now a system of scalar fields, ${F: {\bf C}^m \rightarrow {\bf R}}$ is a potential which is strictly positive and homogeneous of degree ${p+1}$ (and invariant under phase rotations ${u \mapsto e^{i\theta} u}$), and ${G: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m}$ is a smooth compactly supported forcing term, needed for technical reasons.

To oversimplify somewhat, the equation (1) is known to be globally regular in the energy-subcritical case when ${d \leq 2}$, or when ${d \geq 3}$ and ${p < 1+\frac{4}{d-2}}$; global regularity is also known (but is significantly more difficult to establish) in the energy-critical case when ${d \geq 3}$ and ${p = 1 +\frac{4}{d-2}}$. (This is an oversimplification for a number of reasons, in particular in higher dimensions one only knows global well-posedness instead of global regularity. See this previous post for some exploration of this issue in the context of nonlinear wave equations.) The main result of this paper is to show that global regularity can break down in the remaining energy-supercritical case when ${d \geq 3}$ and ${p > 1 + \frac{4}{d-2}}$, at least when the target dimension ${m}$ is allowed to be sufficiently large depending on the spatial dimension ${d}$ (I did not try to achieve the optimal value of ${m}$ here, but the argument gives a value of ${m}$ that grows quadratically in ${d}$). Unfortunately, this result does not directly impact the most interesting case of the defocusing scalar NLS equation

$\displaystyle i \partial_t u + \Delta u = |u|^{p-1} u \ \ \ \ \ (2)$

in which ${m=1}$; however it does establish a rigorous barrier to any attempt to prove global regularity for the scalar NLS equation, in that such an attempt needs to crucially use some property of the scalar NLS that is not shared by the more general systems in (1). For instance, any approach that is primarily based on the conservation laws of mass, momentum, and energy (which are common to both (1) and (2)) will not be sufficient to establish global regularity of supercritical defocusing scalar NLS.

The method of proof in this paper is broadly similar to that in the previous paper for NLW, but with a number of additional technical complications. Both proofs begin by reducing matters to constructing a discretely self-similar solution. In the case of NLW, this solution lived on a forward light cone ${\{ (t,x): |x| \leq t \}}$ and obeyed a self-similarity

$\displaystyle u(2t, 2x) = 2^{-\frac{2}{p-1}} u(t,x).$

The ability to restrict to a light cone arose from the finite speed of propagation properties of NLW. For NLS, the solution will instead live on the domain

$\displaystyle H_d := ([0,+\infty) \times {\bf R}^d) \backslash \{(0,0)\}$

and obey a parabolic self-similarity

$\displaystyle u(4t, 2x) = 2^{-\frac{2}{p-1}} u(t,x)$

and solve the homogeneous version ${G=0}$ of (1). (The inhomogeneity ${G}$ emerges when one truncates the self-similar solution so that the initial data is compactly supported in space.) A key technical point is that ${u}$ has to be smooth everywhere in ${H_d}$, including the boundary component ${\{ (0,x): x \in {\bf R}^d \backslash \{0\}\}}$. This unfortunately rules out many of the existing constructions of self-similar solutions, which typically will have some sort of singularity at the spatial origin.

The remaining steps of the argument can broadly be described as quantifier elimination: one systematically eliminates each of the degrees of freedom of the problem in turn by locating the necessary and sufficient conditions required of the remaining degrees of freedom in order for the constraints of a particular degree of freedom to be satisfiable. The first such degree of freedom to eliminate is the potential function ${F}$. The task here is to determine what constraints must exist on a putative solution ${u}$ in order for there to exist a (positive, homogeneous, smooth away from origin) potential ${F}$ obeying the homogeneous NLS equation

$\displaystyle i \partial_t u + \Delta u = (\nabla F)(u).$

Firstly, the requirement that ${F}$ be homogeneous implies the Euler identity

$\displaystyle \langle (\nabla F)(u), u \rangle = (p+1) F(u)$

(where ${\langle,\rangle}$ denotes the standard real inner product on ${{\bf C}^m}$), while the requirement that ${F}$ be phase invariant similarly yields the variant identity

$\displaystyle \langle (\nabla F)(u), iu \rangle = 0,$

so if one defines the potential energy field to be ${V = F(u)}$, we obtain from the chain rule the equations

$\displaystyle \langle i \partial_t u + \Delta u, u \rangle = (p+1) V$

$\displaystyle \langle i \partial_t u + \Delta u, iu \rangle = 0$

$\displaystyle \langle i \partial_t u + \Delta u, \partial_t u \rangle = \partial_t V$

$\displaystyle \langle i \partial_t u + \Delta u, \partial_{x_j} u \rangle = \partial_{x_j} V.$

Conversely, it turns out (roughly speaking) that if one can locate fields ${u}$ and ${V}$ obeying the above equations (as well as some other technical regularity and non-degeneracy conditions), then one can find an ${F}$ with all the required properties. The first of these equations can be thought of as a definition of the potential energy field ${V}$, and the other three equations are basically disguised versions of the conservation laws of mass, energy, and momentum respectively. The construction of ${F}$ relies on a classical extension theorem of Seeley that is a relative of the Whitney extension theorem.

Now that the potential ${F}$ is eliminated, the next degree of freedom to eliminate is the solution field ${u}$. One can observe that the above equations involving ${u}$ and ${V}$ can be expressed instead in terms of ${V}$ and the Gram-type matrix ${G[u,u]}$ of ${u}$, which is a ${(2d+4) \times (2d+4)}$ matrix consisting of the inner products ${\langle D_1 u, D_2 u \rangle}$ where ${D_1,D_2}$ range amongst the ${2d+4}$ differential operators

$\displaystyle D_1,D_2 \in \{ 1, i, \partial_t, i\partial_t, \partial_{x_1},\dots,\partial_{x_d}, i\partial_{x_1}, \dots, i\partial_{x_d}\}.$

To eliminate ${u}$, one thus needs to answer the question of what properties are required of a ${(2d+4) \times (2d+4)}$ matrix ${G}$ for it to be the Gram-type matrix ${G = G[u,u]}$ of a field ${u}$. Amongst some obvious necessary conditions are that ${G}$ needs to be symmetric and positive semi-definite; there are also additional constraints coming from identities such as

$\displaystyle \partial_t \langle u, u \rangle = 2 \langle u, \partial_t u \rangle$

$\displaystyle \langle i u, \partial_t u \rangle = - \langle u, i \partial_t u \rangle$

and

$\displaystyle \partial_{x_j} \langle iu, \partial_{x_k} u \rangle - \partial_{x_k} \langle iu, \partial_{x_j} u \rangle = 2 \langle i \partial_{x_j} u, \partial_{x_k} u \rangle.$

Ideally one would like a theorem that asserts (for ${m}$ large enough) that as long as ${G}$ obeys all of the “obvious” constraints, then there exists a suitably non-degenerate map ${u}$ such that ${G = G[u,u]}$. In the case of NLW, the analogous claim was basically a consequence of the Nash embedding theorem (which can be viewed as a theorem about the solvability of the system of equations ${\langle \partial_{x_j} u, \partial_{x_k} u \rangle = g_{jk}}$ for a given positive definite symmetric set of fields ${g_{jk}}$). However, the presence of the complex structure in the NLS case poses some significant technical challenges (note for instance that the naive complex version of the Nash embedding theorem is false, due to obstructions such as Liouville’s theorem that prevent a compact complex manifold from being embeddable holomorphically in ${{\bf C}^m}$). Nevertheless, by adapting the proof of the Nash embedding theorem (in particular, the simplified proof of Gunther that avoids the need to use the Nash-Moser iteration scheme) we were able to obtain a partial complex analogue of the Nash embedding theorem that sufficed for our application; it required an artificial additional “curl-free” hypothesis on the Gram-type matrix ${G[u,u]}$, but fortunately this hypothesis ends up being automatic in our construction. Also, this version of the Nash embedding theorem is unable to prescribe the component ${\langle \partial_t u, \partial_t u \rangle}$ of the Gram-type matrix ${G[u,u]}$, but fortunately this component is not used in any of the conservation laws and so the loss of this component does not cause any difficulty.

After applying the above-mentioned Nash-embedding theorem, the task is now to locate a matrix ${G}$ obeying all the hypotheses of that theorem, as well as the conservation laws for mass, momentum, and energy (after defining the potential energy field ${V}$ in terms of ${G}$). This is quite a lot of fields and constraints, but one can cut down significantly on the degrees of freedom by requiring that ${G}$ is spherically symmetric (in a tensorial sense) and also continuously self-similar (not just discretely self-similar). Note that this hypothesis is weaker than the assertion that the original field ${u}$ is spherically symmetric and continuously self-similar; indeed we do not know if non-trivial solutions of this type actually exist. These symmetry hypotheses reduce the number of independent components of the ${(2d+4) \times (2d+4)}$ matrix ${G}$ to just six: ${g_{1,1}, g_{1,i\partial_t}, g_{1,i\partial_r}, g_{\partial_r, \partial_r}, g_{\partial_\omega, \partial_\omega}, g_{\partial_r, \partial_t}}$, which now take as their domain the ${1+1}$-dimensional space

$\displaystyle H_1 := ([0,+\infty) \times {\bf R}) \backslash \{(0,0)\}.$

One now has to construct these six fields, together with a potential energy field ${v}$, that obey a number of constraints, notably some positive definiteness constraints as well as the aforementioned conservation laws for mass, momentum, and energy.

The field ${g_{1,i\partial_t}}$ only arises in the equation for the potential ${v}$ (coming from Euler’s identity) and can easily be eliminated. Similarly, the field ${g_{\partial_r,\partial_t}}$ only makes an appearance in the current of the energy conservation law, and so can also be easily eliminated so long as the total energy is conserved. But in the energy-supercritical case, the total energy is infinite, and so it is relatively easy to eliminate the field ${g_{\partial_r, \partial_t}}$ from the problem also. This leaves us with the task of constructing just five fields ${g_{1,1}, g_{1,i\partial_r}, g_{\partial_r,\partial_r}, g_{\partial_\omega,\partial_\omega}, v}$ obeying a number of positivity conditions, symmetry conditions, regularity conditions, and conservation laws for mass and momentum.

The potential field ${v}$ can effectively be absorbed into the angular stress field ${g_{\partial_\omega,\partial_\omega}}$ (after placing an appropriate counterbalancing term in the radial stress field ${g_{\partial_r, \partial_r}}$ so as not to disrupt the conservation laws), so we can also eliminate this field. The angular stress field ${g_{\partial_\omega, \partial_\omega}}$ is then only constrained through the momentum conservation law and a requirement of positivity; one can then eliminate this field by converting the momentum conservation law from an equality to an inequality. Finally, the radial stress field ${g_{\partial_r, \partial_r}}$ is also only constrained through a positive definiteness constraint and the momentum conservation inequality, so it can also be eliminated from the problem after some further modification of the momentum conservation inequality.

The task then reduces to locating just two fields ${g_{1,1}, g_{1,i\partial_r}}$ that obey a mass conservation law

$\displaystyle \partial_t g_{1,1} = 2 \left(\partial_r + \frac{d-1}{r} \right) g_{1,i\partial r}$

together with an additional inequality that is the remnant of the momentum conservation law. One can solve for the mass conservation law in terms of a single scalar field ${W}$ using the ansatz

$\displaystyle g_{1,1} = 2 r^{1-d} \partial_r (r^d W)$

$\displaystyle g_{1,i\partial_r} = r^{1-d} \partial_t (r^d W)$

so the problem has finally been simplified to the task of locating a single scalar field ${W}$ with some scaling and homogeneity properties that obeys a certain differential inequality relating to momentum conservation. This turns out to be possible by explicitly writing down a specific scalar field ${W}$ using some asymptotic parameters and cutoff functions.

I’ve just uploaded to the arXiv my paper “An integration approach to the Toeplitz square peg problem“, submitted to Forum of Mathematics, Sigma. This paper resulted from my attempts recently to solve the Toeplitz square peg problem (also known as the inscribed square problem):

Conjecture 1 (Toeplitz square peg problem) Let ${\gamma}$ be a simple closed curve in the plane. Is it necessarily the case that ${\gamma}$ contains four vertices of a square?

See this recent survey of Matschke in the Notices of the AMS for the latest results on this problem.

The route I took to the results in this paper was somewhat convoluted. I was motivated to look at this problem after lecturing recently on the Jordan curve theorem in my class. The problem is superficially similar to the Jordan curve theorem in that the result is known (and rather easy to prove) if ${\gamma}$ is sufficiently regular (e.g. if it is a polygonal path), but seems to be significantly more difficult when the curve is merely assumed to be continuous. Roughly speaking, all the known positive results on the problem have proceeded using (in some form or another) tools from homology: note for instance that one can view the conjecture as asking whether the four-dimensional subset ${\gamma^4}$ of the eight-dimensional space ${({\bf R}^2)^4}$ necessarily intersects the four-dimensional space ${\mathtt{Squares} \subset ({\bf R}^2)^4}$ consisting of the quadruples ${(v_1,v_2,v_3,v_4)}$ traversing a square in (say) anti-clockwise order; this space is a four-dimensional linear subspace of ${({\bf R}^2)^4}$, with a two-dimensional subspace of “degenerate” squares ${(v,v,v,v)}$ removed. If one ignores this degenerate subspace, one can use intersection theory to conclude (under reasonable “transversality” hypotheses) that ${\gamma^4}$ intersects ${\mathtt{Squares}}$ an odd number of times (up to the cyclic symmetries of the square), which is basically how Conjecture 1 is proven in the regular case. Unfortunately, if one then takes a limit and considers what happens when ${\gamma}$ is just a continuous curve, the odd number of squares created by these homological arguments could conceivably all degenerate to points, thus blocking one from proving the conjecture in the general case.

Inspired by my previous work on finite time blowup for various PDEs, I first tried looking for a counterexample in the category of (locally) self-similar curves that are smooth (or piecewise linear) away from a single origin where it can oscillate infinitely often; this is basically the smoothest type of curve that was not already covered by previous results. By a rescaling and compactness argument, it is not difficult to see that such a counterexample would exist if there was a counterexample to the following periodic version of the conjecture:

Conjecture 2 (Periodic square peg problem) Let ${\gamma_1, \gamma_2}$ be two disjoint simple closed piecewise linear curves in the cylinder ${({\bf R}/{\bf Z}) \times {\bf R}}$ which have a winding number of one, that is to say they are homologous to the loop ${x \mapsto (x,0)}$ from ${{\bf R}/{\bf Z}}$ to ${({\bf R}/{\bf Z}) \times {\bf R}}$. Then the union of ${\gamma_1}$ and ${\gamma_2}$ contains the four vertices of a square.

In contrast to Conjecture 1, which is known for polygonal paths, Conjecture 2 is still open even under the hypothesis of polygonal paths; the homological arguments alluded to previously now show that the number of inscribed squares in the periodic setting is even rather than odd, which is not enough to conclude the conjecture. (This flipping of parity from odd to even due to an infinite amount of oscillation is reminiscent of the “Eilenberg-Mazur swindle“, discussed in this previous post.)

I therefore tried to construct counterexamples to Conjecture 2. I began perturbatively, looking at curves ${\gamma_1, \gamma_2}$ that were small perturbations of constant functions. After some initial Taylor expansion, I was blocked from forming such a counterexample because an inspection of the leading Taylor coefficients required one to construct a continuous periodic function of mean zero that never vanished, which of course was impossible by the intermediate value theorem. I kept expanding to higher and higher order to try to evade this obstruction (this, incidentally, was when I discovered this cute application of Lagrange reversion) but no matter how high an accuracy I went (I think I ended up expanding to sixth order in a perturbative parameter ${\varepsilon}$ before figuring out what was going on!), this obstruction kept resurfacing again and again. I eventually figured out that this obstruction was being caused by a “conserved integral of motion” for both Conjecture 2 and Conjecture 1, which can in fact be used to largely rule out perturbative constructions. This yielded a new positive result for both conjectures:

Theorem 3

• (i) Conjecture 1 holds when ${\gamma}$ is the union ${\{ (t,f(t)): t \in [t_0,t_1]\} \cup \{ (t,g(t)): t \in [t_0,t_1]\}}$ of the graphs of two Lipschitz functions ${f,g: [t_0,t_1] \rightarrow {\bf R}}$ of Lipschitz constant less than one that agree at the endpoints.
• (ii) Conjecture 2 holds when ${\gamma_1, \gamma_2}$ are graphs of Lipschitz functions ${f: {\bf R}/{\bf Z} \rightarrow {\bf R}, g: {\bf R}/{\bf Z} \rightarrow {\bf R}}$ of Lipschitz constant less than one.

We sketch the proof of Theorem 3(i) as follows (the proof of Theorem 3(ii) is very similar). Let ${\gamma_1: [t_0, t_1] \rightarrow {\bf R}}$ be the curve ${\gamma_1(t) := (t,f(t))}$, thus ${\gamma_1}$ traverses one of the two graphs that comprise ${\gamma}$. For each time ${t \in [t_0,t_1]}$, there is a unique square with first vertex ${\gamma_1(t)}$ (and the other three vertices, traversed in anticlockwise order, denoted ${\gamma_2(t), \gamma_3(t), \gamma_4(t)}$) such that ${\gamma_2(t)}$ also lies in the graph of ${f}$ and ${\gamma_4(t)}$ also lies in the graph of ${g}$ (actually for technical reasons we have to extend ${f,g}$ by constants to all of ${{\bf R}}$ in order for this claim to be true). To see this, we simply rotate the graph of ${g}$ clockwise by ${\frac{\pi}{2}}$ around ${\gamma_1(t)}$, where (by the Lipschitz hypotheses) it must hit the graph of ${f}$ in a unique point, which is ${\gamma_2(t)}$, and which then determines the other two vertices ${\gamma_3(t), \gamma_4(t)}$ of the square. The curve ${\gamma_3(t)}$ has the same starting and ending point as the graph of ${f}$ or ${g}$; using the Lipschitz hypothesis one can show this graph is simple. If the curve ever hits the graph of ${g}$ other than at the endpoints, we have created an inscribed square, so we may assume for contradiction that ${\gamma_3(t)}$ avoids the graph of ${g}$, and hence by the Jordan curve theorem the two curves enclose some non-empty bounded open region ${\Omega}$.

Now for the conserved integral of motion. If we integrate the ${1}$-form ${y\ dx}$ on each of the four curves ${\gamma_1, \gamma_2, \gamma_3, \gamma_4}$, we obtain the identity

$\displaystyle \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0.$

This identity can be established by the following calculation: one can parameterise

$\displaystyle \gamma_1(t) = (x(t), y(t))$

$\displaystyle \gamma_2(t) = (x(t)+a(t), y(t)+b(t))$

$\displaystyle \gamma_3(t) = (x(t)+a(t)-b(t), y(t)+a(t)+b(t))$

$\displaystyle \gamma_4(t) = (x(t)-b(t), y(t)+a(t))$

for some Lipschitz functions ${x,y,a,b: [t_0,t_1] \rightarrow {\bf R}}$; thus for instance ${\int_{\gamma_1} y\ dx = \int_{t_0}^{t_1} y(t)\ dx(t)}$. Inserting these parameterisations and doing some canceling, one can write the above integral as

$\displaystyle \int_{t_0}^{t_1} d \frac{a(t)^2-b(t)^2}{2}$

which vanishes because ${a(t), b(t)}$ (which represent the sidelengths of the squares determined by ${\gamma_1(t), \gamma_2(t), \gamma_3(t), \gamma_4(t)}$ vanish at the endpoints ${t=t_0,t_1}$.

Using this conserved integral of motion, one can show that

$\displaystyle \int_{\gamma_3} y\ dx = \int_{t_0}^{t_1} g(t)\ dt$

which by Stokes’ theorem then implies that the bounded open region ${\Omega}$ mentioned previously has zero area, which is absurd.

This argument hinged on the curve ${\gamma_3}$ being simple, so that the Jordan curve theorem could apply. Once one left the perturbative regime of curves of small Lipschitz constant, it became possible for ${\gamma_3}$ to be self-crossing, but nevertheless there still seemed to be some sort of integral obstruction. I eventually isolated the problem in the form of a strengthened version of Conjecture 2:

Conjecture 4 (Area formulation of square peg problem) Let ${\gamma_1, \gamma_2, \gamma_3, \gamma_4: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}}$ be simple closed piecewise linear curves of winding number ${1}$ obeying the area identity

$\displaystyle \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0$

(note the ${1}$-form ${y\ dx}$ is still well defined on the cylinder ${({\bf R}/{\bf Z}) \times {\bf R}}$; note also that the curves ${\gamma_1,\gamma_2,\gamma_3,\gamma_4}$ are allowed to cross each other.) Then there exists a (possibly degenerate) square with vertices (traversed in anticlockwise order) lying on ${\gamma_1, \gamma_2, \gamma_3, \gamma_4}$ respectively.

It is not difficult to see that Conjecture 4 implies Conjecture 2. Actually I believe that the converse implication is at least morally true, in that any counterexample to Conjecture 4 can be eventually transformed to a counterexample to Conjecture 2 and Conjecture 1. The conserved integral of motion argument can establish Conjecture 4 in many cases, for instance if ${\gamma_2,\gamma_4}$ are graphs of functions of Lipschitz constant less than one.

Conjecture 4 has a model special case, when one of the ${\gamma_i}$ is assumed to just be a horizontal loop. In this case, the problem collapses to that of producing an intersection between two three-dimensional subsets of a six-dimensional space, rather than to four-dimensional subsets of an eight-dimensional space. More precisely, some elementary transformations reveal that this special case of Conjecture 4 can be formulated in the following fashion in which the geometric notion of a square is replaced by the additive notion of a triple of real numbers summing to zero:

Conjecture 5 (Special case of area formulation) Let ${\gamma_1, \gamma_2, \gamma_3: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}}$ be simple closed piecewise linear curves of winding number ${1}$ obeying the area identity

$\displaystyle \int_{\gamma_1} y\ dx + \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx = 0.$

Then there exist ${x \in {\bf R}/{\bf Z}}$ and ${y_1,y_2,y_3 \in {\bf R}}$ with ${y_1+y_2+y_3=0}$ such that ${(x,y_i) \in \gamma_i}$ for ${i=1,2,3}$.

This conjecture is easy to establish if one of the curves, say ${\gamma_3}$, is the graph ${\{ (t,f(t)): t \in {\bf R}/{\bf Z}\}}$ of some piecewise linear function ${f: {\bf R}/{\bf Z} \rightarrow {\bf R}}$, since in that case the curve ${\gamma_1}$ and the curve ${\tilde \gamma_2 := \{ (x, -y-f(x)): (x,y) \in \gamma_2 \}}$ enclose the same area in the sense that ${\int_{\gamma_1} y\ dx = \int_{\tilde \gamma_2} y\ dx}$, and hence must intersect by the Jordan curve theorem (otherwise they would enclose a non-zero amount of area between them), giving the claim. But when none of the ${\gamma_1,\gamma_2,\gamma_3}$ are graphs, the situation becomes combinatorially more complicated.

Using some elementary homological arguments (e.g. breaking up closed ${1}$-cycles into closed paths) and working with a generic horizontal slice of the curves, I was able to show that Conjecture 5 was equivalent to a one-dimensional problem that was largely combinatorial in nature, revolving around the sign patterns of various triple sums ${y_{1,a} + y_{2,b} + y_{3,c}}$ with ${y_{1,a}, y_{2,b}, y_{3,c}}$ drawn from various finite sets of reals.

Conjecture 6 (Combinatorial form) Let ${k_1,k_2,k_3}$ be odd natural numbers, and for each ${i=1,2,3}$, let ${y_{i,1},\dots,y_{i,k_i}}$ be distinct real numbers; we adopt the convention that ${y_{i,0}=y_{i,k_i+1}=-\infty}$. Assume the following axioms:

• (i) For any ${1 \leq p \leq k_1, 1 \leq q \leq k_2, 1 \leq r \leq k_3}$, the sums ${y_{1,p} + y_{2,q} + y_{3,r}}$ are non-zero.
• (ii) (Non-crossing) For any ${i=1,2,3}$ and ${0 \leq p < q \leq k_i}$ with the same parity, the pairs ${\{ y_{i,p}, y_{i,p+1}\}}$ and ${\{y_{i,q}, y_{i,q+1}\}}$ are non-crossing in the sense that

$\displaystyle \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} (-1)^{a+b} \mathrm{sgn}( y_{i,a} - y_{i,b} ) = 0.$

• (iii) (Non-crossing sums) For any ${0 \leq p \leq k_1}$, ${0 \leq q \leq k_2}$, ${0 \leq r \leq k_3}$ of the same parity, one has

$\displaystyle \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} \sum_{c \in \{r,r+1\}} (-1)^{a+b+c} \mathrm{sgn}( y_{1,a} + y_{2,b} + y_{3,c} ) = 0.$

Then one has

$\displaystyle \sum_{i=1}^3 \sum_{p=1}^{k_i} (-1)^{p-1} y_{i,p} < 0.$

Roughly speaking, Conjecture 6 and Conjecture 5 are connected by constructing curves ${\gamma_i}$ to connect ${(0, y_{i,p})}$ to ${(0,y_{i,p+1})}$ for ${0 \leq p \leq k+1}$ by various paths, which either lie to the right of the ${y}$ axis (when ${p}$ is odd) or to the left of the ${y}$ axis (when ${p}$ is even). The axiom (ii) is asserting that the numbers ${-\infty, y_{i,1},\dots,y_{i,k_i}}$ are ordered according to the permutation of a meander (formed by gluing together two non-crossing perfect matchings).

Using various ad hoc arguments involving “winding numbers”, it is possible to prove this conjecture in many cases (e.g. if one of the ${k_i}$ is at most ${3}$), to the extent that I have now become confident that this conjecture is true (and have now come full circle from trying to disprove Conjecture 1 to now believing that this conjecture holds also). But it seems that there is some non-trivial combinatorial argument to be made if one is to prove this conjecture; purely homological arguments seem to partially resolve the problem, but are not sufficient by themselves.

While I was not able to resolve the square peg problem, I think these results do provide a roadmap to attacking it, first by focusing on the combinatorial conjecture in Conjecture 6 (or its equivalent form in Conjecture 5), then after that is resolved moving on to Conjecture 4, and then finally to Conjecture 1.

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.

I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces ${S[k] = S[k]({\bf R}^{1+n})}$ are constructed for each frequency scale ${k}$, and then a further space ${S(c) = S(c)({\bf R}^{1+n})}$ is constructed for a given frequency envelope ${c}$ by the formula

$\displaystyle \| \phi \|_{S(c)({\bf R}^{1+n})} := \|\phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]({\bf R}^{1+n})} \ \ \ \ \ (1)$

where ${\phi_k := P_k \phi}$ is the Littlewood-Paley projection of ${\phi}$ to frequency magnitudes ${\sim 2^k}$. Then, given a spacetime slab ${[-T,T] \times {\bf R}^n}$, we define the restrictions

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi \|_{S(c)({\bf R}^{1+n})}: \tilde \phi \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi \}$

where the infimum is taken over all extensions ${\tilde \phi}$ of ${\phi}$ to the Minkowski spacetime ${{\bf R}^{1+n}}$; similarly one defines

$\displaystyle \| \phi_k \|_{S_k([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi_k \|_{S_k({\bf R}^{1+n})}: \tilde \phi_k \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi_k \}.$

The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab ${[-T,T] \times {\bf R}^n}$ to obtain the equality

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} = \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}.$

(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions ${\tilde \phi_k}$ of ${\phi_k}$ that are optimal for computing ${\| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}}$ are not necessarily the Littlewood-Paley projections of the extensions ${\tilde \phi}$ of ${\phi}$ that are optimal for computing ${\| \phi \|_{S(c)([-T,T] \times {\bf R}^n)}}$.

To remedy the problem, one has to prove an upper bound of the form

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} \lesssim \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}$

for all Schwartz ${\phi}$ (actually we need affinely Schwartz ${\phi}$, but one can easily normalise to the Schwartz case). Without loss of generality we may normalise the RHS to be ${1}$. Thus

$\displaystyle \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} \leq 1 \ \ \ \ \ (2)$

and

$\displaystyle \|P_k \phi \|_{S[k]([-T,T] \times {\bf R}^n)} \leq c_k \ \ \ \ \ (3)$

for each ${k}$, and one has to find a single extension ${\tilde \phi}$ of ${\phi}$ such that

$\displaystyle \|\tilde \phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 1 \ \ \ \ \ (4)$

and

$\displaystyle \|P_k \tilde \phi \|_{S[k]({\bf R}^{1+n})} \lesssim c_k \ \ \ \ \ (5)$

for each ${k}$. Achieving a ${\tilde \phi}$ that obeys (4) is trivial (just extend ${\phi}$ by zero), but such extensions do not necessarily obey (5). On the other hand, from (3) we can find extensions ${\tilde \phi_k}$ of ${P_k \phi}$ such that

$\displaystyle \|\tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim c_k; \ \ \ \ \ (6)$

the extension ${\tilde \phi := \sum_k \tilde \phi_k}$ will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the ${S[k]}$ norm does control the ${L^\infty_t L^\infty_x}$ norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients ${c_k}$, while bounded, are not necessarily summable).

This can be fixed as follows. For each ${k}$ we introduce a time cutoff ${\eta_k}$ supported on ${[-T-2^{-k}, T+2^{-k}]}$ that equals ${1}$ on ${[-T-2^{-k-1},T+2^{-k+1}]}$ and obeys the usual derivative estimates in between (the ${j^{th}}$ time derivative of size ${O_j(2^{jk})}$ for each ${j}$). Later we will prove the truncation estimate

$\displaystyle \| \eta_k \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})}. \ \ \ \ \ (7)$

Assuming this estimate, then if we set ${\tilde \phi := \sum_k \eta_k \tilde \phi_k}$, then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that ${\tilde \phi}$ is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the ${k}$ summation and taking limits we can check that ${\tilde \phi}$ is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that

$\displaystyle \| \sum_k \eta_k(t) \tilde \phi_k(t) \|_{L^\infty_x({\bf R}^n)} \lesssim 1$

for all ${t \in {\bf R}}$.

For ${t \in [-T,T]}$ this is immediate from (2). Now suppose that ${t \in [T+2^{k_0-1}, T+2^{k_0}]}$ for some integer ${k_0}$ (the case when ${t \in [-T-2^{k_0}, -T-2^{k_0-1}]}$ is treated similarly). Then we can split

$\displaystyle \sum_k \eta_k(t) \tilde \phi_k(t) = \Phi_1 + \Phi_2 + \Phi_3$

where

$\displaystyle \Phi_1 := \sum_{k < k_0} \tilde \phi_k(T)$

$\displaystyle \Phi_2 := \sum_{k < k_0} \tilde \phi_k(t) - \tilde \phi_k(T)$

$\displaystyle \Phi_3 := \eta_{k_0}(t) \tilde \phi_{k_0}(t).$

The contribution of the ${\Phi_3}$ term is acceptable by (6) and estimate (82) from my paper. The term ${\Phi_1}$ sums to ${P_{ which is acceptable by (2). So it remains to control the ${L^\infty_x}$ norm of ${\Phi_2}$. By the triangle inequality and the fundamental theorem of calculus, we can bound

$\displaystyle \| \Phi_2 \|_{L^\infty_x} \leq (t-T) \sum_{k < k_0} \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})}.$

By hypothesis, ${t-T \leq 2^{-k_0}}$. Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have

$\displaystyle \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 2^k \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim 2^k;$

and then we are done by summing the geometric series in ${k}$.

It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.

I’ve just posted to the arXiv my paper “Finite time blowup for Lagrangian modifications of the three-dimensional Euler equation“. This paper is loosely in the spirit of other recent papers of mine in which I explore how close one can get to supercritical PDE of physical interest (such as the Euler and Navier-Stokes equations), while still being able to rigorously demonstrate finite time blowup for at least some choices of initial data. Here, the PDE we are trying to get close to is the incompressible inviscid Euler equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p$

$\displaystyle \nabla \cdot u = 0$

in three spatial dimensions, where ${u}$ is the velocity vector field and ${p}$ is the pressure field. In vorticity form, and viewing the vorticity ${\omega}$ as a ${2}$-form (rather than a vector), we can rewrite this system using the language of differential geometry as

$\displaystyle \partial_t \omega + {\mathcal L}_u \omega = 0$

$\displaystyle u = \delta \tilde \eta^{-1} \Delta^{-1} \omega$

where ${{\mathcal L}_u}$ is the Lie derivative along ${u}$, ${\delta}$ is the codifferential (the adjoint of the differential ${d}$, or equivalently the negative of the divergence operator) that sends ${k+1}$-vector fields to ${k}$-vector fields, ${\Delta}$ is the Hodge Laplacian, and ${\tilde \eta}$ is the identification of ${k}$-vector fields with ${k}$-forms induced by the Euclidean metric ${\tilde \eta}$. The equation${u = \delta \tilde \eta^{-1} \Delta^{-1} \omega}$ can be viewed as the Biot-Savart law recovering velocity from vorticity, expressed in the language of differential geometry.

One can then generalise this system by replacing the operator ${\tilde \eta^{-1} \Delta^{-1}}$ by a more general operator ${A}$ from ${2}$-forms to ${2}$-vector fields, giving rise to what I call the generalised Euler equations

$\displaystyle \partial_t \omega + {\mathcal L}_u \omega = 0$

$\displaystyle u = \delta A \omega.$

For example, the surface quasi-geostrophic (SQG) equations can be written in this form, as discussed in this previous post. One can view ${A \omega}$ (up to Hodge duality) as a vector potential for the velocity ${u}$, so it is natural to refer to ${A}$ as a vector potential operator.

The generalised Euler equations carry much of the same geometric structure as the true Euler equations. For instance, the transport equation ${\partial_t \omega + {\mathcal L}_u \omega = 0}$ is equivalent to the Kelvin circulation theorem, which in three dimensions also implies the transport of vortex streamlines and the conservation of helicity. If ${A}$ is self-adjoint and positive definite, then the famous Euler-Poincaré interpretation of the true Euler equations as geodesic flow on an infinite dimensional Riemannian manifold of volume preserving diffeomorphisms (as discussed in this previous post) extends to the generalised Euler equations (with the operator ${A}$ determining the new Riemannian metric to place on this manifold). In particular, the generalised Euler equations have a Lagrangian formulation, and so by Noether’s theorem we expect any continuous symmetry of the Lagrangian to lead to conserved quantities. Indeed, we have a conserved Hamiltonian ${\frac{1}{2} \int \langle \omega, A \omega \rangle}$, and any spatial symmetry of ${A}$ leads to a conserved impulse (e.g. translation invariance leads to a conserved momentum, and rotation invariance leads to a conserved angular momentum). If ${A}$ behaves like a pseudodifferential operator of order ${-2}$ (as is the case with the true vector potential operator ${\tilde \eta^{-1} \Delta^{-1}}$), then it turns out that one can use energy methods to recover the same sort of classical local existence theory as for the true Euler equations (up to and including the famous Beale-Kato-Majda criterion for blowup).

The true Euler equations are suspected of admitting smooth localised solutions which blow up in finite time; there is now substantial numerical evidence for this blowup, but it has not been proven rigorously. The main purpose of this paper is to show that such finite time blowup can at least be established for certain generalised Euler equations that are somewhat close to the true Euler equations. This is similar in spirit to my previous paper on finite time blowup on averaged Navier-Stokes equations, with the main new feature here being that the modified equation continues to have a Lagrangian structure and a vorticity formulation, which was not the case with the averaged Navier-Stokes equation. On the other hand, the arguments here are not able to handle the presence of viscosity (basically because they rely crucially on the Kelvin circulation theorem, which is not available in the viscous case).

In fact, three different blowup constructions are presented (for three different choices of vector potential operator ${A}$). The first is a variant of one discussed previously on this blog, in which a “neck pinch” singularity for a vortex tube is created by using a non-self-adjoint vector potential operator, in which the velocity at the neck of the vortex tube is determined by the circulation of the vorticity somewhat further away from that neck, which when combined with conservation of circulation is enough to guarantee finite time blowup. This is a relatively easy construction of finite time blowup, and has the advantage of being rather stable (any initial data flowing through a narrow tube with a large positive circulation will blow up in finite time). On the other hand, it is not so surprising in the non-self-adjoint case that finite blowup can occur, as there is no conserved energy.

The second blowup construction is based on a connection between the two-dimensional SQG equation and the three-dimensional generalised Euler equations, discussed in this previous post. Namely, any solution to the former can be lifted to a “two and a half-dimensional” solution to the latter, in which the velocity and vorticity are translation-invariant in the vertical direction (but the velocity is still allowed to contain vertical components, so the flow is not completely horizontal). The same embedding also works to lift solutions to generalised SQG equations in two dimensions to solutions to generalised Euler equations in three dimensions. Conveniently, even if the vector potential operator for the generalised SQG equation fails to be self-adjoint, one can ensure that the three-dimensional vector potential operator is self-adjoint. Using this trick, together with a two-dimensional version of the first blowup construction, one can then construct a generalised Euler equation in three dimensions with a vector potential that is both self-adjoint and positive definite, and still admits solutions that blow up in finite time, though now the blowup is now a vortex sheet creasing at on a line, rather than a vortex tube pinching at a point.

This eliminates the main defect of the first blowup construction, but introduces two others. Firstly, the blowup is less stable, as it relies crucially on the initial data being translation-invariant in the vertical direction. Secondly, the solution is not spatially localised in the vertical direction (though it can be viewed as a compactly supported solution on the manifold ${{\bf R}^2 \times {\bf R}/{\bf Z}}$, rather than ${{\bf R}^3}$). The third and final blowup construction of the paper addresses the final defect, by replacing vertical translation symmetry with axial rotation symmetry around the vertical axis (basically, replacing Cartesian coordinates with cylindrical coordinates). It turns out that there is a more complicated way to embed two-dimensional generalised SQG equations into three-dimensional generalised Euler equations in which the solutions to the latter are now axially symmetric (but are allowed to “swirl” in the sense that the velocity field can have a non-zero angular component), while still keeping the vector potential operator self-adjoint and positive definite; the blowup is now that of a vortex ring creasing on a circle.

As with the previous papers in this series, these blowup constructions do not directly imply finite time blowup for the true Euler equations, but they do at least provide a barrier to establishing global regularity for these latter equations, in that one is forced to use some property of the true Euler equations that are not shared by these generalisations. They also suggest some possible blowup mechanisms for the true Euler equations (although unfortunately these mechanisms do not seem compatible with the addition of viscosity, so they do not seem to suggest a viable Navier-Stokes blowup mechanism).

I’ve just uploaded to the arXiv my paper “Equivalence of the logarithmically averaged Chowla and Sarnak conjectures“, submitted to the Festschrift “Number Theory – Diophantine problems, uniform distribution and applications” in honour of Robert F. Tichy. This paper is a spinoff of my previous paper establishing a logarithmically averaged version of the Chowla (and Elliott) conjectures in the two-point case. In that paper, the estimate

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o( \log x )$

as ${x \rightarrow \infty}$ was demonstrated, where ${h}$ was any positive integer and ${\lambda}$ denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form

$\displaystyle \sum_{n \leq x} \frac{|\sum_{h \leq H} \lambda(n+h) e( \alpha h)|}{n} = o( H \log x )$

whenever ${H}$ was a slowly growing function of ${x}$. This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.

It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that

$\displaystyle \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o( \log x )$

as ${x \rightarrow \infty}$ for any fixed distinct integers ${h_1,\dots,h_k}$. As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form

$\displaystyle \sum_{n \leq x} \frac{1}{n} \| \lambda \|_{U^d[n, n+H]} = o( \log x )$

for ${H}$ a slowly growing function of ${x}$ and some fixed ${d}$ (in fact we can take ${d=k-1}$ for ${k \geq 3}$), where ${U^d}$ is the (normalised) local Gowers uniformity norm. (In the case ${k=3}$, ${d=2}$, this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion

$\displaystyle \sum_{n \leq x} \frac{1}{n} \sup |\sum_{h \leq H} \lambda(n+h) F( g^h x )| = o( \log x ) \ \ \ \ \ (1)$

where the supremum is over all possible choices of nilsequences ${h \mapsto F(g^h x)}$ of controlled step and complexity (see the paper for definitions of these terms).

The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that

$\displaystyle \sum_{n \leq x} \frac{1}{n} \lambda(n) F( T^n x )= o( \log x )$

whenever ${n \mapsto F(T^n x)}$ is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the class of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.

On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.

It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.

Tamar Ziegler and I have just uploaded to the arXiv two related papers: “Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteoristic factors” and “polynomial patterns in primes“, with the former developing a “quantitative Bessel inequality” for local Gowers norms that is crucial in the latter.

We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a joint direction. A trivial example of such a concatenation theorem is the following: if a function ${f: {\bf Z} \times {\bf Z} \rightarrow {\bf R}}$ is constant in the first variable (thus ${x \mapsto f(x,y)}$ is constant for each ${y}$), and also constant in the second variable (thus ${y \mapsto f(x,y)}$ is constant for each ${x}$), then it is constant in the joint variable ${(x,y)}$. A slightly less trivial example: if a function ${f: {\bf Z} \times {\bf Z} \rightarrow {\bf R}}$ is affine-linear in the first variable (thus, for each ${y}$, there exist ${\alpha(y), \beta(y)}$ such that ${f(x,y) = \alpha(y) x + \beta(y)}$ for all ${x}$) and affine-linear in the second variable (thus, for each ${x}$, there exist ${\gamma(x), \delta(x)}$ such that ${f(x,y) = \gamma(x)y + \delta(x)}$ for all ${y}$) then ${f}$ is a quadratic polynomial in ${x,y}$; in fact it must take the form

$\displaystyle f(x,y) = \epsilon xy + \zeta x + \eta y + \theta \ \ \ \ \ (1)$

for some real numbers ${\epsilon, \zeta, \eta, \theta}$. (This can be seen for instance by using the affine linearity in ${y}$ to show that the coefficients ${\alpha(y), \beta(y)}$ are also affine linear.)

The same phenomenon extends to higher degree polynomials. Given a function ${f: G \rightarrow K}$ from one additive group ${G}$ to another, we say that ${f}$ is of degree less than ${d}$ along a subgroup ${H}$ of ${G}$ if all the ${d}$-fold iterated differences of ${f}$ along directions in ${H}$ vanish, that is to say

$\displaystyle \partial_{h_1} \dots \partial_{h_d} f(x) = 0$

for all ${x \in G}$ and ${h_1,\dots,h_d \in H}$, where ${\partial_h}$ is the difference operator

$\displaystyle \partial_h f(x) := f(x+h) - f(x).$

(We adopt the convention that the only ${f}$ of degree less than ${0}$ is the zero function.)

We then have the following simple proposition:

Proposition 1 (Concatenation of polynomiality) Let ${f: G \rightarrow K}$ be of degree less than ${d_1}$ along one subgroup ${H_1}$ of ${G}$, and of degree less than ${d_2}$ along another subgroup ${H_2}$ of ${G}$, for some ${d_1,d_2 \geq 1}$. Then ${f}$ is of degree less than ${d_1+d_2-1}$ along the subgroup ${H_1+H_2}$ of ${G}$.

Note the previous example was basically the case when ${G = {\bf Z} \times {\bf Z}}$, ${H_1 = {\bf Z} \times \{0\}}$, ${H_2 = \{0\} \times {\bf Z}}$, ${K = {\bf R}}$, and ${d_1=d_2=2}$.

Proof: The claim is trivial for ${d_1=1}$ or ${d_2=1}$ (in which ${f}$ is constant along ${H_1}$ or ${H_2}$ respectively), so suppose inductively ${d_1,d_2 \geq 2}$ and the claim has already been proven for smaller values of ${d_1-1}$.

We take a derivative in a direction ${h_1 \in H_1}$ along ${h_1}$ to obtain

$\displaystyle T^{-h_1} f = f + \partial_{h_1} f$

where ${T^{-h_1} f(x) = f(x+h_1)}$ is the shift of ${f}$ by ${-h_1}$. Then we take a further shift by a direction ${h_2 \in H_2}$ to obtain

$\displaystyle T^{-h_1-h_2} f = T^{-h_2} f + T^{-h_2} \partial_{h_1} f = f + \partial_{h_2} f + T^{-h_2} \partial_{h_1} f$

$\displaystyle \partial_{h_1+h_2} f = \partial_{h_2} f + T^{-h_2} \partial_{h_1} f.$

Since ${f}$ has degree less than ${d_1}$ along ${H_1}$ and degree less than ${d_2}$ along ${H_2}$, ${\partial_{h_1} f}$ has degree less than ${d_1-1}$ along ${H_1}$ and less than ${d_2}$ along ${H_2}$, so is degree less than ${d_1+d_2-2}$ along ${H_1+H_2}$ by induction hypothesis. Similarly ${\partial_{h_2} f}$ is also of degree less than ${d_1+d_2-2}$ along ${H_1+H_2}$. Combining this with the cocycle equation we see that ${\partial_{h_1+h_2}f}$ is of degree less than ${d_1+d_2-2}$ along ${H_1+H_2}$ for any ${h_1+h_2 \in H_1+H_2}$, and hence ${f}$ is of degree less than ${d_1+d_2-1}$ along ${H_1+H_2}$, as required. $\Box$

While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:

• (i) One should perform induction on the degrees ${d_1,d_2}$ involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree ${d}$ along some subgroup ${H}$ of directions iff all of its first derivatives along ${H}$ are of degree less than ${d-1}$).
• (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function ${f}$ is of degree less than ${d}$ along some subgroup ${H}$, then any derivative ${\partial_k f}$ of ${f}$ is also of degree less than ${d}$ along ${H}$, even if ${k}$ does not belong to ${H}$.

Here is another simple example of a concatenation theorem. Suppose an at most countable additive group ${G}$ acts by measure-preserving shifts ${T: g \mapsto T^g}$ on some probability space ${(X, {\mathcal X}, \mu)}$; we call the pair ${(X,T)}$ (or more precisely ${(X, {\mathcal X}, \mu, T)}$) a ${G}$-system. We say that a function ${f \in L^\infty(X)}$ is a generalised eigenfunction of degree less than ${d}$ along some subgroup ${H}$ of ${G}$ and some ${d \geq 1}$ if one has

$\displaystyle T^h f = \lambda_h f$

almost everywhere for all ${h \in H}$, and some functions ${\lambda_h \in L^\infty(X)}$ of degree less than ${d-1}$ along ${H}$, with the convention that a function has degree less than ${0}$ if and only if it is equal to ${1}$. Thus for instance, a function ${f}$ is an generalised eigenfunction of degree less than ${1}$ along ${H}$ if it is constant on almost every ${H}$-ergodic component of ${G}$, and is a generalised function of degree less than ${2}$ along ${H}$ if it is an eigenfunction of the shift action on almost every ${H}$-ergodic component of ${G}$. A basic example of a higher order eigenfunction is the function ${f(x,y) := e^{2\pi i y}}$ on the skew shift ${({\bf R}/{\bf Z})^2}$ with ${{\bf Z}}$ action given by the generator ${T(x,y) := (x+\alpha,y+x)}$ for some irrational ${\alpha}$. One can check that ${T^h f = \lambda_h f}$ for every integer ${h}$, where ${\lambda_h: x \mapsto e^{2\pi i \binom{h}{2} \alpha} e^{2\pi i h x}}$ is a generalised eigenfunction of degree less than ${2}$ along ${{\bf Z}}$, so ${f}$ is of degree less than ${3}$ along ${{\bf Z}}$.

We then have

Proposition 2 (Concatenation of higher order eigenfunctions) Let ${(X,T)}$ be a ${G}$-system, and let ${f \in L^\infty(X)}$ be a generalised eigenfunction of degree less than ${d_1}$ along one subgroup ${H_1}$ of ${G}$, and a generalised eigenfunction of degree less than ${d_2}$ along another subgroup ${H_2}$ of ${G}$, for some ${d_1,d_2 \geq 1}$. Then ${f}$ is a generalised eigenfunction of degree less than ${d_1+d_2-1}$ along the subgroup ${H_1+H_2}$ of ${G}$.

The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than ${d}$ along ${H}$ is preserved by multiplication and shifts, as well as the operation of “taking derivatives” ${f \mapsto \lambda_k}$ even along directions ${k}$ that do not lie in ${H}$. (To prove this latter claim, one should restrict to the region where ${f}$ is non-zero, and then divide ${T^k f}$ by ${f}$ to locate ${\lambda_k}$.)

A typical example of this proposition in action is as follows: consider the ${{\bf Z}^2}$-system given by the ${3}$-torus ${({\bf R}/{\bf Z})^3}$ with generating shifts

$\displaystyle T^{(1,0)}(x,y,z) := (x+\alpha,y,z+y)$

$\displaystyle T^{(0,1)}(x,y,z) := (x,y+\alpha,z+x)$

for some irrational ${\alpha}$, which can be checked to give a ${{\bf Z}^2}$ action

$\displaystyle T^{(n,m)}(x,y,z) := (x+n\alpha, y+m\alpha, z+ny+mx+nm\alpha).$

The function ${f(x,y,z) := e^{2\pi i z}}$ can then be checked to be a generalised eigenfunction of degree less than ${2}$ along ${{\bf Z} \times \{0\}}$, and also less than ${2}$ along ${\{0\} \times {\bf Z}}$, and less than ${3}$ along ${{\bf Z}^2}$. One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).

The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the Host-Kra characteristic factors ${Z^{ of a ${G}$-system ${X}$ along a subgroup ${H}$. These factors can be defined in a number of ways. One is by duality, using the Gowers-Host-Kra uniformity seminorms (defined for instance here) ${\| \|_{U^d_H(X)}}$. Namely, ${Z^{ is the factor of ${X}$ defined up to equivalence by the requirement that

$\displaystyle \|f\|_{U^d_H(X)} = 0 \iff {\bf E}(f | Z^{

An equivalent definition is in terms of the dual functions ${{\mathcal D}^d_H(f)}$ of ${f}$ along ${H}$, which can be defined recursively by setting ${{\mathcal D}^0_H(f) = 1}$ and

$\displaystyle {\mathcal D}^d_H(f) = {\bf E}_h T^h f {\mathcal D}^{d-1}( f \overline{T^h f} )$

where ${{\bf E}_h}$ denotes the ergodic average along a Følner sequence in ${G}$ (in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor ${Z^{ can then be alternately defined as the factor generated by the dual functions ${{\mathcal D}^d_H(f)}$ for ${f \in L^\infty(X)}$.

In the case when ${G=H={\bf Z}}$ and ${X}$ is ${G}$-ergodic, a deep theorem of Host and Kra shows that the factor ${Z^{ is equivalent to the inverse limit of nilsystems of step less than ${d}$. A similar statement holds with ${{\bf Z}}$ replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when ${X}$ is not ${G}$-ergodic, or when ${X}$ is ${G}$-ergodic but ${H}$ is a proper subgroup of ${G}$ acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).

One of our main theorems is then

Proposition 3 (Concatenation of characteristic factors) Let ${(X,T)}$ be a ${G}$-system, and let ${f}$ be measurable with respect to the factor ${Z^{ and with respect to the factor ${Z^{ for some ${d_1,d_2 \geq 1}$ and some subgroups ${H_1,H_2}$ of ${G}$. Then ${f}$ is also measurable with respect to the factor ${Z^{.

We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type ${ (along a subgroup ${H}$)”, which can be used to inductively describe the factors ${Z^{, and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small ${U^{d_1}_{H_1}}$ norm, and also to all bounded functions of small ${U^{d_2}_{H_2}}$ norm, is also nearly orthogonal to alll bounded functions of small ${U^{d_1+d_2-1}_{H_1+H_2}}$ norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions ${F := {\mathcal D}^d_H(f)}$ obey a property analogous to being a generalised eigenfunction, namely that

$\displaystyle T^h F = {\bf E}_k \lambda_{h,k} F_k$

where ${F_k := T^k F}$ and ${\lambda_{h,k} := {\mathcal D}^{d-1}( T^h f \overline{T^k f} )}$ is a “structured function of order ${d-1}$” along ${H}$. (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order ${d}$.) Again, the point (ii) above is crucial, and in particular it is key that any structure that ${F}$ has is inherited by the associated functions ${\lambda_{h,k}}$ and ${F_k}$. This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and ${\sigma}$-algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.

For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in ${U^{d_1+d_2-1}_{H_1+H_2}}$ norm can be split into a component that is small in ${U^{d_1}_{H_1}}$ norm, and a component that is small in ${U^{d_2}_{H_2}}$ norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of ${H_1+H_2}$, can be decomposed as the sum of a function that has mean zero on every ${H_1}$ coset, and a function that has mean zero on every ${H_2}$ coset. This is dual to the assertion that a function that is constant on every ${H_1}$ coset and constant on every ${H_2}$ coset, is constant on every ${H_1+H_2}$ coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups ${H_1,\dots,H_k}$ and a bounded function is small in ${U^{2d-1}_{H_i+H_j}}$ norm for most ${i,j}$, then it is also small in ${U^d_{H_i}}$ norm for most ${i}$. (Here is a baby version one may wish to warm up on: if a function ${f}$ has small mean on ${({\bf Z}/p{\bf Z})^2}$ for some large prime ${p}$, then it has small mean on most of the cosets of most of the one-dimensional subgroups of ${({\bf Z}/p{\bf Z})^2}$.)

There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups ${H_i}$ are replaced by more general coset progressions ${H_i+P_i}$ (of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as ${U^d_{P_i}}$ by “global” Gowers uniformity norms such as ${U^{2d-1}_{P_i+P_j}}$. This turns out to be particularly useful when attempting to compute polynomial averages such as

$\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} f(n) g(n+r^2) h(n+2r^2) \ \ \ \ \ (2)$

for various functions ${f,g,h}$. After repeated use of the van der Corput lemma, one can control such averages by expressions such as

$\displaystyle \sum_{n \leq N} \sum_{h,m,k \leq \sqrt{N}} f(n) f(n+mh) f(n+mk) f(n+m(h+k))$

(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various ${U^2}$ Gowers uniformity norms of ${f}$ along arithmetic progressions of the form ${\{ mh: h \leq \sqrt{N}\}}$ for various ${m \leq \sqrt{N}}$. Using the above Bessel inequality, this can be controlled in turn by an average of various ${U^3}$ Gowers uniformity norms along rank two generalised arithmetic progressions of the form ${\{ m_1 h_1 + m_2 h_2: h_1,h_2 \le \sqrt{N}\}}$ for various ${m_1,m_2 \leq \sqrt{N}}$. But for generic ${m_1,m_2}$, this rank two progression is close in a certain technical sense to the “global” interval ${\{ n: n \leq N \}}$ (this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of global Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as

$\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} \mu(n) \mu(n+r^2) \mu(n+2r^2)$

or

$\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} \Lambda(n) \Lambda(n+r^2) \Lambda(n+2r^2)$

where ${\mu}$ and ${\Lambda}$ are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).

By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:

Theorem 4 (Polynomial patterns in the primes) Let ${P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}}$ be polynomials of degree at most ${d}$, whose degree ${d}$ coefficients are all distinct, for some ${d \geq 1}$. Suppose that ${P_1,\dots,P_k}$ is admissible in the sense that for every prime ${p}$, there are ${n,r}$ such that ${n+P_1(r),\dots,n+P_k(r)}$ are all coprime to ${p}$. Then there exist infinitely many pairs ${n,r}$ of natural numbers such that ${n+P_1(r),\dots,n+P_k(r)}$ are prime.

Furthermore, we obtain an asymptotic for the number of such pairs ${n,r}$ in the range ${n \leq N}$, ${r \leq N^{1/d}}$ (actually for minor technical reasons we reduce the range of ${r}$ to be very slightly less than ${N^{1/d}}$). In fact one could in principle obtain asymptotics for smaller values of ${r}$, and relax the requirement that the degree ${d}$ coefficients be distinct with the requirement that no two of the ${P_i}$ differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form ${n, n+r,n+r^d}$ unconditionally for ${d \leq 5}$, and conditionally on GRH for all ${d}$, using known results on primes in short intervals on average.

The ${d=1}$ case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher ${d}$, an older result of Tamar and myself was able to tackle the case when ${P_1(0)=\dots=P_k(0)=0}$ (though our results there only give lower bounds on the number of pairs ${(n,r)}$, and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.

. This latter Bessel type inequality is particularly useful in combinatorial and number-theoretic applications, as it allows one to convert “global” Gowers uniformity norm (basically, bounds on norms such as ${U^{2d-1}_{H_i+H_j}}$) to “local” Gowers uniformity norm control.

Van Vu and I just posted to the arXiv our paper “sum-free sets in groups” (submitted to Discrete Analysis), as well as a companion survey article (submitted to J. Comb.). Given a subset ${A}$ of an additive group ${G = (G,+)}$, define the quantity ${\phi(A)}$ to be the cardinality of the largest subset ${B}$ of ${A}$ which is sum-free in ${A}$ in the sense that all the sums ${b_1+b_2}$ with ${b_1,b_2}$ distinct elements of ${B}$ lie outside of ${A}$. For instance, if ${A}$ is itself a group, then ${\phi(A)=1}$, since no two elements of ${A}$ can sum to something outside of ${A}$. More generally, if ${A}$ is the union of ${k}$ groups, then ${\phi(A)}$ is at most ${k}$, thanks to the pigeonhole principle.

If ${G}$ is the integers, then there are no non-trivial subgroups, and one can thus expect ${\phi(A)}$ to start growing with ${A}$. For instance, one has the following easy result:

Proposition 1 Let ${A}$ be a set of ${2^k}$ natural numbers. Then ${\phi(A) > k}$.

Proof: We use an argument of Ruzsa, which is based in turn on an older argument of Choi. Let ${x_1}$ be the largest element of ${A}$, and then recursively, once ${x_1,\dots,x_i}$ has been selected, let ${x_{i+1}}$ be the largest element of ${A}$ not equal to any of the ${x_1,\dots,x_i}$, such that ${x_{i+1}+x_j \not \in A}$ for all ${j=1,\dots,i}$, terminating this construction when no such ${x_{i+1}}$ can be located. This gives a sequence ${x_1 > x_2 > \dots > x_m}$ of elements in ${A}$ which are sum-free in ${A}$, and with the property that for any ${y \in A}$, either ${y}$ is equal to one of the ${x_i}$, or else ${y + x_i \in A}$ for some ${i}$ with ${x_i > y}$. Iterating this, we see that any ${y \in A}$ is of the form ${x_{i_1} - x_{i_2} - \dots - x_{i_j}}$ for some ${j \geq 1}$ and ${1 \leq i_1 < i_2 < \dots \leq i_j \leq m}$. The number of such expressions ${x_{i_1} - x_{i_2} - \dots - x_{i_j}}$ is at most ${2^{m}-1}$, thus ${2^k \leq 2^m-1}$ which implies ${m \geq k+1}$. Since ${\phi(A) \geq m}$, the claim follows. $\Box$

In particular, we have ${\phi(A) \gg \log |A|}$ for subsets ${A}$ of the integers. It has been possible to improve upon this easy bound, but only with remarkable effort. The best lower bound currently is

$\displaystyle \phi(A) \geq \log |A| (\log\log|A|)^{1/2 - o(1)},$

a result of Shao (building upon earlier work of Sudakov, Szemeredi, and Vu and of Dousse). In the opposite direction, a construction of Ruzsa gives examples of large sets ${A}$ with ${\phi(A) \leq \exp( O( \sqrt{\log |A|} ) )}$.

Using the standard tool of Freiman homomorphisms, the above results for the integers extend to other torsion-free abelian groups ${G}$. In our paper we study the opposite case where ${G}$ is finite (but still abelian). In this paper of Erdös (in which the quantity ${\phi(A)}$ was first introduced), the following question was posed: if ${A}$ is sufficiently large depending on ${\phi(A)}$, does this imply the existence of two elements ${x,y \in A}$ with ${x+y=0}$? As it turns out, we were able to find some simple counterexamples to this statement. For instance, if ${H}$ is any finite additive group, then the set ${A := \{ 1 \hbox{ mod } 7, 2 \hbox{ mod } 7, 4 \hbox{ mod } 7\} \times H \subset {\bf Z}/7{\bf Z} \times H}$ has ${\phi(A)=3}$ but with no ${x,y \in A}$ summing to zero; this type of example in fact works with ${7}$ replaced by any larger Mersenne prime, and we also have a counterexample in ${{\bf Z}/2^n{\bf Z}}$ for ${n}$ arbitrarily large. However, in the positive direction, we can show that the answer to Erdös’s question is positive if ${|G|}$ is assumed to have no small prime factors. That is to say,

Theorem 2 For every ${k \geq 1}$ there exists ${C \geq 1}$ such that if ${G}$ is a finite abelian group whose order is not divisible by any prime less than or equal to ${C}$, and ${A}$ is a subset of ${G}$ with order at least ${C}$ and ${\phi(A) \leq k}$, then there exist ${x,y \in A}$ with ${x+y=0}$.

There are two main tools used to prove this result. One is an “arithmetic removal lemma” proven by Král, Serra, and Vena. Note that the condition ${\phi(A) \leq k}$ means that for any distinct ${x_1,\dots,x_{k+1} \in A}$, at least one of the ${x_i+x_j}$, ${1 \leq i < j \leq k+1}$, must also lie in ${A}$. Roughly speaking, the arithmetic removal lemma allows one to “almost” remove the requirement that ${x_1,\dots,x_{k+1}}$ be distinct, which basically now means that ${x \in A \implies 2x \in A}$ for almost all ${x \in A}$. This near-dilation symmetry, when combined with the hypothesis that ${|G|}$ has no small prime factors, gives a lot of “dispersion” in the Fourier coefficients of ${1_A}$ which can now be exploited to prove the theorem.

The second tool is the following structure theorem, which is the main result of our paper, and goes a fair ways towards classifying sets ${A}$ for which ${\phi(A)}$ is small:

Theorem 3 Let ${A}$ be a finite subset of an arbitrary additive group ${G}$, with ${\phi(A) \leq k}$. Then one can find finite subgroups ${H_1,\dots,H_m}$ with ${m \leq k}$ such that ${|A \cap H_i| \gg_k |H_i|}$ and ${|A \backslash (H_1 \cup \dots \cup H_m)| \ll_k 1}$. Furthermore, if ${m=k}$, then the exceptional set ${A \backslash (H_1 \cup \dots \cup H_m)}$ is empty.

Roughly speaking, this theorem shows that the example of the union of ${k}$ subgroups mentioned earlier is more or less the “only” example of sets ${A}$ with ${\phi(A) \leq k}$, modulo the addition of some small exceptional sets and some refinement of the subgroups to dense subsets.

This theorem has the flavour of other inverse theorems in additive combinatorics, such as Freiman’s theorem, and indeed one can use Freiman’s theorem (and related tools, such as the Balog-Szemeredi theorem) to easily get a weaker version of this theorem. Indeed, if there are no sum-free subsets of ${A}$ of order ${k+1}$, then a fraction ${\gg_k 1}$ of all pairs ${a,b}$ in ${A}$ must have their sum also in ${A}$ (otherwise one could take ${k+1}$ random elements of ${A}$ and they would be sum-free in ${A}$ with positive probability). From this and the Balog-Szemeredi theorem and Freiman’s theorem (in arbitrary abelian groups, as established by Green and Ruzsa), we see that ${A}$ must be “commensurate” with a “coset progression” ${H+P}$ of bounded rank. One can then eliminate the torsion-free component ${P}$ of this coset progression by a number of methods (e.g. by using variants of the argument in Proposition 1), with the upshot being that one can locate a finite group ${H_1}$ that has large intersection with ${A}$.

At this point it is tempting to simply remove ${H_1}$ from ${A}$ and iterate. But one runs into a technical difficulty that removing a set such as ${H_1}$ from ${A}$ can alter the quantity ${\phi(A)}$ in unpredictable ways, so one has to still keep ${H_1}$ around when analysing the residual set ${A \backslash H_1}$. A second difficulty is that the latter set ${A \backslash H_1}$ could be considerably smaller than ${A}$ or ${H_1}$, but still large in absolute terms, so in particular any error term whose size is only bounded by ${\varepsilon |A|}$ for a small ${\varepsilon}$ could be massive compared with the residual set ${A\backslash H_1}$, and so such error terms would be unacceptable. One can get around these difficulties if one first performs some preliminary “normalisation” of the group ${H_1}$, so that the residual set ${A \backslash H_1}$ does not intersect any coset of ${H_1}$ too strongly. The arguments become even more complicated when one starts removing more than one group ${H_1,\dots,H_i}$ from ${A}$ and analyses the residual set ${A \backslash (H_1 \cup \dots \cup H_i)}$; indeed the “epsilon management” involved became so fearsomely intricate that we were forced to use a nonstandard analysis formulation of the problem in order to keep the complexity of the argument at a reasonable level (cf. my previous blog post on this topic). One drawback of doing so is that we have no effective bounds for the implied constants in our main theorem; it would be of interest to obtain a more direct proof of our main theorem that would lead to effective bounds.

I’ve just uploaded to the arXiv my paper Finite time blowup for high dimensional nonlinear wave systems with bounded smooth nonlinearity, submitted to Comm. PDE. This paper is in the same spirit as (though not directly related to) my previous paper on finite time blowup of supercritical NLW systems, and was inspired by a question posed to me some time ago by Jeffrey Rauch. Here, instead of looking at supercritical equations, we look at an extremely subcritical equation, namely a system of the form

$\displaystyle \Box u = f(u) \ \ \ \ \ (1)$

where ${u: {\bf R}^{1+d} \rightarrow {\bf R}^m}$ is the unknown field, and ${f: {\bf R}^m \rightarrow {\bf R}^m}$ is the nonlinearity, which we assume to have all derivatives bounded. A typical example of such an equation is the higher-dimensional sine-Gordon equation

$\displaystyle \Box u = \sin u$

for a scalar field ${u: {\bf R}^{1+d} \rightarrow {\bf R}}$. Here ${\Box = -\partial_t^2 + \Delta}$ is the d’Alembertian operator. We restrict attention here to classical (i.e. smooth) solutions to (1).

We do not assume any Hamiltonian structure, so we do not require ${f}$ to be a gradient ${f = \nabla F}$ of a potential ${F: {\bf R}^m \rightarrow {\bf R}}$. But even without such Hamiltonian structure, the equation (1) is very well behaved, with many a priori bounds available. For instance, if the initial position ${u_0(x) = u(0,x)}$ and initial velocity ${u_1(x) = \partial_t u(0,x)}$ are smooth and compactly supported, then from finite speed of propagation ${u(t)}$ has uniformly bounded compact support for all ${t}$ in a bounded interval. As the nonlinearity ${f}$ is bounded, this immediately places ${f(u)}$ in ${L^\infty_t L^2_x}$ in any bounded time interval, which by the energy inequality gives an a priori ${L^\infty_t H^1_x}$ bound on ${u}$ in this time interval. Next, from the chain rule we have

$\displaystyle \nabla f(u) = (\nabla_{{\bf R}^m} f)(u) \nabla u$

which (from the assumption that ${\nabla_{{\bf R}^m} f}$ is bounded) shows that ${f(u)}$ is in ${L^\infty_t H^1_x}$, which by the energy inequality again now gives an a priori ${L^\infty_t H^2_x}$ bound on ${u}$.

One might expect that one could keep iterating this and obtain a priori bounds on ${u}$ in arbitrarily smooth norms. In low dimensions such as ${d \leq 3}$, this is a fairly easy task, since the above estimates and Sobolev embedding already place one in ${L^\infty_t L^\infty_x}$, and the nonlinear map ${f}$ is easily verified to preserve the space ${L^\infty_t H^k_x \cap L^\infty_t L^\infty_x}$ for any natural number ${k}$, from which one obtains a priori bounds in any Sobolev space; from this and standard energy methods, one can then establish global regularity for this equation (that is to say, any smooth choice of initial data generates a global smooth solution). However, one starts running into trouble in higher dimensions, in which no ${L^\infty_x}$ bound is available. The main problem is that even a really nice nonlinearity such as ${u \mapsto \sin u}$ is unbounded in higher Sobolev norms. The estimates

$\displaystyle |\sin u| \leq |u|$

and

$\displaystyle |\nabla(\sin u)| \leq |\nabla u|$

ensure that the map ${u \mapsto \sin u}$ is bounded in low regularity spaces like ${L^2_x}$ or ${H^1_x}$, but one already runs into trouble with the second derivative

$\displaystyle \nabla^2(\sin u) = (\cos u) \nabla^2 u - (\sin u) \nabla u \nabla u$

where there is a troublesome lower order term of size ${O( |\nabla u|^2 )}$ which becomes difficult to control in higher dimensions, preventing the map ${u \mapsto \sin u}$ to be bounded in ${H^2_x}$. Ultimately, the issue here is that when ${u}$ is not controlled in ${L^\infty}$, the function ${\sin u}$ can oscillate at a much higher frequency than ${u}$; for instance, if ${u}$ is the one-dimensional wave ${u = A \sin(kx)}$for some ${k > 0}$ and ${A>1}$, then ${u}$ oscillates at frequency ${k}$, but the function ${\sin(u)= \sin(A \sin(kx))}$ more or less oscillates at the larger frequency ${Ak}$.

In medium dimensions, it is possible to use dispersive estimates for the wave equation (such as the famous Strichartz estimates) to overcome these problems. This line of inquiry was pursued (albeit for slightly different classes of nonlinearity ${f}$ than those considered here) by Heinz-von Wahl, Pecher (in a series of papers), Brenner, and Brenner-von Wahl; to cut a long story short, one of the conclusions of these papers was that one had global regularity for equations such as (1) in dimensions ${d \leq 9}$. (I reprove this result using modern Strichartz estimate and Littlewood-Paley techniques in an appendix to my paper. The references given also allow for some growth in the nonlinearity ${f}$, but we will not detail the precise hypotheses used in these papers here.)

In my paper, I complement these positive results with an almost matching negative result:

Theorem 1 If ${d \geq 11}$ and ${m \geq 2}$, then there exists a nonlinearity ${f: {\bf R}^m \rightarrow {\bf R}^m}$ with all derivatives bounded, and a solution ${u}$ to (1) that is smooth at time zero, but develops a singularity in finite time.

The construction crucially relies on the ability to choose the nonlinearity ${f}$, and also needs some injectivity properties on the solution ${u: {\bf R}^{1+d} \rightarrow {\bf R}^m}$ (after making a symmetry reduction using an assumption of spherical symmetry to view ${u}$ as a function of ${1+1}$ variables rather than ${1+d}$) which restricts our counterexample to the ${m \geq 2}$ case. Thus the model case of the higher-dimensional sine-Gordon equation ${\Box u =\sin u}$ is not covered by our arguments. Nevertheless (as with previous finite-time blowup results discussed on this blog), one can view this result as a barrier to trying to prove regularity for equations such as ${\Box u = \sin u}$ in eleven and higher dimensions, as any such argument must somehow use a property of that equation that is not applicable to the more general system (1).

Let us first give some back-of-the-envelope calculations suggesting why there could be finite time blowup in eleven and higher dimensions. For sake of this discussion let us restrict attention to the sine-Gordon equation ${\Box u = \sin u}$. The blowup ansatz we will use is as follows: for each frequency ${N_j}$ in a sequence ${1 < N_1 < N_2 < N_3 < \dots}$ of large quantities going to infinity, there will be a spacetime “cube” ${Q_j = \{ (t,x): t \sim \frac{1}{N_j}; x = O(\frac{1}{N_j})\}}$ on which the solution ${u}$ oscillates with “amplitude” ${N_j^\alpha}$ and “frequency” ${N_j}$, where ${\alpha>0}$ is an exponent to be chosen later; this ansatz is of course compatible with the uncertainty principle. Since ${N_j^\alpha \rightarrow \infty}$ as ${j \rightarrow \infty}$, this will create a singularity at the spacetime origin ${(0,0)}$. To make this ansatz plausible, we wish to make the oscillation of ${u}$ on ${Q_j}$ driven primarily by the forcing term ${\sin u}$ at ${Q_{j-1}}$. Thus, by Duhamel’s formula, we expect a relation roughly of the form

$\displaystyle u(t,x) \approx \int \frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}} \sin(1_{Q_{j-1}} u(s)) (x)\ ds$

on ${Q_j}$, where ${\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}}$ is the usual free wave propagator, and ${1_{Q_{j-1}}}$ is the indicator function of ${Q_{j-1}}$.

On ${Q_{j-1}}$, ${u}$ oscillates with amplitude ${N_{j-1}^\alpha}$ and frequency ${N_{j-1}}$, we expect the derivative ${\nabla_{t,x} u}$ to be of size about ${N_{j-1}^{\alpha+1}}$, and so from the principle of stationary phase we expect ${\sin(u)}$ to oscillate at frequency about ${N_{j-1}^{\alpha+1}}$. Since the wave propagator ${\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}}$ preserves frequencies, and ${u}$ is supposed to be of frequency ${N_j}$ on ${Q_j}$ we are thus led to the requirement

$\displaystyle N_j \approx N_{j-1}^{\alpha+1}. \ \ \ \ \ (2)$

Next, when restricted to frequencies of order ${N_{j}}$, the propagator ${\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}}$ “behaves like” ${N_{j}^{\frac{d-3}{2}} (s-t)^{\frac{d-1}{2}} A_{s-t}}$, where ${A_{s-t}}$ is the spherical averaging operator

$\displaystyle A_{s-t} f(x) := \frac{1}{\omega_{d-1}} \int_{S^{d-1}} f(x + (s-t)\theta)\ d\theta$

where ${d\theta}$ is surface measure on the unit sphere ${S^{d-1}}$, and ${\omega_{d-1}}$ is the volume of that sphere. In our setting, ${s-t}$ is comparable to ${1/N_{j-1}}$, and so we have the informal approximation

$\displaystyle u(t,x) \approx N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \int_{s \sim 1/N_{j-1}} A_{s-t} \sin(u(s))(x)\ ds$

on ${Q_j}$.

Since ${\sin(u(s))}$ is bounded, ${A_{s-t} \sin(u(s))}$ is bounded as well. This gives a (non-rigorous) upper bound

$\displaystyle u(t,x) \lessapprox N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \frac{1}{N_{j-1}}$

which when combined with our ansatz that ${u}$ has ampitude about ${N_j^\alpha}$ on ${Q_j}$, gives the constraint

$\displaystyle N_j^\alpha \lessapprox N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \frac{1}{N_{j-1}}$

which on applying (2) gives the further constraint

$\displaystyle \alpha(\alpha+1) \leq \frac{d-3}{2} (\alpha+1) - \frac{d-1}{2} - 1$

which can be rearranged as

$\displaystyle \left(\alpha - \frac{d-5}{4}\right)^2 \leq \frac{d^2-10d-7}{16}.$

It is now clear that the optimal choice of ${\alpha}$ is

$\displaystyle \alpha = \frac{d-5}{4},$

and this blowup ansatz is only self-consistent when

$\displaystyle \frac{d^2-10d-7}{16} \geq 0$

or equivalently if ${d \geq 11}$.

To turn this ansatz into an actual blowup example, we will construct ${u}$ as the sum of various functions ${u_j}$ that solve the wave equation with forcing term in ${Q_{j+1}}$, and which concentrate in ${Q_j}$ with the amplitude and frequency indicated by the above heuristic analysis. The remaining task is to show that ${\Box u}$ can be written in the form ${f(u)}$ for some ${f}$ with all derivatives bounded. For this one needs some injectivity properties of ${u}$ (after imposing spherical symmetry to impose a dimensional reduction on the domain of ${u}$ from ${d+1}$ dimensions to ${1+1}$). This requires one to construct some solutions to the free wave equation that have some unusual restrictions on the range (for instance, we will need a solution taking values in the plane ${{\bf R}^2}$ that avoid one quadrant of that plane). In order to do this we take advantage of the very explicit nature of the fundamental solution to the wave equation in odd dimensions (such as ${d=11}$), particularly under the assumption of spherical symmetry. Specifically, one can show that in odd dimension ${d}$, any spherically symmetric function ${u(t,x) = u(t,r)}$ of the form

$\displaystyle u(t,r) = \left(\frac{1}{r} \partial_r\right)^{\frac{d-1}{2}} (g(t+r) + g(t-r))$

for an arbitrary smooth function ${g: {\bf R} \rightarrow {\bf R}^m}$, will solve the free wave equation; this is ultimately due to iterating the “ladder operator” identity

$\displaystyle \left( \partial_{tt} + \partial_{rr} + \frac{d-1}{r} \partial_r \right) \frac{1}{r} \partial_r = \frac{1}{r} \partial_r \left( \partial_{tt} + \partial_{rr} + \frac{d-3}{r} \partial_r \right).$

This precise and relatively simple formula for ${u}$ allows one to create “bespoke” solutions ${u}$ that obey various unusual properties, without too much difficulty.

It is not clear to me what to conjecture for ${d=10}$. The blowup ansatz given above is a little inefficient, in that the frequency ${N_{j+1}}$ component of the solution is only generated from a portion of the ${N_j}$ component, namely the portion close to a certain light cone. In particular, the solution does not saturate the Strichartz estimates that are used to establish the positive results for ${d \leq 9}$, which helps explain the slight gap between the positive and negative results. It may be that a more complicated ansatz could work to give a negative result in ten dimensions; conversely, it is also possible that one could use more advanced estimates than the Strichartz estimate (that somehow capture the “thinness” of the fundamental solution, and not just its dispersive properties) to stretch the positive results to ten dimensions. Which side the ${d=10}$ case falls in all come down to some rather delicate numerology.

I’ve just uploaded to the arXiv my paper Finite time blowup for a supercritical defocusing nonlinear wave system, submitted to Analysis and PDE. This paper was inspired by a question asked of me by Sergiu Klainerman recently, regarding whether there were any analogues of my blowup example for Navier-Stokes type equations in the setting of nonlinear wave equations.

Recall that the defocusing nonlinear wave (NLW) equation reads

$\displaystyle \Box u = |u|^{p-1} u \ \ \ \ \ (1)$

where ${u: {\bf R}^{1+d} \rightarrow {\bf R}}$ is the unknown scalar field, ${\Box = -\partial_t^2 + \Delta}$ is the d’Alambertian operator, and ${p>1}$ is an exponent. We can generalise this equation to the defocusing nonlinear wave system

$\displaystyle \Box u = (\nabla F)(u) \ \ \ \ \ (2)$

where ${u: {\bf R}^{1+d} \rightarrow {\bf R}^m}$ is now a system of scalar fields, and ${F: {\bf R}^m \rightarrow {\bf R}}$ is a potential which is homogeneous of degree ${p+1}$ and strictly positive away from the origin; the scalar equation corresponds to the case where ${m=1}$ and ${F(u) = \frac{1}{p+1} |u|^{p+1}}$. We will be interested in smooth solutions ${u}$ to (2). It is only natural to restrict to the smooth category when the potential ${F}$ is also smooth; unfortunately, if one requires ${F}$ to be homogeneous of order ${p+1}$ all the way down to the origin, then ${F}$ cannot be smooth unless it is identically zero or ${p+1}$ is an odd integer. This is too restrictive for us, so we will only require that ${F}$ be homogeneous away from the origin (e.g. outside the unit ball). In any event it is the behaviour of ${F(u)}$ for large ${u}$ which will be decisive in understanding regularity or blowup for the equation (2).

Formally, solutions to the equation (2) enjoy a conserved energy

$\displaystyle E[u] = \int_{{\bf R}^d} \frac{1}{2} \|\partial_t u \|^2 + \frac{1}{2} \| \nabla_x u \|^2 + F(u)\ dx.$

Using this conserved energy, it is possible to establish global regularity for the Cauchy problem (2) in the energy-subcritical case when ${d \leq 2}$, or when ${d \geq 3}$ and ${p < 1+\frac{4}{d-2}}$. This means that for any smooth initial position ${u_0: {\bf R}^d \rightarrow {\bf R}^m}$ and initial velocity ${u_1: {\bf R}^d \rightarrow {\bf R}^m}$, there exists a (unique) smooth global solution ${u: {\bf R}^{1+d} \rightarrow {\bf R}^m}$ to the equation (2) with ${u(0,x) = u_0(x)}$ and ${\partial_t u(0,x) = u_1(x)}$. These classical global regularity results (essentially due to Jörgens) were famously extended to the energy-critical case when ${d \geq 3}$ and ${p = 1 + \frac{4}{d-2}}$ by Grillakis, Struwe, and Shatah-Struwe (though for various technical reasons, the global regularity component of these results was limited to the range ${3 \leq d \leq 7}$). A key tool used in the energy-critical theory is the Morawetz estimate

$\displaystyle \int_0^T \int_{{\bf R}^d} \frac{|u(t,x)|^{p+1}}{|x|}\ dx dt \lesssim E[u]$

which can be proven by manipulating the properties of the stress-energy tensor

$\displaystyle T_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle - \frac{1}{2} \eta_{\alpha \beta} (\langle \partial^\gamma u, \partial_\gamma u \rangle + F(u))$

(with the usual summation conventions involving the Minkowski metric ${\eta_{\alpha \beta} dx^\alpha dx^\beta = -dt^2 + |dx|^2}$) and in particular exploiting the divergence-free nature of this tensor: ${\partial^\beta T_{\alpha \beta}}$ See for instance the text of Shatah-Struwe, or my own PDE book, for more details. The energy-critical regularity results have also been extended to slightly supercritical settings in which the potential grows by a logarithmic factor or so faster than the critical rate; see the results of myself and of Roy.

This leaves the question of global regularity for the energy supercritical case when ${d \geq 3}$ and ${p > 1+\frac{4}{d-2}}$. On the one hand, global smooth solutions are known for small data (if ${F}$ vanishes to sufficiently high order at the origin, see e.g. the work of Lindblad and Sogge), and global weak solutions for large data were constructed long ago by Segal. On the other hand, the solution map, if it exists, is known to be extremely unstable, particularly at high frequencies; see for instance this paper of Lebeau, this paper of Christ, Colliander, and myself, this paper of Brenner and Kumlin, or this paper of Ibrahim, Majdoub, and Masmoudi for various formulations of this instability. In the case of the focusing NLW ${-\partial_{tt} u + \Delta u = - |u|^{p-1} u}$, one can easily create solutions that blow up in finite time by ODE constructions, for instance one can take ${u(t,x) = c (1-t)^{-\frac{2}{p-1}}}$ with ${c = (\frac{2(p+1)}{(p-1)^2})^{\frac{1}{p-1}}}$, which blows up as ${t}$ approaches ${1}$. However the situation in the defocusing supercritical case is less clear. The strongest positive results are of Kenig-Merle and Killip-Visan, which show (under some additional technical hypotheses) that global regularity for such equations holds under the additional assumption that the critical Sobolev norm of the solution stays bounded. Roughly speaking, this shows that “Type II blowup” cannot occur for (2).

Our main result is that finite time blowup can in fact occur, at least for three-dimensional systems where the number ${m}$ of degrees of freedom is sufficiently large:

Theorem 1 Let ${d=3}$, ${p > 5}$, and ${m \geq 76}$. Then there exists a smooth potential ${F: {\bf R}^m \rightarrow {\bf R}}$, positive and homogeneous of degree ${p+1}$ away from the origin, and a solution to (2) with smooth initial data that develops a singularity in finite time.

The rather large lower bound of ${76}$ on ${m}$ here is primarily due to our use of the Nash embedding theorem (which is the first time I have actually had to use this theorem in an application!). It can certainly be lowered, but unfortunately our methods do not seem to be able to bring ${m}$ all the way down to ${1}$, so we do not directly exhibit finite time blowup for the scalar supercritical defocusing NLW. Nevertheless, this result presents a barrier to any attempt to prove global regularity for that equation, in that it must somehow use a property of the scalar equation which is not available for systems. It is likely that the methods can be adapted to higher dimensions than three, but we take advantage of some special structure to the equations in three dimensions (related to the strong Huygens principle) which does not seem to be available in higher dimensions.

The blowup will in fact be of discrete self-similar type in a backwards light cone, thus ${u}$ will obey a relation of the form

$\displaystyle u(e^S t, e^S x) = e^{-\frac{2}{p-1} S} u(t,x)$

for some fixed ${S>0}$ (the exponent ${-\frac{2}{p-1}}$ is mandated by dimensional analysis considerations). It would be natural to consider continuously self-similar solutions (in which the above relation holds for all ${S}$, not just one ${S}$). And rough self-similar solutions have been constructed in the literature by perturbative methods (see this paper of Planchon, or this paper of Ribaud and Youssfi). However, it turns out that continuously self-similar solutions to a defocusing equation have to obey an additional monotonicity formula which causes them to not exist in three spatial dimensions; this argument is given in my paper. So we have to work just with discretely self-similar solutions.

Because of the discrete self-similarity, the finite time blowup solution will be “locally Type II” in the sense that scale-invariant norms inside the backwards light cone stay bounded as one approaches the singularity. But it will not be “globally Type II” in that scale-invariant norms stay bounded outside the light cone as well; indeed energy will leak from the light cone at every scale. This is consistent with the results of Kenig-Merle and Killip-Visan which preclude “globally Type II” blowup solutions to these equations in many cases.

We now sketch the arguments used to prove this theorem. Usually when studying the NLW, we think of the potential ${F}$ (and the initial data ${u_0,u_1}$) as being given in advance, and then try to solve for ${u}$ as an unknown field. However, in this problem we have the freedom to select ${F}$. So we can look at this problem from a “backwards” direction: we first choose the field ${u}$, and then fit the potential ${F}$ (and the initial data) to match that field.

Now, one cannot write down a completely arbitrary field ${u}$ and hope to find a potential ${F}$ obeying (2), as there are some constraints coming from the homogeneity of ${F}$. Namely, from the Euler identity

$\displaystyle \langle u, (\nabla F)(u) \rangle = (p+1) F(u)$

we see that ${F(u)}$ can be recovered from (2) by the formula

$\displaystyle F(u) = \frac{1}{p+1} \langle u, \Box u \rangle \ \ \ \ \ (3)$

so the defocusing nature of ${F}$ imposes a constraint

$\displaystyle \langle u, \Box u \rangle > 0.$

Furthermore, taking a derivative of (3) we obtain another constraining equation

$\displaystyle \langle \partial_\alpha u, \Box u \rangle = \frac{1}{p+1} \partial_\alpha \langle u, \Box u \rangle$

that does not explicitly involve the potential ${F}$. Actually, one can write this equation in the more familiar form

$\displaystyle \partial^\beta T_{\alpha \beta} = 0$

where ${T_{\alpha \beta}}$ is the stress-energy tensor

$\displaystyle T_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle - \frac{1}{2} \eta_{\alpha \beta} (\langle \partial^\gamma u, \partial_\gamma u \rangle + \frac{1}{p+1} \langle u, \Box u \rangle),$

now written in a manner that does not explicitly involve ${F}$.

With this reformulation, this suggests a strategy for locating ${u}$: first one selects a stress-energy tensor ${T_{\alpha \beta}}$ that is divergence-free and obeys suitable positive definiteness and self-similarity properties, and then locates a self-similar map ${u}$ from the backwards light cone to ${{\bf R}^m}$ that has that stress-energy tensor (one also needs the map ${u}$ (or more precisely the direction component ${u/\|u\|}$ of that map) injective up to the discrete self-similarity, in order to define ${F(u)}$ consistently). If the stress-energy tensor was replaced by the simpler “energy tensor”

$\displaystyle E_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle$

then the question of constructing an (injective) map ${u}$ with the specified energy tensor is precisely the embedding problem that was famously solved by Nash (viewing ${E_{\alpha \beta}}$ as a Riemannian metric on the domain of ${u}$, which in this case is a backwards light cone quotiented by a discrete self-similarity to make it compact). It turns out that one can adapt the Nash embedding theorem to also work with the stress-energy tensor as well (as long as one also specifies the mass density ${M = \|u\|^2}$, and as long as a certain positive definiteness property, related to the positive semi-definiteness of Gram matrices, is obeyed). Here is where the dimension ${76}$ shows up:

Proposition 2 Let ${M}$ be a smooth compact Riemannian ${4}$-manifold, and let ${m \geq 76}$. Then ${M}$ smoothly isometrically embeds into the sphere ${S^{m-1}}$.

Proof: The Nash embedding theorem (in the form given in this ICM lecture of Gunther) shows that ${M}$ can be smoothly isometrically embedded into ${{\bf R}^{19}}$, and thus in ${[-R,R]^{19}}$ for some large ${R}$. Using an irrational slope, the interval ${[-R,R]}$ can be smoothly isometrically embedded into the ${2}$-torus ${\frac{1}{\sqrt{38}} (S^1 \times S^1)}$, and so ${[-R,R]^{19}}$ and hence ${M}$ can be smoothly embedded in ${\frac{1}{\sqrt{38}} (S^1)^{38}}$. But from Pythagoras’ theorem, ${\frac{1}{\sqrt{38}} (S^1)^{38}}$ can be identified with a subset of ${S^{m-1}}$ for any ${m \geq 76}$, and the claim follows. $\Box$

One can presumably improve upon the bound ${76}$ by being more efficient with the embeddings (e.g. by modifying the proof of Nash embedding to embed directly into a round sphere), but I did not try to optimise the bound here.

The remaining task is to construct the stress-energy tensor ${T_{\alpha \beta}}$. One can reduce to tensors that are invariant with respect to rotations around the spatial origin, but this still leaves a fair amount of degrees of freedom (it turns out that there are four fields that need to be specified, which are denoted ${M, E_{tt}, E_{tr}, E_{rr}}$ in my paper). However a small miracle occurs in three spatial dimensions, in that the divergence-free condition involves only two of the four degrees of freedom (or three out of four, depending on whether one considers a function that is even or odd in ${r}$ to only be half a degree of freedom). This is easiest to illustrate with the scalar NLW (1). Assuming spherical symmetry, this equation becomes

$\displaystyle - \partial_{tt} u + \partial_{rr} u + \frac{2}{r} \partial_r u = |u|^{p-1} u.$

Making the substitution ${\phi := ru}$, we can eliminate the lower order term ${\frac{2}{r} \partial_r}$ completely to obtain

$\displaystyle - \partial_{tt} \phi + \partial_{rr} \phi= \frac{1}{r^{p-1}} |\phi|^{p-1} \phi.$

(This can be compared with the situation in higher dimensions, in which an undesirable zeroth order term ${\frac{(d-1)(d-3)}{r^2} \phi}$ shows up.) In particular, if one introduces the null energy density

$\displaystyle e_+ := \frac{1}{2} |\partial_t \phi + \partial_r \phi|^2$

and the potential energy density

$\displaystyle V := \frac{|\phi|^{p+1}}{(p+1) r^{p-1}}$

then one can verify the equation

$\displaystyle (\partial_t - \partial_r) e_+ + (\partial_t + \partial_r) V = - \frac{p-1}{r} V$

which can be viewed as a transport equation for ${e_+}$ with forcing term depending on ${V}$ (or vice versa), and is thus quite easy to solve explicitly by choosing one of these fields and then solving for the other. As it turns out, once one is in the supercritical regime ${p>5}$, one can solve this equation while giving ${e_+}$ and ${V}$ the right homogeneity (they have to be homogeneous of order ${-\frac{4}{p-1}}$, which is greater than ${-1}$ in the supercritical case) and positivity properties, and from this it is possible to prescribe all the other fields one needs to satisfy the conclusions of the main theorem. (It turns out that ${e_+}$ and ${V}$ will be concentrated near the boundary of the light cone, so this is how the solution ${u}$ will concentrate also.)