You are currently browsing the category archive for the ‘paper’ category.

The AMS and MAA have recently published (and made available online) a collection of essays entitled “Living Proof: Stories of Resilience Along the Mathematical Journey”.  Each author contributes a story of how they encountered some internal or external difficulty in advancing their mathematical career, and how they were able to deal with such difficulties.  I myself have contributed one of these essays; I was initially somewhat surprised when I was approached for a contribution, as my career trajectory has been somewhat of an outlier, and I have been very fortunate to not experience to the same extent many of the obstacles that other contributors write about in this text.    Nevertheless there was a turning point in my career that I write about here during my graduate years, when I found that the improvised and poorly disciplined study habits that were able to get me into graduate school due to an over-reliance on raw mathematical ability were completely inadequate to handle the graduate qualifying exam.  With a combination of an astute advisor and some sheer luck, I was able to pass the exam and finally develop a more sustainable approach to learning and doing mathematics, but it could easily have gone quite differently.  (My 20 25-year old writeup of this examination, complete with spelling errors, may be found here.)

 

 

I was recently asked to contribute a short comment to Nature Reviews Physics, as part of a series of articles on fluid dynamics on the occasion of the 200th anniversary (this August) of the birthday of George Stokes.  My contribution is now online as “Searching for singularities in the Navier–Stokes equations“, where I discuss the global regularity problem for Navier-Stokes and my thoughts on how one could try to construct a solution that blows up in finite time via an approximately discretely self-similar “fluid computer”.  (The rest of the series does not currently seem to be available online, but I expect they will become so shortly.)

 

The Polymath15 paper “Effective approximation of heat flow evolution of the Riemann {\xi} function, and a new upper bound for the de Bruijn-Newman constant“, submitted to Research in the Mathematical Sciences, has just been uploaded to the arXiv. This paper records the mix of theoretical and computational work needed to improve the upper bound on the de Bruijn-Newman constant {\Lambda}. This constant can be defined as follows. The function

\displaystyle H_0(z) := \frac{1}{8} \xi\left(\frac{1}{2} + \frac{iz}{2}\right),

where {\xi} is the Riemann {\xi} function

\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma\left(\frac{s}{2}\right) \zeta(s)

has a Fourier representation

\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du

where {\Phi} is the super-exponentially decaying function

\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u} ).

The Riemann hypothesis is equivalent to the claim that all the zeroes of {H_0} are real. De Bruijn introduced (in different notation) the deformations

\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du

of {H_0}; one can view this as the solution to the backwards heat equation {\partial_t H_t = -\partial_{zz} H_t} starting at {H_0}. From the work of de Bruijn and of Newman, it is known that there exists a real number {\Lambda} – the de Bruijn-Newman constant – such that {H_t} has all zeroes real for {t \geq \Lambda} and has at least one non-real zero for {t < \Lambda}. In particular, the Riemann hypothesis is equivalent to the assertion {\Lambda \leq 0}. Prior to this paper, the best known bounds for this constant were

\displaystyle 0 \leq \Lambda < 1/2

with the lower bound due to Rodgers and myself, and the upper bound due to Ki, Kim, and Lee. One of the main results of the paper is to improve the upper bound to

\displaystyle \Lambda \leq 0.22. \ \ \ \ \ (1)

At a purely numerical level this gets “closer” to proving the Riemann hypothesis, but the methods of proof take as input a finite numerical verification of the Riemann hypothesis up to some given height {T} (in our paper we take {T \sim 3 \times 10^{10}}) and converts this (and some other numerical verification) to an upper bound on {\Lambda} that is of order {O(1/\log T)}. As discussed in the final section of the paper, further improvement of the numerical verification of RH would thus lead to modest improvements in the upper bound on {\Lambda}, although it does not seem likely that our methods could for instance improve the bound to below {0.1} without an infeasible amount of computation.

We now discuss the methods of proof. An existing result of de Bruijn shows that if all the zeroes of {H_{t_0}(z)} lie in the strip {\{ x+iy: |y| \leq y_0\}}, then {\Lambda \leq t_0 + \frac{1}{2} y_0^2}; we will verify this hypothesis with {t_0=y_0=0.2}, thus giving (1). Using the symmetries and the known zero-free regions, it suffices to show that

\displaystyle H_{0.2}(x+iy) \neq 0 \ \ \ \ \ (2)

whenever {x \geq 0} and {0.2 \leq y \leq 1}.

For large {x} (specifically, {x \geq 6 \times 10^{10}}), we use effective numerical approximation to {H_t(x+iy)} to establish (2), as discussed in a bit more detail below. For smaller values of {x}, the existing numerical verification of the Riemann hypothesis (we use the results of Platt) shows that

\displaystyle H_0(x+iy) \neq 0

for {0 \leq x \leq 6 \times 10^{10}} and {0.2 \leq y \leq 1}. The problem though is that this result only controls {H_t} at time {t=0} rather than the desired time {t = 0.2}. To bridge the gap we need to erect a “barrier” that, roughly speaking, verifies that

\displaystyle H_t(x+iy) \neq 0 \ \ \ \ \ (3)

for {0 \leq t \leq 0.2}, {x = 6 \times 10^{10} + O(1)}, and {0.2 \leq y \leq 1}; with a little bit of work this barrier shows that zeroes cannot sneak in from the right of the barrier to the left in order to produce counterexamples to (2) for small {x}.

To enforce this barrier, and to verify (2) for large {x}, we need to approximate {H_t(x+iy)} for positive {t}. Our starting point is the Riemann-Siegel formula, which roughly speaking is of the shape

\displaystyle H_0(x+iy) \approx B_0(x+iy) ( \sum_{n=1}^N \frac{1}{n^{\frac{1+y-ix}{2}}} + \gamma_0(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\frac{1+y+ix}{2}}} )

where {N := \sqrt{x/4\pi}}, {B_0(x+iy)} is an explicit “gamma factor” that decays exponentially in {x}, and {\gamma_0(x+iy)} is a ratio of gamma functions that is roughly of size {(x/4\pi)^{-y/2}}. Deforming this by the heat flow gives rise to an approximation roughly of the form

\displaystyle H_t(x+iy) \approx B_t(x+iy) ( \sum_{n=1}^N \frac{b_n^t}{n^{s_*}} + \gamma_t(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\overline{s_*}}} ) \ \ \ \ \ (4)

where {B_t(x+iy)} and {\gamma_t(x+iy)} are variants of {B_0(x+iy)} and {\gamma_0(x+iy)}, {b_n^t := \exp( \frac{t}{4} \log^2 n )}, and {s_*} is an exponent which is roughly {\frac{1+y-ix}{2} + \frac{t}{4} \log \frac{x}{4\pi}}. In particular, for positive values of {t}, {s_*} increases (logarithmically) as {x} increases, and the two sums in the Riemann-Siegel formula become increasingly convergent (even in the face of the slowly increasing coefficients {b_n^t}). For very large values of {x} (in the range {x \geq \exp(C/t)} for a large absolute constant {C}), the {n=1} terms of both sums dominate, and {H_t(x+iy)} begins to behave in a sinusoidal fashion, with the zeroes “freezing” into an approximate arithmetic progression on the real line much like the zeroes of the sine or cosine functions (we give some asymptotic theorems that formalise this “freezing” effect). This lets one verify (2) for extremely large values of {x} (e.g., {x \geq 10^{12}}). For slightly less large values of {x}, we first multiply the Riemann-Siegel formula by an “Euler product mollifier” to reduce some of the oscillation in the sum and make the series converge better; we also use a technical variant of the triangle inequality to improve the bounds slightly. These are sufficient to establish (2) for moderately large {x} (say {x \geq 6 \times 10^{10}}) with only a modest amount of computational effort (a few seconds after all the optimisations; on my own laptop with very crude code I was able to verify all the computations in a matter of minutes).

The most difficult computational task is the verification of the barrier (3), particularly when {t} is close to zero where the series in (4) converge quite slowly. We first use an Euler product heuristic approximation to {H_t(x+iy)} to decide where to place the barrier in order to make our numerical approximation to {H_t(x+iy)} as large in magnitude as possible (so that we can afford to work with a sparser set of mesh points for the numerical verification). In order to efficiently evaluate the sums in (4) for many different values of {x+iy}, we perform a Taylor expansion of the coefficients to factor the sums as combinations of other sums that do not actually depend on {x} and {y} and so can be re-used for multiple choices of {x+iy} after a one-time computation. At the scales we work in, this computation is still quite feasible (a handful of minutes after software and hardware optimisations); if one assumes larger numerical verifications of RH and lowers {t_0} and {y_0} to optimise the value of {\Lambda} accordingly, one could get down to an upper bound of {\Lambda \leq 0.1} assuming an enormous numerical verification of RH (up to height about {4 \times 10^{21}}) and a very large distributed computing project to perform the other numerical verifications.

This post can serve as the (presumably final) thread for the Polymath15 project (continuing this post), to handle any remaining discussion topics for that project.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Value patterns of multiplicative functions and related sequences“, submitted to Forum of Mathematics, Sigma. This paper explores how to use recent technology on correlations of multiplicative (or nearly multiplicative functions), such as the “entropy decrement method”, in conjunction with techniques from additive combinatorics, to establish new results on the sign patterns of functions such as the Liouville function {\lambda}. For instance, with regards to length 5 sign patterns

\displaystyle  (\lambda(n+1),\dots,\lambda(n+5)) \in \{-1,+1\}^5

of the Liouville function, we can now show that at least {24} of the {32} possible sign patterns in {\{-1,+1\}^5} occur with positive upper density. (Conjecturally, all of them do so, and this is known for all shorter sign patterns, but unfortunately {24} seems to be the limitation of our methods.)

The Liouville function can be written as {\lambda(n) = e^{2\pi i \Omega(n)/2}}, where {\Omega(n)} is the number of prime factors of {n} (counting multiplicity). One can also consider the variant {\lambda_3(n) = e^{2\pi i \Omega(n)/3}}, which is a completely multiplicative function taking values in the cube roots of unity {\{1, \omega, \omega^2\}}. Here we are able to show that all {27} sign patterns in {\{1,\omega,\omega^2\}} occur with positive lower density as sign patterns {(\lambda_3(n+1), \lambda_3(n+2), \lambda_3(n+3))} of this function. The analogous result for {\lambda} was already known (see this paper of Matomäki, Radziwiłł, and myself), and in that case it is even known that all sign patterns occur with equal logarithmic density {1/8} (from this paper of myself and Teräväinen), but these techniques barely fail to handle the {\lambda_3} case by itself (largely because the “parity” arguments used in the case of the Liouville function no longer control three-point correlations in the {\lambda_3} case) and an additional additive combinatorial tool is needed. After applying existing technology (such as entropy decrement methods), the problem roughly speaking reduces to locating patterns {a \in A_1, a+r \in A_2, a+2r \in A_3} for a certain partition {G = A_1 \cup A_2 \cup A_3} of a compact abelian group {G} (think for instance of the unit circle {G={\bf R}/{\bf Z}}, although the general case is a bit more complicated, in particular if {G} is disconnected then there is a certain “coprimality” constraint on {r}, also we can allow the {A_1,A_2,A_3} to be replaced by any {A_{c_1}, A_{c_2}, A_{c_3}} with {c_1+c_2+c_3} divisible by {3}), with each of the {A_i} having measure {1/3}. An inequality of Kneser just barely fails to guarantee the existence of such patterns, but by using an inverse theorem for Kneser’s inequality in this previous paper of mine we are able to identify precisely the obstruction for this method to work, and rule it out by an ad hoc method.

The same techniques turn out to also make progress on some conjectures of Erdös-Pomerance and Hildebrand regarding patterns of the largest prime factor {P^+(n)} of a natural number {n}. For instance, we improve results of Erdös-Pomerance and of Balog demonstrating that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3)

each hold for infinitely many {n}, by demonstrating the stronger claims that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3) > P^+(n+4)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3) < P^+(n+4)

each hold for a set of {n} of positive lower density. As a variant, we also show that we can find a positive density set of {n} for which

\displaystyle  P^+(n+1), P^+(n+2), P^+(n+3) > n^\gamma

for any fixed {\gamma < e^{-1/3} = 0.7165\dots} (this improves on a previous result of Hildebrand with {e^{-1/3}} replaced by {e^{-1/2} = 0.6065\dots}. A number of other results of this type are also obtained in this paper.

In order to obtain these sorts of results, one needs to extend the entropy decrement technology from the setting of multiplicative functions to that of what we call “weakly stable sets” – sets {A} which have some multiplicative structure, in the sense that (roughly speaking) there is a set {B} such that for all small primes {p}, the statements {n \in A} and {pn \in B} are roughly equivalent to each other. For instance, if {A} is a level set {A = \{ n: \omega(n) = 0 \hbox{ mod } 3 \}}, one would take {B = \{ n: \omega(n) = 1 \hbox{ mod } 3 \}}; if instead {A} is a set of the form {\{ n: P^+(n) \geq n^\gamma\}}, then one can take {B=A}. When one has such a situation, then very roughly speaking, the entropy decrement argument then allows one to estimate a one-parameter correlation such as

\displaystyle  {\bf E}_n 1_A(n+1) 1_A(n+2) 1_A(n+3)

with a two-parameter correlation such as

\displaystyle  {\bf E}_n {\bf E}_p 1_B(n+p) 1_B(n+2p) 1_B(n+3p)

(where we will be deliberately vague as to how we are averaging over {n} and {p}), and then the use of the “linear equations in primes” technology of Ben Green, Tamar Ziegler, and myself then allows one to replace this average in turn by something like

\displaystyle  {\bf E}_n {\bf E}_r 1_B(n+r) 1_B(n+2r) 1_B(n+3r)

where {r} is constrained to be not divisible by small primes but is otherwise quite arbitrary. This latter average can then be attacked by tools from additive combinatorics, such as translation to a continuous group model (using for instance the Furstenberg correspondence principle) followed by tools such as Kneser’s inequality (or inverse theorems to that inequality).

I have just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds, II. Non-rigidity of Euler flows“, submitted to Pure and Applied Functional Analysis. This paper continues my attempts to establish “universality” properties of the Euler equations on Riemannian manifolds {(M,g)}, as I conjecture that the freedom to set the metric {g} ought to allow one to “program” such Euler flows to exhibit a wide range of behaviour, and in particular to achieve finite time blowup (if the dimension is sufficiently large, at least).

In coordinates, the Euler equations read

\displaystyle \partial_t u^k + u^j \nabla_j u^k = - \nabla^k p \ \ \ \ \ (1)

 

\displaystyle \nabla_k u^k = 0

where {p: [0,T] \rightarrow C^\infty(M)} is the pressure field and {u: [0,T] \rightarrow \Gamma(TM)} is the velocity field, and {\nabla} denotes the Levi-Civita connection with the usual Penrose abstract index notation conventions; we restrict attention here to the case where {u,p} are smooth and {M} is compact, smooth, orientable, connected, and without boundary. Let’s call {u} an Euler flow on {M} (for the time interval {[0,T]}) if it solves the above system of equations for some pressure {p}, and an incompressible flow if it just obeys the divergence-free relation {\nabla_k u^k=0}. Thus every Euler flow is an incompressible flow, but the converse is certainly not true; for instance the various conservation laws of the Euler equation, such as conservation of energy, will already block most incompressible flows from being an Euler flow, or even being approximated in a reasonably strong topology by such Euler flows.

However, one can ask if an incompressible flow can be extended to an Euler flow by adding some additional dimensions to {M}. In my paper, I formalise this by considering warped products {\tilde M} of {M} which (as a smooth manifold) are products {\tilde M = M \times ({\bf R}/{\bf Z})^m} of {M} with a torus, with a metric {\tilde g} given by

\displaystyle d \tilde g^2 = g_{ij}(x) dx^i dx^j + \sum_{s=1}^m \tilde g_{ss}(x) (d\theta^s)^2

for {(x,\theta) \in \tilde M}, where {\theta^1,\dots,\theta^m} are the coordinates of the torus {({\bf R}/{\bf Z})^m}, and {\tilde g_{ss}: M \rightarrow {\bf R}^+} are smooth positive coefficients for {s=1,\dots,m}; in order to preserve the incompressibility condition, we also require the volume preservation property

\displaystyle \prod_{s=1}^m \tilde g_{ss}(x) = 1 \ \ \ \ \ (2)

though in practice we can quickly dispose of this condition by adding one further “dummy” dimension to the torus {({\bf R}/{\bf Z})^m}. We say that an incompressible flow {u} is extendible to an Euler flow if there exists a warped product {\tilde M} extending {M}, and an Euler flow {\tilde u} on {\tilde M} of the form

\displaystyle \tilde u(t,(x,\theta)) = u^i(t,x) \frac{d}{dx^i} + \sum_{s=1}^m \tilde u^s(t,x) \frac{d}{d\theta^s}

for some “swirl” fields {\tilde u^s: [0,T] \times M \rightarrow {\bf R}}. The situation here is motivated by the familiar situation of studying axisymmetric Euler flows {\tilde u} on {{\bf R}^3}, which in cylindrical coordinates take the form

\displaystyle \tilde u(t,(r,z,\theta)) = u^r(t,r,z) \frac{d}{dr} + u^z(t,r,z) \frac{d}{dz} + \tilde u^\theta(t,r,z) \frac{d}{d\theta}.

The base component

\displaystyle u^r(t,r,z) \frac{d}{dr} + u^z(t,r,z) \frac{d}{dz}

of this flow is then a flow on the two-dimensional {r,z} plane which is not quite incompressible (due to the failure of the volume preservation condition (2) in this case) but still satisfies a system of equations (coupled with a passive scalar field {\rho} that is basically the square of the swirl {\tilde u^\rho}) that is reminiscent of the Boussinesq equations.

On a fixed {d}-dimensional manifold {(M,g)}, let {{\mathcal F}} denote the space of incompressible flows {u: [0,T] \rightarrow \Gamma(TM)}, equipped with the smooth topology (in spacetime), and let {{\mathcal E} \subset {\mathcal F}} denote the space of such flows that are extendible to Euler flows. Our main theorem is

Theorem 1

  • (i) (Generic inextendibility) Assume {d \geq 3}. Then {{\mathcal E}} is of the first category in {{\mathcal F}} (the countable union of nowhere dense sets in {{\mathcal F}}).
  • (ii) (Non-rigidity) Assume {M = ({\bf R}/{\bf Z})^d} (with an arbitrary metric {g}). Then {{\mathcal E}} is somewhere dense in {{\mathcal F}} (that is, the closure of {{\mathcal E}} has non-empty interior).

More informally, starting with an incompressible flow {u}, one usually cannot extend it to an Euler flow just by extending the manifold, warping the metric, and adding swirl coefficients, even if one is allowed to select the dimension of the extension, as well as the metric and coefficients, arbitrarily. However, many such flows can be perturbed to be extendible in such a manner (though different perturbations will require different extensions, in particular the dimension of the extension will not be fixed). Among other things, this means that conservation laws such as energy (or momentum, helicity, or circulation) no longer present an obstruction when one is allowed to perform an extension (basically this is because the swirl components of the extension can exchange energy (or momentum, etc.) with the base components in a basically arbitrary fashion.

These results fall short of my hopes to use the ability to extend the manifold to create universal behaviour in Euler flows, because of the fact that each flow requires a different extension in order to achieve the desired dynamics. Still it does seem to provide a little bit of support to the idea that high-dimensional Euler flows are quite “flexible” in their behaviour, though not completely so due to the generic inextendibility phenomenon. This flexibility reminds me a little bit of the flexibility of weak solutions to equations such as the Euler equations provided by the “{h}-principle” of Gromov and its variants (as discussed in these recent notes), although in this case the flexibility comes from adding additional dimensions, rather than by repeatedly adding high-frequency corrections to the solution.

The proof of part (i) of the theorem basically proceeds by a dimension counting argument (similar to that in the proof of Proposition 9 of these recent lecture notes of mine). Heuristically, the point is that an arbitrary incompressible flow {u} is essentially determined by {d-1} independent functions of space and time, whereas the warping factors {\tilde g_{ss}} are functions of space only, the pressure field is one function of space and time, and the swirl fields {u^s} are technically functions of both space and time, but have the same number of degrees of freedom as a function just of space, because they solve an evolution equation. When {d>2}, this means that there are fewer unknown functions of space and time than prescribed functions of space and time, which is the source of the generic inextendibility. This simple argument breaks down when {d=2}, but we do not know whether the claim is actually false in this case.

The proof of part (ii) proceeds by direct calculation of the effect of the warping factors and swirl velocities, which effectively create a forcing term (of Boussinesq type) in the first equation of (1) that is a combination of functions of the Eulerian spatial coordinates {x^i} (coming from the warping factors) and the Lagrangian spatial coordinates {a^\beta} (which arise from the swirl velocities, which are passively transported by the flow). In a non-empty open subset of {{\mathcal F}}, the combination of these coordinates becomes a non-degenerate set of coordinates for spacetime, and one can then use the Stone-Weierstrass theorem to conclude. The requirement that {M} be topologically a torus is a technical hypothesis in order to avoid topological obstructions such as the hairy ball theorem, but it may be that the hypothesis can be dropped (and it may in fact be true, in the {M = ({\bf R}/{\bf Z})^d} case at least, that {{\mathcal E}} is dense in all of {{\mathcal F}}, not just in a non-empty open subset).

 

Kaisa Matomäki, Maksym Radziwill, and I just uploaded to the arXiv our paper “Fourier uniformity of bounded multiplicative functions in short intervals on average“. This paper is the outcome of our attempts during the MSRI program in analytic number theory last year to attack the local Fourier uniformity conjecture for the Liouville function {\lambda}. This conjecture generalises a landmark result of Matomäki and Radziwill, who show (among other things) that one has the asymptotic

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o(HX) \ \ \ \ \ (1)

whenever {X \rightarrow \infty} and {H = H(X)} goes to infinity as {X \rightarrow \infty}. Informally, this says that the Liouville function has small mean for almost all short intervals {[x,x+H]}. The remarkable thing about this theorem is that there is no lower bound on how {H} goes to infinity with {X}; one can take for instance {H = \log\log\log X}. This lack of lower bound was crucial when I applied this result (or more precisely, a generalisation of this result to arbitrary non-pretentious bounded multiplicative functions) a few years ago to solve the Erdös discrepancy problem, as well as a logarithmically averaged two-point Chowla conjecture, for instance it implies that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log X).

The local Fourier uniformity conjecture asserts the stronger asymptotic

\displaystyle  \int_X^{2X} \sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (2)

under the same hypotheses on {H} and {X}. As I worked out in a previous paper, this conjecture would imply a logarithmically averaged three-point Chowla conjecture, implying for instance that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log X).

This particular bound also follows from some slightly different arguments of Joni Teräväinen and myself, but the implication would also work for other non-pretentious bounded multiplicative functions, whereas the arguments of Joni and myself rely more heavily on the specific properties of the Liouville function (in particular that {\lambda(p)=-1} for all primes {p}).

There is also a higher order version of the local Fourier uniformity conjecture in which the linear phase {{}e(-\alpha n)} is replaced with a polynomial phase such as {e(-\alpha_d n^d - \dots - \alpha_1 n - \alpha_0)}, or more generally a nilsequence {\overline{F(g(n) \Gamma)}}; as shown in my previous paper, this conjecture implies (and is in fact equivalent to, after logarithmic averaging) a logarithmically averaged version of the full Chowla conjecture (not just the two-point or three-point versions), as well as a logarithmically averaged version of the Sarnak conjecture.

The main result of the current paper is to obtain some cases of the local Fourier uniformity conjecture:

Theorem 1 The asymptotic (2) is true when {H = X^\theta} for a fixed {\theta > 0}.

Previously this was known for {\theta > 5/8} by the work of Zhan (who in fact proved the stronger pointwise assertion {\sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|= o(H)} for {X \leq x \leq 2X} in this case). In a previous paper with Kaisa and Maksym, we also proved a weak version

\displaystyle  \sup_{\alpha \in {\bf R}} \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)

of (2) for any {H} growing arbitrarily slowly with {X}; this is stronger than (1) (and is in fact proven by a variant of the method) but significantly weaker than (2), because in the latter the worst-case {\alpha} is permitted to depend on the {x} parameter, whereas in (3) {\alpha} must remain independent of {x}.

Unfortunately, the restriction {H = X^\theta} is not strong enough to give applications to Chowla-type conjectures (one would need something more like {H = \log^\theta X} for this). However, it can still be used to control some sums that had not previously been manageable. For instance, a quick application of the circle method lets one use the above theorem to derive the asymptotic

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \lambda(n) \Lambda(n+h) \Lambda(n+2h) = o( H X )

whenever {H = X^\theta} for a fixed {\theta > 0}, where {\Lambda} is the von Mangoldt function. Amusingly, the seemingly simpler question of establishing the expected asymptotic for

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \Lambda(n+h) \Lambda(n+2h)

is only known in the range {\theta \geq 1/6} (from the work of Zaccagnini). Thus we have a rare example of a number theory sum that becomes easier to control when one inserts a Liouville function!

We now give an informal description of the strategy of proof of the theorem (though for numerous technical reasons, the actual proof deviates in some respects from the description given here). If (2) failed, then for many values of {x \in [X,2X]} we would have the lower bound

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha_x n)| \gg 1

for some frequency {\alpha_x \in{\bf R}}. We informally describe this correlation between {\lambda(n)} and {e(\alpha_x n)} by writing

\displaystyle  \lambda(n) \approx e(\alpha_x n) \ \ \ \ \ (4)

for {n \in [x,x+H]} (informally, one should view this as asserting that {\lambda(n)} “behaves like” a constant multiple of {e(\alpha_x n)}). For sake of discussion, suppose we have this relationship for all {x \in [X,2X]}, not just many.

As mentioned before, the main difficulty here is to understand how {\alpha_x} varies with {x}. As it turns out, the multiplicativity properties of the Liouville function place a significant constraint on this dependence. Indeed, if we let {p} be a fairly small prime (e.g. of size {H^\varepsilon} for some {\varepsilon>0}), and use the identity {\lambda(np) = \lambda(n) \lambda(p) = - \lambda(n)} for the Liouville function to conclude (at least heuristically) from (4) that

\displaystyle  \lambda(n) \approx e(\alpha_x n p)

for {n \in [x/p, x/p + H/p]}. (In practice, we will have this sort of claim for many primes {p} rather than all primes {p}, after using tools such as the Turán-Kubilius inequality, but we ignore this distinction for this informal argument.)

Now let {x, y \in [X,2X]} and {p,q \sim P} be primes comparable to some fixed range {P = H^\varepsilon} such that

\displaystyle  x/p = y/q + O( H/P). \ \ \ \ \ (5)

Then we have both

\displaystyle  \lambda(n) \approx e(\alpha_x n p)

and

\displaystyle  \lambda(n) \approx e(\alpha_y n q)

on essentially the same range of {n} (two nearby intervals of length {\sim H/P}). This suggests that the frequencies {p \alpha_x} and {q \alpha_y} should be close to each other modulo {1}, in particular one should expect the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } 1. \ \ \ \ \ (6)

Comparing this with (5) one is led to the expectation that {\alpha_x} should depend inversely on {x} in some sense (for instance one can check that

\displaystyle  \alpha_x = T/x \ \ \ \ \ (7)

would solve (6) if {T = O( X / H^2 )}; by Taylor expansion, this would correspond to a global approximation of the form {\lambda(n) \approx n^{iT}}). One now has a problem of an additive combinatorial flavour (or of a “local to global” flavour), namely to leverage the relation (6) to obtain global control on {\alpha_x} that resembles (7).

A key obstacle in solving (6) efficiently is the fact that one only knows that {p \alpha_x} and {q \alpha_y} are close modulo {1}, rather than close on the real line. One can start resolving this problem by the Chinese remainder theorem, using the fact that we have the freedom to shift (say) {\alpha_y} by an arbitrary integer. After doing so, one can arrange matters so that one in fact has the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p \ \ \ \ \ (8)

whenever {x,y \in [X,2X]} and {p,q \sim P} obey (5). (This may force {\alpha_q} to become extremely large, on the order of {\prod_{p \sim P} p}, but this will not concern us.)

Now suppose that we have {y,y' \in [X,2X]} and primes {q,q' \sim P} such that

\displaystyle  y/q = y'/q' + O(H/P). \ \ \ \ \ (9)

For every prime {p \sim P}, we can find an {x} such that {x/p} is within {O(H/P)} of both {y/q} and {y'/q'}. Applying (8) twice we obtain

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p

and

\displaystyle  p \alpha_x = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

and thus by the triangle inequality we have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

for all {p \sim P}; hence by the Chinese remainder theorem

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } \prod_{p \sim P} p.

In practice, in the regime {H = X^\theta} that we are considering, the modulus {\prod_{p \sim P} p} is so huge we can effectively ignore it (in the spirit of the Lefschetz principle); so let us pretend that we in fact have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \ \ \ \ \ (10)

whenever {y,y' \in [X,2X]} and {q,q' \sim P} obey (9).

Now let {k} be an integer to be chosen later, and suppose we have primes {p_1,\dots,p_k,q_1,\dots,q_k \sim P} such that the difference

\displaystyle  q = |p_1 \dots p_k - q_1 \dots q_k|

is small but non-zero. If {k} is chosen so that

\displaystyle  P^k \approx \frac{X}{H}

(where one is somewhat loose about what {\approx} means) then one can then find real numbers {x_1,\dots,x_k \sim X} such that

\displaystyle  \frac{x_j}{p_j} = \frac{x_{j+1}}{q_j} + O( \frac{H}{P} )

for {j=1,\dots,k}, with the convention that {x_{k+1} = x_1}. We then have

\displaystyle  p_j \alpha_{x_j} = q_j \alpha_{x_{j+1}} + O( \frac{P}{H} )

which telescopes to

\displaystyle  p_1 \dots p_k \alpha_{x_1} = q_1 \dots q_k \alpha_{x_1} + O( \frac{P^k}{H} )

and thus

\displaystyle  q \alpha_{x_1} = O( \frac{P^k}{H} )

and hence

\displaystyle  \alpha_{x_1} = O( \frac{P^k}{H} ) \approx O( \frac{X}{H^2} ).

In particular, for each {x \sim X}, we expect to be able to write

\displaystyle  \alpha_x = \frac{T_x}{x} + O( \frac{1}{H} )

for some {T_x = O( \frac{X^2}{H^2} )}. This quantity {T_x} can vary with {x}; but from (10) and a short calculation we see that

\displaystyle  T_y = T_{y'} + O( \frac{X}{H} )

whenever {y, y' \in [X,2X]} obey (9) for some {q,q' \sim P}.

Now imagine a “graph” in which the vertices are elements {y} of {[X,2X]}, and two elements {y,y'} are joined by an edge if (9) holds for some {q,q' \sim P}. Because of exponential sum estimates on {\sum_{q \sim P} q^{it}}, this graph turns out to essentially be an “expander” in the sense that any two vertices {y,y' \in [X,2X]} can be connected (in multiple ways) by fairly short paths in this graph (if one allows one to modify one of {y} or {y'} by {O(H)}). As a consequence, we can assume that this quantity {T_y} is essentially constant in {y} (cf. the application of the ergodic theorem in this previous blog post), thus we now have

\displaystyle  \alpha_x = \frac{T}{x} + O(\frac{1}{H} )

for most {x \in [X,2X]} and some {T = O(X^2/H^2)}. By Taylor expansion, this implies that

\displaystyle  \lambda(n) \approx n^{iT}

on {[x,x+H]} for most {x}, thus

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}|\ dx \gg HX.

But this can be shown to contradict the Matomäki-Radziwill theorem (because the multiplicative function {n \mapsto \lambda(n) n^{-iT}} is known to be non-pretentious).

I’ve just uploaded to the arXiv my paper “Embedding the Heisenberg group into a bounded dimensional Euclidean space with optimal distortion“, submitted to Revista Matematica Iberoamericana. This paper concerns the extent to which one can accurately embed the metric structure of the Heisenberg group

\displaystyle H := \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix}

into Euclidean space, which we can write as {\{ [x,y,z]: x,y,z \in {\bf R} \}} with the notation

\displaystyle [x,y,z] := \begin{pmatrix} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1 \end{pmatrix}.

Here we give {H} the right-invariant Carnot-Carathéodory metric {d} coming from the right-invariant vector fields

\displaystyle X := \frac{\partial}{\partial x} + y \frac{\partial}{\partial z}; \quad Y := \frac{\partial}{\partial y}

but not from the commutator vector field

\displaystyle Z := [Y,X] = \frac{\partial}{\partial z}.

This gives {H} the geometry of a Carnot group. As observed by Semmes, it follows from the Carnot group differentiation theory of Pansu that there is no bilipschitz map from {(H,d)} to any Euclidean space {{\bf R}^D} or even to {\ell^2}, since such a map must be differentiable almost everywhere in the sense of Carnot groups, which in particular shows that the derivative map annihilate {Z} almost everywhere, which is incompatible with being bilipschitz.

On the other hand, if one snowflakes the Heisenberg group by replacing the metric {d} with {d^{1-\varepsilon}} for some {0 < \varepsilon < 1}, then it follows from the general theory of Assouad on embedding snowflaked metrics of doubling spaces that {(H,d^{1-\varepsilon})} may be embedded in a bilipschitz fashion into {\ell^2}, or even to {{\bf R}^{D_\varepsilon}} for some {D_\varepsilon} depending on {\varepsilon}.

Of course, the distortion of this bilipschitz embedding must degenerate in the limit {\varepsilon \rightarrow 0}. From the work of Austin-Naor-Tessera and Naor-Neiman it follows that {(H,d^{1-\varepsilon})} may be embedded into {\ell^2} with a distortion of {O( \varepsilon^{-1/2} )}, but no better. The Naor-Neiman paper also embeds {(H,d^{1-\varepsilon})} into a finite-dimensional space {{\bf R}^D} with {D} independent of {\varepsilon}, but at the cost of worsening the distortion to {O(\varepsilon^{-1})}. They then posed the question of whether this worsening of the distortion is necessary.

The main result of this paper answers this question in the negative:

Theorem 1 There exists an absolute constant {D} such that {(H,d^{1-\varepsilon})} may be embedded into {{\bf R}^D} in a bilipschitz fashion with distortion {O(\varepsilon^{-1/2})} for any {0 < \varepsilon \leq 1/2}.

To motivate the proof of this theorem, let us first present a bilipschitz map {\Phi: {\bf R} \rightarrow \ell^2} from the snowflaked line {({\bf R},d_{\bf R}^{1-\varepsilon})} (with {d_{\bf R}} being the usual metric on {{\bf R}}) into complex Hilbert space {\ell^2({\bf C})}. The map is given explicitly as a Weierstrass type function

\displaystyle \Phi(x) := \sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(x) - \phi_k(0))

where for each {k}, {\phi_k: {\bf R} \rightarrow \ell^2} is the function

\displaystyle \phi_k(x) := 2^k e^{2\pi i x / 2^k} e_k.

and {(e_k)_{k \in {\bf Z}}} are an orthonormal basis for {\ell^2({\bf C})}. The subtracting of the constant {\phi_k(0)} is purely in order to make the sum convergent as {k \rightarrow \infty}. If {x,y \in {\bf R}} are such that {2^{k_0-2} \leq d_{\bf R}(x,y) \leq 2^{k_0-1}} for some integer {k_0}, one can easily check the bounds

\displaystyle |\phi_k(x) - \phi_k(y)| \lesssim d_{\bf R}(x,y)^{(1-\varepsilon)} \min( 2^{-(1-\varepsilon) (k_0-k)}, 2^{-\varepsilon (k-k_0)} )

with the lower bound

\displaystyle |\phi_{k_0}(x) - \phi_{k_0}(y)| \gtrsim d_{\bf R}(x,y)^{(1-\varepsilon)}

at which point one finds that

\displaystyle d_{\bf R}(x,y)^{1-\varepsilon} \lesssim |\Phi(x) - \Phi(y)| \lesssim \varepsilon^{-1/2} d_{\bf R}(x,y)^{1-\varepsilon}

as desired.

The key here was that each function {\phi_k} oscillated at a different spatial scale {2^k}, and the functions were all orthogonal to each other (so that the upper bound involved a factor of {\varepsilon^{-1/2}} rather than {\varepsilon^{-1}}). One can replicate this example for the Heisenberg group without much difficulty. Indeed, if we let {\Gamma := \{ [a,b,c]: a,b,c \in {\bf Z} \}} be the discrete Heisenberg group, then the nilmanifold {H/\Gamma} is a three-dimensional smooth compact manifold; thus, by the Whitney embedding theorem, it smoothly embeds into {{\bf R}^6}. This gives a smooth immersion {\phi: H \rightarrow {\bf R}^6} which is {\Gamma}-automorphic in the sense that {\phi(p\gamma) = \phi(p)} for all {p \in H} and {\gamma \in \Gamma}. If one then defines {\phi_k: H \rightarrow \ell^2 \otimes {\bf R}^6} to be the function

\displaystyle \phi_k(p) := 2^k \phi( \delta_{2^{-k}}(p) ) \otimes e_k

where {\delta_\lambda: H \rightarrow H} is the scaling map

\displaystyle \delta_\lambda([x,y,z]) := [\lambda x, \lambda y, \lambda^2 z],

then one can repeat the previous arguments to obtain the required bilipschitz bounds

\displaystyle d(p,q)^{1-\varepsilon} \lesssim |\Phi(p) - \Phi(q) \lesssim \varepsilon^{-1/2} d(p,q)^{1-\varepsilon}

for the function

\displaystyle \Phi(p) :=\sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(p) - \phi_k(0)).

To adapt this construction to bounded dimension, the main obstruction was the requirement that the {\phi_k} took values in orthogonal subspaces. But if one works things out carefully, it is enough to require the weaker orthogonality requirement

\displaystyle B( \phi_{k_0}, \sum_{k>k_0} 2^{-\varepsilon(k-k_0)} \phi_k ) = 0

for all {k_0 \in {\bf Z}}, where {B(\phi, \psi): H \rightarrow {\bf R}^2} is the bilinear form

\displaystyle B(\phi,\psi) := (X \phi \cdot X \psi, Y \phi \cdot Y \psi ).

One can then try to construct the {\phi_k: H \rightarrow {\bf R}^D} for bounded dimension {D} by an iterative argument. After some standard reductions, the problem becomes this (roughly speaking): given a smooth, slowly varying function {\psi: H \rightarrow {\bf R}^{D}} whose derivatives obey certain quantitative upper and lower bounds, construct a smooth oscillating function {\phi: H \rightarrow {\bf R}^{D}}, whose derivatives also obey certain quantitative upper and lower bounds, which obey the equation

\displaystyle B(\phi,\psi) = 0. \ \ \ \ \ (1)

 

We view this as an underdetermined system of differential equations for {\phi} (two equations in {D} unknowns; after some reductions, our {D} can be taken to be the explicit value {36}). The trivial solution {\phi=0} to this equation will be inadmissible for our purposes due to the lower bounds we will require on {\phi} (in order to obtain the quantitative immersion property mentioned previously, as well as for a stronger “freeness” property that is needed to close the iteration). Because this construction will need to be iterated, it will be essential that the regularity control on {\phi} is the same as that on {\psi}; one cannot afford to “lose derivatives” when passing from {\psi} to {\phi}.

This problem has some formal similarities with the isometric embedding problem (discussed for instance in this previous post), which can be viewed as the problem of solving an equation of the form {Q(\phi,\phi) = g}, where {(M,g)} is a Riemannian manifold and {Q} is the bilinear form

\displaystyle Q(\phi,\psi)_{ij} = \partial_i \phi \cdot \partial_j \psi.

The isometric embedding problem also has the key obstacle that naive attempts to solve the equation {Q(\phi,\phi)=g} iteratively can lead to an undesirable “loss of derivatives” that prevents one from iterating indefinitely. This obstacle was famously resolved by the Nash-Moser iteration scheme in which one alternates between perturbatively adjusting an approximate solution to improve the residual error term, and mollifying the resulting perturbation to counteract the loss of derivatives. The current equation (1) differs in some key respects from the isometric embedding equation {Q(\phi,\phi)=g}, in particular being linear in the unknown field {\phi} rather than quadratic; nevertheless the key obstacle is the same, namely that naive attempts to solve either equation lose derivatives. Our approach to solving (1) was inspired by the Nash-Moser scheme; in retrospect, I also found similarities with Uchiyama’s constructive proof of the Fefferman-Stein decomposition theorem, discussed in this previous post (and in this recent one).

To motivate this iteration, we first express {B(\phi,\psi)} using the product rule in a form that does not place derivatives directly on the unknown {\phi}:

\displaystyle B(\phi,\psi) = \left( W(\phi \cdot W \psi) - \phi \cdot WW \psi\right)_{W = X,Y} \ \ \ \ \ (2)

 

This reveals that one can construct solutions {\phi} to (1) by solving the system of equations

\displaystyle \phi \cdot W \psi = \phi \cdot WW \psi = 0 \ \ \ \ \ (3)

 

for {W \in \{X, Y \}}. Because this system is zeroth order in {\phi}, this can easily be done by linear algebra (even in the presence of a forcing term {B(\phi,\psi)=F}) if one imposes a “freeness” condition (analogous to the notion of a free embedding in the isometric embedding problem) that {X \psi(p), Y \psi(p), XX \psi(p), YY \psi(p)} are linearly independent at each point {p}, which (together with some other technical conditions of a similar nature) one then adds to the list of upper and lower bounds required on {\psi} (with a related bound then imposed on {\phi}, in order to close the iteration). However, as mentioned previously, there is a “loss of derivatives” problem with this construction: due to the presence of the differential operators {W} in (3), a solution {\phi} constructed by this method can only be expected to have two degrees less regularity than {\psi} at best, which makes this construction unsuitable for iteration.

To get around this obstacle (which also prominently appears when solving (linearisations of) the isometric embedding equation {Q(\phi,\phi)=g}), we instead first construct a smooth, low-frequency solution {\phi_{\leq N_0} \colon H \rightarrow {\bf R}^{D}} to a low-frequency equation

\displaystyle B( \phi_{\leq N_0}, P_{\leq N_0} \psi ) = 0 \ \ \ \ \ (4)

 

where {P_{\leq N_0} \psi} is a mollification of {\psi} (of Littlewood-Paley type) applied at a small spatial scale {1/N_0} for some {N_0}, and then gradually relax the frequency cutoff {P_{\leq N_0}} to deform this low frequency solution {\phi_{\leq N_0}} to a solution {\phi} of the actual equation (1).

We will construct the low-frequency solution {\phi_{\leq N_0}} rather explicitly, using the Whitney embedding theorem to construct an initial oscillating map {f} into a very low dimensional space {{\bf R}^6}, composing it with a Veronese type embedding into a slightly larger dimensional space {{\bf R}^{27}} to obtain a required “freeness” property, and then composing further with a slowly varying isometry {U(p) \colon {\bf R}^{27} \rightarrow {\bf R}^{36}} depending on {P_{\leq N_0}} and constructed by a quantitative topological lemma (relying ultimately on the vanishing of the first few homotopy groups of high-dimensional spheres), in order to obtain the required orthogonality (4). (This sort of “quantitative null-homotopy” was first proposed by Gromov, with some recent progress on optimal bounds by Chambers-Manin-Weinberger and by Chambers-Dotterer-Manin-Weinberger, but we will not need these more advanced results here, as one can rely on the classical qualitative vanishing {\pi^k(S^d)=0} for {k < d} together with a compactness argument to obtain (ineffective) quantitative bounds, which suffice for this application).

To perform the deformation of {\phi_{\leq N_0}} into {\phi}, we must solve what is essentially the linearised equation

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = 0 \ \ \ \ \ (5)

 

of (1) when {\phi}, {\psi} (viewed as low frequency functions) are both being deformed at some rates {\dot \phi, \dot \psi} (which should be viewed as high frequency functions). To avoid losing derivatives, the magnitude of the deformation {\dot \phi} in {\phi} should not be significantly greater than the magnitude of the deformation {\dot \psi} in {\psi}, when measured in the same function space norms.

As before, if one directly solves the difference equation (5) using a naive application of (2) with {B(\phi,\dot \psi)} treated as a forcing term, one will lose at least one derivative of regularity when passing from {\dot \psi} to {\dot \phi}. However, observe that (2) (and the symmetry {B(\phi, \dot \psi) = B(\dot \psi,\phi)}) can be used to obtain the identity

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = \left( W(\dot \phi \cdot W \psi + \dot \psi \cdot W \phi) - (\dot \phi \cdot WW \psi + \dot \psi \cdot WW \phi)\right)_{W = X,Y} \ \ \ \ \ (6)

 

and then one can solve (5) by solving the system of equations

\displaystyle \dot \phi \cdot W \psi = - \dot \psi \cdot W \phi

for {W \in \{X,XX,Y,YY\}}. The key point here is that this system is zeroth order in both {\dot \phi} and {\dot \psi}, so one can solve this system without losing any derivatives when passing from {\dot \psi} to {\dot \phi}; compare this situation with that of the superficially similar system

\displaystyle \dot \phi \cdot W \psi = - \phi \cdot W \dot \psi

that one would obtain from naively linearising (3) without exploiting the symmetry of {B}. There is still however one residual “loss of derivatives” problem arising from the presence of a differential operator {W} on the {\phi} term, which prevents one from directly evolving this iteration scheme in time without losing regularity in {\phi}. It is here that we borrow the final key idea of the Nash-Moser scheme, which is to replace {\phi} by a mollified version {P_{\leq N} \phi} of itself (where the projection {P_{\leq N}} depends on the time parameter). This creates an error term in (5), but it turns out that this error term is quite small and smooth (being a “high-high paraproduct” of {\nabla \phi} and {\nabla\psi}, it ends up being far more regular than either {\phi} or {\psi}, even with the presence of the derivatives) and can be iterated away provided that the initial frequency cutoff {N_0} is large and the function {\psi} has a fairly high (but finite) amount of regularity (we will eventually use the Hölder space {C^{20,\alpha}} on the Heisenberg group to measure this).

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown {x} using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “({A} AND {B}) OR (NOT {C})”, where {A,B,C} are some formulas). A typical such rule is Modus Ponens: if the sentence {A} is known to be true, and the implication “{A} IMPLIES {B}” is also known to be true, then one can deduce that {B} is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption {A}, one is able to derive the statement {B}, then one can conclude that the implication “{A} IMPLIES {B}” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if {D,X \in B(H)} are bounded operators on a Hilbert space {H} whose commutator {[D,X] := DX-XD} is close to the identity in the sense that

\displaystyle  \| [D,X] - I \|_{op} \leq \varepsilon \ \ \ \ \ (1)

for some {\varepsilon > 0}, then one has the lower bound

\displaystyle  \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (2)

In the other direction, for any {0 < \varepsilon < 1}, there are examples of operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \varepsilon^{-2}. \ \ \ \ \ (3)

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1 For any {0 < \varepsilon < 1/2}, and any infinite-dimensional {H}, there exist operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \log^{16} \frac{1}{\varepsilon}. \ \ \ \ \ (4)

One can probably improve the exponent {16} somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to {1} without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying {B(H)} with the algebra {M_2(B(H))} of {2 \times 2} matrices that have entries in {B(H)}. It is then possible to find two matrices {D, X \in M_2(B(H))} whose commutator takes the form

\displaystyle  [D,X] = \begin{pmatrix} I & u \\ 0 & I \end{pmatrix}

for some bounded operator {u \in B(H)} (for instance one can take {u} to be an isometry). If one then conjugates {D, X} by the diagonal operator {\mathrm{diag}(\varepsilon,1)}, one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to {n \times n} matrices {D,X \in M_n(B(H))} rather than {2 \times 2} matrices, where {n} is a parameter at one’s disposal. If one can find matrices {D,X \in M_n(B(H))} that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator {[D,X]} only differs from the identity in the top right corner, thus

\displaystyle  [D, X] = \begin{pmatrix} I & 0 & 0 & \dots & 0 & S \\ 0 & I & 0 & \dots & 0 & 0 \\ 0 & 0 & I & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & I & 0 \\ 0 & 0 & 0 & \dots & 0 & I \end{pmatrix}.

for some {S}, then by conjugating by a diagonal matrix such as {\mathrm{diag}( \mu^{n-1}, \mu^{n-2}, \dots, 1)} for some {\mu} and optimising in {\mu}, one can improve the bound {\varepsilon^{-2}} in (3) to {O_n( \varepsilon^{-\frac{2}{n-1}} )}; if the bounds in the implied constant in the {O_n(1)} are polynomial in {n}, one can then optimise in {n} to obtain a bound of the form (4) (perhaps with the exponent {16} replaced by a different constant).

The task is then to find almost upper triangular matrices {D, X} whose commutator takes the required form. The lower diagonals of {D,X} must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say {X} by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say {u}. After a lot of further lengthy experimentation, I eventually realised that one could conjugate {X} further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that {X} takes the normal form

\displaystyle  X := \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & b_1 \\ I & 0 & 0 & \dots & 0 & b_2 \\ 0 & I & 0 & \dots & 0 & b_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & 0 & b_{n-1} \\ 0 & 0 & 0 & \dots & I & b_n \end{pmatrix}.

\displaystyle  D := \begin{pmatrix} v & I & 0 & \dots & 0 & b_1 u \\ u & v & 2 I & \dots & 0 & b_2 u \\ 0 & u & v & \dots & 0 & b_3 u \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & v & (n-1) I + b_{n-1} u \\ 0 & 0 & 0 & \dots & u & v + b_n u \end{pmatrix}

for some {u,v \in B(H)}, solving the system of equations

\displaystyle  [v, b_i] + [u, b_{i-1}] + i b_{i+1} + b_i [u, b_n] = 0 \ \ \ \ \ (5)

for {i=2,\dots,n-1}, and also

\displaystyle  [v, b_n] + [u, b_{n-1}] + b_n [u, b_n] = n \cdot 1_{B(H)}. \ \ \ \ \ (6)

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes {u,v} to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in {n} in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if {A \in B(H)} obeys the bounds

\displaystyle  \|A \| = O(1)

and

\displaystyle  \| A \| = O( \mathrm{dist}( A, {\bf C} + K(H) )^{2/3} ) \ \ \ \ \ (7)

(where {{\bf C} + K(H)} denotes the space of all operators of the form {\lambda I + T} with {\lambda \in {\bf C}} and {T} compact), then there exist operators {D,X \in B(H)} with {\|D\|, \|X\| = O(1)} such that {A = [D,X]}. (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite {W^*} algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that {\mathrm{dist}(A, {\bf C} + K(H)) = \varepsilon} and {\|A\| = O( \varepsilon^{2/3})} for some {0 < \varepsilon \ll 1}. A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors {e_n, f_n, g_n} for {n=1,2,\dots} such that for each {n}, one has {A e_n = \varepsilon_n f_n + v_n} for some {\varepsilon_n} comparable to {\varepsilon}, and some {v_n} orthogonal to all of the {e_n,f_n,g_n}. After some conjugation (and a suitable identification of {B(H)} with {M_2(B(H))}, one can thus place {A} in a normal form

\displaystyle  A = \begin{pmatrix} \varepsilon^{2/3} x & \varepsilon v^* \\ \varepsilon^{2/3} y & \varepsilon^{2/3} z \end{pmatrix}

where {v \in B(H)} is a isometry with infinite deficiency, and {x,y,z \in B(H)} have norm {O(1)}. Setting {\varepsilon' := \varepsilon^{1/3}}, it then suffices to solve the commutator equation

\displaystyle  [D,X] = \begin{pmatrix} x & \varepsilon' v^* \\ y & z \end{pmatrix}

with {\|D\|_{op} \|X\|_{op} \ll (\varepsilon')^{-2}}; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement {v} with another isometry {u} obeying the “Hilbert’s hotel” identity

\displaystyle  uu^* + vv^* = I

and also {u^* u = v^* v = I}, {u^* v = v^* u = 0}. Proceeding as in the previous post, we can try the ansatz

\displaystyle  D = \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, X = \begin{pmatrix} b & \varepsilon' I \\ c & d \end{pmatrix}

for some operators {a,b,c,d \in B(H)}, leading to the system of equations

\displaystyle  [\frac{1}{2} u^*, b] + [\frac{1}{2} u^* - v^*, c] = x+z

\displaystyle  \varepsilon' a = [\frac{1}{2} u^*, b] - x

\displaystyle  \frac{1}{2} u^* c + c (\frac{1}{2} u^* - v^*) + ab-da = y.

Using the first equation to solve for {b,c}, the second to then solve for {a}, and the third to then solve for {c}, one can obtain matrices {D,X} with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on {D,X} have been unsuccessful. A model problem would be to express

\displaystyle  \begin{pmatrix} I & 0 & \varepsilon v^* \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}

as a commutator {[D,X]} with {\|D\| \|X\|} significantly smaller than {O(\varepsilon^{-2})}. The construction in my paper achieves something like this, but with {v^*} replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

Kevin Ford, Sergei Konyagin, James Maynard, Carl Pomerance, and I have uploaded to the arXiv our paper “Long gaps in sieved sets“, submitted to J. Europ. Math. Soc..

This paper originated from the MSRI program in analytic number theory last year, and was centred around variants of the question of finding large gaps between primes. As discussed for instance in this previous post, it is now known that within the set of primes {{\mathcal P} = \{2,3,5,\dots\}}, one can find infinitely many adjacent elements {a,b} whose gap {b-a} obeys a lower bound of the form

\displaystyle b-a \gg \log a \frac{\log_2 a \log_4 a}{\log_3 a}

where {\log_k} denotes the {k}-fold iterated logarithm. This compares with the trivial bound of {b-a \gg \log a} that one can obtain from the prime number theorem and the pigeonhole principle. Several years ago, Pomerance posed the question of whether analogous improvements to the trivial bound can be obtained for such sets as

\displaystyle {\mathcal P}_2 = \{ n \in {\bf N}: n^2+1 \hbox{ prime} \}.

Here there is the obvious initial issue that this set is not even known to be infinite (this is the fourth Landau problem), but let us assume for the sake of discussion that this set is indeed infinite, so that we have an infinite number of gaps to speak of. Standard sieve theory techniques give upper bounds for the density of {{\mathcal P}_2} that is comparable (up to an absolute constant) to the prime number theorem bounds for {{\mathcal P}}, so again we can obtain a trivial bound of {b-a \gg \log a} for the gaps of {{\mathcal P}_2}. In this paper we improve this to

\displaystyle b-a \gg \log a \log^c_2 a

for an absolute constant {c>0}; this is not as strong as the corresponding bound for {{\mathcal P}}, but still improves over the trivial bound. In fact we can handle more general “sifted sets” than just {{\mathcal P}_2}. Recall from the sieve of Eratosthenes that the elements of {{\mathcal P}} in, say, the interval {[x/2, x]} can be obtained by removing from {[x/2, x]} one residue class modulo {p} for each prime up to {\sqrt{x}}, namely the class {0} mod {p}. In a similar vein, the elements of {{\mathcal P}_2} in {[x/2,x]} can be obtained by removing for each prime {p} up to {x} zero, one, or two residue classes modulo {p}, depending on whether {-1} is a quadratic residue modulo {p}. On the average, one residue class will be removed (this is a very basic case of the Chebotarev density theorem), so this sieving system is “one-dimensional on the average”. Roughly speaking, our arguments apply to any other set of numbers arising from a sieving system that is one-dimensional on average. (One can consider other dimensions also, but unfortunately our methods seem to give results that are worse than a trivial bound when the dimension is less than or greater than one.)

The standard “Erdős-Rankin” method for constructing long gaps between primes proceeds by trying to line up some residue classes modulo small primes {p} so that they collectively occupy a long interval. A key tool in doing so are the smooth number estimates of de Bruijn and others, which among other things assert that if one removes from an interval such as {[1,x]} all the residue classes {0} mod {p} for {p} between {x^{1/u}} and {x} for some fixed {u>1}, then the set of survivors has exceptionally small density (roughly of the order of {u^{-u}}, with the precise density given by the Dickman function), in marked contrast to the situation in which one randomly removes one residue class for each such prime {p}, in which the density is more like {1/u}. One generally exploits this phenomenon to sieve out almost all the elements of a long interval using some of the primes available, and then using the remaining primes to cover up the remaining elements that have not already been sifted out. In the more recent work on this problem, advanced combinatorial tools such as hypergraph covering lemmas are used for the latter task.

In the case of {{\mathcal P}_2}, there does not appear to be any analogue of smooth numbers, in the sense that there is no obvious way to arrange the residue classes so that they have significantly fewer survivors than a random arrangement. Instead we adopt the following semi-random strategy to cover an interval {[1,y]} by residue classes. Firstly, we randomly remove residue classes for primes {p} up to some intermediate threshold {z} (smaller than {y} by a logarithmic factor), leaving behind a preliminary sifted set {S_{[2,z]}}. Then, for each prime {p} between {z} and another intermediate threshold {x/2}, we remove a residue class mod {p} that maximises (or nearly maximises) its intersection with {S_{[2,z]}}. This ends up reducing the number of survivors to be significantly below what one would achieve if one selects residue classes randomly, particularly if one also uses the hypergraph covering lemma from our previous paper. Finally, we cover each the remaining survivors by a residue class from a remaining available prime.

Archives