You are currently browsing the category archive for the ‘paper’ category.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv a completely rewritten version of our previous paper, now titled “Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra“. This paper is now a survey of the various literature surrounding the following basic identity in linear algebra, which we propose to call the eigenvector-eigenvalue identity:

Theorem 1 (Eigenvector-eigenvalue identity) Let ${A}$ be an ${n \times n}$ Hermitian matrix, with eigenvalues ${\lambda_1(A),\dots,\lambda_n(A)}$. Let ${v_i}$ be a unit eigenvector corresponding to the eigenvalue ${\lambda_i(A)}$, and let ${v_{i,j}}$ be the ${j^{th}}$ component of ${v_i}$. Then

$\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))$

where ${M_j}$ is the ${n-1 \times n-1}$ Hermitian matrix formed by deleting the ${j^{th}}$ row and column from ${A}$.

When we posted the first version of this paper, we were unaware of previous appearances of this identity in the literature; a related identity had been used by Erdos-Schlein-Yau and by myself and Van Vu for applications to random matrix theory, but to our knowledge this specific identity appeared to be new. Even two months after our preprint first appeared on the arXiv in August, we had only learned of one other place in the literature where the identity showed up (by Forrester and Zhang, who also cite an earlier paper of Baryshnikov).

The situation changed rather dramatically with the publication of a popular science article in Quanta on this identity in November, which gave this result significantly more exposure. Within a few weeks we became informed (through private communication, online discussion, and exploration of the citation tree around the references we were alerted to) of over three dozen places where the identity, or some other closely related identity, had previously appeared in the literature, in such areas as numerical linear algebra, various aspects of graph theory (graph reconstruction, chemical graph theory, and walks on graphs), inverse eigenvalue problems, random matrix theory, and neutrino physics. As a consequence, we have decided to completely rewrite our article in order to collate this crowdsourced information, and survey the history of this identity, all the known proofs (we collect seven distinct ways to prove the identity (or generalisations thereof)), and all the applications of it that we are currently aware of. The citation graph of the literature that this ad hoc crowdsourcing effort produced is only very weakly connected, which we found surprising:

The earliest explicit appearance of the eigenvector-eigenvalue identity we are now aware of is in a 1966 paper of Thompson, although this paper is only cited (directly or indirectly) by a fraction of the known literature, and also there is a precursor identity of Löwner from 1934 that can be shown to imply the identity as a limiting case. At the end of the paper we speculate on some possible reasons why this identity only achieved a modest amount of recognition and dissemination prior to the November 2019 Quanta article.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let ${X = (X,{\mathcal X},\mu)}$ be a probability space, and ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$ be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps ${T: X \rightarrow X}$ that preserve the measure ${\mu}$: ${T_* \mu = \mu}$. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call ${X = (X,{\mathcal X}, \mu)}$ a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let ${\Gamma = (\Gamma,\cdot)}$ be a discrete group. A (concrete) measure-preserving action of ${\Gamma}$ on ${X}$ is a group homomorphism ${\gamma \mapsto T^\gamma}$ from ${\Gamma}$ to ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$, thus ${T^1}$ is the identity map and ${T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}}$ for all ${\gamma_1,\gamma_2 \in \Gamma}$. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when ${\Gamma}$ is the integers (with the additive group law).

Let ${K = (K,+)}$ be a compact Hausdorff abelian group, which we can endow with the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$. A (concrete measurable) ${K}$cocycle is a collection ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ of concrete measurable maps ${\rho_\gamma: X \rightarrow K}$ obeying the cocycle equation

$\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)$

for ${\mu}$-almost every ${x \in X}$. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action ${\gamma \mapsto S^\gamma}$ on ${X \times K}$ (which we endow with the product of ${\mu}$ with Haar probability measure on ${K}$), defined by

$\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).$

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a ${K}$-valued cocycle is a (concrete measurable) ${K}$-valued coboundary, in which ${\rho_\gamma}$ for each ${\gamma \in \Gamma}$ takes the special form

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for ${\mu}$-almost every ${x \in X}$, where ${F: X \rightarrow K}$ is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable ${K}$-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which ${K}$-valued cocycles are in fact ${K}$-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when ${K}$ is the unit circle ${\mathbf{T} = {\bf R}/{\bf Z}}$, by taking advantage of the Pontryagin dual group ${\hat K}$ of characters ${\hat k: K \rightarrow \mathbf{T}}$, that is to say the collection of continuous homomorphisms ${\hat k: k \mapsto \langle \hat k, k \rangle}$ to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting in a concrete measure-preserving fashion on a probability space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Assume the following additional hypotheses:

• (i) ${\Gamma}$ is at most countable.
• (ii) ${X}$ is a standard Borel space.
• (iii) ${K}$ is metrisable.

Then a ${K}$-valued concrete measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is a concrete coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data ${\Gamma, X, K}$ are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to ${K}$ being second countable, and (ii) is equivalent to ${X}$ being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each ${\langle \hat k, \rho \rangle}$ is a coboundary, then so is ${\rho}$. By hypothesis, we then have an equation of the form

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)$

for all ${\hat k, \gamma, x}$ and some functions ${\alpha_{\hat k}: X \rightarrow {\mathbf T}}$, and our task is then to produce a function ${F: X \rightarrow K}$ for which

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for all ${\gamma,x}$.

Comparing the two equations, the task would be easy if we could find an ${F: X \rightarrow K}$ for which

$\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)$

for all ${\hat k, x}$. However there is an obstruction to this: the left-hand side of (3) is additive in ${\hat k}$, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)$

for all ${\hat k_1, \hat k_2, x}$. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function ${F}$ obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between ${K}$ and the homomorphisms of the (discrete) group ${\hat K}$ to ${\mathbf{T}}$.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in ${\hat k}$, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

$\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).$

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is ${\Gamma}$-invariant. Now let us assume for sake of argument that the action of ${\Gamma}$ is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only ${\Gamma}$-invariant functions are constant. So now we get a weaker version of (4), namely

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)$

for some constants ${c_{\hat k_1, \hat k_2} \in \mathbf{T}}$.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let ${L^0({\bf X} \rightarrow {\bf T})}$ denote the space of concrete measurable maps ${\alpha}$ from ${{\bf X}}$ to ${{\bf T}}$, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup ${{\bf T}}$ of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because ${{\bf T}}$ is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction ${w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}$, that is to say a group homomorphism that is the identity on the subgroup ${{\bf T}}$. We can use this retraction, or more precisely the complement ${\alpha \mapsto \alpha - w(\alpha)}$, to eliminate the constant in (5). Indeed, if we set

$\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})$

then from (5) we see that

$\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0$

while from (2) one has

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)$

and now the previous strategy works with ${\alpha_{\hat k}}$ replaced by ${\tilde \alpha_{\hat k}}$. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to ${\hat K}$ being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation ${+: K \times K \rightarrow K}$, while still continuous, can fail to be measurable as a map from ${(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))}$ to ${(K, {\mathcal B}(K))}$! Thus for instance the sum of two measurable functions ${F: X \rightarrow K}$ need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when ${K}$ is the uncountable torus ${{\mathbf T}^{{\bf R}}}$, endowed with the product topology. Crucially, the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$ generated by this uncountable product is not the product ${{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}}$ of the factor Borel ${\sigma}$-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but ${\sigma}$-algebras do not); relating to this, the product ${\sigma}$-algebra ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$ is not the same as the Borel ${\sigma}$-algebra ${{\mathcal B}(K \times K)}$, but is instead a strict sub-algebra. If the group operations on ${K}$ were measurable, then the diagonal set

$\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}$

would be measurable in ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$. But it is an easy exercise in manipulation of ${\sigma}$-algebras to show that if ${(X, {\mathcal X}), (Y, {\mathcal Y})}$ are any two measurable spaces and ${E \subset X \times Y}$ is measurable in ${{\mathcal X} \otimes {\mathcal Y}}$, then the fibres ${E_x := \{ y \in Y: (x,y) \in E \}}$ of ${E}$ are contained in some countably generated subalgebra of ${{\mathcal Y}}$. Thus if ${K^\Delta}$ were ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$-measurable, then all the points of ${K}$ would lie in a single countably generated ${\sigma}$-algebra. But the cardinality of such an algebra is at most ${2^{\alpha_0}}$ while the cardinality of ${K}$ is ${2^{2^{\alpha_0}}}$, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give ${K}$ a coarser ${\sigma}$-algebra than the Borel ${\sigma}$-algebra, which we call the reduced ${\sigma}$-algebra ${{\mathcal B}^\otimes(K)}$, thus coarsening the measurable space structure on ${K = (K,{\mathcal B}(K))}$ to a new measurable space ${K_\otimes := (K, {\mathcal B}^\otimes(K))}$. In the case of compact Hausdorff abelian groups, ${{\mathcal B}^{\otimes}(K)}$ can be defined as the ${\sigma}$-algebra generated by the characters ${\hat k: K \rightarrow {\mathbf T}}$; for more general compact abelian groups, one can define ${{\mathcal B}^{\otimes}(K)}$ as the ${\sigma}$-algebra generated by all continuous maps into metric spaces. This ${\sigma}$-algebra is equal to ${{\mathcal B}(K)}$ when ${K}$ is metrisable but can be smaller for other ${K}$. With this measurable structure, ${K_\otimes}$ becomes a measurable group; it seems that once one leaves the metrisable world that ${K_\otimes}$ is a superior (or at least equally good) space to work with than ${K}$ for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if ${m_K}$ is the Haar probability measure on ${K}$, then ${L^2(K,m_K) = L^2(K_\otimes,m_K)}$ (thus, every ${K}$-measurable set is equivalent modulo ${m_K}$-null sets to a ${K_\otimes}$-measurable set), so there is no damage to Plancherel caused by passing to the reduced ${\sigma}$-algebra.

Passing to the reduced ${\sigma}$-algebra ${K_\otimes}$ fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of ${\sigma}$-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form ${{\mathcal X}^{\mathrm{op}}}$, where ${{\mathcal X}}$ is an (abstract) ${\sigma}$-algebra and ${\mathrm{op}}$ is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$ is an object of the form ${(T^*)^{\mathrm{op}}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is a Boolean algebra homomorphism and ${\mathrm{op}}$ is again used as a formal placeholder; we call ${T^*}$ the pullback map associated to ${T}$.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be $\sigma$-complete (i.e., it respects countable joins).] The composition ${S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ of two abstract measurable maps ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$, ${S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ is defined by the formula ${S \circ T := (T^* \circ S^*)^{\mathrm{op}}}$, or equivalently ${(S \circ T)^* = T^* \circ S^*}$.

Every concrete measurable space ${X = (X,{\mathcal X})}$ can be identified with an abstract counterpart ${{\mathcal X}^{op}}$, and similarly every concrete measurable map ${T: X \rightarrow Y}$ can be identified with an abstract counterpart ${(T^*)^{op}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is the pullback map ${T^* E := T^{-1}(E)}$. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra ${X_\mu}$ of ${X}$, defined as ${({\bf X}/{\bf N})^*}$ where ${{\bf N}}$ is the ideal of null sets in ${{\bf X}}$. Informally, ${X_\mu}$ is the space ${X}$ with all the null sets removed; there is a canonical abstract embedding map ${\iota: X_\mu \rightarrow X}$, which allows one to convert any concrete measurable map ${f: X \rightarrow Y}$ into an abstract one ${[f]: X_\mu \rightarrow Y}$. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing ${X}$ with the opposite measure algebra ${X_\mu}$; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting abstractly on a ${\sigma}$-finite measure space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Then a ${K_\otimes}$-valued abstract measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is an abstract coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map ${\alpha_{\hat k}: X_\mu \rightarrow {\bf T}}$ for each ${\hat k \in K}$ obeying the identity ${ \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0}$ for all ${\hat k_1,\hat k_2 \in \hat K}$, then there is an abstract measurable map ${F: X_\mu \rightarrow K_\otimes}$ such that ${\alpha_{\hat k} = \langle \hat k, F \rangle}$ for all ${\hat k \in \hat K}$. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the ${\sigma}$-algebra of ${X_\mu}$, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

• (i) Suppose one has an abstract measurable map ${f: X_\mu \rightarrow Y}$ into a concrete measurable space. Does there exist a representation of ${f}$ by a concrete measurable map ${\tilde f: X \rightarrow Y}$? Is it unique up to almost everywhere equivalence?
• (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

• If ${Y}$ does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If ${Y}$ is not compact or Polish, there can be counterexamples to existence.
• If ${Y}$ is a compact metric space or a Polish space, then one always has existence and uniqueness.
• If ${Y}$ is a compact Hausdorff abelian group, one always has existence.
• If ${X}$ is a complete measure space, then one always has existence (from a theorem of Maharam).
• If ${X}$ is the unit interval with the Borel ${\sigma}$-algebra and Lebesgue measure, then one has existence for all compact Hausdorff ${Y}$ assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
• For more general ${X}$, existence for all compact Hausdorff ${Y}$ is equivalent to the existence of a lifting from the ${\sigma}$-algebra ${\mathcal{X}/\mathcal{N}}$ to ${\mathcal{X}}$ (or, in the language of abstract measurable spaces, the existence of an abstract retraction from ${X}$ to ${X_\mu}$).
• It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever ${Y}$ is compact Hausdorff.

Our understanding of (ii) is much less complete:

• If ${K}$ is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
• If ${\Gamma}$ is at most countable and ${X}$ is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the ${3x+1}$ conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let ${{\bf N}+1 = \{1,2,\dots\}}$ denote the positive integers (with ${{\bf N} =\{0,1,2,\dots\}}$ the natural numbers), and let ${\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1}$ be the map defined by setting ${\mathrm{Col}(N)}$ equal to ${3N+1}$ when ${N}$ is odd and ${N/2}$ when ${N}$ is even. Let ${\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)}$ be the minimal element of the Collatz orbit ${N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}$. Then we have

Conjecture 1 (Collatz conjecture) One has ${\mathrm{Col}_{\min}(N)=1}$ for all ${N \in {\bf N}+1}$.

Establishing the conjecture for all ${N}$ remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” ${N}$ in some sense. For instance, it is a result of Krasikov and Lagarias that

$\displaystyle \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}$

for all sufficiently large ${x}$. In another direction, it was shown by Terras that for almost all ${N}$ (in the sense of natural density), one has ${\mathrm{Col}_{\min}(N) < N}$. This was then improved by Allouche to ${\mathrm{Col}_{\min}(N) 0.869}$, and extended later by Korec to cover all ${\theta > \frac{\log 3}{\log 4} \approx 0.7924}$. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let ${f: {\bf N}+1 \rightarrow {\bf R}}$ be any function with ${\lim_{N \rightarrow \infty} f(N) = +\infty}$. Then we have ${\mathrm{Col}_{\min}(N) < f(N)}$ for almost all ${N}$ (in the sense of logarithmic density).

Thus for instance one has ${\mathrm{Col}_{\min}(N) < \log\log\log\log N}$ for almost all ${N}$ (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution ${\mathrm{Col}^n(N)}$ for times ${n}$ that only get as large as a small multiple ${c \log N}$ of ${\log N}$; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get ${\mathrm{Col}^n(N)}$ all the way down to ${f(N)}$ one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state ${N=O(1)}$.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if ${\theta > \frac{\log 3}{\log 4}}$ one picks at random an integer ${N}$ from a large interval ${[1,x]}$, then in most cases, the orbit of ${N}$ will eventually move into the interval ${[1,x^{\theta}]}$. Similarly, if one picks an integer ${M}$ at random from ${[1,x^\theta]}$, then in most cases, the orbit of ${M}$ will eventually move into ${[1,x^{\theta^2}]}$. It is then tempting to concatenate the two statements and conclude that for most ${N}$ in ${[1,x]}$, the orbit will eventually move ${[1,x^{\theta^2}]}$. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn ${N \in [1,x]}$ reaches ${[1,x^\theta]}$, the distribution of the final value is unlikely to be close to being uniformly distributed on ${[1,x^\theta]}$, and in particular could potentially concentrate almost entirely in the exceptional set of ${M \in [1,x^\theta]}$ that do not make it into ${[1,x^{\theta^2}]}$. The point here is the uniform measure on ${[1,x]}$ is not transported by Collatz dynamics to anything resembling the uniform measure on ${[1,x^\theta]}$.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map ${\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}$, defined on the odd numbers ${2{\bf N}+1 = \{1,3,5,\dots\}}$ by setting ${\mathrm{Syr}(N) = (3N+1)/2^a}$, where ${2^a}$ is the largest power of ${2}$ that divides ${3N+1}$. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of ${3}$ at each iteration step, which makes the map better behaved when performing “${3}$-adic” analysis.)

When viewed ${3}$-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, ${\mathrm{Syr}(N)}$ is never divisible by ${3}$. A little less obviously, ${\mathrm{Syr}(N)}$ is twice as likely to equal ${2}$ mod ${3}$ as it is to equal ${1}$ mod ${3}$. This is because for a randomly chosen odd ${\mathbf{N}}$, the number of times ${\mathbf{a}}$ that ${2}$ divides ${3\mathbf{N}+1}$ can be seen to have a geometric distribution of mean ${2}$ – it equals any given value ${a \in{\bf N}+1}$ with probability ${2^{-a}}$. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of ${3}$. For instance, one can compute that for large random odd ${\mathbf{N}}$, ${\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9}$ will take the residue classes ${0,1,2,3,4,5,6,7,8 \hbox{ mod } 9}$ with probabilities

$\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}$

respectively. More generally, for any ${n}$, ${\mathrm{Syr}^n(N) \hbox{ mod } 3^n}$ will be distributed according to the law of a random variable ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ on ${{\bf Z}/3^n{\bf Z}}$ that we call a Syracuse random variable, and can be described explicitly as

$\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)$

where ${\mathbf{a}_1,\dots,\mathbf{a}_n}$ are iid copies of a geometric random variable of mean ${2}$.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this ${3}$-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ to construct such a measure, but only if these random variables stabilise in the limit ${n \rightarrow \infty}$ in a certain total variation sense. More precisely, in the paper we establish the estimate

$\displaystyle \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)$

$\displaystyle \ll_A m^{-A}$

for any ${1 \leq m \leq n}$ and any ${A > 0}$. This type of stabilisation is plausible from entropy heuristics – the tuple ${(\mathbf{a}_1,\dots,\mathbf{a}_n)}$ of geometric random variables that generates ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ has Shannon entropy ${n \log 4}$, which is significantly larger than the total entropy ${n \log 3}$ of the uniform distribution on ${{\bf Z}/3^n{\bf Z}}$, so we expect a lot of “mixing” and “collision” to occur when converting the tuple ${(\mathbf{a}_1,\dots,\mathbf{a}_n)}$ to ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$; these heuristics can be supported by numerics (which I was able to work out up to about ${n=10}$ before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

$\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}$

are all distinct as ${(a_1,\dots,a_n)}$ vary over tuples in ${({\bf N}+1)^n}$. Unfortunately, the process of reducing mod ${3^n}$ creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

$\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)$

are mostly distinct for “typical” ${a_1,\dots,a_m}$ (as drawn using the geometric distribution) as long as ${m}$ is a bit smaller than ${\frac{\log 3}{\log 4} n}$ (basically because the rational number appearing in (3) then typically takes a form like ${M/2^{2m}}$ with ${M}$ an integer between ${0}$ and ${3^n}$). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on ${ \mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of ${{\bf Z}/3^n{\bf Z}}$ of density less than ${n^{-C}}$ for some large absolute constant ${C>0}$). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$, and more precisely in showing that

$\displaystyle \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)$

for any ${A > 0}$ and any ${\xi \in {\bf Z}/3^n{\bf Z}}$ that is not divisible by ${3}$.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming ${n}$ is even for sake of discussion) as

$\displaystyle (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots$

$\displaystyle + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n$

where ${\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}$. The point here is that after conditioning on the ${\mathbf{b}_1,\dots,\mathbf{b}_{n/2}}$ to be fixed, the random variables ${\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n}$ remain independent (though the distribution of each ${\mathbf{a}_{2j}}$ depends on the value that we conditioned ${\mathbf{b}_j}$ to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

$\displaystyle \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}$

is not close to an integer for a moderately large number (${\gg A \log n}$, to be precise) of indices ${j = 1,\dots,n/2}$. (Actually, for technical reasons we have to also restrict to those ${j}$ for which ${\mathbf{b}_j=3}$, but let us ignore this detail here.) To put it another way, if we let ${B}$ denote the set of pairs ${(j,l)}$ for which

$\displaystyle \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},$

we have to show that (with overwhelming probability) the random walk

$\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})$

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of ${B}$.

A little bit of elementary number theory and combinatorics allows one to describe the set ${B}$ as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of ${B}$ after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of ${B}$. The most difficult case is when the renewal process passes through a particularly large triangle in ${B}$. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of ${B}$ that one can finish the proof of (4), and thus Theorem 2.

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes ${{\mathcal P} := \{2,3,5,\dots\}}$, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If ${x}$ is a large number, then the largest prime gap ${G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n}$ in ${[1,x]}$ is of size ${\asymp \log^2 x}$. (Granville refines this conjecture to ${\gtrsim \xi \log^2 x}$, where ${\xi := 2e^{-\gamma} = 1.1229\dots}$. Here we use the asymptotic notation ${X \gtrsim Y}$ for ${X \geq (1-o(1)) Y}$, ${X \sim Y}$ for ${X \gtrsim Y \gtrsim X}$, ${X \gg Y}$ for ${X \geq C^{-1} Y}$, and ${X \asymp Y}$ for ${X \gg Y \gg X}$.)

Conjecture 2 (Hardy-Littlewood conjecture) If ${\mathcal{H} := \{h_1,\dots,h_k\}}$ are fixed distinct integers, then the number of numbers ${n \in [1,x]}$ with ${n+h_1,\dots,n+h_k}$ all prime is ${({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}}$ as ${x \rightarrow \infty}$, where the singular series ${{\mathfrak S}(\mathcal{H})}$ is defined by the formula

$\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.$

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps ${p_{n+1}-p_n}$ with ${n \leq x}$ were asymptotically distributed according to an exponential distribution of mean ${\log x}$, in the sense that

$\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$ for any fixed ${\lambda \geq 0}$. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of ${\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}}$ for fixed ${k,\lambda}$, where ${p_n}$ ranges over the primes in ${[1,x]}$. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

$\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is even and

$\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is odd, for any natural number ${N}$; setting ${N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}$ and taking means, one then gets upper and lower bounds for the probability that the interval ${(p_n, p_n + \lambda \log x)}$ is free of primes. The most difficult step is to control the mean values of the singular series ${{\mathfrak S}(\mathcal{H})}$ as ${{\mathcal H}}$ ranges over ${k}$-tuples in a fixed interval such as ${[0, \lambda \log x]}$.

Heuristically, if one extrapolates the asymptotic (1) to the regime ${\lambda \asymp \log x}$, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below ${1}$ when ${\lambda}$ is significantly larger than ${\log x}$. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of ${\lambda}$, which is only enough to establish the far weaker bound ${G_{\mathcal P}(x) / \log x \rightarrow \infty}$, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on ${G_{\mathcal P}(x)}$). An inspection of the argument shows that if one wished to extend (1) to parameter choices ${\lambda}$ that were allowed to grow with ${x}$, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length ${k}$ of the tuple ${{\mathcal H} = (h_1,\dots,h_k)}$, as well as the magnitudes of the shifts ${h_1,\dots,h_k}$, were also allowed to grow with ${x}$. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

• If the Hardy-Littlewood conjecture is uniformly true for ${k}$-tuples of length ${k \ll \frac{\log x}{\log\log x}}$, and with shifts ${h_1,\dots,h_k}$ of size ${O( \log^2 x )}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}$.
• If the Hardy-Littlewood conjecture is “true on average” for ${k}$-tuples of length ${k \ll \frac{y}{\log x}}$ and shifts ${h_1,\dots,h_k}$ of size ${y}$ for all ${\log x \leq y \leq \log^2 x \log\log x}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \log^2 x}$.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with ${k}$ allowed to grow slowly with ${x}$. Again, the main difficulty is to accurately estimate average values of the singular series ${{\mathfrak S}({\mathfrak H})}$. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

$\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)$

of the singular series, for a suitable cutoff ${z}$; it turns out that when studying prime tuples of size ${t}$, the most convenient cutoff ${z(t)}$ is the “Pólya magic cutoff“, defined as the largest prime for which

$\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)$

(this is well defined for ${t \geq e^2}$); by Mertens’ theorem, we have ${z(t) \sim t^{1/e^\gamma}}$. One can interpret ${V_{\mathcal Z}(z)}$ probabilistically as

$\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )$

where ${\mathcal{S}_z \subset {\bf Z}}$ is the randomly sifted set of integers formed by removing one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p \leq z}$. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes ${{\mathcal P}}$ behave in some approximate statistical sense like the random sifted set ${\mathcal{S}_z}$, and one can prove the above theorem by using the Bonferroni inequalities both for the primes ${{\mathcal P}}$ and for the random sifted set, and comparing the two (using an even ${K}$ for the sifted set and an odd ${K}$ for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes ${{\mathcal P}}$ other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of ${{\mathcal P}}$ which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model ${{\mathcal C}}$ of the primes, in which each natural number ${n \geq 3}$ is selected for membership in ${{\mathcal C}}$ with an independent probability of ${1/\log n}$. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

$\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})$

and Cramér also showed that the largest gap ${G_{\mathcal C}(x)}$ was almost surely ${\sim \log^2 x}$. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series ${{\mathfrak S}({\mathcal H})}$ is absent.

Granville proposed a refinement ${{\mathcal G}}$ to Cramér’s random model ${{\mathcal C}}$ in which one first sieves out (in each dyadic interval ${[x,2x]}$) all residue classes ${0 \hbox{ mod } p}$ for ${p \leq A}$ for a certain threshold ${A = \log^{1-o(1)} x = o(\log x)}$, and then places each surviving natural number ${n}$ in ${{\mathcal G}}$ with an independent probability ${\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}$. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap ${G_{\mathcal G}(x)}$ in this model was almost surely ${\gtrsim \xi \log^2 x}$, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to ${10^{18}}$, for instance, have shown that ${G_{\mathcal P}(x)}$ never exceeds ${0.9206 \log^2 x}$ in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff ${A}$. After some experimentation, we were able to produce a tractable random model ${{\mathcal R}}$ for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of ${\gtrsim \xi \log^2 x}$ (we also get an upper bound of ${\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}}$ for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p}$, and as before we let ${S_z}$ be the sifted set of integers formed by deleting the residue classes ${a_p \hbox{ mod } p}$ with ${p \leq z}$. We then set

$\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}$

with ${z(t)}$ Pólya’s magic cutoff (this is the cutoff that gives ${{\mathcal R}}$ a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

$\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)$

and that the Hardy-Littlewood conjectures hold with power savings for ${k}$ up to ${\log^c x}$ for any fixed ${c < 1}$ and for shifts ${h_1,\dots,h_k}$ of size ${O(\log^c x)}$. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint ${c=1}$), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model ${{\mathcal R}}$ (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for ${{\mathcal G}_{\mathcal R}(x) }$ that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo ${p}$ from an interval ${[0,y]}$ for all primes ${p}$ up to a given threshold. The lower bound in (3) relates to the case of deleting the ${0 \hbox{ mod } p}$ residue classes from ${[0,y]}$; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of ${{\mathcal R}}$ is somewhat more complicated than of ${{\mathcal C}}$ or ${{\mathcal G}}$ as there is now non-trivial coupling between the events ${n \in {\mathcal R}}$ as ${n}$ varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval ${[0,y]}$ and randomly deletes a residue class ${a_p \hbox{ mod } p}$ for each prime ${p}$ up to a given threshold. For very small ${p}$ this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized ${p}$, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes ${p}$, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements ${n \in [0,y]}$ to primes ${p}$ if ${n}$ falls in the random residue class ${a_p \hbox{ mod } p}$) turns out to give the best results.

I’ve just uploaded to the arXiv my paper “Quantitative bounds for critically bounded solutions to the Navier-Stokes equations“, submitted to the proceedings of the Linde Hall Inaugural Math Symposium. (I unfortunately had to cancel my physical attendance at this symposium for personal reasons, but was still able to contribute to the proceedings.) In recent years I have been interested in working towards establishing the existence of classical solutions for the Navier-Stokes equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p$

$\displaystyle \nabla \cdot u = 0$

that blow up in finite time, but this time for a change I took a look at the other side of the theory, namely the conditional regularity results for this equation. There are several such results that assert that if a certain norm of the solution stays bounded (or grows at a controlled rate), then the solution stays regular; taken in the contrapositive, they assert that if a solution blows up at a certain finite time ${T_*}$, then certain norms of the solution must also go to infinity. Here are some examples (not an exhaustive list) of such blowup criteria:

• (Leray blowup criterion, 1934) If ${u}$ blows up at a finite time ${T_*}$, and ${3 < p \leq \infty}$, then ${\liminf_{t \rightarrow T_*} (T_* - t)^{\frac{1}{2}-\frac{3}{2p}} \| u(t) \|_{L^p_x({\bf R}^3)} \geq c}$ for an absolute constant ${c>0}$.
• (ProdiSerrinLadyzhenskaya blowup criterion, 1959-1967) If ${u}$ blows up at a finite time ${T_*}$, and ${3 < p \leq \infty}$, then ${\| u \|_{L^q_t L^p_x([0,T_*) \times {\bf R}^3)} =+\infty}$, where ${\frac{1}{q} := \frac{1}{2} - \frac{3}{2p}}$.
• (Beale-Kato-Majda blowup criterion, 1984) If ${u}$ blows up at a finite time ${T_*}$, then ${\| \omega \|_{L^1_t L^\infty_x([0,T_*) \times {\bf R}^3)} = +\infty}$, where ${\omega := \nabla \times u}$ is the vorticity.
• (Kato blowup criterion, 1984) If ${u}$ blows up at a finite time ${T_*}$, then ${\liminf_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} \geq c}$ for some absolute constant ${c>0}$.
• (Escauriaza-Seregin-Sverak blowup criterion, 2003) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}$.
• (Seregin blowup criterion, 2012) If ${u}$ blows up at a finite time ${T_*}$, then ${\lim_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}$.
• (Phuc blowup criterion, 2015) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{L^{3,q}_x({\bf R}^3)} = +\infty}$ for any ${q < \infty}$.
• (Gallagher-Koch-Planchon blowup criterion, 2016) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty}$ for any ${3 < p, q < \infty}$.
• (Albritton blowup criterion, 2016) If ${u}$ blows up at a finite time ${T_*}$, then ${\lim_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty}$ for any ${3 < p, q < \infty}$.

My current paper is most closely related to the Escauriaza-Seregin-Sverak blowup criterion, which was the first to show a critical (i.e., scale-invariant, or dimensionless) spatial norm, namely ${L^3_x({\bf R}^3)}$, had to become large. This result now has many proofs; for instance, many of the subsequent blowup criterion results imply the Escauriaza-Seregin-Sverak one as a special case, and there are also additional proofs by Gallagher-Koch-Planchon (building on ideas of Kenig-Koch), and by Dong-Du. However, all of these proofs rely on some form of a compactness argument: given a finite time blowup, one extracts some suitable family of rescaled solutions that converges in some weak sense to a limiting solution that has some additional good properties (such as almost periodicity modulo symmetries), which one can then rule out using additional qualitative tools, such as unique continuation and backwards uniqueness theorems for parabolic heat equations. In particular, all known proofs use some version of the backwards uniqueness theorem of Escauriaza, Seregin, and Sverak. Because of this reliance on compactness, the existing proofs of the Escauriaza-Seregin-Sverak blowup criterion are qualitative, in that they do not provide any quantitative information on how fast the ${\|u(t)\|_{L^3_x({\bf R}^3)}}$ norm will go to infinity (along a subsequence of times).

On the other hand, it is a general principle that qualitative arguments established using compactness methods ought to have quantitative analogues that replace the use of compactness by more complicated substitutes that give effective bounds; see for instance these previous blog posts for more discussion. I therefore was interested in trying to obtain a quantitative version of this blowup criterion that gave reasonably good effective bounds (in particular, my objective was to avoid truly enormous bounds such as tower-exponential or Ackermann function bounds, which often arise if one “naively” tries to make a compactness argument effective). In particular, I obtained the following triple-exponential quantitative regularity bounds:

Theorem 1 If ${u}$ is a classical solution to Navier-Stokes on ${[0,T) \times {\bf R}^3}$ with

$\displaystyle \| u \|_{L^\infty_t L^3_x([0,T) \times {\bf R}^3)} \leq A \ \ \ \ \ (1)$

for some ${A \geq 2}$, then

$\displaystyle | \nabla^j u(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+1}{2}}$

and

$\displaystyle | \nabla^j \omega(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+2}{2}}$

for ${(t,x) \in [0,T) \times {\bf R}^3}$ and ${j=0,1}$.

As a corollary, one can now improve the Escauriaza-Seregin-Sverak blowup criterion to

$\displaystyle \limsup_{t \rightarrow T_*} \frac{\|u(t) \|_{L^3_x({\bf R}^3)}}{(\log\log\log \frac{1}{T_*-t})^c} = +\infty$

for some absolute constant ${c>0}$, which to my knowledge is the first (very slightly) supercritical blowup criterion for Navier-Stokes in the literature.

The proof uses many of the same quantitative inputs as previous arguments, most notably the Carleman inequalities used to establish unique continuation and backwards uniqueness theorems for backwards heat equations, but also some additional techniques that make the quantitative bounds more efficient. The proof focuses initially on points of concentration of the solution, which we define as points ${(t_0,x_0)}$ where there is a frequency ${N_0}$ for which one has the bound

$\displaystyle |N_0^{-1} P_{N_0} u(t_0,x_0)| \geq A^{-C_0} \ \ \ \ \ (2)$

for a large absolute constant ${C_0}$, where ${P_{N_0}}$ is a Littlewood-Paley projection to frequencies ${\sim N_0}$. (This can be compared with the upper bound of ${O(A)}$ for the quantity on the left-hand side that follows from (1).) The factor of ${N_0^{-1}}$ normalises the left-hand side of (2) to be dimensionless (i.e., critical). The main task is to show that the dimensionless quantity ${t_0 N_0^2}$ cannot get too large; in particular, we end up establishing a bound of the form

$\displaystyle t_0 N_0^2 \lesssim \exp\exp\exp A^{O(C_0^6)}$

from which the above theorem ends up following from a routine adaptation of the local well-posedness and regularity theory for Navier-Stokes.

The strategy is to show that any concentration such as (2) when ${t_0 N_0^2}$ is large must force a significant component of the ${L^3_x}$ norm of ${u(t_0)}$ to also show up at many other locations than ${x_0}$, which eventually contradicts (1) if one can produce enough such regions of non-trivial ${L^3_x}$ norm. (This can be viewed as a quantitative variant of the “rigidity” theorems in some of the previous proofs of the Escauriaza-Seregin-Sverak theorem that rule out solutions that exhibit too much “compactness” or “almost periodicity” in the ${L^3_x}$ topology.) The chain of causality that leads from a concentration (2) at ${(t_0,x_0)}$ to significant ${L^3_x}$ norm at other regions of the time slice ${t_0 \times {\bf R}^3}$ is somewhat involved (though simpler than the much more convoluted schemes I initially envisaged for this argument):

1. Firstly, by using Duhamel’s formula, one can show that a concentration (2) can only occur (with ${t_0 N_0^2}$ large) if there was also a preceding concentration

$\displaystyle |N_1^{-1} P_{N_1} u(t_1,x_1)| \geq A^{-C_0} \ \ \ \ \ (3)$

at some slightly previous point ${(t_1,x_1)}$ in spacetime, with ${N_1}$ also close to ${N_0}$ (more precisely, we have ${t_1 = t_0 - A^{-O(C_0^3)} N_0^{-2}}$, ${N_1 = A^{O(C_0^2)} N_0}$, and ${x_1 = x_0 + O( A^{O(C_0^4)} N_0^{-1})}$). This can be viewed as a sort of contrapositive of a “local regularity theorem”, such as the ones established by Caffarelli, Kohn, and Nirenberg. A key point here is that the lower bound ${A^{-C_0}}$ in the conclusion (3) is precisely the same as the lower bound in (2), so that this backwards propagation of concentration can be iterated.

2. Iterating the previous step, one can find a sequence of concentration points

$\displaystyle |N_n^{-1} P_{N_n} u(t_n,x_n)| \geq A^{-C_0} \ \ \ \ \ (4)$

with the ${(t_n,x_n)}$ propagating backwards in time; by using estimates ultimately resulting from the dissipative term in the energy identity, one can extract such a sequence in which the ${t_0-t_n}$ increase geometrically with time, the ${N_n}$ are comparable (up to polynomial factors in ${A}$) to the natural frequency scale ${(t_0-t_n)^{-1/2}}$, and one has ${x_n = x_0 + O( |t_0-t_n|^{1/2} )}$. Using the “epochs of regularity” theory that ultimately dates back to Leray, and tweaking the ${t_n}$ slightly, one can also place the times ${t_n}$ in intervals ${I_n}$ (of length comparable to a small multiple of ${|t_0-t_n|}$) in which the solution is quite regular (in particular, ${u, \nabla u, \omega, \nabla \omega}$ enjoy good ${L^\infty_t L^\infty_x}$ bounds on ${I_n \times {\bf R}^3}$).

3. The concentration (4) can be used to establish a lower bound for the ${L^2_t L^2_x}$ norm of the vorticity ${\omega}$ near ${(t_n,x_n)}$. As is well known, the vorticity obeys the vorticity equation

$\displaystyle \partial_t \omega = \Delta \omega - (u \cdot \nabla) \omega + (\omega \cdot \nabla) u.$

In the epoch of regularity ${I_n \times {\bf R}^3}$, the coefficients ${u, \nabla u}$ of this equation obey good ${L^\infty_x}$ bounds, allowing the machinery of Carleman estimates to come into play. Using a Carleman estimate that is used to establish unique continuation results for backwards heat equations, one can propagate this lower bound to also give lower ${L^2}$ bounds on the vorticity (and its first derivative) in annuli of the form ${\{ (t,x) \in I_n \times {\bf R}^3: R \leq |x-x_n| \leq R' \}}$ for various radii ${R,R'}$, although the lower bounds decay at a gaussian rate with ${R}$.

4. Meanwhile, using an energy pigeonholing argument of Bourgain (which, in this Navier-Stokes context, is actually an enstrophy pigeonholing argument), one can locate some annuli ${\{ x \in {\bf R}^3: R \leq |x-x_n| \leq R'\}}$ where (a slightly normalised form of) the entrosphy is small at time ${t=t_n}$; using a version of the localised enstrophy estimates from a previous paper of mine, one can then propagate this sort of control forward in time, obtaining an “annulus of regularity” of the form ${\{ (t,x) \in [t_n,t_0] \times {\bf R}^3: R_n \leq |x-x_n| \leq R'_n\}}$ in which one has good estimates; in particular, one has ${L^\infty_x}$ type bounds on ${u, \nabla u, \omega, \nabla \omega}$ in this cylindrical annulus.
5. By intersecting the previous epoch of regularity ${I_n \times {\bf R}^3}$ with the above annulus of regularity, we have some lower bounds on the ${L^2}$ norm of the vorticity (and its first derivative) in the annulus of regularity. Using a Carleman estimate first introduced by Escauriaza, Seregin, and Sverak, as well as a second application of the Carleman estimate used previously, one can then propagate this lower bound back up to time ${t=t_0}$, establishing a lower bound for the vorticity on the spatial annulus ${\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}$. By some basic Littlewood-Paley theory one can parlay this lower bound to a lower bound on the ${L^3}$ norm of the velocity ${u}$; crucially, this lower bound is uniform in ${n}$.
6. If ${t_0 N_0^2}$ is very large (triple exponential in ${A}$!), one can then find enough scales ${n}$ with disjoint ${\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}$ annuli that the total lower bound on the ${L^3_x}$ norm of ${u(t_0)}$ provided by the above arguments is inconsistent with (1), thus establishing the claim.

The chain of causality is summarised in the following image:

It seems natural to conjecture that similar triply logarithmic improvements can be made to several of the other blowup criteria listed above, but I have not attempted to pursue this question. It seems difficult to improve the triple logarithmic factor using only the techniques here; the Bourgain pigeonholing argument inevitably costs one exponential, the Carleman inequalities cost a second, and the stacking of scales at the end to contradict the ${L^3}$ upper bound costs the third.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv the short unpublished note “Eigenvectors from eigenvalues“. This note gives two proofs of a general eigenvector identity observed recently by Denton, Parke and Zhang in the course of some quantum mechanical calculations. The identity is as follows:

Theorem 1 Let ${A}$ be an ${n \times n}$ Hermitian matrix, with eigenvalues ${\lambda_1(A),\dots,\lambda_n(A)}$. Let ${v_i}$ be a unit eigenvector corresponding to the eigenvalue ${\lambda_i(A)}$, and let ${v_{i,j}}$ be the ${j^{th}}$ component of ${v_i}$. Then

$\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))$

where ${M_j}$ is the ${n-1 \times n-1}$ Hermitian matrix formed by deleting the ${j^{th}}$ row and column from ${A}$.

For instance, if we have

$\displaystyle A = \begin{pmatrix} a & X^* \\ X & M \end{pmatrix}$

for some real number ${a}$, ${n-1}$-dimensional vector ${X}$, and ${n-1 \times n-1}$ Hermitian matrix ${M}$, then we have

$\displaystyle |v_{i,1}|^2 = \frac{\prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M))}{\prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A))} \ \ \ \ \ (1)$

assuming that the denominator is non-zero.

Once one is aware of the identity, it is not so difficult to prove it; we give two proofs, each about half a page long, one of which is based on a variant of the Cauchy-Binet formula, and the other based on properties of the adjugate matrix. But perhaps it is surprising that such a formula exists at all; one does not normally expect to learn much information about eigenvectors purely from knowledge of eigenvalues. In the random matrix theory literature, for instance in this paper of Erdos, Schlein, and Yau, or this later paper of Van Vu and myself, a related identity has been used, namely

$\displaystyle |v_{i,1}|^2 = \frac{1}{1 + \| (M-\lambda_i(A))^{-1} X \|^2}, \ \ \ \ \ (2)$

but it is not immediately obvious that one can derive the former identity from the latter. (I do so below the fold; we ended up not putting this proof in the note as it was longer than the two other proofs we found. I also give two other proofs below the fold, one from a more geometric perspective and one proceeding via Cramer’s rule.) It was certainly something of a surprise to me that there is no explicit appearance of the ${a,X}$ components of ${A}$ in the formula (1) (though they do indirectly appear through their effect on the eigenvalues ${\lambda_k(A)}$; for instance from taking traces one sees that ${a = \sum_{k=1}^n \lambda_k(A) - \sum_{k=1}^{n-1} \lambda_k(M)}$).

One can get some feeling of the identity (1) by considering some special cases. Suppose for instance that ${A}$ is a diagonal matrix with all distinct entries. The upper left entry ${a}$ of ${A}$ is one of the eigenvalues of ${A}$. If it is equal to ${\lambda_i(A)}$, then the eigenvalues of ${M}$ are the other ${n-1}$ eigenvalues of ${A}$, and now the left and right-hand sides of (1) are equal to ${1}$. At the other extreme, if ${a}$ is equal to a different eigenvalue of ${A}$, then ${\lambda_i(A)}$ now appears as an eigenvalue of ${M}$, and both sides of (1) now vanish. More generally, if we order the eigenvalues ${\lambda_1(A) \leq \dots \leq \lambda_n(A)}$ and ${\lambda_1(M) \leq \dots \leq \lambda_{n-1}(M)}$, then the Cauchy interlacing inequalities tell us that

$\displaystyle 0 \leq \lambda_i(A) - \lambda_k(M) \leq \lambda_i(A) - \lambda_k(A)$

for ${1 \leq k < i}$, and

$\displaystyle \lambda_i(A) - \lambda_{k+1}(A) \leq \lambda_i(A) - \lambda_k(M) < 0$

for ${i \leq k \leq n-1}$, so that the right-hand side of (1) lies between ${0}$ and ${1}$, which is of course consistent with (1) as ${v_i}$ is a unit vector. Thus the identity relates the coefficient sizes of an eigenvector with the extent to which the Cauchy interlacing inequalities are sharp.

I have just uploaded to the arXiv my paper “Sharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint“, submitted to Mathematika. In this paper I return (after more than a decade’s absence) to one of my first research interests, namely the Kakeya and restriction family of conjectures. The starting point is the following “multilinear Kakeya estimate” first established in the non-endpoint case by Bennett, Carbery, and myself, and then in the endpoint case by Guth (with further proofs and extensions by Bourgain-Guth and Carbery-Valdimarsson:

Theorem 1 (Multilinear Kakeya estimate) Let ${\delta > 0}$ be a radius. For each ${j = 1,\dots,d}$, let ${\mathbb{T}_j}$ denote a finite family of infinite tubes ${T_j}$ in ${{\bf R}^d}$ of radius ${\delta}$. Assume the following axiom:

• (i) (Transversality) whenever ${T_j \in \mathbb{T}_j}$ is oriented in the direction of a unit vector ${n_j}$ for ${j =1,\dots,d}$, we have

$\displaystyle \left|\bigwedge_{j=1}^d n_j\right| \geq A^{-1}$

for some ${A>0}$, where we use the usual Euclidean norm on the wedge product ${\bigwedge^d {\bf R}^d}$.

Then, for any ${p \geq \frac{1}{d-1}}$, one has

$\displaystyle \left\| \prod_{j=1}^d \sum_{T_j \in \mathbb{T}_j} 1_{T_j} \right\|_{L^p({\bf R}^d)} \lesssim_{A,p} \delta^{\frac{d}{p}} \prod_{j \in [d]} \# \mathbb{T}_j. \ \ \ \ \ (1)$

where ${L^p({\bf R}^d)}$ are the usual Lebesgue norms with respect to Lebesgue measure, ${1_{T_j}}$ denotes the indicator function of ${T_j}$, and ${\# \mathbb{T}_j}$ denotes the cardinality of ${\mathbb{T}_j}$.

The original proof of this proceeded using a heat flow monotonicity method, which in my previous post I reinterpreted using a “virtual integration” concept on a fractional Cartesian product space. It turns out that this machinery is somewhat flexible, and can be used to establish some other estimates of this type. The first result of this paper is to extend the above theorem to the curved setting, in which one localises to a ball of radius ${O(1)}$ (and sets ${\delta}$ to be small), but allows the tubes ${T_j}$ to be curved in a ${C^2}$ fashion. If one runs the heat flow monotonicity argument, one now picks up some additional error terms arising from the curvature, but as the spatial scale approaches zero, the tubes become increasingly linear, and as such the error terms end up being an integrable multiple of the main term, at which point one can conclude by Gronwall’s inequality (actually for technical reasons we use a bootstrap argument instead of Gronwall). A key point in this approach is that one obtains optimal bounds (not losing factors of ${\delta^{-\varepsilon}}$ or ${\log^{O(1)} \frac{1}{\delta}}$), so long as one stays away from the endpoint case ${p=\frac{1}{d-1}}$ (which does not seem to be easily treatable by the heat flow methods). Previously, the paper of Bennett, Carbery, and myself was able to use an induction on scale argument to obtain a curved multilinear Kakeya estimate losing a factor of ${\log^{O(1)} \frac{1}{\delta}}$ (after optimising the argument); later arguments of Bourgain-Guth and Carbery-Valdimarsson, based on algebraic topology methods, could also obtain a curved multilinear Kakeya estimate without such losses, but only in the algebraic case when the tubes were neighbourhoods of algebraic curves of bounded degree.

Perhaps more interestingly, we are also able to extend the heat flow monotonicity method to apply directly to the multilinear restriction problem, giving the following global multilinear restriction estimate:

Theorem 2 (Multilinear restriction theorem) Let ${\frac{1}{d-1} < p \leq \infty}$ be an exponent, and let ${A \geq 2}$ be a parameter. Let ${M}$ be a sufficiently large natural number, depending only on ${d}$. For ${j \in [d]}$, let ${U_j}$ be an open subset of ${B^{d-1}(0,A)}$, and let ${h_j: U_j \rightarrow {\bf R}}$ be a smooth function obeying the following axioms:

• (i) (Regularity) For each ${j \in [d]}$ and ${\xi \in U_j}$, one has

$\displaystyle |\nabla_\xi^{\otimes m} \otimes h_j(\xi)| \leq A \ \ \ \ \ (2)$

for all ${1 \leq m \leq M}$.

• (ii) (Transversality) One has

$\displaystyle \left| \bigwedge_{j \in [d]} (-\nabla_\xi h_j(\xi_j),1) \right| \geq A^{-1}$

whenever ${\xi_j \in U_j}$ for ${j \in [d]}$.

Let ${U_{j,1/A} \subset U_j}$ be the sets

$\displaystyle U_{j,1/A} := \{ \xi \in U_j: B^{d-1}(\xi,1/A) \subset U_j \}. \ \ \ \ \ (3)$

Then one has

$\displaystyle \left\| \prod_{j \in [d]} {\mathcal E}_j f_j \right\|_{L^{2p}({\bf R}^d)} \leq A^{O(1)} \left(d-1-\frac{1}{p}\right)^{-O(1)} \prod_{j \in [d]} \|f_j \|_{L^2(U_{j,1/A})}$

for any ${f_j \in L^2(U_{j,1/A} \rightarrow {\bf C})}$, ${j \in [d]}$, extended by zero outside of ${U_{j,1/A}}$, and ${{\mathcal E}_j}$ denotes the extension operator

$\displaystyle {\mathcal E}_j f_j( x', x_d ) := \int_{U_j} e^{2\pi i (x' \xi^T + x_d h_j(\xi))} f_j(\xi)\ d\xi.$

Local versions of such estimate, in which ${L^{2p}({\bf R}^d)}$ is replaced with ${L^{2p}(B^d(0,R))}$ for some ${R \geq 2}$, and one accepts a loss of the form ${\log^{O(1)} R}$, were already established by Bennett, Carbery, and myself using an induction on scale argument. In a later paper of Bourgain-Guth these losses were removed by “epsilon removal lemmas” to recover Theorme 2, but only in the case when all the hypersurfaces involved had curvatures bounded away from zero.

There are two main new ingredients in the proof of Theorem 2. The first is to replace the usual induction on scales scheme to establish multilinear restriction by a “ball inflation” induction on scales scheme that more closely resembles the proof of decoupling theorems. In particular, we actually prove the more general family of estimates

$\displaystyle \left\| \prod_{j \in [d]} E_{r}[{\mathcal E}_j f_j] \right\|_{L^{p}({\bf R}^d)} \leq A^{O(1)} \left(d-1 - \frac{1}{p}\right)^{O(1)} r^{\frac{d}{p}} \prod_{j \in [d]} \| f_j \|_{L^2(U_{j,1/A})}^2$

where ${E_r}$ denotes the local energies

$\displaystyle E_{r}[f](x',x_d) := \int_{B^{d-1}(x',r)} |f(y',x_d)|^2\ dy'$

(actually for technical reasons it is more convenient to use a smoother weight than the strict cutoff to the disk ${B^{d-1}(x',r)}$). With logarithmic losses, it is not difficult to establish this estimate by an upward induction on ${r}$. To avoid such losses we use the heat flow monotonicity method. Here we run into the issue that the extension operators ${{\mathcal E}_j f_j}$ are complex-valued rather than non-negative, and thus would not be expected to obey many good montonicity properties. However, the local energies ${E_r[{\mathcal E}_j f_j]}$ can be expressed in terms of the magnitude squared of what is essentially the Gabor transform of ${{\mathcal E}_j f_j}$, and these are non-negative; furthermore, the dispersion relation associated to the extension operators ${{\mathcal E}_j f_j}$ implies that these Gabor transforms propagate along tubes, so that the situation becomes quite similar (up to several additional lower order error terms) to that in the multilinear Kakeya problem. (This can be viewed as a continuous version of the usual wave packet decomposition method used to relate restriction and Kakeya problems, which when combined with the heat flow monotonicity method allows for one to use a continuous version of induction on scales methods that do not concede any logarithmic factors.)

Finally, one can combine the curved multilinear Kakeya result with the multilinear restriction result to obtain estimates for multilinear oscillatory integrals away from the endpoint. Again, this sort of implication was already established in the previous paper of Bennett, Carbery, and myself, but the arguments there had some epsilon losses in the exponents; here we were able to run the argument more carefully and avoid these losses.

The AMS and MAA have recently published (and made available online) a collection of essays entitled “Living Proof: Stories of Resilience Along the Mathematical Journey”.  Each author contributes a story of how they encountered some internal or external difficulty in advancing their mathematical career, and how they were able to deal with such difficulties.  I myself have contributed one of these essays; I was initially somewhat surprised when I was approached for a contribution, as my career trajectory has been somewhat of an outlier, and I have been very fortunate to not experience to the same extent many of the obstacles that other contributors write about in this text.    Nevertheless there was a turning point in my career that I write about here during my graduate years, when I found that the improvised and poorly disciplined study habits that were able to get me into graduate school due to an over-reliance on raw mathematical ability were completely inadequate to handle the graduate qualifying exam.  With a combination of an astute advisor and some sheer luck, I was able to pass the exam and finally develop a more sustainable approach to learning and doing mathematics, but it could easily have gone quite differently.  (My 20 25-year old writeup of this examination, complete with spelling errors, may be found here.)

I was recently asked to contribute a short comment to Nature Reviews Physics, as part of a series of articles on fluid dynamics on the occasion of the 200th anniversary (this August) of the birthday of George Stokes.  My contribution is now online as “Searching for singularities in the Navier–Stokes equations“, where I discuss the global regularity problem for Navier-Stokes and my thoughts on how one could try to construct a solution that blows up in finite time via an approximately discretely self-similar “fluid computer”.  (The rest of the series does not currently seem to be available online, but I expect they will become so shortly.)

The Polymath15 paper “Effective approximation of heat flow evolution of the Riemann ${\xi}$ function, and a new upper bound for the de Bruijn-Newman constant“, submitted to Research in the Mathematical Sciences, has just been uploaded to the arXiv. This paper records the mix of theoretical and computational work needed to improve the upper bound on the de Bruijn-Newman constant ${\Lambda}$. This constant can be defined as follows. The function

$\displaystyle H_0(z) := \frac{1}{8} \xi\left(\frac{1}{2} + \frac{iz}{2}\right),$

where ${\xi}$ is the Riemann ${\xi}$ function

$\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma\left(\frac{s}{2}\right) \zeta(s)$

has a Fourier representation

$\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du$

where ${\Phi}$ is the super-exponentially decaying function

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u} ).$

The Riemann hypothesis is equivalent to the claim that all the zeroes of ${H_0}$ are real. De Bruijn introduced (in different notation) the deformations

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du$

of ${H_0}$; one can view this as the solution to the backwards heat equation ${\partial_t H_t = -\partial_{zz} H_t}$ starting at ${H_0}$. From the work of de Bruijn and of Newman, it is known that there exists a real number ${\Lambda}$ – the de Bruijn-Newman constant – such that ${H_t}$ has all zeroes real for ${t \geq \Lambda}$ and has at least one non-real zero for ${t < \Lambda}$. In particular, the Riemann hypothesis is equivalent to the assertion ${\Lambda \leq 0}$. Prior to this paper, the best known bounds for this constant were

$\displaystyle 0 \leq \Lambda < 1/2$

with the lower bound due to Rodgers and myself, and the upper bound due to Ki, Kim, and Lee. One of the main results of the paper is to improve the upper bound to

$\displaystyle \Lambda \leq 0.22. \ \ \ \ \ (1)$

At a purely numerical level this gets “closer” to proving the Riemann hypothesis, but the methods of proof take as input a finite numerical verification of the Riemann hypothesis up to some given height ${T}$ (in our paper we take ${T \sim 3 \times 10^{10}}$) and converts this (and some other numerical verification) to an upper bound on ${\Lambda}$ that is of order ${O(1/\log T)}$. As discussed in the final section of the paper, further improvement of the numerical verification of RH would thus lead to modest improvements in the upper bound on ${\Lambda}$, although it does not seem likely that our methods could for instance improve the bound to below ${0.1}$ without an infeasible amount of computation.

We now discuss the methods of proof. An existing result of de Bruijn shows that if all the zeroes of ${H_{t_0}(z)}$ lie in the strip ${\{ x+iy: |y| \leq y_0\}}$, then ${\Lambda \leq t_0 + \frac{1}{2} y_0^2}$; we will verify this hypothesis with ${t_0=y_0=0.2}$, thus giving (1). Using the symmetries and the known zero-free regions, it suffices to show that

$\displaystyle H_{0.2}(x+iy) \neq 0 \ \ \ \ \ (2)$

whenever ${x \geq 0}$ and ${0.2 \leq y \leq 1}$.

For large ${x}$ (specifically, ${x \geq 6 \times 10^{10}}$), we use effective numerical approximation to ${H_t(x+iy)}$ to establish (2), as discussed in a bit more detail below. For smaller values of ${x}$, the existing numerical verification of the Riemann hypothesis (we use the results of Platt) shows that

$\displaystyle H_0(x+iy) \neq 0$

for ${0 \leq x \leq 6 \times 10^{10}}$ and ${0.2 \leq y \leq 1}$. The problem though is that this result only controls ${H_t}$ at time ${t=0}$ rather than the desired time ${t = 0.2}$. To bridge the gap we need to erect a “barrier” that, roughly speaking, verifies that

$\displaystyle H_t(x+iy) \neq 0 \ \ \ \ \ (3)$

for ${0 \leq t \leq 0.2}$, ${x = 6 \times 10^{10} + O(1)}$, and ${0.2 \leq y \leq 1}$; with a little bit of work this barrier shows that zeroes cannot sneak in from the right of the barrier to the left in order to produce counterexamples to (2) for small ${x}$.

To enforce this barrier, and to verify (2) for large ${x}$, we need to approximate ${H_t(x+iy)}$ for positive ${t}$. Our starting point is the Riemann-Siegel formula, which roughly speaking is of the shape

$\displaystyle H_0(x+iy) \approx B_0(x+iy) ( \sum_{n=1}^N \frac{1}{n^{\frac{1+y-ix}{2}}} + \gamma_0(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\frac{1+y+ix}{2}}} )$

where ${N := \sqrt{x/4\pi}}$, ${B_0(x+iy)}$ is an explicit “gamma factor” that decays exponentially in ${x}$, and ${\gamma_0(x+iy)}$ is a ratio of gamma functions that is roughly of size ${(x/4\pi)^{-y/2}}$. Deforming this by the heat flow gives rise to an approximation roughly of the form

$\displaystyle H_t(x+iy) \approx B_t(x+iy) ( \sum_{n=1}^N \frac{b_n^t}{n^{s_*}} + \gamma_t(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\overline{s_*}}} ) \ \ \ \ \ (4)$

where ${B_t(x+iy)}$ and ${\gamma_t(x+iy)}$ are variants of ${B_0(x+iy)}$ and ${\gamma_0(x+iy)}$, ${b_n^t := \exp( \frac{t}{4} \log^2 n )}$, and ${s_*}$ is an exponent which is roughly ${\frac{1+y-ix}{2} + \frac{t}{4} \log \frac{x}{4\pi}}$. In particular, for positive values of ${t}$, ${s_*}$ increases (logarithmically) as ${x}$ increases, and the two sums in the Riemann-Siegel formula become increasingly convergent (even in the face of the slowly increasing coefficients ${b_n^t}$). For very large values of ${x}$ (in the range ${x \geq \exp(C/t)}$ for a large absolute constant ${C}$), the ${n=1}$ terms of both sums dominate, and ${H_t(x+iy)}$ begins to behave in a sinusoidal fashion, with the zeroes “freezing” into an approximate arithmetic progression on the real line much like the zeroes of the sine or cosine functions (we give some asymptotic theorems that formalise this “freezing” effect). This lets one verify (2) for extremely large values of ${x}$ (e.g., ${x \geq 10^{12}}$). For slightly less large values of ${x}$, we first multiply the Riemann-Siegel formula by an “Euler product mollifier” to reduce some of the oscillation in the sum and make the series converge better; we also use a technical variant of the triangle inequality to improve the bounds slightly. These are sufficient to establish (2) for moderately large ${x}$ (say ${x \geq 6 \times 10^{10}}$) with only a modest amount of computational effort (a few seconds after all the optimisations; on my own laptop with very crude code I was able to verify all the computations in a matter of minutes).

The most difficult computational task is the verification of the barrier (3), particularly when ${t}$ is close to zero where the series in (4) converge quite slowly. We first use an Euler product heuristic approximation to ${H_t(x+iy)}$ to decide where to place the barrier in order to make our numerical approximation to ${H_t(x+iy)}$ as large in magnitude as possible (so that we can afford to work with a sparser set of mesh points for the numerical verification). In order to efficiently evaluate the sums in (4) for many different values of ${x+iy}$, we perform a Taylor expansion of the coefficients to factor the sums as combinations of other sums that do not actually depend on ${x}$ and ${y}$ and so can be re-used for multiple choices of ${x+iy}$ after a one-time computation. At the scales we work in, this computation is still quite feasible (a handful of minutes after software and hardware optimisations); if one assumes larger numerical verifications of RH and lowers ${t_0}$ and ${y_0}$ to optimise the value of ${\Lambda}$ accordingly, one could get down to an upper bound of ${\Lambda \leq 0.1}$ assuming an enormous numerical verification of RH (up to height about ${4 \times 10^{21}}$) and a very large distributed computing project to perform the other numerical verifications.

This post can serve as the (presumably final) thread for the Polymath15 project (continuing this post), to handle any remaining discussion topics for that project.