You are currently browsing the category archive for the ‘Mathematics’ category.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv a completely rewritten version of our previous paper, now titled “Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra“. This paper is now a survey of the various literature surrounding the following basic identity in linear algebra, which we propose to call the eigenvector-eigenvalue identity:

Theorem 1 (Eigenvector-eigenvalue identity) Let ${A}$ be an ${n \times n}$ Hermitian matrix, with eigenvalues ${\lambda_1(A),\dots,\lambda_n(A)}$. Let ${v_i}$ be a unit eigenvector corresponding to the eigenvalue ${\lambda_i(A)}$, and let ${v_{i,j}}$ be the ${j^{th}}$ component of ${v_i}$. Then

$\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))$

where ${M_j}$ is the ${n-1 \times n-1}$ Hermitian matrix formed by deleting the ${j^{th}}$ row and column from ${A}$.

When we posted the first version of this paper, we were unaware of previous appearances of this identity in the literature; a related identity had been used by Erdos-Schlein-Yau and by myself and Van Vu for applications to random matrix theory, but to our knowledge this specific identity appeared to be new. Even two months after our preprint first appeared on the arXiv in August, we had only learned of one other place in the literature where the identity showed up (by Forrester and Zhang, who also cite an earlier paper of Baryshnikov).

The situation changed rather dramatically with the publication of a popular science article in Quanta on this identity in November, which gave this result significantly more exposure. Within a few weeks we became informed (through private communication, online discussion, and exploration of the citation tree around the references we were alerted to) of over three dozen places where the identity, or some other closely related identity, had previously appeared in the literature, in such areas as numerical linear algebra, various aspects of graph theory (graph reconstruction, chemical graph theory, and walks on graphs), inverse eigenvalue problems, random matrix theory, and neutrino physics. As a consequence, we have decided to completely rewrite our article in order to collate this crowdsourced information, and survey the history of this identity, all the known proofs (we collect seven distinct ways to prove the identity (or generalisations thereof)), and all the applications of it that we are currently aware of. The citation graph of the literature that this ad hoc crowdsourcing effort produced is only very weakly connected, which we found surprising:

The earliest explicit appearance of the eigenvector-eigenvalue identity we are now aware of is in a 1966 paper of Thompson, although this paper is only cited (directly or indirectly) by a fraction of the known literature, and also there is a precursor identity of Löwner from 1934 that can be shown to imply the identity as a limiting case. At the end of the paper we speculate on some possible reasons why this identity only achieved a modest amount of recognition and dissemination prior to the November 2019 Quanta article.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let ${X = (X,{\mathcal X},\mu)}$ be a probability space, and ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$ be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps ${T: X \rightarrow X}$ that preserve the measure ${\mu}$: ${T_* \mu = \mu}$. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call ${X = (X,{\mathcal X}, \mu)}$ a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let ${\Gamma = (\Gamma,\cdot)}$ be a discrete group. A (concrete) measure-preserving action of ${\Gamma}$ on ${X}$ is a group homomorphism ${\gamma \mapsto T^\gamma}$ from ${\Gamma}$ to ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$, thus ${T^1}$ is the identity map and ${T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}}$ for all ${\gamma_1,\gamma_2 \in \Gamma}$. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when ${\Gamma}$ is the integers (with the additive group law).

Let ${K = (K,+)}$ be a compact Hausdorff abelian group, which we can endow with the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$. A (concrete measurable) ${K}$cocycle is a collection ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ of concrete measurable maps ${\rho_\gamma: X \rightarrow K}$ obeying the cocycle equation

$\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)$

for ${\mu}$-almost every ${x \in X}$. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action ${\gamma \mapsto S^\gamma}$ on ${X \times K}$ (which we endow with the product of ${\mu}$ with Haar probability measure on ${K}$), defined by

$\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).$

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a ${K}$-valued cocycle is a (concrete measurable) ${K}$-valued coboundary, in which ${\rho_\gamma}$ for each ${\gamma \in \Gamma}$ takes the special form

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for ${\mu}$-almost every ${x \in X}$, where ${F: X \rightarrow K}$ is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable ${K}$-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which ${K}$-valued cocycles are in fact ${K}$-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when ${K}$ is the unit circle ${\mathbf{T} = {\bf R}/{\bf Z}}$, by taking advantage of the Pontryagin dual group ${\hat K}$ of characters ${\hat k: K \rightarrow \mathbf{T}}$, that is to say the collection of continuous homomorphisms ${\hat k: k \mapsto \langle \hat k, k \rangle}$ to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting in a concrete measure-preserving fashion on a probability space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Assume the following additional hypotheses:

• (i) ${\Gamma}$ is at most countable.
• (ii) ${X}$ is a standard Borel space.
• (iii) ${K}$ is metrisable.

Then a ${K}$-valued concrete measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is a concrete coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data ${\Gamma, X, K}$ are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to ${K}$ being second countable, and (ii) is equivalent to ${X}$ being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each ${\langle \hat k, \rho \rangle}$ is a coboundary, then so is ${\rho}$. By hypothesis, we then have an equation of the form

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)$

for all ${\hat k, \gamma, x}$ and some functions ${\alpha_{\hat k}: X \rightarrow {\mathbf T}}$, and our task is then to produce a function ${F: X \rightarrow K}$ for which

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for all ${\gamma,x}$.

Comparing the two equations, the task would be easy if we could find an ${F: X \rightarrow K}$ for which

$\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)$

for all ${\hat k, x}$. However there is an obstruction to this: the left-hand side of (3) is additive in ${\hat k}$, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)$

for all ${\hat k_1, \hat k_2, x}$. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function ${F}$ obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between ${K}$ and the homomorphisms of the (discrete) group ${\hat K}$ to ${\mathbf{T}}$.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in ${\hat k}$, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

$\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).$

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is ${\Gamma}$-invariant. Now let us assume for sake of argument that the action of ${\Gamma}$ is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only ${\Gamma}$-invariant functions are constant. So now we get a weaker version of (4), namely

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)$

for some constants ${c_{\hat k_1, \hat k_2} \in \mathbf{T}}$.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let ${L^0({\bf X} \rightarrow {\bf T})}$ denote the space of concrete measurable maps ${\alpha}$ from ${{\bf X}}$ to ${{\bf T}}$, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup ${{\bf T}}$ of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because ${{\bf T}}$ is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction ${w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}$, that is to say a group homomorphism that is the identity on the subgroup ${{\bf T}}$. We can use this retraction, or more precisely the complement ${\alpha \mapsto \alpha - w(\alpha)}$, to eliminate the constant in (5). Indeed, if we set

$\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})$

then from (5) we see that

$\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0$

while from (2) one has

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)$

and now the previous strategy works with ${\alpha_{\hat k}}$ replaced by ${\tilde \alpha_{\hat k}}$. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to ${\hat K}$ being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation ${+: K \times K \rightarrow K}$, while still continuous, can fail to be measurable as a map from ${(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))}$ to ${(K, {\mathcal B}(K))}$! Thus for instance the sum of two measurable functions ${F: X \rightarrow K}$ need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when ${K}$ is the uncountable torus ${{\mathbf T}^{{\bf R}}}$, endowed with the product topology. Crucially, the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$ generated by this uncountable product is not the product ${{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}}$ of the factor Borel ${\sigma}$-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but ${\sigma}$-algebras do not); relating to this, the product ${\sigma}$-algebra ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$ is not the same as the Borel ${\sigma}$-algebra ${{\mathcal B}(K \times K)}$, but is instead a strict sub-algebra. If the group operations on ${K}$ were measurable, then the diagonal set

$\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}$

would be measurable in ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$. But it is an easy exercise in manipulation of ${\sigma}$-algebras to show that if ${(X, {\mathcal X}), (Y, {\mathcal Y})}$ are any two measurable spaces and ${E \subset X \times Y}$ is measurable in ${{\mathcal X} \otimes {\mathcal Y}}$, then the fibres ${E_x := \{ y \in Y: (x,y) \in E \}}$ of ${E}$ are contained in some countably generated subalgebra of ${{\mathcal Y}}$. Thus if ${K^\Delta}$ were ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$-measurable, then all the points of ${K}$ would lie in a single countably generated ${\sigma}$-algebra. But the cardinality of such an algebra is at most ${2^{\alpha_0}}$ while the cardinality of ${K}$ is ${2^{2^{\alpha_0}}}$, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give ${K}$ a coarser ${\sigma}$-algebra than the Borel ${\sigma}$-algebra, which we call the reduced ${\sigma}$-algebra ${{\mathcal B}^\otimes(K)}$, thus coarsening the measurable space structure on ${K = (K,{\mathcal B}(K))}$ to a new measurable space ${K_\otimes := (K, {\mathcal B}^\otimes(K))}$. In the case of compact Hausdorff abelian groups, ${{\mathcal B}^{\otimes}(K)}$ can be defined as the ${\sigma}$-algebra generated by the characters ${\hat k: K \rightarrow {\mathbf T}}$; for more general compact abelian groups, one can define ${{\mathcal B}^{\otimes}(K)}$ as the ${\sigma}$-algebra generated by all continuous maps into metric spaces. This ${\sigma}$-algebra is equal to ${{\mathcal B}(K)}$ when ${K}$ is metrisable but can be smaller for other ${K}$. With this measurable structure, ${K_\otimes}$ becomes a measurable group; it seems that once one leaves the metrisable world that ${K_\otimes}$ is a superior (or at least equally good) space to work with than ${K}$ for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if ${m_K}$ is the Haar probability measure on ${K}$, then ${L^2(K,m_K) = L^2(K_\otimes,m_K)}$ (thus, every ${K}$-measurable set is equivalent modulo ${m_K}$-null sets to a ${K_\otimes}$-measurable set), so there is no damage to Plancherel caused by passing to the reduced ${\sigma}$-algebra.

Passing to the reduced ${\sigma}$-algebra ${K_\otimes}$ fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of ${\sigma}$-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form ${{\mathcal X}^{\mathrm{op}}}$, where ${{\mathcal X}}$ is an (abstract) ${\sigma}$-algebra and ${\mathrm{op}}$ is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$ is an object of the form ${(T^*)^{\mathrm{op}}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is a Boolean algebra homomorphism and ${\mathrm{op}}$ is again used as a formal placeholder; we call ${T^*}$ the pullback map associated to ${T}$.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be $\sigma$-complete (i.e., it respects countable joins).] The composition ${S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ of two abstract measurable maps ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$, ${S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ is defined by the formula ${S \circ T := (T^* \circ S^*)^{\mathrm{op}}}$, or equivalently ${(S \circ T)^* = T^* \circ S^*}$.

Every concrete measurable space ${X = (X,{\mathcal X})}$ can be identified with an abstract counterpart ${{\mathcal X}^{op}}$, and similarly every concrete measurable map ${T: X \rightarrow Y}$ can be identified with an abstract counterpart ${(T^*)^{op}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is the pullback map ${T^* E := T^{-1}(E)}$. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra ${X_\mu}$ of ${X}$, defined as ${({\bf X}/{\bf N})^*}$ where ${{\bf N}}$ is the ideal of null sets in ${{\bf X}}$. Informally, ${X_\mu}$ is the space ${X}$ with all the null sets removed; there is a canonical abstract embedding map ${\iota: X_\mu \rightarrow X}$, which allows one to convert any concrete measurable map ${f: X \rightarrow Y}$ into an abstract one ${[f]: X_\mu \rightarrow Y}$. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing ${X}$ with the opposite measure algebra ${X_\mu}$; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting abstractly on a ${\sigma}$-finite measure space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Then a ${K_\otimes}$-valued abstract measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is an abstract coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map ${\alpha_{\hat k}: X_\mu \rightarrow {\bf T}}$ for each ${\hat k \in K}$ obeying the identity ${ \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0}$ for all ${\hat k_1,\hat k_2 \in \hat K}$, then there is an abstract measurable map ${F: X_\mu \rightarrow K_\otimes}$ such that ${\alpha_{\hat k} = \langle \hat k, F \rangle}$ for all ${\hat k \in \hat K}$. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the ${\sigma}$-algebra of ${X_\mu}$, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

• (i) Suppose one has an abstract measurable map ${f: X_\mu \rightarrow Y}$ into a concrete measurable space. Does there exist a representation of ${f}$ by a concrete measurable map ${\tilde f: X \rightarrow Y}$? Is it unique up to almost everywhere equivalence?
• (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

• If ${Y}$ does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If ${Y}$ is not compact or Polish, there can be counterexamples to existence.
• If ${Y}$ is a compact metric space or a Polish space, then one always has existence and uniqueness.
• If ${Y}$ is a compact Hausdorff abelian group, one always has existence.
• If ${X}$ is a complete measure space, then one always has existence (from a theorem of Maharam).
• If ${X}$ is the unit interval with the Borel ${\sigma}$-algebra and Lebesgue measure, then one has existence for all compact Hausdorff ${Y}$ assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
• For more general ${X}$, existence for all compact Hausdorff ${Y}$ is equivalent to the existence of a lifting from the ${\sigma}$-algebra ${\mathcal{X}/\mathcal{N}}$ to ${\mathcal{X}}$ (or, in the language of abstract measurable spaces, the existence of an abstract retraction from ${X}$ to ${X_\mu}$).
• It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever ${Y}$ is compact Hausdorff.

Our understanding of (ii) is much less complete:

• If ${K}$ is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
• If ${\Gamma}$ is at most countable and ${X}$ is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

of the prime number theorem (where we take the limit ${x \rightarrow \infty}$). One can interpret this estimate probabilistically as

$\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)$

where ${\mathbf{n} = \mathbf{n}_{\leq x}}$ is a random variable drawn uniformly from the natural numbers up to ${x}$, and ${{\mathbb E}}$ denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).$

After dividing by ${x}$, this is essentially

$\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).$

With probabilistic intuition, one may expect the random variables ${\mu(\mathbf{n}), \mu(\mathbf{n}+1)}$ to be approximately independent (there is no obvious relationship between the number of prime factors of ${\mathbf{n}}$, and of ${\mathbf{n}+1}$), and so the above average would be expected to be approximately equal to

$\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))$

which by (2) is equal to ${o(1)}$. Thus we are led to the prediction

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)$

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “${L^2}$” measure of independence known as covariance, and the “${L \log L}$” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of ${L^2}$ type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of ${L \log L}$ methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as ${L^2}$ methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

$\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)$

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound ${\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}$, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables ${\mathbf{X}}$ and ${\mathbf{Y}}$ of number-theoretic origin (such as the random variables ${\mu(\mathbf{n})}$ and ${\mu(\mathbf{n}+1)}$ mentioned previously), it can often be extremely difficult to determine the extent to which ${\mathbf{X},\mathbf{Y}}$ behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if ${\mathbf{Y}_1,\dots,\mathbf{Y}_k}$ are a large collection of “independent” random variables, and ${\mathbf{X}}$ is a further random variable that is “not too large” in some sense, then ${\mathbf{X}}$ must necessarily be nearly independent (or conditionally independent) to many of the ${\mathbf{Y}_i}$, even if one cannot pinpoint precisely which of the ${\mathbf{Y}_i}$ the variable ${\mathbf{X}}$ is independent with. In the case of the second moment method, this allows us to compute correlations such as ${{\mathbb E} {\mathbf X} \mathbf{Y}_i}$ for “most” ${i}$. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities ${{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )}$ for “most” ${i}$ thanks to tools such as the Pinsker inequality.

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the ${3x+1}$ conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let ${{\bf N}+1 = \{1,2,\dots\}}$ denote the positive integers (with ${{\bf N} =\{0,1,2,\dots\}}$ the natural numbers), and let ${\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1}$ be the map defined by setting ${\mathrm{Col}(N)}$ equal to ${3N+1}$ when ${N}$ is odd and ${N/2}$ when ${N}$ is even. Let ${\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)}$ be the minimal element of the Collatz orbit ${N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}$. Then we have

Conjecture 1 (Collatz conjecture) One has ${\mathrm{Col}_{\min}(N)=1}$ for all ${N \in {\bf N}+1}$.

Establishing the conjecture for all ${N}$ remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” ${N}$ in some sense. For instance, it is a result of Krasikov and Lagarias that

$\displaystyle \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}$

for all sufficiently large ${x}$. In another direction, it was shown by Terras that for almost all ${N}$ (in the sense of natural density), one has ${\mathrm{Col}_{\min}(N) < N}$. This was then improved by Allouche to ${\mathrm{Col}_{\min}(N) 0.869}$, and extended later by Korec to cover all ${\theta > \frac{\log 3}{\log 4} \approx 0.7924}$. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let ${f: {\bf N}+1 \rightarrow {\bf R}}$ be any function with ${\lim_{N \rightarrow \infty} f(N) = +\infty}$. Then we have ${\mathrm{Col}_{\min}(N) < f(N)}$ for almost all ${N}$ (in the sense of logarithmic density).

Thus for instance one has ${\mathrm{Col}_{\min}(N) < \log\log\log\log N}$ for almost all ${N}$ (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution ${\mathrm{Col}^n(N)}$ for times ${n}$ that only get as large as a small multiple ${c \log N}$ of ${\log N}$; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get ${\mathrm{Col}^n(N)}$ all the way down to ${f(N)}$ one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state ${N=O(1)}$.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if ${\theta > \frac{\log 3}{\log 4}}$ one picks at random an integer ${N}$ from a large interval ${[1,x]}$, then in most cases, the orbit of ${N}$ will eventually move into the interval ${[1,x^{\theta}]}$. Similarly, if one picks an integer ${M}$ at random from ${[1,x^\theta]}$, then in most cases, the orbit of ${M}$ will eventually move into ${[1,x^{\theta^2}]}$. It is then tempting to concatenate the two statements and conclude that for most ${N}$ in ${[1,x]}$, the orbit will eventually move ${[1,x^{\theta^2}]}$. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn ${N \in [1,x]}$ reaches ${[1,x^\theta]}$, the distribution of the final value is unlikely to be close to being uniformly distributed on ${[1,x^\theta]}$, and in particular could potentially concentrate almost entirely in the exceptional set of ${M \in [1,x^\theta]}$ that do not make it into ${[1,x^{\theta^2}]}$. The point here is the uniform measure on ${[1,x]}$ is not transported by Collatz dynamics to anything resembling the uniform measure on ${[1,x^\theta]}$.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map ${\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}$, defined on the odd numbers ${2{\bf N}+1 = \{1,3,5,\dots\}}$ by setting ${\mathrm{Syr}(N) = (3N+1)/2^a}$, where ${2^a}$ is the largest power of ${2}$ that divides ${3N+1}$. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of ${3}$ at each iteration step, which makes the map better behaved when performing “${3}$-adic” analysis.)

When viewed ${3}$-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, ${\mathrm{Syr}(N)}$ is never divisible by ${3}$. A little less obviously, ${\mathrm{Syr}(N)}$ is twice as likely to equal ${2}$ mod ${3}$ as it is to equal ${1}$ mod ${3}$. This is because for a randomly chosen odd ${\mathbf{N}}$, the number of times ${\mathbf{a}}$ that ${2}$ divides ${3\mathbf{N}+1}$ can be seen to have a geometric distribution of mean ${2}$ – it equals any given value ${a \in{\bf N}+1}$ with probability ${2^{-a}}$. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of ${3}$. For instance, one can compute that for large random odd ${\mathbf{N}}$, ${\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9}$ will take the residue classes ${0,1,2,3,4,5,6,7,8 \hbox{ mod } 9}$ with probabilities

$\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}$

respectively. More generally, for any ${n}$, ${\mathrm{Syr}^n(N) \hbox{ mod } 3^n}$ will be distributed according to the law of a random variable ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ on ${{\bf Z}/3^n{\bf Z}}$ that we call a Syracuse random variable, and can be described explicitly as

$\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)$

where ${\mathbf{a}_1,\dots,\mathbf{a}_n}$ are iid copies of a geometric random variable of mean ${2}$.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this ${3}$-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ to construct such a measure, but only if these random variables stabilise in the limit ${n \rightarrow \infty}$ in a certain total variation sense. More precisely, in the paper we establish the estimate

$\displaystyle \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)$

$\displaystyle \ll_A m^{-A}$

for any ${1 \leq m \leq n}$ and any ${A > 0}$. This type of stabilisation is plausible from entropy heuristics – the tuple ${(\mathbf{a}_1,\dots,\mathbf{a}_n)}$ of geometric random variables that generates ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ has Shannon entropy ${n \log 4}$, which is significantly larger than the total entropy ${n \log 3}$ of the uniform distribution on ${{\bf Z}/3^n{\bf Z}}$, so we expect a lot of “mixing” and “collision” to occur when converting the tuple ${(\mathbf{a}_1,\dots,\mathbf{a}_n)}$ to ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$; these heuristics can be supported by numerics (which I was able to work out up to about ${n=10}$ before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

$\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}$

are all distinct as ${(a_1,\dots,a_n)}$ vary over tuples in ${({\bf N}+1)^n}$. Unfortunately, the process of reducing mod ${3^n}$ creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

$\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)$

are mostly distinct for “typical” ${a_1,\dots,a_m}$ (as drawn using the geometric distribution) as long as ${m}$ is a bit smaller than ${\frac{\log 3}{\log 4} n}$ (basically because the rational number appearing in (3) then typically takes a form like ${M/2^{2m}}$ with ${M}$ an integer between ${0}$ and ${3^n}$). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on ${ \mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of ${{\bf Z}/3^n{\bf Z}}$ of density less than ${n^{-C}}$ for some large absolute constant ${C>0}$). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$, and more precisely in showing that

$\displaystyle \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)$

for any ${A > 0}$ and any ${\xi \in {\bf Z}/3^n{\bf Z}}$ that is not divisible by ${3}$.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming ${n}$ is even for sake of discussion) as

$\displaystyle (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots$

$\displaystyle + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n$

where ${\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}$. The point here is that after conditioning on the ${\mathbf{b}_1,\dots,\mathbf{b}_{n/2}}$ to be fixed, the random variables ${\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n}$ remain independent (though the distribution of each ${\mathbf{a}_{2j}}$ depends on the value that we conditioned ${\mathbf{b}_j}$ to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

$\displaystyle \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}$

is not close to an integer for a moderately large number (${\gg A \log n}$, to be precise) of indices ${j = 1,\dots,n/2}$. (Actually, for technical reasons we have to also restrict to those ${j}$ for which ${\mathbf{b}_j=3}$, but let us ignore this detail here.) To put it another way, if we let ${B}$ denote the set of pairs ${(j,l)}$ for which

$\displaystyle \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},$

we have to show that (with overwhelming probability) the random walk

$\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})$

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of ${B}$.

A little bit of elementary number theory and combinatorics allows one to describe the set ${B}$ as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of ${B}$ after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of ${B}$. The most difficult case is when the renewal process passes through a particularly large triangle in ${B}$. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of ${B}$ that one can finish the proof of (4), and thus Theorem 2.

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

1. Elementary multiplicative number theory
2. Complex-analytic multiplicative number theory
3. The entropy decrement argument
4. Bounds for exponential sums
5. Zero density theorems
6. Halasz’s theorem and the Matomaki-Radziwill theorem
7. The circle method
8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem

Lecture notes for topics 3, 6, and 8 will be forthcoming.

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes ${{\mathcal P} := \{2,3,5,\dots\}}$, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If ${x}$ is a large number, then the largest prime gap ${G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n}$ in ${[1,x]}$ is of size ${\asymp \log^2 x}$. (Granville refines this conjecture to ${\gtrsim \xi \log^2 x}$, where ${\xi := 2e^{-\gamma} = 1.1229\dots}$. Here we use the asymptotic notation ${X \gtrsim Y}$ for ${X \geq (1-o(1)) Y}$, ${X \sim Y}$ for ${X \gtrsim Y \gtrsim X}$, ${X \gg Y}$ for ${X \geq C^{-1} Y}$, and ${X \asymp Y}$ for ${X \gg Y \gg X}$.)

Conjecture 2 (Hardy-Littlewood conjecture) If ${\mathcal{H} := \{h_1,\dots,h_k\}}$ are fixed distinct integers, then the number of numbers ${n \in [1,x]}$ with ${n+h_1,\dots,n+h_k}$ all prime is ${({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}}$ as ${x \rightarrow \infty}$, where the singular series ${{\mathfrak S}(\mathcal{H})}$ is defined by the formula

$\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.$

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps ${p_{n+1}-p_n}$ with ${n \leq x}$ were asymptotically distributed according to an exponential distribution of mean ${\log x}$, in the sense that

$\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$ for any fixed ${\lambda \geq 0}$. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of ${\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}}$ for fixed ${k,\lambda}$, where ${p_n}$ ranges over the primes in ${[1,x]}$. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

$\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is even and

$\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is odd, for any natural number ${N}$; setting ${N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}$ and taking means, one then gets upper and lower bounds for the probability that the interval ${(p_n, p_n + \lambda \log x)}$ is free of primes. The most difficult step is to control the mean values of the singular series ${{\mathfrak S}(\mathcal{H})}$ as ${{\mathcal H}}$ ranges over ${k}$-tuples in a fixed interval such as ${[0, \lambda \log x]}$.

Heuristically, if one extrapolates the asymptotic (1) to the regime ${\lambda \asymp \log x}$, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below ${1}$ when ${\lambda}$ is significantly larger than ${\log x}$. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of ${\lambda}$, which is only enough to establish the far weaker bound ${G_{\mathcal P}(x) / \log x \rightarrow \infty}$, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on ${G_{\mathcal P}(x)}$). An inspection of the argument shows that if one wished to extend (1) to parameter choices ${\lambda}$ that were allowed to grow with ${x}$, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length ${k}$ of the tuple ${{\mathcal H} = (h_1,\dots,h_k)}$, as well as the magnitudes of the shifts ${h_1,\dots,h_k}$, were also allowed to grow with ${x}$. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

• If the Hardy-Littlewood conjecture is uniformly true for ${k}$-tuples of length ${k \ll \frac{\log x}{\log\log x}}$, and with shifts ${h_1,\dots,h_k}$ of size ${O( \log^2 x )}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}$.
• If the Hardy-Littlewood conjecture is “true on average” for ${k}$-tuples of length ${k \ll \frac{y}{\log x}}$ and shifts ${h_1,\dots,h_k}$ of size ${y}$ for all ${\log x \leq y \leq \log^2 x \log\log x}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \log^2 x}$.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with ${k}$ allowed to grow slowly with ${x}$. Again, the main difficulty is to accurately estimate average values of the singular series ${{\mathfrak S}({\mathfrak H})}$. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

$\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)$

of the singular series, for a suitable cutoff ${z}$; it turns out that when studying prime tuples of size ${t}$, the most convenient cutoff ${z(t)}$ is the “Pólya magic cutoff“, defined as the largest prime for which

$\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)$

(this is well defined for ${t \geq e^2}$); by Mertens’ theorem, we have ${z(t) \sim t^{1/e^\gamma}}$. One can interpret ${V_{\mathcal Z}(z)}$ probabilistically as

$\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )$

where ${\mathcal{S}_z \subset {\bf Z}}$ is the randomly sifted set of integers formed by removing one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p \leq z}$. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes ${{\mathcal P}}$ behave in some approximate statistical sense like the random sifted set ${\mathcal{S}_z}$, and one can prove the above theorem by using the Bonferroni inequalities both for the primes ${{\mathcal P}}$ and for the random sifted set, and comparing the two (using an even ${K}$ for the sifted set and an odd ${K}$ for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes ${{\mathcal P}}$ other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of ${{\mathcal P}}$ which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model ${{\mathcal C}}$ of the primes, in which each natural number ${n \geq 3}$ is selected for membership in ${{\mathcal C}}$ with an independent probability of ${1/\log n}$. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

$\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})$

and Cramér also showed that the largest gap ${G_{\mathcal C}(x)}$ was almost surely ${\sim \log^2 x}$. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series ${{\mathfrak S}({\mathcal H})}$ is absent.

Granville proposed a refinement ${{\mathcal G}}$ to Cramér’s random model ${{\mathcal C}}$ in which one first sieves out (in each dyadic interval ${[x,2x]}$) all residue classes ${0 \hbox{ mod } p}$ for ${p \leq A}$ for a certain threshold ${A = \log^{1-o(1)} x = o(\log x)}$, and then places each surviving natural number ${n}$ in ${{\mathcal G}}$ with an independent probability ${\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}$. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap ${G_{\mathcal G}(x)}$ in this model was almost surely ${\gtrsim \xi \log^2 x}$, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to ${10^{18}}$, for instance, have shown that ${G_{\mathcal P}(x)}$ never exceeds ${0.9206 \log^2 x}$ in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff ${A}$. After some experimentation, we were able to produce a tractable random model ${{\mathcal R}}$ for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of ${\gtrsim \xi \log^2 x}$ (we also get an upper bound of ${\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}}$ for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p}$, and as before we let ${S_z}$ be the sifted set of integers formed by deleting the residue classes ${a_p \hbox{ mod } p}$ with ${p \leq z}$. We then set

$\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}$

with ${z(t)}$ Pólya’s magic cutoff (this is the cutoff that gives ${{\mathcal R}}$ a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

$\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)$

and that the Hardy-Littlewood conjectures hold with power savings for ${k}$ up to ${\log^c x}$ for any fixed ${c < 1}$ and for shifts ${h_1,\dots,h_k}$ of size ${O(\log^c x)}$. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint ${c=1}$), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model ${{\mathcal R}}$ (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for ${{\mathcal G}_{\mathcal R}(x) }$ that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo ${p}$ from an interval ${[0,y]}$ for all primes ${p}$ up to a given threshold. The lower bound in (3) relates to the case of deleting the ${0 \hbox{ mod } p}$ residue classes from ${[0,y]}$; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of ${{\mathcal R}}$ is somewhat more complicated than of ${{\mathcal C}}$ or ${{\mathcal G}}$ as there is now non-trivial coupling between the events ${n \in {\mathcal R}}$ as ${n}$ varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval ${[0,y]}$ and randomly deletes a residue class ${a_p \hbox{ mod } p}$ for each prime ${p}$ up to a given threshold. For very small ${p}$ this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized ${p}$, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes ${p}$, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements ${n \in [0,y]}$ to primes ${p}$ if ${n}$ falls in the random residue class ${a_p \hbox{ mod } p}$) turns out to give the best results.

I’ve just uploaded to the arXiv my paper “Quantitative bounds for critically bounded solutions to the Navier-Stokes equations“, submitted to the proceedings of the Linde Hall Inaugural Math Symposium. (I unfortunately had to cancel my physical attendance at this symposium for personal reasons, but was still able to contribute to the proceedings.) In recent years I have been interested in working towards establishing the existence of classical solutions for the Navier-Stokes equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p$

$\displaystyle \nabla \cdot u = 0$

that blow up in finite time, but this time for a change I took a look at the other side of the theory, namely the conditional regularity results for this equation. There are several such results that assert that if a certain norm of the solution stays bounded (or grows at a controlled rate), then the solution stays regular; taken in the contrapositive, they assert that if a solution blows up at a certain finite time ${T_*}$, then certain norms of the solution must also go to infinity. Here are some examples (not an exhaustive list) of such blowup criteria:

• (Leray blowup criterion, 1934) If ${u}$ blows up at a finite time ${T_*}$, and ${3 < p \leq \infty}$, then ${\liminf_{t \rightarrow T_*} (T_* - t)^{\frac{1}{2}-\frac{3}{2p}} \| u(t) \|_{L^p_x({\bf R}^3)} \geq c}$ for an absolute constant ${c>0}$.
• (ProdiSerrinLadyzhenskaya blowup criterion, 1959-1967) If ${u}$ blows up at a finite time ${T_*}$, and ${3 < p \leq \infty}$, then ${\| u \|_{L^q_t L^p_x([0,T_*) \times {\bf R}^3)} =+\infty}$, where ${\frac{1}{q} := \frac{1}{2} - \frac{3}{2p}}$.
• (Beale-Kato-Majda blowup criterion, 1984) If ${u}$ blows up at a finite time ${T_*}$, then ${\| \omega \|_{L^1_t L^\infty_x([0,T_*) \times {\bf R}^3)} = +\infty}$, where ${\omega := \nabla \times u}$ is the vorticity.
• (Kato blowup criterion, 1984) If ${u}$ blows up at a finite time ${T_*}$, then ${\liminf_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} \geq c}$ for some absolute constant ${c>0}$.
• (Escauriaza-Seregin-Sverak blowup criterion, 2003) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}$.
• (Seregin blowup criterion, 2012) If ${u}$ blows up at a finite time ${T_*}$, then ${\lim_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}$.
• (Phuc blowup criterion, 2015) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{L^{3,q}_x({\bf R}^3)} = +\infty}$ for any ${q < \infty}$.
• (Gallagher-Koch-Planchon blowup criterion, 2016) If ${u}$ blows up at a finite time ${T_*}$, then ${\limsup_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty}$ for any ${3 < p, q < \infty}$.
• (Albritton blowup criterion, 2016) If ${u}$ blows up at a finite time ${T_*}$, then ${\lim_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty}$ for any ${3 < p, q < \infty}$.

My current paper is most closely related to the Escauriaza-Seregin-Sverak blowup criterion, which was the first to show a critical (i.e., scale-invariant, or dimensionless) spatial norm, namely ${L^3_x({\bf R}^3)}$, had to become large. This result now has many proofs; for instance, many of the subsequent blowup criterion results imply the Escauriaza-Seregin-Sverak one as a special case, and there are also additional proofs by Gallagher-Koch-Planchon (building on ideas of Kenig-Koch), and by Dong-Du. However, all of these proofs rely on some form of a compactness argument: given a finite time blowup, one extracts some suitable family of rescaled solutions that converges in some weak sense to a limiting solution that has some additional good properties (such as almost periodicity modulo symmetries), which one can then rule out using additional qualitative tools, such as unique continuation and backwards uniqueness theorems for parabolic heat equations. In particular, all known proofs use some version of the backwards uniqueness theorem of Escauriaza, Seregin, and Sverak. Because of this reliance on compactness, the existing proofs of the Escauriaza-Seregin-Sverak blowup criterion are qualitative, in that they do not provide any quantitative information on how fast the ${\|u(t)\|_{L^3_x({\bf R}^3)}}$ norm will go to infinity (along a subsequence of times).

On the other hand, it is a general principle that qualitative arguments established using compactness methods ought to have quantitative analogues that replace the use of compactness by more complicated substitutes that give effective bounds; see for instance these previous blog posts for more discussion. I therefore was interested in trying to obtain a quantitative version of this blowup criterion that gave reasonably good effective bounds (in particular, my objective was to avoid truly enormous bounds such as tower-exponential or Ackermann function bounds, which often arise if one “naively” tries to make a compactness argument effective). In particular, I obtained the following triple-exponential quantitative regularity bounds:

Theorem 1 If ${u}$ is a classical solution to Navier-Stokes on ${[0,T) \times {\bf R}^3}$ with

$\displaystyle \| u \|_{L^\infty_t L^3_x([0,T) \times {\bf R}^3)} \leq A \ \ \ \ \ (1)$

for some ${A \geq 2}$, then

$\displaystyle | \nabla^j u(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+1}{2}}$

and

$\displaystyle | \nabla^j \omega(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+2}{2}}$

for ${(t,x) \in [0,T) \times {\bf R}^3}$ and ${j=0,1}$.

As a corollary, one can now improve the Escauriaza-Seregin-Sverak blowup criterion to

$\displaystyle \limsup_{t \rightarrow T_*} \frac{\|u(t) \|_{L^3_x({\bf R}^3)}}{(\log\log\log \frac{1}{T_*-t})^c} = +\infty$

for some absolute constant ${c>0}$, which to my knowledge is the first (very slightly) supercritical blowup criterion for Navier-Stokes in the literature.

The proof uses many of the same quantitative inputs as previous arguments, most notably the Carleman inequalities used to establish unique continuation and backwards uniqueness theorems for backwards heat equations, but also some additional techniques that make the quantitative bounds more efficient. The proof focuses initially on points of concentration of the solution, which we define as points ${(t_0,x_0)}$ where there is a frequency ${N_0}$ for which one has the bound

$\displaystyle |N_0^{-1} P_{N_0} u(t_0,x_0)| \geq A^{-C_0} \ \ \ \ \ (2)$

for a large absolute constant ${C_0}$, where ${P_{N_0}}$ is a Littlewood-Paley projection to frequencies ${\sim N_0}$. (This can be compared with the upper bound of ${O(A)}$ for the quantity on the left-hand side that follows from (1).) The factor of ${N_0^{-1}}$ normalises the left-hand side of (2) to be dimensionless (i.e., critical). The main task is to show that the dimensionless quantity ${t_0 N_0^2}$ cannot get too large; in particular, we end up establishing a bound of the form

$\displaystyle t_0 N_0^2 \lesssim \exp\exp\exp A^{O(C_0^6)}$

from which the above theorem ends up following from a routine adaptation of the local well-posedness and regularity theory for Navier-Stokes.

The strategy is to show that any concentration such as (2) when ${t_0 N_0^2}$ is large must force a significant component of the ${L^3_x}$ norm of ${u(t_0)}$ to also show up at many other locations than ${x_0}$, which eventually contradicts (1) if one can produce enough such regions of non-trivial ${L^3_x}$ norm. (This can be viewed as a quantitative variant of the “rigidity” theorems in some of the previous proofs of the Escauriaza-Seregin-Sverak theorem that rule out solutions that exhibit too much “compactness” or “almost periodicity” in the ${L^3_x}$ topology.) The chain of causality that leads from a concentration (2) at ${(t_0,x_0)}$ to significant ${L^3_x}$ norm at other regions of the time slice ${t_0 \times {\bf R}^3}$ is somewhat involved (though simpler than the much more convoluted schemes I initially envisaged for this argument):

1. Firstly, by using Duhamel’s formula, one can show that a concentration (2) can only occur (with ${t_0 N_0^2}$ large) if there was also a preceding concentration

$\displaystyle |N_1^{-1} P_{N_1} u(t_1,x_1)| \geq A^{-C_0} \ \ \ \ \ (3)$

at some slightly previous point ${(t_1,x_1)}$ in spacetime, with ${N_1}$ also close to ${N_0}$ (more precisely, we have ${t_1 = t_0 - A^{-O(C_0^3)} N_0^{-2}}$, ${N_1 = A^{O(C_0^2)} N_0}$, and ${x_1 = x_0 + O( A^{O(C_0^4)} N_0^{-1})}$). This can be viewed as a sort of contrapositive of a “local regularity theorem”, such as the ones established by Caffarelli, Kohn, and Nirenberg. A key point here is that the lower bound ${A^{-C_0}}$ in the conclusion (3) is precisely the same as the lower bound in (2), so that this backwards propagation of concentration can be iterated.

2. Iterating the previous step, one can find a sequence of concentration points

$\displaystyle |N_n^{-1} P_{N_n} u(t_n,x_n)| \geq A^{-C_0} \ \ \ \ \ (4)$

with the ${(t_n,x_n)}$ propagating backwards in time; by using estimates ultimately resulting from the dissipative term in the energy identity, one can extract such a sequence in which the ${t_0-t_n}$ increase geometrically with time, the ${N_n}$ are comparable (up to polynomial factors in ${A}$) to the natural frequency scale ${(t_0-t_n)^{-1/2}}$, and one has ${x_n = x_0 + O( |t_0-t_n|^{1/2} )}$. Using the “epochs of regularity” theory that ultimately dates back to Leray, and tweaking the ${t_n}$ slightly, one can also place the times ${t_n}$ in intervals ${I_n}$ (of length comparable to a small multiple of ${|t_0-t_n|}$) in which the solution is quite regular (in particular, ${u, \nabla u, \omega, \nabla \omega}$ enjoy good ${L^\infty_t L^\infty_x}$ bounds on ${I_n \times {\bf R}^3}$).

3. The concentration (4) can be used to establish a lower bound for the ${L^2_t L^2_x}$ norm of the vorticity ${\omega}$ near ${(t_n,x_n)}$. As is well known, the vorticity obeys the vorticity equation

$\displaystyle \partial_t \omega = \Delta \omega - (u \cdot \nabla) \omega + (\omega \cdot \nabla) u.$

In the epoch of regularity ${I_n \times {\bf R}^3}$, the coefficients ${u, \nabla u}$ of this equation obey good ${L^\infty_x}$ bounds, allowing the machinery of Carleman estimates to come into play. Using a Carleman estimate that is used to establish unique continuation results for backwards heat equations, one can propagate this lower bound to also give lower ${L^2}$ bounds on the vorticity (and its first derivative) in annuli of the form ${\{ (t,x) \in I_n \times {\bf R}^3: R \leq |x-x_n| \leq R' \}}$ for various radii ${R,R'}$, although the lower bounds decay at a gaussian rate with ${R}$.

4. Meanwhile, using an energy pigeonholing argument of Bourgain (which, in this Navier-Stokes context, is actually an enstrophy pigeonholing argument), one can locate some annuli ${\{ x \in {\bf R}^3: R \leq |x-x_n| \leq R'\}}$ where (a slightly normalised form of) the entrosphy is small at time ${t=t_n}$; using a version of the localised enstrophy estimates from a previous paper of mine, one can then propagate this sort of control forward in time, obtaining an “annulus of regularity” of the form ${\{ (t,x) \in [t_n,t_0] \times {\bf R}^3: R_n \leq |x-x_n| \leq R'_n\}}$ in which one has good estimates; in particular, one has ${L^\infty_x}$ type bounds on ${u, \nabla u, \omega, \nabla \omega}$ in this cylindrical annulus.
5. By intersecting the previous epoch of regularity ${I_n \times {\bf R}^3}$ with the above annulus of regularity, we have some lower bounds on the ${L^2}$ norm of the vorticity (and its first derivative) in the annulus of regularity. Using a Carleman estimate first introduced by Escauriaza, Seregin, and Sverak, as well as a second application of the Carleman estimate used previously, one can then propagate this lower bound back up to time ${t=t_0}$, establishing a lower bound for the vorticity on the spatial annulus ${\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}$. By some basic Littlewood-Paley theory one can parlay this lower bound to a lower bound on the ${L^3}$ norm of the velocity ${u}$; crucially, this lower bound is uniform in ${n}$.
6. If ${t_0 N_0^2}$ is very large (triple exponential in ${A}$!), one can then find enough scales ${n}$ with disjoint ${\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}$ annuli that the total lower bound on the ${L^3_x}$ norm of ${u(t_0)}$ provided by the above arguments is inconsistent with (1), thus establishing the claim.

The chain of causality is summarised in the following image:

It seems natural to conjecture that similar triply logarithmic improvements can be made to several of the other blowup criteria listed above, but I have not attempted to pursue this question. It seems difficult to improve the triple logarithmic factor using only the techniques here; the Bourgain pigeonholing argument inevitably costs one exponential, the Carleman inequalities cost a second, and the stacking of scales at the end to contradict the ${L^3}$ upper bound costs the third.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv the short unpublished note “Eigenvectors from eigenvalues“. This note gives two proofs of a general eigenvector identity observed recently by Denton, Parke and Zhang in the course of some quantum mechanical calculations. The identity is as follows:

Theorem 1 Let ${A}$ be an ${n \times n}$ Hermitian matrix, with eigenvalues ${\lambda_1(A),\dots,\lambda_n(A)}$. Let ${v_i}$ be a unit eigenvector corresponding to the eigenvalue ${\lambda_i(A)}$, and let ${v_{i,j}}$ be the ${j^{th}}$ component of ${v_i}$. Then

$\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))$

where ${M_j}$ is the ${n-1 \times n-1}$ Hermitian matrix formed by deleting the ${j^{th}}$ row and column from ${A}$.

For instance, if we have

$\displaystyle A = \begin{pmatrix} a & X^* \\ X & M \end{pmatrix}$

for some real number ${a}$, ${n-1}$-dimensional vector ${X}$, and ${n-1 \times n-1}$ Hermitian matrix ${M}$, then we have

$\displaystyle |v_{i,1}|^2 = \frac{\prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M))}{\prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A))} \ \ \ \ \ (1)$

assuming that the denominator is non-zero.

Once one is aware of the identity, it is not so difficult to prove it; we give two proofs, each about half a page long, one of which is based on a variant of the Cauchy-Binet formula, and the other based on properties of the adjugate matrix. But perhaps it is surprising that such a formula exists at all; one does not normally expect to learn much information about eigenvectors purely from knowledge of eigenvalues. In the random matrix theory literature, for instance in this paper of Erdos, Schlein, and Yau, or this later paper of Van Vu and myself, a related identity has been used, namely

$\displaystyle |v_{i,1}|^2 = \frac{1}{1 + \| (M-\lambda_i(A))^{-1} X \|^2}, \ \ \ \ \ (2)$

but it is not immediately obvious that one can derive the former identity from the latter. (I do so below the fold; we ended up not putting this proof in the note as it was longer than the two other proofs we found. I also give two other proofs below the fold, one from a more geometric perspective and one proceeding via Cramer’s rule.) It was certainly something of a surprise to me that there is no explicit appearance of the ${a,X}$ components of ${A}$ in the formula (1) (though they do indirectly appear through their effect on the eigenvalues ${\lambda_k(A)}$; for instance from taking traces one sees that ${a = \sum_{k=1}^n \lambda_k(A) - \sum_{k=1}^{n-1} \lambda_k(M)}$).

One can get some feeling of the identity (1) by considering some special cases. Suppose for instance that ${A}$ is a diagonal matrix with all distinct entries. The upper left entry ${a}$ of ${A}$ is one of the eigenvalues of ${A}$. If it is equal to ${\lambda_i(A)}$, then the eigenvalues of ${M}$ are the other ${n-1}$ eigenvalues of ${A}$, and now the left and right-hand sides of (1) are equal to ${1}$. At the other extreme, if ${a}$ is equal to a different eigenvalue of ${A}$, then ${\lambda_i(A)}$ now appears as an eigenvalue of ${M}$, and both sides of (1) now vanish. More generally, if we order the eigenvalues ${\lambda_1(A) \leq \dots \leq \lambda_n(A)}$ and ${\lambda_1(M) \leq \dots \leq \lambda_{n-1}(M)}$, then the Cauchy interlacing inequalities tell us that

$\displaystyle 0 \leq \lambda_i(A) - \lambda_k(M) \leq \lambda_i(A) - \lambda_k(A)$

for ${1 \leq k < i}$, and

$\displaystyle \lambda_i(A) - \lambda_{k+1}(A) \leq \lambda_i(A) - \lambda_k(M) < 0$

for ${i \leq k \leq n-1}$, so that the right-hand side of (1) lies between ${0}$ and ${1}$, which is of course consistent with (1) as ${v_i}$ is a unit vector. Thus the identity relates the coefficient sizes of an eigenvector with the extent to which the Cauchy interlacing inequalities are sharp.

I have just uploaded to the arXiv my paper “Sharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint“, submitted to Mathematika. In this paper I return (after more than a decade’s absence) to one of my first research interests, namely the Kakeya and restriction family of conjectures. The starting point is the following “multilinear Kakeya estimate” first established in the non-endpoint case by Bennett, Carbery, and myself, and then in the endpoint case by Guth (with further proofs and extensions by Bourgain-Guth and Carbery-Valdimarsson:

Theorem 1 (Multilinear Kakeya estimate) Let ${\delta > 0}$ be a radius. For each ${j = 1,\dots,d}$, let ${\mathbb{T}_j}$ denote a finite family of infinite tubes ${T_j}$ in ${{\bf R}^d}$ of radius ${\delta}$. Assume the following axiom:

• (i) (Transversality) whenever ${T_j \in \mathbb{T}_j}$ is oriented in the direction of a unit vector ${n_j}$ for ${j =1,\dots,d}$, we have

$\displaystyle \left|\bigwedge_{j=1}^d n_j\right| \geq A^{-1}$

for some ${A>0}$, where we use the usual Euclidean norm on the wedge product ${\bigwedge^d {\bf R}^d}$.

Then, for any ${p \geq \frac{1}{d-1}}$, one has

$\displaystyle \left\| \prod_{j=1}^d \sum_{T_j \in \mathbb{T}_j} 1_{T_j} \right\|_{L^p({\bf R}^d)} \lesssim_{A,p} \delta^{\frac{d}{p}} \prod_{j \in [d]} \# \mathbb{T}_j. \ \ \ \ \ (1)$

where ${L^p({\bf R}^d)}$ are the usual Lebesgue norms with respect to Lebesgue measure, ${1_{T_j}}$ denotes the indicator function of ${T_j}$, and ${\# \mathbb{T}_j}$ denotes the cardinality of ${\mathbb{T}_j}$.

The original proof of this proceeded using a heat flow monotonicity method, which in my previous post I reinterpreted using a “virtual integration” concept on a fractional Cartesian product space. It turns out that this machinery is somewhat flexible, and can be used to establish some other estimates of this type. The first result of this paper is to extend the above theorem to the curved setting, in which one localises to a ball of radius ${O(1)}$ (and sets ${\delta}$ to be small), but allows the tubes ${T_j}$ to be curved in a ${C^2}$ fashion. If one runs the heat flow monotonicity argument, one now picks up some additional error terms arising from the curvature, but as the spatial scale approaches zero, the tubes become increasingly linear, and as such the error terms end up being an integrable multiple of the main term, at which point one can conclude by Gronwall’s inequality (actually for technical reasons we use a bootstrap argument instead of Gronwall). A key point in this approach is that one obtains optimal bounds (not losing factors of ${\delta^{-\varepsilon}}$ or ${\log^{O(1)} \frac{1}{\delta}}$), so long as one stays away from the endpoint case ${p=\frac{1}{d-1}}$ (which does not seem to be easily treatable by the heat flow methods). Previously, the paper of Bennett, Carbery, and myself was able to use an induction on scale argument to obtain a curved multilinear Kakeya estimate losing a factor of ${\log^{O(1)} \frac{1}{\delta}}$ (after optimising the argument); later arguments of Bourgain-Guth and Carbery-Valdimarsson, based on algebraic topology methods, could also obtain a curved multilinear Kakeya estimate without such losses, but only in the algebraic case when the tubes were neighbourhoods of algebraic curves of bounded degree.

Perhaps more interestingly, we are also able to extend the heat flow monotonicity method to apply directly to the multilinear restriction problem, giving the following global multilinear restriction estimate:

Theorem 2 (Multilinear restriction theorem) Let ${\frac{1}{d-1} < p \leq \infty}$ be an exponent, and let ${A \geq 2}$ be a parameter. Let ${M}$ be a sufficiently large natural number, depending only on ${d}$. For ${j \in [d]}$, let ${U_j}$ be an open subset of ${B^{d-1}(0,A)}$, and let ${h_j: U_j \rightarrow {\bf R}}$ be a smooth function obeying the following axioms:

• (i) (Regularity) For each ${j \in [d]}$ and ${\xi \in U_j}$, one has

$\displaystyle |\nabla_\xi^{\otimes m} \otimes h_j(\xi)| \leq A \ \ \ \ \ (2)$

for all ${1 \leq m \leq M}$.

• (ii) (Transversality) One has

$\displaystyle \left| \bigwedge_{j \in [d]} (-\nabla_\xi h_j(\xi_j),1) \right| \geq A^{-1}$

whenever ${\xi_j \in U_j}$ for ${j \in [d]}$.

Let ${U_{j,1/A} \subset U_j}$ be the sets

$\displaystyle U_{j,1/A} := \{ \xi \in U_j: B^{d-1}(\xi,1/A) \subset U_j \}. \ \ \ \ \ (3)$

Then one has

$\displaystyle \left\| \prod_{j \in [d]} {\mathcal E}_j f_j \right\|_{L^{2p}({\bf R}^d)} \leq A^{O(1)} \left(d-1-\frac{1}{p}\right)^{-O(1)} \prod_{j \in [d]} \|f_j \|_{L^2(U_{j,1/A})}$

for any ${f_j \in L^2(U_{j,1/A} \rightarrow {\bf C})}$, ${j \in [d]}$, extended by zero outside of ${U_{j,1/A}}$, and ${{\mathcal E}_j}$ denotes the extension operator

$\displaystyle {\mathcal E}_j f_j( x', x_d ) := \int_{U_j} e^{2\pi i (x' \xi^T + x_d h_j(\xi))} f_j(\xi)\ d\xi.$

Local versions of such estimate, in which ${L^{2p}({\bf R}^d)}$ is replaced with ${L^{2p}(B^d(0,R))}$ for some ${R \geq 2}$, and one accepts a loss of the form ${\log^{O(1)} R}$, were already established by Bennett, Carbery, and myself using an induction on scale argument. In a later paper of Bourgain-Guth these losses were removed by “epsilon removal lemmas” to recover Theorme 2, but only in the case when all the hypersurfaces involved had curvatures bounded away from zero.

There are two main new ingredients in the proof of Theorem 2. The first is to replace the usual induction on scales scheme to establish multilinear restriction by a “ball inflation” induction on scales scheme that more closely resembles the proof of decoupling theorems. In particular, we actually prove the more general family of estimates

$\displaystyle \left\| \prod_{j \in [d]} E_{r}[{\mathcal E}_j f_j] \right\|_{L^{p}({\bf R}^d)} \leq A^{O(1)} \left(d-1 - \frac{1}{p}\right)^{O(1)} r^{\frac{d}{p}} \prod_{j \in [d]} \| f_j \|_{L^2(U_{j,1/A})}^2$

where ${E_r}$ denotes the local energies

$\displaystyle E_{r}[f](x',x_d) := \int_{B^{d-1}(x',r)} |f(y',x_d)|^2\ dy'$

(actually for technical reasons it is more convenient to use a smoother weight than the strict cutoff to the disk ${B^{d-1}(x',r)}$). With logarithmic losses, it is not difficult to establish this estimate by an upward induction on ${r}$. To avoid such losses we use the heat flow monotonicity method. Here we run into the issue that the extension operators ${{\mathcal E}_j f_j}$ are complex-valued rather than non-negative, and thus would not be expected to obey many good montonicity properties. However, the local energies ${E_r[{\mathcal E}_j f_j]}$ can be expressed in terms of the magnitude squared of what is essentially the Gabor transform of ${{\mathcal E}_j f_j}$, and these are non-negative; furthermore, the dispersion relation associated to the extension operators ${{\mathcal E}_j f_j}$ implies that these Gabor transforms propagate along tubes, so that the situation becomes quite similar (up to several additional lower order error terms) to that in the multilinear Kakeya problem. (This can be viewed as a continuous version of the usual wave packet decomposition method used to relate restriction and Kakeya problems, which when combined with the heat flow monotonicity method allows for one to use a continuous version of induction on scales methods that do not concede any logarithmic factors.)

Finally, one can combine the curved multilinear Kakeya result with the multilinear restriction result to obtain estimates for multilinear oscillatory integrals away from the endpoint. Again, this sort of implication was already established in the previous paper of Bennett, Carbery, and myself, but the arguments there had some epsilon losses in the exponents; here we were able to run the argument more carefully and avoid these losses.

Earlier this month, Hao Huang (who, incidentally, was a graduate student here at UCLA) gave a remarkably short proof of a long-standing problem in theoretical computer science known as the sensitivity conjecture. See for instance this blog post of Gil Kalai for further discussion and links to many other online discussions of this result. One formulation of the theorem proved is as follows. Define the ${n}$-dimensional hypercube graph ${Q_n}$ to be the graph with vertex set ${({\bf Z}/2{\bf Z})^n}$, and with every vertex ${v \in ({\bf Z}/2{\bf Z})^n}$ joined to the ${n}$ vertices ${v + e_1,\dots,v+e_n}$, where ${e_1,\dots,e_n}$ is the standard basis of ${({\bf Z}/2{\bf Z})^n}$.

Theorem 1 (Lower bound on maximum degree of induced subgraphs of hypercube) Let ${E}$ be a set of at least ${2^{n-1}+1}$ vertices in ${Q_n}$. Then there is a vertex in ${E}$ that is adjacent (in ${Q_n}$) to at least ${\sqrt{n}}$ other vertices in ${E}$.

The bound ${\sqrt{n}}$ (or more precisely, ${\lceil \sqrt{n} \rceil}$) is completely sharp, as shown by Chung, Furedi, Graham, and Seymour; we describe this example below the fold. When combined with earlier reductions of Gotsman-Linial and Nisan-Szegedy; we give these below the fold also.

Let ${A = (a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}}$ be the adjacency matrix of ${Q_n}$ (where we index the rows and columns directly by the vertices in ${({\bf Z}/2{\bf Z})^n}$, rather than selecting some enumeration ${1,\dots,2^n}$), thus ${a_{vw}=1}$ when ${w = v+e_i}$ for some ${i=1,\dots,n}$, and ${a_{vw}=0}$ otherwise. The above theorem then asserts that if ${E}$ is a set of at least ${2^{n-1}+1}$ vertices, then the ${E \times E}$ minor ${(a_{vw})_{v,w \in E}}$ of ${A}$ has a row (or column) that contains at least ${\sqrt{n}}$ non-zero entries.

The key step to prove this theorem is the construction of rather curious variant ${\tilde A}$ of the adjacency matrix ${A}$:

Proposition 2 There exists a ${({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n}$ matrix ${\tilde A = (\tilde a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}}$ which is entrywise dominated by ${A}$ in the sense that

$\displaystyle |\tilde a_{vw}| \leq a_{vw} \hbox{ for all } v,w \in ({\bf Z}/2{\bf Z})^n \ \ \ \ \ (1)$

and such that ${\tilde A}$ has ${\sqrt{n}}$ as an eigenvalue with multiplicity ${2^{n-1}}$.

Assuming this proposition, the proof of Theorem 1 can now be quickly concluded. If we view ${\tilde A}$ as a linear operator on the ${2^n}$-dimensional space ${\ell^2(({\bf Z}/2{\bf Z})^n)}$ of functions of ${({\bf Z}/2{\bf Z})^n}$, then by hypothesis this space has a ${2^{n-1}}$-dimensional subspace ${V}$ on which ${\tilde A}$ acts by multiplication by ${\sqrt{n}}$. If ${E}$ is a set of at least ${2^{n-1}+1}$ vertices in ${Q_n}$, then the space ${\ell^2(E)}$ of functions on ${E}$ has codimension at most ${2^{n-1}-1}$ in ${\ell^2(({\bf Z}/2{\bf Z})^n)}$, and hence intersects ${V}$ non-trivially. Thus the ${E \times E}$ minor ${\tilde A_E}$ of ${\tilde A}$ also has ${\sqrt{n}}$ as an eigenvalue (this can also be derived from the Cauchy interlacing inequalities), and in particular this minor has operator norm at least ${\sqrt{n}}$. By Schur’s test, this implies that one of the rows or columns of this matrix has absolute values summing to at least ${\sqrt{n}}$, giving the claim.

Remark 3 The argument actually gives a strengthening of Theorem 1: there exists a vertex ${v_0}$ of ${E}$ with the property that for every natural number ${k}$, there are at least ${n^{k/2}}$ paths of length ${k}$ in the restriction ${Q_n|_E}$ of ${Q_n}$ to ${E}$ that start from ${v_0}$. Indeed, if we let ${(u_v)_{v \in E}}$ be an eigenfunction of ${\tilde A}$ on ${\ell^2(E)}$, and let ${v_0}$ be a vertex in ${E}$ that maximises the value of ${|u_{v_0}|}$, then for any ${k}$ we have that the ${v_0}$ component of ${\tilde A_E^k (u_v)_{v \in E}}$ is equal to ${n^{k/2} |u_{v_0}|}$; on the other hand, by the triangle inequality, this component is at most ${|u_{v_0}|}$ times the number of length ${k}$ paths in ${Q_n|_E}$ starting from ${v_0}$, giving the claim.

This argument can be viewed as an instance of a more general “interlacing method” to try to control the behaviour of a graph ${G}$ on all large subsets ${E}$ by first generating a matrix ${\tilde A}$ on ${G}$ with very good spectral properties, which are then partially inherited by the ${E \times E}$ minor of ${\tilde A}$ by interlacing inequalities. In previous literature using this method (see e.g., this survey of Haemers, or this paper of Wilson), either the original adjacency matrix ${A}$, or some non-negatively weighted version of that matrix, was used as the controlling matrix ${\tilde A}$; the novelty here is the use of signed controlling matrices. It will be interesting to see what further variants and applications of this method emerge in the near future. (Thanks to Anurag Bishoi in the comments for these references.)

The “magic” step in the above argument is constructing ${\tilde A}$. In Huang’s paper, ${\tilde A}$ is constructed recursively in the dimension ${n}$ in a rather simple but mysterious fashion. Very recently, Roman Karasev gave an interpretation of this matrix in terms of the exterior algebra on ${{\bf R}^n}$. In this post I would like to give an alternate interpretation in terms of the operation of twisted convolution, which originated in the theory of the Heisenberg group in quantum mechanics.

Firstly note that the original adjacency matrix ${A}$, when viewed as a linear operator on ${\ell^2(({\bf Z}/2{\bf Z})^n)}$, is a convolution operator

$\displaystyle A f = f * \mu$

where

$\displaystyle \mu(x) := \sum_{i=1}^n 1_{x=e_i}$

is the counting measure on the standard basis ${e_1,\dots,e_n}$, and ${*}$ denotes the ordinary convolution operation

$\displaystyle f * g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} f(y) g(x-y) = \sum_{y_1+y_2 = x} f(y_1) g(y_2).$

As is well known, this operation is commutative and associative. Thus for instance the square ${A^2}$ of the adjacency operator ${A}$ is also a convolution operator

$\displaystyle A^2 f = f * (\mu * \mu)(x)$

where the convolution kernel ${\mu * \mu}$ is moderately complicated:

$\displaystyle \mu*\mu(x) = n \times 1_{x=0} + \sum_{1 \leq i < j \leq n} 2 \times 1_{x = e_i + e_j}.$

The factor ${2}$ in this expansion comes from combining the two terms ${1_{x=e_i} * 1_{x=e_j}}$ and ${1_{x=e_j} * 1_{x=e_i}}$, which both evaluate to ${1_{x=e_i+e_j}}$.

More generally, given any bilinear form ${B: ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n \rightarrow {\bf Z}/2{\bf Z}}$, one can define the twisted convolution

$\displaystyle f *_B g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} (-1)^{B(y,x-y)} f(y) g(x-y)$

$\displaystyle = \sum_{y_1+y_2=x} (-1)^{B(y_1,y_2)} f(y_1) g(y_2)$

of two functions ${f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}$. This operation is no longer commutative (unless ${B}$ is symmetric). However, it remains associative; indeed, one can easily compute that

$\displaystyle (f *_B g) *_B h(x) = f *_B (g *_B h)(x)$

$\displaystyle = \sum_{y_1+y_2+y_3=x} (-1)^{B(y_1,y_2)+B(y_1,y_3)+B(y_2,y_3)} f(y_1) g(y_2) h(y_3).$

In particular, if we define the twisted convolution operator

$\displaystyle A_B f(x) := f *_B \mu(x)$

then the square ${A_B^2}$ is also a twisted convolution operator

$\displaystyle A_B^2 f = f *_B (\mu *_B \mu)$

and the twisted convolution kernel ${\mu *_B \mu}$ can be computed as

$\displaystyle \mu *_B \mu(x) = (\sum_{i=1}^n (-1)^{B(e_i,e_i)}) 1_{x=0}$

$\displaystyle + \sum_{1 \leq i < j \leq n} ((-1)^{B(e_i,e_j)} + (-1)^{B(e_j,e_i)}) 1_{x=e_i+e_j}.$

For general bilinear forms ${B}$, this twisted convolution is just as messy as ${\mu * \mu}$ is. But if we take the specific bilinear form

$\displaystyle B(x,y) := \sum_{1 \leq i < j \leq n} x_i y_j \ \ \ \ \ (2)$

then ${B(e_i,e_i)=0}$ for ${1 \leq i \leq n}$ and ${B(e_i,e_j)=1, B(e_j,e_i)=0}$ for ${1 \leq i < j \leq n}$, and the above twisted convolution simplifies to

$\displaystyle \mu *_B \mu(x) = n 1_{x=0}$

and now ${A_B^2}$ is very simple:

$\displaystyle A_B^2 f = n f.$

Thus the only eigenvalues of ${A_B}$ are ${+\sqrt{n}}$ and ${-\sqrt{n}}$. The matrix ${A_B}$ is entrywise dominated by ${A}$ in the sense of (1), and in particular has trace zero; thus the ${+\sqrt{n}}$ and ${-\sqrt{n}}$ eigenvalues must occur with equal multiplicity, so in particular the ${+\sqrt{n}}$ eigenvalue occurs with multiplicity ${2^{n-1}}$ since the matrix has dimensions ${2^n \times 2^n}$. This establishes Proposition 2.

Remark 4 Twisted convolution ${*_B}$ is actually just a component of ordinary convolution, but not on the original group ${({\bf Z}/2{\bf Z})^n}$; instead it relates to convolution on a Heisenberg group extension of this group. More specifically, define the Heisenberg group ${H}$ to be the set of pairs ${(x, t) \in ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})}$ with group law

$\displaystyle (x,t) \cdot (y,s) := (x+y, t+s+B(x,y))$

and inverse operation

$\displaystyle (x,t)^{-1} = (-x, -t+B(x,x))$

(one can dispense with the negative signs here if desired, since we are in characteristic two). Convolution on ${H}$ is defined in the usual manner: one has

$\displaystyle F*G( (x,t) ) := \sum_{(y,s) \in H} F(y,s) G( (y,s)^{-1} (x,t) )$

for any ${F,G \in \ell^2(H)}$. Now if ${f \in \ell^2(({\bf Z}/2{\bf Z})^n)}$ is a function on the original group ${({\bf Z}/2{\bf Z})^n}$, we can define the lift ${\tilde f \in \ell^2(H)}$ by the formula

$\displaystyle \tilde f(x,t) := (-1)^t f(x)$

and then by chasing all the definitions one soon verifies that

$\displaystyle \tilde f * \tilde g = 2 \widetilde{f *_B g}$

for any ${f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}$, thus relating twisted convolution ${*_B}$ to Heisenberg group convolution ${*}$.

Remark 5 With the twisting by the specific bilinear form ${B}$ given by (2), convolution by ${1_{x=e_i}}$ and ${1_{x=e_j}}$ now anticommute rather than commute. This makes the twisted convolution algebra ${(\ell^2(({\bf Z}/2{\bf Z})^n), *_B)}$ isomorphic to a Clifford algebra ${Cl({\bf R}^n,I_n)}$ (the real or complex algebra generated by formal generators ${v_1,\dots,v_n}$ subject to the relations ${(v_iv_j+v_jv_i)/2 = 1_{i=j}}$ for ${i,j=1,\dots,n}$) rather than the commutative algebra more familiar to abelian Fourier analysis. This connection to Clifford algebra (also observed independently by Tom Mrowka and by Daniel Matthews) may be linked to the exterior algebra interpretation of the argument in the recent preprint of Karasev mentioned above.

Remark 6 One could replace the form (2) in this argument by any other bilinear form ${B'}$ that obeyed the relations ${B'(e_i,e_i)=0}$ and ${B'(e_i,e_j) + B'(e_j,e_i)=1}$ for ${i \neq j}$. However, this additional level of generality does not add much; any such ${B'}$ will differ from ${B}$ by an antisymmetric form ${C}$ (so that ${C(x,x) = 0}$ for all ${x}$, which in characteristic two implied that ${C(x,y) = C(y,x)}$ for all ${x,y}$), and such forms can always be decomposed as ${C(x,y) = C'(x,y) + C'(y,x)}$, where ${C'(x,y) := \sum_{i. As such, the matrices ${A_B}$ and ${A_{B'}}$ are conjugate, with the conjugation operator being the diagonal matrix with entries ${(-1)^{C'(x,x)}}$ at each vertex ${x}$.

Remark 7 (Added later) This remark combines the two previous remarks. One can view any of the matrices ${A_{B'}}$ in Remark 6 as components of a single canonical matrix ${A_{Cl}}$ that is still of dimensions ${({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n}$, but takes values in the Clifford algebra ${Cl({\bf R}^n,I_n)}$ from Remark 5; with this “universal algebra” perspective, one no longer needs to make any arbitrary choices of form ${B}$. More precisely, let ${\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}$ denote the vector space of functions ${f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)}$ from the hypercube to the Clifford algebra; as a real vector space, this is a ${2^{2n}}$ dimensional space, isomorphic to the direct sum of ${2^n}$ copies of ${\ell^2(({\bf Z}/2{\bf Z})^n)}$, as the Clifford algebra is itself ${2^n}$ dimensional. One can then define a canonical Clifford adjacency operator ${A_{Cl}}$ on this space by

$\displaystyle A_{Cl} f(x) := \sum_{i=1}^n f(x+e_i) v_i$

where ${v_1,\dots,v_n}$ are the generators of ${Cl({\bf R}^n,I_n)}$. This operator can either be identified with a Clifford-valued ${2^n \times 2^n}$ matrix or as a real-valued ${2^{2n} \times 2^{2n}}$ matrix. In either case one still has the key algebraic relations ${A_{Cl}^2 = n}$ and ${\mathrm{tr} A_{Cl} = 0}$, ensuring that when viewed as a real ${2^{2n} \times 2^{2n}}$ matrix, half of the eigenvalues are equal to ${+\sqrt{n}}$ and half equal to ${-\sqrt{n}}$. One can then use this matrix in place of any of the ${A_{B'}}$ to establish Theorem 1 (noting that Schur’s test continues to work for Clifford-valued matrices because of the norm structure on ${Cl({\bf R}^n,I_n)}$).

To relate ${A_{Cl}}$ to the real ${2^n \times 2^n}$ matrices ${A_{B'}}$, first observe that each point ${x}$ in the hypercube ${({\bf Z}/2{\bf Z})^n}$ can be associated with a one-dimensional real subspace ${\ell_x}$ (i.e., a line) in the Clifford algebra ${Cl({\bf R}^n,I_n)}$ by the formula

$\displaystyle \ell_{e_{i_1} + \dots + e_{i_k}} := \mathrm{span}_{\bf R}( v_{i_1} \dots v_{i_k} )$

for any ${i_1,\dots,i_k \in \{1,\dots,n\}}$ (note that this definition is well-defined even if the ${i_1,\dots,i_k}$ are out of order or contain repetitions). This can be viewed as a discrete line bundle over the hypercube. Since ${\ell_{x+e_i} = \ell_x e_i}$ for any ${i}$, we see that the ${2^n}$-dimensional real linear subspace ${V}$ of ${\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}$ of sections of this bundle, that is to say the space of functions ${f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)}$ such that ${f(x) \in \ell_x}$ for all ${x \in ({\bf Z}/2{\bf Z})^n}$, is an invariant subspace of ${A_{Cl}}$. (Indeed, using the left-action of the Clifford algebra on ${\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}$, which commutes with ${A_{Cl}}$, one can naturally identify ${\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}$ with ${Cl({\bf R}^n,I_n) \otimes V}$, with the left action of ${Cl({\bf R}^n,I_n)}$ acting purely on the first factor and ${A_{Cl}}$ acting purely on the second factor.) Any trivialisation of this line bundle lets us interpret the restriction ${A_{Cl}|_V}$ of ${A_{Cl}}$ to ${V}$ as a real ${2^n \times 2^n}$ matrix. In particular, given one of the bilinear forms ${B'}$ from Remark 6, we can identify ${V}$ with ${\ell^2(({\bf Z}/2{\bf Z})^n)}$ by identifying any real function ${f \in \ell^2( ({\bf Z}/2{\bf Z})^n)}$ with the lift ${\tilde f \in V}$ defined by

$\displaystyle \tilde f(e_{i_1} + \dots + e_{i_k}) := (-1)^{\sum_{1 \leq j < j' \leq k} B'(e_{i_j}, e_{i_{j'}})}$

$\displaystyle f(e_{i_1} + \dots + e_{i_k}) v_{i_1} \dots v_{i_k}$

whenever ${1 \leq i_1 < \dots < i_k \leq n}$. A somewhat tedious computation using the properties of ${B'}$ then eventually gives the intertwining identity

$\displaystyle A_{Cl} \tilde f = \widetilde{A_{B'} f}$

and so ${A_{B'}}$ is conjugate to ${A_{Cl}|_V}$.