You are currently browsing the category archive for the ‘expository’ category.

The celebrated decomposition theorem of Fefferman and Stein shows that every function ${f \in \mathrm{BMO}({\bf R}^n)}$ of bounded mean oscillation can be decomposed in the form

$\displaystyle f = f_0 + \sum_{i=1}^n R_i f_i \ \ \ \ \ (1)$

modulo constants, for some ${f_0,f_1,\dots,f_n \in L^\infty({\bf R}^n)}$, where ${R_i := |\nabla|^{-1} \partial_i}$ are the Riesz transforms. A technical note here a function in BMO is defined only up to constants (as well as up to the usual almost everywhere equivalence); related to this, if ${f_i}$ is an ${L^\infty({\bf R}^n)}$ function, then the Riesz transform ${R_i f_i}$ is well defined as an element of ${\mathrm{BMO}({\bf R}^n)}$, but is also only defined up to constants and almost everywhere equivalence.

The original proof of Fefferman and Stein was indirect (relying for instance on the Hahn-Banach theorem). A constructive proof was later given by Uchiyama, and was in fact the topic of the second post on this blog. A notable feature of Uchiyama’s argument is that the construction is quite nonlinear; the vector-valued function ${(f_0,f_1,\dots,f_n)}$ is defined to take values on a sphere, and the iterative construction to build these functions from ${f}$ involves repeatedly projecting a potential approximant to this function to the sphere (also, the high-frequency components of this approximant are constructed in a manner that depends nonlinearly on the low-frequency components, which is a type of technique that has become increasingly common in analysis and PDE in recent years).

It is natural to ask whether the Fefferman-Stein decomposition (1) can be made linear in ${f}$, in the sense that each of the ${f_i, i=0,\dots,n}$ depend linearly on ${f}$. Strictly speaking this is easily accomplished using the axiom of choice: take a Hamel basis of ${\mathrm{BMO}({\bf R}^n)}$, choose a decomposition (1) for each element of this basis, and then extend linearly to all finite linear combinations of these basis functions, which then cover ${\mathrm{BMO}({\bf R}^n)}$ by definition of Hamel basis. But these linear operations have no reason to be continuous as a map from ${\mathrm{BMO}({\bf R}^n)}$ to ${L^\infty({\bf R}^n)}$. So the correct question is whether the decomposition can be made continuously linear (or equivalently, boundedly linear) in ${f}$, that is to say whether there exist continuous linear transformations ${T_i: \mathrm{BMO}({\bf R}^n) \rightarrow L^\infty({\bf R}^n)}$ such that

$\displaystyle f = T_0 f + \sum_{i=1}^n R_i T_i f \ \ \ \ \ (2)$

modulo constants for all ${f \in \mathrm{BMO}({\bf R}^n)}$. Note from the open mapping theorem that one can choose the functions ${f_0,\dots,f_n}$ to depend in a bounded fashion on ${f}$ (thus ${\|f_i\|_{L^\infty} \leq C \|f\|_{BMO}}$ for some constant ${C}$, however the open mapping theorem does not guarantee linearity. Using a result of Bartle and Graves one can also make the ${f_i}$ depend continuously on ${f}$, but again the dependence is not guaranteed to be linear.

It is generally accepted folklore that continuous linear dependence is known to be impossible, but I had difficulty recently tracking down an explicit proof of this assertion in the literature (if anyone knows of a reference, I would be glad to know of it). The closest I found was a proof of a similar statement in this paper of Bourgain and Brezis, which I was able to adapt to establish the current claim. The basic idea is to average over the symmetries of the decomposition, which in the case of (1) are translation invariance, rotation invariance, and dilation invariance. This effectively makes the operators ${T_0,T_1,\dots,T_n}$ invariant under all these symmetries, which forces them to themselves be linear combinations of the identity and Riesz transform operators; however, no such non-trivial linear combination maps ${\mathrm{BMO}}$ to ${L^\infty}$, and the claim follows. Formal details of this argument (which we phrase in a dual form in order to avoid some technicalities) appear below the fold.

Let ${k}$ be a field, and let ${E}$ be a finite extension of that field; in this post we will denote such a relationship by ${k \hookrightarrow E}$. We say that ${E}$ is a Galois extension of ${k}$ if the cardinality of the automorphism group ${\mathrm{Aut}(E/k)}$ of ${E}$ fixing ${k}$ is as large as it can be, namely the degree ${[E:k]}$ of the extension. In that case, we call ${\mathrm{Aut}(E/k)}$ the Galois group of ${E}$ over ${k}$ and denote it also by ${\mathrm{Gal}(E/k)}$. The fundamental theorem of Galois theory then gives a one-to-one correspondence (also known as the Galois correspondence) between the intermediate extensions between ${E}$ and ${k}$ and the subgroups of ${\mathrm{Gal}(E/k)}$:

Theorem 1 (Fundamental theorem of Galois theory) Let ${E}$ be a Galois extension of ${k}$.

• (i) If ${k \hookrightarrow F \hookrightarrow E}$ is an intermediate field betwen ${k}$ and ${E}$, then ${E}$ is a Galois extension of ${F}$, and ${\mathrm{Gal}(E/F)}$ is a subgroup of ${\mathrm{Gal}(E/k)}$.
• (ii) Conversely, if ${H}$ is a subgroup of ${\mathrm{Gal}(E/k)}$, then there is a unique intermediate field ${k \hookrightarrow F \hookrightarrow E}$ such that ${\mathrm{Gal}(E/F)=H}$; namely ${F}$ is the set of elements of ${E}$ that are fixed by ${H}$.
• (iii) If ${k \hookrightarrow F_1 \hookrightarrow E}$ and ${k \hookrightarrow F_2 \hookrightarrow E}$, then ${F_1 \hookrightarrow F_2}$ if and only if ${\mathrm{Gal}(E/F_2)}$ is a subgroup of ${\mathrm{Gal}(E/F_1)}$.
• (iv) If ${k \hookrightarrow F \hookrightarrow E}$ is an intermediate field between ${k}$ and ${E}$, then ${F}$ is a Galois extension of ${k}$ if and only if ${\mathrm{Gal}(E/F)}$ is a normal subgroup of ${\mathrm{Gal}(E/k)}$. In that case, ${\mathrm{Gal}(F/k)}$ is isomorphic to the quotient group ${\mathrm{Gal}(E/k) / \mathrm{Gal}(E/F)}$.

Example 2 Let ${k= {\bf Q}}$, and let ${E = {\bf Q}(e^{2\pi i/n})}$ be the degree ${\phi(n)}$ Galois extension formed by adjoining a primitive ${n^{th}}$ root of unity (that is to say, ${E}$ is the cyclotomic field of order ${n}$). Then ${\mathrm{Gal}(E/k)}$ is isomorphic to the multiplicative cyclic group ${({\bf Z}/n{\bf Z})^\times}$ (the invertible elements of the ring ${{\bf Z}/n{\bf Z}}$). Amongst the intermediate fields, one has the cyclotomic fields of the form ${F = {\bf Q}(e^{2\pi i/m})}$ where ${m}$ divides ${n}$; they are also Galois extensions, with ${\mathrm{Gal}(F/k)}$ isomorphic to ${({\bf Z}/m{\bf Z})^\times}$ and ${\mathrm{Gal}(E/F)}$ isomorphic to the elements ${a}$ of ${({\bf Z}/n{\bf Z})^\times}$ such that ${a(n/m) = (n/m)}$ modulo ${n}$. (There can also be other intermediate fields, corresponding to other subgroups of ${({\bf Z}/n{\bf Z})^\times}$.)

Example 3 Let ${k = {\bf C}(z)}$ be the field of rational functions of one indeterminate ${z}$ with complex coefficients, and let ${E = {\bf C}(w)}$ be the field formed by adjoining an ${n^{th}}$ root ${w = z^{1/n}}$ to ${k}$, thus ${k = {\bf C}(w^n)}$. Then ${E}$ is a degree ${n}$ Galois extension of ${k}$ with Galois group isomorphic to ${{\bf Z}/n{\bf Z}}$ (with an element ${a \in {\bf Z}/n{\bf Z}}$ corresponding to the field automorphism of ${k}$ that sends ${w}$ to ${e^{2\pi i a/n} w}$). The intermediate fields are of the form ${F = {\bf C}(w^{n/m})}$ where ${m}$ divides ${n}$; they are also Galois extensions, with ${\mathrm{Gal}(F/k)}$ isomorphic to ${{\bf Z}/m{\bf Z}}$ and ${\mathrm{Gal}(E/F)}$ isomorphic to the multiples of ${m}$ in ${{\bf Z}/n{\bf Z}}$.

There is an analogous Galois correspondence in the covering theory of manifolds. For simplicity we restrict attention to finite covers. If ${L}$ is a connected manifold and ${\pi_{L \leftarrow M}: M \rightarrow L}$ is a finite covering map of ${L}$ by another connected manifold ${M}$, we denote this relationship by ${L \leftarrow M}$. (Later on we will change our function notations slightly and write ${\pi_{L \leftarrow M}: L \leftarrow M}$ in place of the more traditional ${\pi_{L \leftarrow M}: M \rightarrow L}$, and similarly for the deck transformations ${g: M \leftarrow M}$ below; more on this below the fold.) If ${L \leftarrow M}$, we can define ${\mathrm{Aut}(M/L)}$ to be the group of deck transformations: continuous maps ${g: M \rightarrow M}$ which preserve the fibres of ${\pi}$. We say that this covering map is a Galois cover if the cardinality of the group ${\mathrm{Aut}(M/L)}$ is as large as it can be. In that case we call ${\mathrm{Aut}(M/L)}$ the Galois group of ${M}$ over ${L}$ and denote it by ${\mathrm{Gal}(M/L)}$.

Suppose ${M}$ is a finite cover of ${L}$. An intermediate cover ${N}$ between ${M}$ and ${L}$ is a cover of ${N}$ by ${L}$, such that ${L \leftarrow N \leftarrow M}$, in such a way that the covering maps are compatible, in the sense that ${\pi_{L \leftarrow M}}$ is the composition of ${\pi_{L \leftarrow N}}$ and ${\pi_{N \leftarrow M}}$. This sort of compatibilty condition will be implicitly assumed whenever we chain together multiple instances of the ${\leftarrow}$ notation. Two intermediate covers ${N,N'}$ are equivalent if they cover each other, in a fashion compatible with all the other covering maps, thus ${L \leftarrow N \leftarrow N' \leftarrow M}$ and ${L \leftarrow N' \leftarrow N \leftarrow M}$. We then have the analogous Galois correspondence:

Theorem 4 (Fundamental theorem of covering spaces) Let ${L \leftarrow M}$ be a Galois covering.

• (i) If ${L \leftarrow N \leftarrow M}$ is an intermediate cover betwen ${L}$ and ${M}$, then ${M}$ is a Galois extension of ${N}$, and ${\mathrm{Gal}(M/N)}$ is a subgroup of ${\mathrm{Gal}(M/L)}$.
• (ii) Conversely, if ${H}$ is a subgroup of ${\mathrm{Gal}(M/L)}$, then there is a intermediate cover ${L \leftarrow N \leftarrow M}$, unique up to equivalence, such that ${\mathrm{Gal}(M/N)=H}$.
• (iii) If ${L \leftarrow N_1 \leftarrow M}$ and ${L \leftarrow N_2 \leftarrow M}$, then ${L \leftarrow N_1 \leftarrow N_2 \leftarrow M}$ if and only if ${\mathrm{Gal}(M/N_2)}$ is a subgroup of ${\mathrm{Gal}(M/N_1)}$.
• (iv) If ${L \leftarrow N \leftarrow M}$, then ${N}$ is a Galois cover of ${L}$ if and only if ${\mathrm{Gal}(M/N)}$ is a normal subgroup of ${\mathrm{Gal}(M/L)}$. In that case, ${\mathrm{Gal}(N/L)}$ is isomorphic to the quotient group ${\mathrm{Gal}(M/L) / \mathrm{Gal}(N/L)}$.

Example 5 Let ${L= {\bf C}^\times := {\bf C} \backslash \{0\}}$, and let ${M = {\bf C}^\times}$ be the ${n}$-fold cover of ${L}$ with covering map ${\pi_{L \leftarrow M}(w) := w^n}$. Then ${M}$ is a Galois cover of ${L}$, and ${\mathrm{Gal}(M/L)}$ is isomorphic to the cyclic group ${{\bf Z}/n{\bf Z}}$. The intermediate covers are (up to equivalence) of the form ${N = {\bf C}^\times}$ with covering map ${\pi_{L \leftarrow N}(u) := u^m}$ where ${m}$ divides ${n}$; they are also Galois covers, with ${\mathrm{Gal}(N/L)}$ isomorphic to ${{\bf Z}/m{\bf Z}}$ and ${\mathrm{Gal}(M/N)}$ isomorphic to the multiples of ${m}$ in ${{\bf Z}/n{\bf Z}}$.

Given the strong similarity between the two theorems, it is natural to ask if there is some more concrete connection between Galois theory and the theory of finite covers.

In one direction, if the manifolds ${L,M,N}$ have an algebraic structure (or a complex structure), then one can relate covering spaces to field extensions by considering the field of rational functions (or meromorphic functions) on the space. For instance, if ${L = {\bf C}^\times}$ and ${z}$ is the coordinate on ${L}$, one can consider the field ${{\bf C}(z)}$ of rational functions on ${L}$; the ${n}$-fold cover ${M = {\bf C}^\times}$ with coordinate ${w}$ from Example 5 similarly has a field ${{\bf C}(w)}$ of rational functions. The covering ${\pi_{L \leftarrow M}(w) = w^n}$ relates the two coordinates ${z,w}$ by the relation ${z = w^n}$, at which point one sees that the rational functions ${{\bf C}(w)}$ on ${L}$ are a degree ${n}$ extension of that of ${{\bf C}(z)}$ (formed by adjoining the ${n^{th}}$ root of unity ${w}$ to ${z}$). In this way we see that Example 5 is in fact closely related to Example 3.

Exercise 6 What happens if one uses meromorphic functions in place of rational functions in the above example? (To answer this question, I found it convenient to use a discrete Fourier transform associated to the multiplicative action of the ${n^{th}}$ roots of unity on ${M}$ to decompose the meromorphic functions on ${M}$ as a linear combination of functions invariant under this action, times a power ${w^j}$ of the coordinate ${w}$ for ${j=0,\dots,n-1}$.)

I was curious however about the reverse direction. Starting with some field extensions ${k \hookrightarrow F \hookrightarrow E}$, is it is possible to create manifold like spaces ${M_k \leftarrow M_F \leftarrow M_E}$ associated to these fields in such a fashion that (say) ${M_E}$ behaves like a “covering space” to ${M_k}$ with a group ${\mathrm{Aut}(M_E/M_k)}$ of deck transformations isomorphic to ${\mathrm{Aut}(E/k)}$, so that the Galois correspondences agree? Also, given how the notion of a path (and associated concepts such as loops, monodromy and the fundamental group) play a prominent role in the theory of covering spaces, can spaces such as ${M_k}$ or ${M_E}$ also come with a notion of a path that is somehow compatible with the Galois correspondence?

The standard answer from modern algebraic geometry (as articulated for instance in this nice MathOverflow answer by Minhyong Kim) is to set ${M_E}$ equal to the spectrum ${\mathrm{Spec}(E)}$ of the field ${E}$. As a set, the spectrum ${\mathrm{Spec}(R)}$ of a commutative ring ${R}$ is defined as the set of prime ideals of ${R}$. Generally speaking, the map ${R \mapsto \mathrm{Spec}(R)}$ that maps a commutative ring to its spectrum tends to act like an inverse of the operation that maps a space ${X}$ to a ring of functions on that space. For instance, if one considers the commutative ring ${{\bf C}[z, z^{-1}]}$ of regular functions on ${M = {\bf C}^\times}$, then each point ${z_0}$ in ${M}$ gives rise to the prime ideal ${\{ f \in {\bf C}[z, z^{-1}]: f(z_0)=0\}}$, and one can check that these are the only such prime ideals (other than the zero ideal ${(0)}$), giving an almost one-to-one correspondence between ${\mathrm{Spec}( {\bf C}[z,z^{-1}] )}$ and ${M}$. (The zero ideal corresponds instead to the generic point of ${M}$.)

Of course, the spectrum of a field such as ${E}$ is just a point, as the zero ideal ${(0)}$ is the only prime ideal. Naively, it would then seem that there is not enough space inside such a point to support a rich enough structure of paths to recover the Galois theory of this field. In modern algebraic geometry, one addresses this issue by considering not just the set-theoretic elements of ${E}$, but more general “base points” ${p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)}$ that map from some other (affine) scheme ${\mathrm{Spec}(b)}$ to ${\mathrm{Spec}(E)}$ (one could also consider non-affine base points of course). One has to rework many of the fundamentals of the subject to accommodate this “relative point of view“, for instance replacing the usual notion of topology with an étale topology, but once one does so one obtains a very satisfactory theory.

As an exercise, I set myself the task of trying to interpret Galois theory as an analogue of covering space theory in a more classical fashion, without explicit reference to more modern concepts such as schemes, spectra, or étale topology. After some experimentation, I found a reasonably satisfactory way to do so as follows. The space ${M_E}$ that one associates with ${E}$ in this classical perspective is not the single point ${\mathrm{Spec}(E)}$, but instead the much larger space consisting of ring homomorphisms ${p: E \rightarrow b}$ from ${E}$ to arbitrary integral domains ${b}$; informally, ${M_E}$ consists of all the “models” or “representations” of ${E}$ (in the spirit of this previous blog post). (There is a technical set-theoretic issue here because the class of integral domains ${R}$ is a proper class, so that ${M_E}$ will also be a proper class; I will completely ignore such technicalities in this post.) We view each such homomorphism ${p: E \rightarrow b}$ as a single point in ${M_E}$. The analogous notion of a path from one point ${p: E \rightarrow b}$ to another ${p': E \rightarrow b'}$ is then a homomorphism ${\gamma: b \rightarrow b'}$ of integral domains, such that ${p'}$ is the composition of ${p}$ with ${\gamma}$. Note that every prime ideal ${I}$ in the spectrum ${\mathrm{Spec}(R)}$ of a commutative ring ${R}$ gives rise to a point ${p_I}$ in the space ${M_R}$ defined here, namely the quotient map ${p_I: R \rightarrow R/I}$ to the ring ${R/I}$, which is an integral domain because ${I}$ is prime. So one can think of ${\mathrm{Spec}(R)}$ as being a distinguished subset of ${M_R}$; alternatively, one can think of ${M_R}$ as a sort of “penumbra” surrounding ${\mathrm{Spec}(R)}$. In particular, when ${E}$ is a field, ${\mathrm{Spec}(E) = \{(0)\}}$ defines a special point ${p_R}$ in ${M_R}$, namely the identity homomorphism ${p_R: R \rightarrow R}$.

Below the fold I would like to record this interpretation of Galois theory, by first revisiting the theory of covering spaces using paths as the basic building block, and then adapting that theory to the theory of field extensions using the spaces indicated above. This is not too far from the usual scheme-theoretic way of phrasing the connection between the two topics (basically I have replaced étale-type points ${p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)}$ with more classical points ${p: E \rightarrow b}$), but I had not seen it explicitly articulated before, so I am recording it here for my own benefit and for any other readers who may be interested.

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown ${x}$ using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “(${A}$ AND ${B}$) OR (NOT ${C}$)”, where ${A,B,C}$ are some formulas). A typical such rule is Modus Ponens: if the sentence ${A}$ is known to be true, and the implication “${A}$ IMPLIES ${B}$” is also known to be true, then one can deduce that ${B}$ is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption ${A}$, one is able to derive the statement ${B}$, then one can conclude that the implication “${A}$ IMPLIES ${B}$” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Let ${(X,T,\mu)}$ be a measure-preserving system – a probability space ${(X,\mu)}$ equipped with a measure-preserving translation ${T}$ (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points ${x,y}$ in this space as being “close” if ${y = T^n x}$ for some ${n}$ that is not too large; this allows one to distinguish between “local” structure at a point ${x}$ (in which one only looks at nearby points ${T^n x}$ for moderately large ${n}$) and “global” structure (in which one looks at the entire space ${X}$). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function ${f: X \rightarrow {\bf R}}$ which is locally essentially constant in the sense that ${f(Tx) = f(x)}$ for ${\mu}$-almost every ${x}$, is necessarily globally essentially constant in the sense that there is a constant ${c}$ such that ${f(x) = c}$ for ${\mu}$-almost every ${x}$. A basic consequence of ergodicity is the mean ergodic theorem: if ${f \in L^2(X,\mu)}$, then the averages ${x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)}$ converge in ${L^2}$ norm to the mean ${\int_X f\ d\mu}$. (The mean ergodic theorem also applies to other ${L^p}$ spaces with ${1 < p < \infty}$, though it is usually proven first in the Hilbert space ${L^2}$.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that ${\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)}$ converges to ${\mu(E)^2}$ for any measurable set ${E}$.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let ${(X,T,\mu)}$ be an ergodic measure-preserving system, and let ${f: X \rightarrow {\bf R}}$ be measurable. Suppose that

$\displaystyle \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)$

for some ${0 \leq \delta \leq 1}$. Then there exists a constant ${c}$ such that ${f(x)=c}$ for ${x}$ in a set of measure at least ${\delta}$.

Informally: if ${f}$ is locally constant on pairs ${x, T^n x}$ at least ${\delta}$ of the time, then ${f}$ is globally constant at least ${\delta}$ of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take ${f}$ to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing ${f}$ with (say) the arctangent function, we may assume without loss of generality that ${f}$ is bounded. Let ${k>0}$, and partition ${X}$ as ${\bigcup_{m \in {\bf Z}} E_{m,k}}$, where ${E_{m,k}}$ is the level set

$\displaystyle E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.$

For each ${k}$, only finitely many of the ${E_{m,k}}$ are non-empty. By (1), one has

$\displaystyle \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.$

Using the ergodic theorem, we conclude that

$\displaystyle \sum_m \mu( E_{m,k} )^2 \geq \delta.$

On the other hand, ${\sum_m \mu(E_{m,k}) = 1}$. Thus there exists ${m_k}$ such that ${\mu(E_{m_k,k}) \geq \delta}$, thus

$\displaystyle \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.$

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where ${m_k 2^{-k}}$ converges to a limit ${c}$, then we have

$\displaystyle \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta$

for infinitely many ${k}$, and hence

$\displaystyle \mu( \{ x \in X: f(x) = c \}) \geq \delta.$

The claim follows. $\Box$

Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups (i.e., groups with an abelian addition group law). A map ${f: G \rightarrow H}$ is a homomorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = 0$

for all ${x,y \in G}$. A map ${f: G \rightarrow H}$ is an affine homomorphism if one has

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = 0 \ \ \ \ \ (1)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$, by which we mean that ${x_1,x_2,x_3,x_4 \in G}$ and ${x_1-x_2+x_3-x_4=0}$. The two notions are closely related; it is easy to verify that ${f}$ is an affine homomorphism if and only if ${f}$ is the sum of a homomorphism and a constant.

Now suppose that ${H}$ also has a translation-invariant metric ${d}$. A map ${f: G \rightarrow H}$ is said to be a quasimorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = O(1) \ \ \ \ \ (2)$

for all ${x,y \in G}$, where ${O(1)}$ denotes a quantity at a bounded distance from the origin. Similarly, ${f: G \rightarrow H}$ is an affine quasimorphism if

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = O(1) \ \ \ \ \ (3)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$. Again, one can check that ${f}$ is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism ${f: {\bf Z} \rightarrow {\bf R}}$. Iterating (2), we see that ${f(kx) = kf(x) + O(k)}$ for any integer ${x}$ and natural number ${k}$, which we can rewrite as ${f(kx)/kx = f(x)/x + O(1/|x|)}$ for non-zero ${x}$. Also, ${f}$ is Lipschitz. Sending ${k \rightarrow \infty}$, we can verify that ${f(x)/x}$ is a Cauchy sequence as ${x \rightarrow \infty}$ and thus tends to some limit ${\alpha}$; we have ${\alpha = f(x)/x + O(1/x)}$ for ${x \geq 1}$, hence ${f(x) = \alpha x + O(1)}$ for positive ${x}$, and then one can use (2) one last time to obtain ${f(x) = \alpha x + O(1)}$ for all ${x}$. Thus ${f}$ is the sum of the homomorphism ${x \mapsto \alpha x}$ and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map ${f: G \rightarrow H}$ a ${0}$-cocycle. A ${1}$-cocycle is a map ${\rho: G \times G \rightarrow H}$ obeying the identity

$\displaystyle \rho(x,y+z) + \rho(y,z) = \rho(x,y) + \rho(x+y,z)$

for all ${x,y,z \in G}$. Given a ${0}$-cocycle ${f: G \rightarrow H}$, one can form its derivative ${\partial f: G \times G \rightarrow H}$ by the formula

$\displaystyle \partial f(x,y) := f(x+y)-f(x)-f(y).$

Such functions are called ${1}$-coboundaries. It is easy to see that the abelian group of ${1}$-coboundaries is a subgroup of the abelian group of ${1}$-cocycles. The quotient of these two groups is the first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1(G; H)}$.

If a ${0}$-cocycle is bounded then its derivative is a bounded ${1}$-coboundary. The quotient of the group of bounded ${1}$-cocycles by the derivatives of bounded ${0}$-cocycles is called the bounded first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1_b(G; H)}$. There is an obvious homomorphism ${\phi}$ from ${H^1_b(G; H)}$ to ${H^1(G; H)}$, formed by taking a coset of the space of derivatives of bounded ${0}$-cocycles, and enlarging it to a coset of the space of ${1}$-coboundaries. By chasing all the definitions, we see that all quasimorphism from ${G}$ to ${H}$ are the sum of a homomorphism and a bounded function if and only if this homomorphism ${\phi}$ is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of ${\phi}$.

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “${1\%}$ of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of ${1\%}$-structure to ${100\%}$-structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1 Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups with ${|G|=N}$, let ${S}$ be a subset of ${H}$, let ${E \subset G}$, and let ${f: E \rightarrow H}$ be a function such that

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) \in S$

for ${\geq K^{-1} N^3}$ additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${E}$. Then there exists a subset ${A}$ of ${G}$ containing ${0}$ with ${|A| \gg K^{-O(1)} N}$, a subset ${X}$ of ${H}$ with ${|X| \ll K^{O(1)}}$, and a function ${g: 4A-4A \rightarrow H}$ such that

$\displaystyle g(x+y) - g(x)-g(y) \in X + 496S - 496S \ \ \ \ \ (4)$

for all ${x, y \in 2A-2A}$ (thus, the derivative ${\partial g}$ takes values in ${X + 496 S - 496 S}$ on ${2A - 2A}$), and such that for each ${h \in A}$, one has

$\displaystyle f(x+h) - f(x) - g(h) \in 8S - 8S \ \ \ \ \ (5)$

for ${\gg K^{-O(1)} N}$ values of ${x \in E}$.

Presumably the constants ${8}$ and ${496}$ can be improved further, but we have not attempted to optimise these constants. We chose ${2A-2A}$ as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside ${2A-2A}$. In applications, the set ${S}$ need not have bounded size, or even bounded doubling; for instance, in the inverse ${U^4}$ theory over a small finite fields ${F}$, one would be interested in the situation where ${H}$ is the group of ${n \times n}$ matrices with coefficients in ${F}$ (for some large ${n}$, and ${S}$ being the subset consisting of those matrices of rank bounded by some bound ${C = O(1)}$.

Proof: By hypothesis, there are ${\geq K N^3}$ triples ${(h,x,y) \in G^3}$ such that ${x,x+h,y,y+h \in E}$ and

$\displaystyle f(x+h) - f(x) \in f(y+h)-f(y) + S. \ \ \ \ \ (6)$

Thus, there is a set ${B \subset G}$ with ${|B| \gg K^{-1} N}$ such that for all ${h \in B}$, one has (6) for ${\gg K^{-1} N^2}$ pairs ${(x,y) \in G^2}$ with ${x,x+h,y,y+h \in E}$; in particular, there exists ${y = y(h) \in E \cap (E-h)}$ such that (6) holds for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$. Setting ${g_0(h) := f(y(h)+h) - f(y(h))}$, we conclude that for each ${h \in B}$, one has

$\displaystyle f(x+h) - f(x) \in g_0(h) + S \ \ \ \ \ (7)$

for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$.

Consider the bipartite graph whose vertex sets are two copies of ${E}$, and ${x}$ and ${x+h}$ connected by a (directed) edge if ${h \in B}$ and (7) holds. Then this graph has ${\gg K^{-2} N^2}$ edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset ${C}$ of ${E}$ with ${|C| \gg K^{-O(1)} N}$ with the property that for any ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^3}$ triples ${(x_2,y_1,y_2) \in E^3}$ such that the edges ${(x_1,y_1), (x_2,y_1), (x_2,y_2), (x_3,y_2)}$ all lie in this bipartite graph. This implies that, for all ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^7}$ septuples ${(x_2,y_1,y_2,z_{11},z_{21},z_{22},z_{32}) \in G^7}$ obeying the constraints

$\displaystyle f(y_j) - f(x_i), f(y_j+z_{ij}) - f(x_i+z_{ij}) \in g_0(y_j-x_i) + S$

and ${y_j, x_i, y_j+z_{ij}, x_i+z_{ij} \in E}$ for ${ij = 11, 21, 22, 32}$. These constraints imply in particular that

$\displaystyle f(x_3) - f(x_1) \in f(x_3+z_{32}) - f(y_2+z_{32}) + f(y_2+z_{22}) - f(x_2+z_{22}) + f(x_2+z_{21}) - f(y_1+z_{21}) + f(y_1+z_{11}) - f(x_1+z_{11}) + 4S - 4S.$

Also observe that

$\displaystyle x_3 - x_1 = (x_3+z_{32}) - (y_2+z_{32}) + (y_2+z_{22}) - (x_2+z_{22}) + (x_2+z_{21}) - (y_1+z_{21}) + (y_1+z_{11}) - (x_1+z_{11}).$

Thus, if ${h \in G}$ and ${x_3,x_1 \in C}$ are such that ${x_3-x_1 = h}$, we see that

$\displaystyle f(w_1) - f(w_2) + f(w_3) - f(w_4) + f(w_5) - f(w_6) + f(w_7) - f(w_8) \in f(x_3) - f(x_1) + 4S - 4S$

for ${\gg K^{-O(1)} N^7}$ octuples ${(w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8) \in E^8}$ in the hyperplane

$\displaystyle h = w_1 - w_2 + w_3 - w_4 + w_5 - w_6 + w_7 - w_8.$

By the pigeonhole principle, this implies that for any fixed ${h \in G}$, there can be at most ${O(K^{O(1)})}$ sets of the form ${f(x_3)-f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set ${W_h}$ of cardinality ${O(K^{O(1)})}$, such that each set ${f(x_3) - f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ intersects ${w+4S -4S}$ for some ${w \in W_h}$, or in other words that

$\displaystyle f(x_3) - f(x_1) \in W_{x_3-x_1} + 8S-8S \ \ \ \ \ (8)$

whenever ${x_1,x_3 \in C}$. In particular,

$\displaystyle \sum_{h \in G} \sum_{w \in W_h} | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in w + 8S-8S \}| \geq |C|^2 \gg K^{-O(1)} N^2.$

This implies that there exists a subset ${A}$ of ${G}$ with ${|A| \gg K^{-O(1)} N}$, and an element ${g_1(h) \in W_h}$ for each ${h \in A}$, such that

$\displaystyle | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in g_1(h) + 8S-8S \}| \gg K^{-O(1)} N \ \ \ \ \ (9)$

for all ${h \in A}$. Note we may assume without loss of generality that ${0 \in A}$ and ${g_1(0)=0}$.

Suppose that ${h_1,\dots,h_{16} \in A}$ are such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} h_i = 0. \ \ \ \ \ (10)$

By construction of ${A}$, and permuting labels, we can find ${\gg K^{-O(1)} N^{16}}$ 16-tuples ${(x_1,\dots,x_{16},y_1,\dots,y_{16}) \in C^{32}}$ such that

$\displaystyle y_i - x_i = (-1)^{i-1} h_i$

and

$\displaystyle f(y_i) - f(x_i) \in (-1)^{i-1} g_i(h) + 8S - 8S$

for ${i=1,\dots,16}$. We sum this to obtain

$\displaystyle f(y_1) + \sum_{i=1}^{15} f(y_{i+1})-f(x_i) - f(x_8) \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 128 S - 128 S$

and hence by (8)

$\displaystyle f(y_1) - f(x_{16}) + \sum_{i=1}^{15} W_{k_i} \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 248 S - 248 S$

where ${k_i := y_{i+1}-x_i}$. Since

$\displaystyle y_1 - x_{16} + \sum_{i=1}^{15} k_i = 0$

we see that there are only ${N^{16}}$ possible values of ${(y_1,x_{16},k_1,\dots,k_{15})}$. By the pigeonhole principle, we conclude that at most ${O(K^{O(1)})}$ of the sets ${\sum_{i=1}^{16} (-1)^i g_1(h_i) + 248 S - 248 S}$ can be disjoint. Arguing as before, we conclude that there exists a set ${X}$ of cardinality ${O(K^{O(1)})}$ such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) \in X + 496 S - 496 S \ \ \ \ \ (11)$

whenever (10) holds.

For any ${h \in 4A-4A}$, write ${h}$ arbitrarily as ${h = \sum_{i=1}^8 (-1)^{i-1} h_i}$ for some ${h_1,\dots,h_8 \in A}$ (with ${h_5=\dots=h_8=0}$ if ${h \in 2A-2A}$, and ${h_2 = \dots = h_8 = 0}$ if ${h \in A}$) and then set

$\displaystyle g(h) := \sum_{i=1}^8 (-1)^i g_1(h_i).$

Then from (11) we have (4). For ${h \in A}$ we have ${g(h) = g_1(h)}$, and (5) then follows from (9). $\Box$

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

$\displaystyle \partial_t P(t,z) = \partial_{zz} P(t,z)$

on the zeroes of a time-dependent family of polynomials ${z \mapsto P(t,z)}$, with a particular focus on the case when the polynomials ${z \mapsto P(t,z)}$ had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle ${\{ z: |z| = \sqrt{q} \}}$, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when ${P}$ is the numerator of the zeta function of a curve.

More precisely, let ${g}$ be a natural number. We will say that a polynomial

$\displaystyle P(z) = \sum_{j=0}^{2g} a_j z^j$

of degree ${2g}$ (so that ${a_{2g} \neq 0}$) obeys the functional equation if the ${a_j}$ are all real and

$\displaystyle a_j = q^{g-j} a_{2g-j}$

for all ${j=0,\dots,2g}$, thus

$\displaystyle P(\overline{z}) = \overline{P(z)}$

and

$\displaystyle P(q/z) = q^g z^{-2g} P(z)$

for all non-zero ${z}$. This means that the ${2g}$ zeroes ${\alpha_1,\dots,\alpha_{2g}}$ of ${P(z)}$ (counting multiplicity) lie in ${{\bf C} \backslash \{0\}}$ and are symmetric with respect to complex conjugation ${z \mapsto \overline{z}}$ and inversion ${z \mapsto q/z}$ across the circle ${\{ |z| = \sqrt{q}\}}$. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle ${\{ z = \sqrt{q}\}}$. For instance, in the ${g=1}$ case, the polynomial ${z^2 - a_1 z + q}$ obeys the Riemann hypothesis if and only if ${|a_1| \leq 2\sqrt{q}}$.

Such polynomials arise in number theory as follows: if ${C}$ is a projective curve of genus ${g}$ over a finite field ${\mathbf{F}_q}$, then, as famously proven by Weil, the associated local zeta function ${\zeta_{C,q}(z)}$ (as defined for instance in this previous blog post) is known to take the form

$\displaystyle \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}$

where ${P}$ is a degree ${2g}$ polynomial obeying both the functional equation and the Riemann hypothesis. In the case that ${C}$ is an elliptic curve, then ${g=1}$ and ${P}$ takes the form ${P(z) = z^2 - a_1 z + q}$, where ${a_1}$ is the number of ${{\bf F}_q}$-points of ${C}$ minus ${q+1}$. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

$\displaystyle P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)$

of ${2g \times 2g}$ matrices ${F}$ in the compact symplectic group ${Sp(g)}$. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials ${P}$ arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where ${F}$ is drawn uniformly from ${Sp(g)}$ with Haar measure.

Given a polynomial ${z \mapsto P(0,z)}$ of degree ${2g}$ with coefficients

$\displaystyle P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,$

we can evolve it in time by the formula

$\displaystyle P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,$

thus ${a_j(t) = \exp(t(j-g)) a_j(0)}$ for ${t \in {\bf R}}$. Informally, as one increases ${t}$, this evolution accentuates the effect of the extreme monomials, particularly, ${z^0}$ and ${z^{2g}}$ at the expense of the intermediate monomials such as ${z^g}$, and conversely as one decreases ${t}$. This family of polynomials obeys the heat-type equation

$\displaystyle \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)$

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group ${Sp(n)}$, and should also be tied to some sort of “${\beta=\infty}$” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if ${z \mapsto P(0,z)}$ obeys the functional equation, then so does ${z \mapsto P(t,z)}$ for any other time ${t}$. Now we investigate the evolution of the zeroes. Suppose at some time ${t_0}$ that the zeroes ${\alpha_1(t_0),\dots,\alpha_{2g}(t_0)}$ of ${z \mapsto P(t_0,z)}$ are distinct, then

$\displaystyle P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).$

From the inverse function theorem we see that for times ${t}$ sufficiently close to ${t_0}$, the zeroes ${\alpha_1(t),\dots,\alpha_{2g}(t)}$ of ${z \mapsto P(t,z)}$ continue to be distinct (and vary smoothly in ${t}$), with

$\displaystyle P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).$

Differentiating this at any ${z}$ not equal to any of the ${\alpha_j(t)}$, we obtain

$\displaystyle \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})$

and

$\displaystyle \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})$

and

$\displaystyle \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).$

Inserting these formulae into (2) (expanding ${(z \partial_z - g)^2}$ as ${z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}$) and canceling some terms, we conclude that

$\displaystyle - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}$

$\displaystyle - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}$

for ${t}$ sufficiently close to ${t_0}$, and ${z}$ not equal to ${\alpha_1(t),\dots,\alpha_{2g}(t)}$. Extracting the residue at ${z = \alpha_j(t)}$, we conclude that

$\displaystyle - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)$

which we can rearrange as

$\displaystyle \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.$

If we make the change of variables ${\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}}$ (noting that one can make ${\theta_j}$ depend smoothly on ${t}$ for ${t}$ sufficiently close to ${t_0}$), this becomes

$\displaystyle \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)$

Intuitively, this equation asserts that the phases ${\theta_j}$ repel each other if they are real (and attract each other if their difference is imaginary). If ${z \mapsto P(t_0,z)}$ obeys the Riemann hypothesis, then the ${\theta_j}$ are all real at time ${t_0}$, then the Picard uniqueness theorem (applied to ${\theta_j(t)}$ and its complex conjugate) then shows that the ${\theta_j}$ are also real for ${t}$ sufficiently close to ${t_0}$. If we then define the entropy functional

$\displaystyle H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }$

then the above equation becomes a gradient flow

$\displaystyle \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )$

which implies in particular that ${H(\theta_1(t),\dots,\theta_{2g}(t))}$ is non-increasing in time. This shows that as one evolves time forward from ${t_0}$, there is a uniform lower bound on the separation between the phases ${\theta_1(t),\dots,\theta_{2g}(t)}$, and hence the equation can be solved indefinitely; in particular, ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for all ${t > t_0}$ if it does so at time ${t_0}$. Our argument here assumed that the zeroes of ${z \mapsto P(t_0,z)}$ were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial ${z \mapsto P(0,z)}$ obeying the functional equation, the rescaled polynomials ${z \mapsto e^{-g^2 t} P(t,z)}$ converge locally uniformly to ${a_{2g}(0) (z^{2g} + q^g)}$ as ${t \rightarrow +\infty}$. By Rouche’s theorem, we conclude that the zeroes of ${z \mapsto P(t,z)}$ converge to the equally spaced points ${\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}}$ on the circle ${\{ |z| = \sqrt{q}\}}$. Together with the symmetry properties of the zeroes, this implies in particular that ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for all sufficiently large positive ${t}$. In the opposite direction, when ${t \rightarrow -\infty}$, the polynomials ${z \mapsto P(t,z)}$ converge locally uniformly to ${a_g(0) z^g}$, so if ${a_g(0) \neq 0}$, ${g}$ of the zeroes converge to the origin and the other ${g}$ converge to infinity. In particular, ${z \mapsto P(t,z)}$ fails the Riemann hypothesis for sufficiently large negative ${t}$. Thus (if ${a_g(0) \neq 0}$), there must exist a real number ${\Lambda}$, which we call the de Bruijn-Newman constant of the original polynomial ${z \mapsto P(0,z)}$, such that ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for ${t \geq \Lambda}$ and fails the Riemann hypothesis for ${t < \Lambda}$. The situation is a bit more complicated if ${a_g(0)}$ vanishes; if ${k}$ is the first natural number such that ${a_{g+k}(0)}$ (or equivalently, ${a_{g-j}(0)}$) does not vanish, then by the above arguments one finds in the limit ${t \rightarrow -\infty}$ that ${g-k}$ of the zeroes go to the origin, ${g-k}$ go to infinity, and the remaining ${2k}$ zeroes converge to the equally spaced points ${\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}$. In this case the de Bruijn-Newman constant remains finite except in the degenerate case ${k=g}$, in which case ${\Lambda = -\infty}$.

For instance, consider the case when ${g=1}$ and ${P(0,z) = z^2 - a_1 z + q}$ for some real ${a_1}$ with ${|a_1| \leq 2\sqrt{q}}$. Then the quadratic polynomial

$\displaystyle P(t,z) = e^t z^2 - a_1 z + e^t q$

has zeroes

$\displaystyle \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}$

and one easily checks that these zeroes lie on the circle ${\{ |z|=\sqrt{q}\}}$ when ${t \geq \log \frac{|a_1|}{2\sqrt{q}}}$, and are on the real axis otherwise. Thus in this case we have ${\Lambda = \log \frac{|a_1|}{2\sqrt{q}}}$ (with ${\Lambda=-\infty}$ if ${a_1=0}$). Note how as ${t}$ increases to ${+\infty}$, the zeroes repel each other and eventually converge to ${\pm i \sqrt{q}}$, while as ${t}$ decreases to ${-\infty}$, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial ${P}$ of degree ${g}$ that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to ${1/g}$, basically because the average spacing is ${1/g}$ and hence by (3) the typical velocity of the zeroes should be comparable to ${g}$, and the diameter of the unit circle is comparable to ${1}$, thus requiring time comparable to ${1/g}$ to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant ${\Lambda}$ should typically take on values comparable to ${-1/g}$ (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of ${P}$ given previously) to explore this further.

In this post we assume the Riemann hypothesis and the simplicity of zeroes, thus the zeroes of ${\zeta}$ in the critical strip take the form ${\frac{1}{2} \pm i \gamma_j}$ for some real number ordinates ${0 < \gamma_1 < \gamma_2 < \dots}$. From the Riemann-von Mangoldt formula, one has the asymptotic

$\displaystyle \gamma_n = (1+o(1)) \frac{2\pi}{\log n} n$

as ${n \rightarrow \infty}$; in particular, the spacing ${\gamma_{n+1} - \gamma_n}$ should behave like ${\frac{2\pi}{\log n}}$ on the average. However, it can happen that some gaps are unusually small compared to other nearby gaps. For the sake of concreteness, let us define a Lehmer pair to be a pair of adjacent ordinates ${\gamma_n, \gamma_{n+1}}$ such that

$\displaystyle \frac{1}{(\gamma_{n+1} - \gamma_n)^2} \geq 1.3 \sum_{m \neq n,n+1} \frac{1}{(\gamma_m - \gamma_n)^2} + \frac{1}{(\gamma_m - \gamma_{n+1})^2}. \ \ \ \ \ (1)$

The specific value of constant ${1.3}$ is not particularly important here; anything larger than ${\frac{5}{4}}$ would suffice. An example of such a pair would be the classical pair

$\displaystyle \gamma_{6709} = 7005.062866\dots$

$\displaystyle \gamma_{6710} = 7005.100564\dots$

discovered by Lehmer. It follows easily from the main results of Csordas, Smith, and Varga that if an infinite number of Lehmer pairs (in the above sense) existed, then the de Bruijn-Newman constant ${\Lambda}$ is non-negative. This implication is now redundant in view of the unconditional results of this recent paper of Rodgers and myself; however, the question of whether an infinite number of Lehmer pairs exist remain open.

In this post, I sketch an argument that Brad and I came up with (as initially suggested by Odlyzko) the GUE hypothesis implies the existence of infinitely many Lehmer pairs. We argue probabilistically: pick a sufficiently large number ${T}$, pick ${n}$ at random from ${T \log T}$ to ${2 T \log T}$ (so that the average gap size is close to ${\frac{2\pi}{\log T}}$), and prove that the Lehmer pair condition (1) occurs with positive probability.

Introduce the renormalised ordinates ${x_n := \frac{\log T}{2\pi} \gamma_n}$ for ${T \log T \leq n \leq 2 T \log T}$, and let ${\varepsilon > 0}$ be a small absolute constant (independent of ${T}$). It will then suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2} \geq$

$\displaystyle 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2}$

$\displaystyle + \frac{1}{6\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$, since the contribution of those ${m}$ outside of ${[T \log T, 2T \log T]}$ can be absorbed by the ${\frac{1}{\varepsilon^2}}$ factor with probability ${o(1)}$.

As one consequence of the GUE hypothesis, we have ${x_{n+1} - x_n \leq \varepsilon^2}$ with probability ${O(\varepsilon^6)}$. Thus, if ${E := \{ m \in [T \log T, 2T \log T]: x_{m+1} - x_m \leq \varepsilon^2 \}}$, then ${E}$ has density ${O( \varepsilon^6 )}$. Applying the Hardy-Littlewood maximal inequality, we see that with probability ${O(\varepsilon^6)}$, we have

$\displaystyle \sup_{h \geq 1} | \# E \cap [n+h, n-h] | \leq \frac{1}{10}$

which implies in particular that

$\displaystyle |x_m - x_n|, |x_{m} - x_{n+1}| \gg \varepsilon^2 |m-n|$

for all ${m \in [T \log T, 2 T \log T] \backslash \{ n, n+1\}}$. This implies in particular that

$\displaystyle \sum_{m \in [T \log T, 2T \log T]: |m-n| \geq \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} \ll \varepsilon^{-1}$

and so it will suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2}$

$\displaystyle \geq 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1; |m-n| < \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} + \frac{1}{5\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$.

By the GUE hypothesis (and the fact that ${\varepsilon}$ is independent of ${T}$), it suffices to show that a Dyson sine process ${(x_n)_{n \in {\bf Z}}}$, normalised so that ${x_0}$ is the first positive point in the process, obeys the inequality

$\displaystyle \frac{1}{(x_{1} - x_0)^2} \geq 1.3 \sum_{|m| < \varepsilon^{-3}: m \neq 0,1} \frac{1}{(x_m - x_0)^2} + \frac{1}{(x_m - x_1)^2} \ \ \ \ \ (2)$

with probability ${\gg \varepsilon^4}$. However, if we let ${A > 0}$ be a moderately large constant (and assume ${\varepsilon}$ small depending on ${A}$), one can show using ${k}$-point correlation functions for the Dyson sine process (and the fact that the Dyson kernel ${K(x,y) = \sin(\pi(x-y))/\pi(x-y)}$ equals ${1}$ to second order at the origin) that

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} \gg \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} \ll \varepsilon^7$

$\displaystyle {\bf E} \binom{N_{[-\varepsilon,0]}}{2} N_{[0,\varepsilon]} \ll \varepsilon^7$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[\varepsilon,A^{-1}]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-A^{-1}, -\varepsilon]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-k, k]}^2 \ll k^2 \varepsilon^4 \ \ \ \ \ (3)$

for any natural number ${k}$, where ${N_{I}}$ denotes the number of elements of the process in ${I}$. For instance, the expression ${{\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} }$ can be written in terms of the three-point correlation function ${\rho_3(x_1,x_2,x_3) = \mathrm{det}(K(x_i,x_j))_{1 \leq i,j \leq 3}}$ as

$\displaystyle \int_{-\varepsilon \leq x_1 \leq 0 \leq x_2 \leq x_3 \leq \varepsilon} \rho_3( x_1, x_2, x_3 )\ dx_1 dx_2 dx_3$

which can easily be estimated to be ${O(\varepsilon^7)}$ (since ${\rho_3 = O(\varepsilon^4)}$ in this region), and similarly for the other estimates claimed above.

Since for natural numbers ${a,b}$, the quantity ${ab - 2 a \binom{b}{2} - 2 b \binom{a}{2} = ab (5-2a-2b)}$ is only positive when ${a=b=1}$, we see from the first three estimates that the event ${E}$ that ${N_{[-\varepsilon,0]} = N_{[0,\varepsilon]} = 1}$ occurs with probability ${\gg \varepsilon^4}$. In particular, by Markov’s inequality we have the conditional probabilities

$\displaystyle {\bf P} ( N_{[\varepsilon,A^{-1}]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-A^{-1}, -\varepsilon]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-k, k]} \geq A k^{5/3} | E ) \ll A^{-4} k^{-4/3}$

and thus, if ${A}$ is large enough, and ${\varepsilon}$ small enough, it will be true with probability ${\gg \varepsilon^4}$ that

$\displaystyle N_{[-\varepsilon,0]}, N_{[0,\varepsilon]} = 1$

and

$\displaystyle N_{[A^{-1}, \varepsilon]} = N_{[\varepsilon, A^{-1}]} = 0$

and simultaneously that

$\displaystyle N_{[-k,k]} \leq A k^{5/3}$

for all natural numbers ${k}$. This implies in particular that

$\displaystyle x_1 - x_0 \leq 2\varepsilon$

and

$\displaystyle |x_m - x_0|, |x_m - x_1| \gg_A |m|^{3/5}$

for all ${m \neq 0,1}$, which gives (2) for ${\varepsilon}$ small enough.

Remark 1 The above argument needed the GUE hypothesis for correlations up to fourth order (in order to establish (3)). It might be possible to reduce the number of correlations needed, but I do not see how to obtain the claim just using pair correlations only.

The Boussinesq equations for inviscid, incompressible two-dimensional fluid flow in the presence of gravity are given by

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_x = -\partial_x p \ \ \ \ \ (1)$

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_y = \rho - \partial_y p \ \ \ \ \ (2)$

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) \rho = 0 \ \ \ \ \ (3)$

$\displaystyle \partial_x u_x + \partial_y u_y = 0 \ \ \ \ \ (4)$

where ${u: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}^2}$ is the velocity field, ${p: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}}$ is the pressure field, and ${\rho: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}}$ is the density field (or, in some physical interpretations, the temperature field). In this post we shall restrict ourselves to formal manipulations, assuming implicitly that all fields are regular enough (or sufficiently decaying at spatial infinity) that the manipulations are justified. Using the material derivative ${D_t := \partial_t + u_x \partial_x + u_y \partial_y}$, one can abbreviate these equations as

$\displaystyle D_t u_x = -\partial_x p$

$\displaystyle D_t u_y = \rho - \partial_y p$

$\displaystyle D_t \rho = 0$

$\displaystyle \partial_x u_x + \partial_y u_y = 0.$

One can eliminate the role of the pressure ${p}$ by working with the vorticity ${\omega := \partial_x u_y - \partial_y u_x}$. A standard calculation then leads us to the equivalent “vorticity-stream” formulation

$\displaystyle D_t \omega = \partial_x \rho$

$\displaystyle D_t \rho = 0$

$\displaystyle \omega = \partial_x u_y - \partial_y u_x$

$\displaystyle \partial_x u_y + \partial_y u_y = 0$

of the Boussinesq equations. The latter two equations can be used to recover the velocity field ${u}$ from the vorticity ${\omega}$ by the Biot-Savart law

$\displaystyle u_x := -\partial_y \Delta^{-1} \omega; \quad u_y = \partial_x \Delta^{-1} \omega.$

It has long been observed (see e.g. Section 5.4.1 of Bertozzi-Majda) that the Boussinesq equations are very similar, though not quite identical, to the three-dimensional inviscid incompressible Euler equations under the hypothesis of axial symmetry (with swirl). The Euler equations are

$\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p$

$\displaystyle \nabla \cdot u = 0$

where now the velocity field ${u: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}^3}$ and pressure field ${p: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}}$ are over the three-dimensional domain ${{\bf R}^3}$. If one expresses ${{\bf R}^3}$ in polar coordinates ${(z,r,\theta)}$ then one can write the velocity vector field ${u}$ in these coordinates as

$\displaystyle u = u^z \frac{d}{dz} + u^r \frac{d}{dr} + u^\theta \frac{d}{d\theta}.$

If we make the axial symmetry assumption that these components, as well as ${p}$, do not depend on the ${\theta}$ variable, thus

$\displaystyle \partial_\theta u^z, \partial_\theta u^r, \partial_\theta u^\theta, \partial_\theta p = 0,$

then after some calculation (which we give below the fold) one can eventually reduce the Euler equations to the system

$\displaystyle \tilde D_t \omega = \frac{1}{r^4} \partial_z \rho \ \ \ \ \ (5)$

$\displaystyle \tilde D_t \rho = 0 \ \ \ \ \ (6)$

$\displaystyle \omega = \frac{1}{r} (\partial_z u^r - \partial_r u^z) \ \ \ \ \ (7)$

$\displaystyle \partial_z(ru^z) + \partial_r(ru^r) = 0 \ \ \ \ \ (8)$

where ${\tilde D_t := \partial_t + u^z \partial_z + u^r \partial_r}$ is the modified material derivative, and ${\rho}$ is the field ${\rho := (r u^\theta)^2}$. This is almost identical with the Boussinesq equations except for some additional powers of ${r}$; thus, the intuition is that the Boussinesq equations are a simplified model for axially symmetric Euler flows when one stays away from the axis ${r=0}$ and also does not wander off to ${r=\infty}$.

However, this heuristic is not rigorous; the above calculations do not actually give an embedding of the Boussinesq equations into Euler. (The equations do match on the cylinder ${r=1}$, but this is a measure zero subset of the domain, and so is not enough to give an embedding on any non-trivial region of space.) Recently, while playing around with trying to embed other equations into the Euler equations, I discovered that it is possible to make such an embedding into a four-dimensional Euler equation, albeit on a slightly curved manifold rather than in Euclidean space. More precisely, we use the Ebin-Marsden generalisation

$\displaystyle \partial_t u + \nabla_u u = - \mathrm{grad}_g p$

$\displaystyle \mathrm{div}_g u = 0$

of the Euler equations to an arbitrary Riemannian manifold ${(M,g)}$ (ignoring any issues of boundary conditions for this discussion), where ${u: {\bf R} \rightarrow \Gamma(TM)}$ is a time-dependent vector field, ${p: {\bf R} \rightarrow C^\infty(M)}$ is a time-dependent scalar field, and ${\nabla_u}$ is the covariant derivative along ${u}$ using the Levi-Civita connection ${\nabla}$. In Penrose abstract index notation (using the Levi-Civita connection ${\nabla}$, and raising and lowering indices using the metric ${g = g_{ij}}$), the equations of motion become

$\displaystyle \partial_t u^i + u^j \nabla_j u^i = - \nabla^i p \ \ \ \ \ (9)$

$\displaystyle \nabla_i u^i = 0;$

in coordinates, this becomes

$\displaystyle \partial_t u^i + u^j (\partial_j u^i + \Gamma^i_{jk} u^k) = - g^{ij} \partial_j p$

$\displaystyle \partial_i u^i + \Gamma^i_{ik} u^k = 0 \ \ \ \ \ (10)$

where the Christoffel symbols ${\Gamma^i_{jk}}$ are given by the formula

$\displaystyle \Gamma^i_{jk} := \frac{1}{2} g^{il} (\partial_j g_{lk} + \partial_k g_{lj} - \partial_l g_{jk}),$

where ${g^{il}}$ is the inverse to the metric tensor ${g_{il}}$. If the coordinates are chosen so that the volume form ${dg}$ is the Euclidean volume form ${dx}$, thus ${\mathrm{det}(g)=1}$, then on differentiating we have ${g^{ij} \partial_k g_{ij} = 0}$, and hence ${\Gamma^i_{ik} = 0}$, and so the divergence-free equation (10) simplifies in this case to ${\partial_i u^i = 0}$. The Ebin-Marsden Euler equations are the natural generalisation of the Euler equations to arbitrary manifolds; for instance, they (formally) conserve the kinetic energy

$\displaystyle \frac{1}{2} \int_M |u|_g^2\ dg = \frac{1}{2} \int_M g_{ij} u^i u^j\ dg$

and can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on ${M}$ (see this previous post for a discussion of this in the flat space case).

The specific four-dimensional manifold in question is the space ${{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z} \times {\bf R}/{\bf Z}}$ with metric

$\displaystyle dx^2 + dy^2 + y^{-1} dz^2 + y dw^2$

and solutions to the Boussinesq equation on ${{\bf R} \times {\bf R}^+}$ can be transformed into solutions to the Euler equations on this manifold. This is part of a more general family of embeddings into the Euler equations in which passive scalar fields (such as the field ${\rho}$ appearing in the Boussinesq equations) can be incorporated into the dynamics via fluctuations in the Riemannian metric ${g}$). I am writing the details below the fold (partly for my own benefit).

A basic object of study in multiplicative number theory are the arithmetic functions: functions ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers to the complex numbers. Some fundamental examples of such functions include

• The constant function ${1: n \mapsto 1}$;
• The Kronecker delta function ${\delta: n \mapsto 1_{n=1}}$;
• The natural logarithm function ${L: n \mapsto \log n}$;
• The divisor function ${d_2: n \mapsto \sum_{d|n} 1}$;
• The von Mangoldt function ${\Lambda}$, with ${\Lambda(n)}$ defined to equal ${\log p}$ when ${n}$ is a power ${p^j}$ of a prime ${p}$ for some ${j \geq 1}$, and defined to equal zero otherwise; and
• The Möbius function ${\mu}$, with ${\mu(n)}$ defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ distinct primes, and defined to equal zero otherwise.

Given an arithmetic function ${f}$, we are often interested in statistics such as the summatory function

$\displaystyle \sum_{n \leq x} f(n), \ \ \ \ \ (1)$

the logarithmically (or harmonically) weighted summatory function

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n}, \ \ \ \ \ (2)$

or the Dirichlet series

$\displaystyle {\mathcal D}[f](s) := \sum_n \frac{f(n)}{n^s}.$

In the latter case, one typically has to first restrict ${s}$ to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as ${\sum_{n \leq x} f(n) f(n+h)}$, but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions ${f,g: {\bf N} \rightarrow {\bf C}}$, forms a new arithmetic function ${f*g: {\bf N} \rightarrow {\bf C}}$, defined by the formula

$\displaystyle f*g(n) := \sum_{d|n} f(d) g(\frac{n}{d}).$

Thus for instance ${1*1 = d_2}$, ${1 * \Lambda = L}$, ${1 * \mu = \delta}$, and ${\delta * f = f}$ for any arithmetic function ${f}$. Dirichlet convolution and Dirichlet series are related by the fundamental formula

$\displaystyle {\mathcal D}[f * g](s) = {\mathcal D}[f](s) {\mathcal D}[g](s), \ \ \ \ \ (3)$

at least when the real part of ${s}$ is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

$\displaystyle {\mathcal D}[Lf](s) = - \frac{d}{ds} {\mathcal D}[f](s), \ \ \ \ \ (4)$

at least when the real part of ${s}$ is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function ${\zeta = {\mathcal D}[1]}$, thus for instance

$\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s)$

$\displaystyle {\mathcal D}[L](s) = - \zeta'(s)$

$\displaystyle {\mathcal D}[\delta](s) = 1$

$\displaystyle {\mathcal D}[\mu](s) = \frac{1}{\zeta(s)}$

$\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)}.$

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers ${{\bf N}}$, which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval ${{\bf N}_\infty := [1,+\infty)}$, which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of ${{\bf N}}$ at the infinite place ${\infty}$, hence the terminology.) Accordingly, let us define a continuous arithmetic function to be a locally integrable function ${f: {\bf N}_\infty \rightarrow {\bf C}}$. The analogue of the summatory function (1) is then an integral

$\displaystyle \int_1^x f(t)\ dt,$

and similarly the analogue of (2) is

$\displaystyle \int_1^x \frac{f(t)}{t}\ dt.$

The analogue of the Dirichlet series is the Mellin-type transform

$\displaystyle {\mathcal D}_\infty[f](s) := \int_1^\infty \frac{f(t)}{t^s}\ dt,$

which will be well-defined at least if the real part of ${s}$ is large enough and if the continuous arithmetic function ${f: {\bf N}_\infty \rightarrow {\bf C}}$ does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function ${1: {\bf N} \rightarrow {\bf C}}$ would be the constant function ${1_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, which maps any ${t \in [1,+\infty)}$ to ${1}$, and which we will denote by ${1_\infty}$ in order to keep it distinct from ${1}$. The two functions ${1_\infty}$ and ${1}$ have approximately similar statistics; for instance one has

$\displaystyle \sum_{n \leq x} 1 = \lfloor x \rfloor \approx x-1 = \int_1^x 1\ dt$

and

$\displaystyle \sum_{n \leq x} \frac{1}{n} = H_{\lfloor x \rfloor} \approx \log x = \int_1^x \frac{1}{t}\ dt$

where ${H_n}$ is the ${n^{th}}$ harmonic number, and we are deliberately vague as to what the symbol ${\approx}$ means. Continuing this analogy, we would expect

$\displaystyle {\mathcal D}[1](s) = \zeta(s) \approx \frac{1}{s-1} = {\mathcal D}_\infty[1_\infty](s)$

which reflects the fact that ${\zeta}$ has a simple pole at ${s=1}$ with residue ${1}$, and no other poles. Note that the identity ${{\mathcal D}_\infty[1_\infty](s) = \frac{1}{s-1}}$ is initially only valid in the region ${\mathrm{Re} s > 1}$, but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at ${1}$, and so one can define ${{\mathcal D}_\infty[1_\infty]}$ in this region also.

In a similar vein, the logarithm function ${L: {\bf N} \rightarrow {\bf C}}$ is approximately similar to the logarithm function ${L_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, giving for instance the crude form

$\displaystyle \sum_{n \leq x} L(n) = \log \lfloor x \rfloor! \approx x \log x - x = \int_1^\infty L_\infty(t)\ dt$

of Stirling’s formula, or the Dirichlet series approximation

$\displaystyle {\mathcal D}[L](s) = -\zeta'(s) \approx \frac{1}{(s-1)^2} = {\mathcal D}_\infty[L_\infty](s).$

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure ${\frac{dt}{t}}$: given two continuous arithmetic functions ${f_\infty, g_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, one can define their convolution ${f_\infty *_\infty g_\infty: {\bf N}_\infty \rightarrow {\bf C}}$ by the formula

$\displaystyle f_\infty *_\infty g_\infty(t) := \int_1^t f_\infty(s) g_\infty(\frac{t}{s}) \frac{ds}{s}.$

Thus for instance ${1_\infty * 1_\infty = L_\infty}$. A short computation using Fubini’s theorem shows the analogue

$\displaystyle D_\infty[f_\infty *_\infty g_\infty](s) = D_\infty[f_\infty](s) D_\infty[g_\infty](s)$

of (3) whenever the real part of ${s}$ is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

$\displaystyle D_\infty[L_\infty f_\infty](s) = -\frac{d}{ds} D_\infty[f_\infty](s) \ \ \ \ \ (5)$

again assuming that the real part of ${s}$ is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number ${\rho}$, one has

$\displaystyle \frac{1}{s-\rho} = D_\infty[ t \mapsto t^{\rho-1} ](s)$

(at least for the real part of ${s}$ large enough), and hence by several applications of (5)

$\displaystyle \frac{1}{(s-\rho)^k} = D_\infty[ t \mapsto \frac{1}{(k-1)!} t^{\rho-1} \log^{k-1} t ](s)$

for any natural number ${k}$. This can lead to the following heuristic: if a Dirichlet series ${D[f](s)}$ behaves like a linear combination of poles ${\frac{1}{(s-\rho)^k}}$, in that

$\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{(s-\rho)^{k_\rho}}$

for some set ${\rho}$ of poles and some coefficients ${c_\rho}$ and natural numbers ${k_\rho}$ (where we again are vague as to what ${\approx}$ means, and how to interpret the sum ${\sum_\rho}$ if the set of poles is infinite), then one should expect the arithmetic function ${f}$ to behave like the continuous arithmetic function

$\displaystyle t \mapsto \sum_\rho \frac{c_\rho}{(k_\rho-1)!} t^{\rho-1} \log^{k_\rho-1} t.$

In particular, if we only have simple poles,

$\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{s-\rho}$

then we expect to have ${f}$ behave like continuous arithmetic function

$\displaystyle t \mapsto \sum_\rho c_\rho t^{\rho-1}.$

Integrating this from ${1}$ to ${x}$, this heuristically suggests an approximation

$\displaystyle \sum_{n \leq x} f(n) \approx \sum_\rho c_\rho \frac{x^\rho-1}{\rho}$

for the summatory function, and similarly

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n} \approx \sum_\rho c_\rho \frac{x^{\rho-1}-1}{\rho-1},$

with the convention that ${\frac{x^\rho-1}{\rho}}$ is ${\log x}$ when ${\rho=0}$, and similarly ${\frac{x^{\rho-1}-1}{\rho-1}}$ is ${\log x}$ when ${\rho=1}$. One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

$\displaystyle \zeta(s) \approx \frac{1}{s-1} + \gamma$

to the zeta function near ${s=1}$, we have

$\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s) \approx \frac{1}{(s-1)^2} + \frac{2 \gamma}{s-1}$

we would expect that

$\displaystyle d_2 \approx L_\infty + 2 \gamma$

and thus for instance

$\displaystyle \sum_{n \leq x} d_2(n) \approx x \log x - x + 2 \gamma x$

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that ${\zeta(s)}$ has a simple pole at ${s=1}$ and assuming simple zeroes elsewhere, the log derivative ${-\zeta'(s)/\zeta(s)}$ will have simple poles of residue ${+1}$ at ${s=1}$ and ${-1}$ at all the zeroes, leading to the heuristic

$\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)} \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho}$

suggesting that ${\Lambda}$ should behave like the continuous arithmetic function

$\displaystyle t \mapsto 1 - \sum_\rho t^{\rho-1}$

leading for instance to the summatory approximation

$\displaystyle \sum_{n \leq x} \Lambda(n) \approx x - \sum_\rho \frac{x^\rho-1}{\rho}$

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1 Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also ${p}$-adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as ${\sum_{n \leq x: n = a \hbox{ mod } p^j} f(n)}$. A key problem here is that there does not seem to be any good interpretation of the expression ${\frac{1}{t^s}}$ when ${s}$ is complex and ${t}$ is a ${p}$-adic number, so it is not clear that one can analyse a Dirichlet series ${p}$-adically. For similar reasons, we don’t have a canonical way to define ${\chi(t)}$ for a Dirichlet character ${\chi}$ (unless its conductor happens to be a power of ${p}$), so there doesn’t seem to be much to say in the ${q}$-aspect either.

Let ${\lambda: {\bf N} \rightarrow \{-1,1\}}$ be the Liouville function, thus ${\lambda(n)}$ is defined to equal ${+1}$ when ${n}$ is the product of an even number of primes, and ${-1}$ when ${n}$ is the product of an odd number of primes. The Chowla conjecture asserts that ${\lambda}$ has the statistics of a random sign pattern, in the sense that

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (1)$

for all ${k \geq 1}$ and all distinct natural numbers ${h_1,\dots,h_k}$, where we use the averaging notation

$\displaystyle \mathbb{E}_{n \leq N} f(n) := \frac{1}{N} \sum_{n \leq N} f(n).$

For ${k=1}$, this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any ${k \geq 2}$.

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (2)$

of the conjecture, where we use the logarithmic averaging notation

$\displaystyle \mathbb{E}_{n \leq N}^{\log} f(n) := \frac{\sum_{n \leq N} \frac{f(n)}{n}}{\sum_{n \leq N} \frac{1}{n}}.$

Using the summation by parts (or telescoping series) identity

$\displaystyle \sum_{n \leq N} \frac{f(n)}{n} = \sum_{M < N} \frac{1}{M(M+1)} (\sum_{n \leq M} f(n)) + \frac{1}{N} \sum_{n \leq N} f(n) \ \ \ \ \ (3)$

it is not difficult to show that the Chowla conjecture (1) for a given ${k,h_1,\dots,h_k}$ implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for ${k=1}$, we have already mentioned that the Chowla conjecture

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0$

is equivalent to the prime number theorem; but the logarithmically averaged analogue

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}^{\log}_{n \leq N} \lambda(n) = 0$

is significantly easier to show (a proof with the Liouville function ${\lambda}$ replaced by the closely related Möbius function ${\mu}$ is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for ${k=2}$, and in this recent paper with Joni Teravainen, we proved the conjecture for all odd ${k}$ (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1 Assume that the logarithmically averaged Chowla conjecture (2) is true for all ${k}$. Then there exists a sequence ${N_i}$ going to infinity such that the Chowla conjecture (1) is true for all ${k}$ along that sequence, that is to say

$\displaystyle \lim_{N_i \rightarrow \infty} \mathbb{E}_{n \leq N_i} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

for all ${k}$ and all distinct ${h_1,\dots,h_k}$.

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2 Let ${k}$ be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for ${2k}$. Then there exists a set ${{\mathcal N}}$ of natural numbers of logarithmic density ${1}$ (that is, ${\lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}} = 1}$) such that

$\displaystyle \lim_{N \rightarrow \infty: N \in {\mathcal N}} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

for any distinct ${h_1,\dots,h_k}$.

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture (${k=2}$ and odd ${k}$) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct ${h_1,\dots,h_k}$, we take a large number ${H}$ and consider the limiting the second moment

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2.$

We can expand this as

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{m,m' \leq H} \mathop{\bf E}_{n \leq N}^{\log} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)$

$\displaystyle \lambda(n+m'+h_1) \dots \lambda(n+m'+h_k).$

If all the ${m+h_1,\dots,m+h_k,m'+h_1,\dots,m'+h_k}$ are distinct, the hypothesis (2) tells us that the inner averages goes to zero as ${N \rightarrow \infty}$. The remaining averages are ${O(1)}$, and there are ${O( k^2 )}$ of these averages. We conclude that

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k^2 / H.$

By Markov’s inequality (and (3)), we conclude that for any fixed ${h_1,\dots,h_k, H}$, there exists a set ${{\mathcal N}_{h_1,\dots,h_k,H}}$ of upper logarithmic density at least ${1-k/H^{1/2}}$, thus

$\displaystyle \limsup_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}_{h_1,\dots,h_k,H}} \geq 1 - k/H^{1/2}$

such that

$\displaystyle \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.$

By deleting at most finitely many elements, we may assume that ${{\mathcal N}_{h_1,\dots,h_k,H}}$ consists only of elements of size at least ${H^2}$ (say).

For any ${H_0}$, if we let ${{\mathcal N}_{h_1,\dots,h_k, \geq H_0}}$ be the union of ${{\mathcal N}_{h_1,\dots,h_k, H}}$ for ${H \geq H_0}$, then ${{\mathcal N}_{h_1,\dots,h_k, \geq H_0}}$ has logarithmic density ${1}$. By a diagonalisation argument (using the fact that the set of tuples ${(h_1,\dots,h_k)}$ is countable), we can then find a set ${{\mathcal N}}$ of natural numbers of logarithmic density ${1}$, such that for every ${h_1,\dots,h_k,H_0}$, every sufficiently large element of ${{\mathcal N}}$ lies in ${{\mathcal N}_{h_1,\dots,h_k,\geq H_0}}$. Thus for every sufficiently large ${N}$ in ${{\mathcal N}}$, one has

$\displaystyle \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.$

for some ${H \geq H_0}$ with ${N \geq H^2}$. By Cauchy-Schwarz, this implies that

$\displaystyle \mathop{\bf E}_{n \leq N} \mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k) \ll k^{1/2} / H^{1/4};$

interchanging the sums and using ${N \geq H^2}$ and ${H \geq H_0}$, this implies that

$\displaystyle \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) \ll k^{1/2} / H^{1/4} \leq k^{1/2} / H_0^{1/4}.$

We conclude on taking ${H_0}$ to infinity that

$\displaystyle \lim_{N \rightarrow \infty; N \in {\mathcal N}} \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

as required.