You are currently browsing the category archive for the ‘Mathematics’ category.

As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic.  The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item).  So it will likely not change much going forward.

I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.).  But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate.  I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so.   The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently.

[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text.  For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not.  This is presumably not an example of good programming practice, though it would be relatively easy to fix.  More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text.  It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]

Every four years at the International Congress of Mathematicians (ICM), the Fields Medal laureates are announced.     Today, at the 2018 ICM in Rio de Janeiro, it was announced that the Fields Medal was awarded to Caucher Birkar, Alessio Figalli, Peter Scholze, and Akshay Venkatesh.

After the two previous congresses in 2010 and 2014, I wrote blog posts describing some of the work of each of the winners.  This time, though, I happened to be a member of the Fields Medal selection committee, and as such had access to a large number of confidential letters and discussions about the candidates with the other committee members; in order to have the opinions and discussion as candid as possible, it was explicitly understood that these communications would not be publicly disclosed.  Because of this, I will unfortunately not be able to express much of a comment or opinion on the candidates or the process as an individual (as opposed to a joint statement of the committee).  I can refer you instead to the formal citations of the laureates (which, as a committee member, I was involved in crafting, and then signing off on), or the profiles of the laureates by Quanta magazine; see also the short biographical videos of the laureates by the Simons Foundation that accompanied the formal announcements of the winners. I am sure, though, that there will be plenty of other mathematicians who will be able to present the work of each of the medalists (for instance, there was a laudatio given at the ICM for each of the winners, which should eventually be made available at this link).

I know that there is a substantial amount of interest in finding out more about the inner workings of the Fields Medal selection process.  For the reasons stated above, I as an individual will unfortunately be unable to answer any questions about this process (e.g., I cannot reveal any information about other nominees, or of any comparisons between any two candidates or nominees).  I think I can safely express the following two personal opinions though.  Firstly, while I have served on many prize committees in the past, the process for the Fields Medal committee was by far the most thorough and deliberate of any I have been part of, and I for one learned an astonishing amount about the mathematical work of all of the shortlisted nominees, which was an absolutely essential component of the deliberations, in particular giving the discussions a context which would have been very difficult to obtain for an individual mathematician not in possession of all the confidential letters, presentations, and other information available to the committee (in particular, some of my preconceived impressions about the nominees going into the process had to be corrected in light of this more complete information).  Secondly, I believe the four medalists are all extremely deserving recipients of the prize, and I fully stand by the decision of the committee to award the Fields medals this year to these four.

I’ll leave the comments to this post open for anyone who wishes to discuss the work of the medalists.  But, for the reasons above, I will not participate in the discussion myself.

[Edit, Aug 1: looks like the ICM site is (barely) up and running now, so links have been added.  At this time of writing, there does not seem to be an online announcement of the composition of the committee, but this should appear in due course. -T.]

[Edit, Aug 9: the composition of the Fields Medal Committee for 2018 (which included myself) can be found here. -T.]

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown ${x}$ using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “(${A}$ AND ${B}$) OR (NOT ${C}$)”, where ${A,B,C}$ are some formulas). A typical such rule is Modus Ponens: if the sentence ${A}$ is known to be true, and the implication “${A}$ IMPLIES ${B}$” is also known to be true, then one can deduce that ${B}$ is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption ${A}$, one is able to derive the statement ${B}$, then one can conclude that the implication “${A}$ IMPLIES ${B}$” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Let ${(X,T,\mu)}$ be a measure-preserving system – a probability space ${(X,\mu)}$ equipped with a measure-preserving translation ${T}$ (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points ${x,y}$ in this space as being “close” if ${y = T^n x}$ for some ${n}$ that is not too large; this allows one to distinguish between “local” structure at a point ${x}$ (in which one only looks at nearby points ${T^n x}$ for moderately large ${n}$) and “global” structure (in which one looks at the entire space ${X}$). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function ${f: X \rightarrow {\bf R}}$ which is locally essentially constant in the sense that ${f(Tx) = f(x)}$ for ${\mu}$-almost every ${x}$, is necessarily globally essentially constant in the sense that there is a constant ${c}$ such that ${f(x) = c}$ for ${\mu}$-almost every ${x}$. A basic consequence of ergodicity is the mean ergodic theorem: if ${f \in L^2(X,\mu)}$, then the averages ${x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)}$ converge in ${L^2}$ norm to the mean ${\int_X f\ d\mu}$. (The mean ergodic theorem also applies to other ${L^p}$ spaces with ${1 < p < \infty}$, though it is usually proven first in the Hilbert space ${L^2}$.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that ${\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)}$ converges to ${\mu(E)^2}$ for any measurable set ${E}$.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let ${(X,T,\mu)}$ be an ergodic measure-preserving system, and let ${f: X \rightarrow {\bf R}}$ be measurable. Suppose that

$\displaystyle \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)$

for some ${0 \leq \delta \leq 1}$. Then there exists a constant ${c}$ such that ${f(x)=c}$ for ${x}$ in a set of measure at least ${\delta}$.

Informally: if ${f}$ is locally constant on pairs ${x, T^n x}$ at least ${\delta}$ of the time, then ${f}$ is globally constant at least ${\delta}$ of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take ${f}$ to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing ${f}$ with (say) the tangent function, we may assume without loss of generality that ${f}$ is bounded. Let ${k>0}$, and partition ${X}$ as ${\bigcup_{m \in {\bf Z}} E_{m,k}}$, where ${E_{m,k}}$ is the level set

$\displaystyle E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.$

For each ${k}$, only finitely many of the ${E_{m,k}}$ are non-empty. By (1), one has

$\displaystyle \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.$

Using the ergodic theorem, we conclude that

$\displaystyle \sum_m \mu( E_{m,k} )^2 \geq \delta.$

On the other hand, ${\sum_m \mu(E_{m,k}) = 1}$. Thus there exists ${m_k}$ such that ${\mu(E_{m_k,k}) \geq \delta}$, thus

$\displaystyle \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.$

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where ${m_k 2^{-k}}$ converges to a limit ${c}$, then we have

$\displaystyle \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta$

for infinitely many ${k}$, and hence

$\displaystyle \mu( \{ x \in X: f(x) = c \}) \geq \delta.$

The claim follows. $\Box$

Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups (i.e., groups with an abelian addition group law). A map ${f: G \rightarrow H}$ is a homomorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = 0$

for all ${x,y \in G}$. A map ${f: G \rightarrow H}$ is an affine homomorphism if one has

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = 0 \ \ \ \ \ (1)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$, by which we mean that ${x_1,x_2,x_3,x_4 \in G}$ and ${x_1-x_2+x_3-x_4=0}$. The two notions are closely related; it is easy to verify that ${f}$ is an affine homomorphism if and only if ${f}$ is the sum of a homomorphism and a constant.

Now suppose that ${H}$ also has a translation-invariant metric ${d}$. A map ${f: G \rightarrow H}$ is said to be a quasimorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = O(1) \ \ \ \ \ (2)$

for all ${x,y \in G}$, where ${O(1)}$ denotes a quantity at a bounded distance from the origin. Similarly, ${f: G \rightarrow H}$ is an affine quasimorphism if

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = O(1) \ \ \ \ \ (3)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$. Again, one can check that ${f}$ is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism ${f: {\bf Z} \rightarrow {\bf R}}$. Iterating (2), we see that ${f(kx) = kf(x) + O(k)}$ for any integer ${x}$ and natural number ${k}$, which we can rewrite as ${f(kx)/kx = f(x)/x + O(1/|x|)}$ for non-zero ${x}$. Also, ${f}$ is Lipschitz. Sending ${k \rightarrow \infty}$, we can verify that ${f(x)/x}$ is a Cauchy sequence as ${x \rightarrow \infty}$ and thus tends to some limit ${\alpha}$; we have ${\alpha = f(x)/x + O(1/x)}$ for ${x \geq 1}$, hence ${f(x) = \alpha x + O(1)}$ for positive ${x}$, and then one can use (2) one last time to obtain ${f(x) = \alpha x + O(1)}$ for all ${x}$. Thus ${f}$ is the sum of the homomorphism ${x \mapsto \alpha x}$ and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map ${f: G \rightarrow H}$ a ${0}$-cocycle. A ${1}$-cocycle is a map ${\rho: G \times G \rightarrow H}$ obeying the identity

$\displaystyle \rho(x,y+z) + \rho(y,z) = \rho(x,y) + \rho(x+y,z)$

for all ${x,y,z \in G}$. Given a ${0}$-cocycle ${f: G \rightarrow H}$, one can form its derivative ${\partial f: G \times G \rightarrow H}$ by the formula

$\displaystyle \partial f(x,y) := f(x+y)-f(x)-f(y).$

Such functions are called ${1}$-coboundaries. It is easy to see that the abelian group of ${1}$-coboundaries is a subgroup of the abelian group of ${1}$-cocycles. The quotient of these two groups is the first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1(G; H)}$.

If a ${0}$-cocycle is bounded then its derivative is a bounded ${1}$-coboundary. The quotient of the group of bounded ${1}$-cocycles by the derivatives of bounded ${0}$-cocycles is called the bounded first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1_b(G; H)}$. There is an obvious homomorphism ${\phi}$ from ${H^1_b(G; H)}$ to ${H^1(G; H)}$, formed by taking a coset of the space of derivatives of bounded ${0}$-cocycles, and enlarging it to a coset of the space of ${1}$-coboundaries. By chasing all the definitions, we see that all quasimorphism from ${G}$ to ${H}$ are the sum of a homomorphism and a bounded function if and only if this homomorphism ${\phi}$ is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of ${\phi}$.

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “${1\%}$ of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of ${1\%}$-structure to ${100\%}$-structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1 Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups with ${|G|=N}$, let ${S}$ be a subset of ${H}$, let ${E \subset G}$, and let ${f: E \rightarrow H}$ be a function such that

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) \in S$

for ${\geq K^{-1} N^3}$ additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${E}$. Then there exists a subset ${A}$ of ${G}$ containing ${0}$ with ${|A| \gg K^{-O(1)} N}$, a subset ${X}$ of ${H}$ with ${|X| \ll K^{O(1)}}$, and a function ${g: 4A-4A \rightarrow H}$ such that

$\displaystyle g(x+y) - g(x)-g(y) \in X + 496S - 496S \ \ \ \ \ (4)$

for all ${x, y \in 2A-2A}$ (thus, the derivative ${\partial g}$ takes values in ${X + 496 S - 496 S}$ on ${2A - 2A}$), and such that for each ${h \in A}$, one has

$\displaystyle f(x+h) - f(x) - g(h) \in 8S - 8S \ \ \ \ \ (5)$

for ${\gg K^{-O(1)} N}$ values of ${x \in E}$.

Presumably the constants ${8}$ and ${496}$ can be improved further, but we have not attempted to optimise these constants. We chose ${2A-2A}$ as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside ${2A-2A}$. In applications, the set ${S}$ need not have bounded size, or even bounded doubling; for instance, in the inverse ${U^4}$ theory over a small finite fields ${F}$, one would be interested in the situation where ${H}$ is the group of ${n \times n}$ matrices with coefficients in ${F}$ (for some large ${n}$, and ${S}$ being the subset consisting of those matrices of rank bounded by some bound ${C = O(1)}$.

Proof: By hypothesis, there are ${\geq K N^3}$ triples ${(h,x,y) \in G^3}$ such that ${x,x+h,y,y+h \in E}$ and

$\displaystyle f(x+h) - f(x) \in f(y+h)-f(y) + S. \ \ \ \ \ (6)$

Thus, there is a set ${B \subset G}$ with ${|B| \gg K^{-1} N}$ such that for all ${h \in B}$, one has (6) for ${\gg K^{-1} N^2}$ pairs ${(x,y) \in G^2}$ with ${x,x+h,y,y+h \in E}$; in particular, there exists ${y = y(h) \in E \cap (E-h)}$ such that (6) holds for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$. Setting ${g_0(h) := f(y(h)+h) - f(y(h))}$, we conclude that for each ${h \in B}$, one has

$\displaystyle f(x+h) - f(x) \in g_0(h) + S \ \ \ \ \ (7)$

for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$.

Consider the bipartite graph whose vertex sets are two copies of ${E}$, and ${x}$ and ${x+h}$ connected by a (directed) edge if ${h \in B}$ and (7) holds. Then this graph has ${\gg K^{-2} N^2}$ edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset ${C}$ of ${E}$ with ${|C| \gg K^{-O(1)} N}$ with the property that for any ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^3}$ triples ${(x_2,y_1,y_2) \in E^3}$ such that the edges ${(x_1,y_1), (x_2,y_1), (x_2,y_2), (x_3,y_2)}$ all lie in this bipartite graph. This implies that, for all ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^7}$ septuples ${(x_2,y_1,y_2,z_{11},z_{21},z_{22},z_{32}) \in G^7}$ obeying the constraints

$\displaystyle f(y_j) - f(x_i), f(y_j+z_{ij}) - f(x_i+z_{ij}) \in g_0(y_j-x_i) + S$

and ${y_j, x_i, y_j+z_{ij}, x_i+z_{ij} \in E}$ for ${ij = 11, 21, 22, 32}$. These constraints imply in particular that

$\displaystyle f(x_3) - f(x_1) \in f(x_3+z_{32}) - f(y_2+z_{32}) + f(y_2+z_{22}) - f(x_2+z_{22}) + f(x_2+z_{21}) - f(y_1+z_{21}) + f(y_1+z_{11}) - f(x_1+z_{11}) + 4S - 4S.$

Also observe that

$\displaystyle x_3 - x_1 = (x_3+z_{32}) - (y_2+z_{32}) + (y_2+z_{22}) - (x_2+z_{22}) + (x_2+z_{21}) - (y_1+z_{21}) + (y_1+z_{11}) - (x_1+z_{11}).$

Thus, if ${h \in G}$ and ${x_3,x_1 \in C}$ are such that ${x_3-x_1 = h}$, we see that

$\displaystyle f(w_1) - f(w_2) + f(w_3) - f(w_4) + f(w_5) - f(w_6) + f(w_7) - f(w_8) \in f(x_3) - f(x_1) + 4S - 4S$

for ${\gg K^{-O(1)} N^7}$ octuples ${(w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8) \in E^8}$ in the hyperplane

$\displaystyle h = w_1 - w_2 + w_3 - w_4 + w_5 - w_6 + w_7 - w_8.$

By the pigeonhole principle, this implies that for any fixed ${h \in G}$, there can be at most ${O(K^{O(1)})}$ sets of the form ${f(x_3)-f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set ${W_h}$ of cardinality ${O(K^{O(1)})}$, such that each set ${f(x_3) - f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ intersects ${w+4S -4S}$ for some ${w \in W_h}$, or in other words that

$\displaystyle f(x_3) - f(x_1) \in W_{x_3-x_1} + 8S-8S \ \ \ \ \ (8)$

whenever ${x_1,x_3 \in C}$. In particular,

$\displaystyle \sum_{h \in G} \sum_{w \in W_h} | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in w + 8S-8S \}| \geq |C|^2 \gg K^{-O(1)} N^2.$

This implies that there exists a subset ${A}$ of ${G}$ with ${|A| \gg K^{-O(1)} N}$, and an element ${g_1(h) \in W_h}$ for each ${h \in A}$, such that

$\displaystyle | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in g_1(h) + 8S-8S \}| \gg K^{-O(1)} N \ \ \ \ \ (9)$

for all ${h \in A}$. Note we may assume without loss of generality that ${0 \in A}$ and ${g_1(0)=0}$.

Suppose that ${h_1,\dots,h_{16} \in A}$ are such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} h_i = 0. \ \ \ \ \ (10)$

By construction of ${A}$, and permuting labels, we can find ${\gg K^{-O(1)} N^{16}}$ 16-tuples ${(x_1,\dots,x_{16},y_1,\dots,y_{16}) \in C^{32}}$ such that

$\displaystyle y_i - x_i = (-1)^{i-1} h_i$

and

$\displaystyle f(y_i) - f(x_i) \in (-1)^{i-1} g_i(h) + 8S - 8S$

for ${i=1,\dots,16}$. We sum this to obtain

$\displaystyle f(y_1) + \sum_{i=1}^{15} f(y_{i+1})-f(x_i) - f(x_8) \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 128 S - 128 S$

and hence by (8)

$\displaystyle f(y_1) - f(x_{16}) + \sum_{i=1}^{15} W_{k_i} \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 248 S - 248 S$

where ${k_i := y_{i+1}-x_i}$. Since

$\displaystyle y_1 - x_{16} + \sum_{i=1}^{15} k_i = 0$

we see that there are only ${N^{16}}$ possible values of ${(y_1,x_{16},k_1,\dots,k_{15})}$. By the pigeonhole principle, we conclude that at most ${O(K^{O(1)})}$ of the sets ${\sum_{i=1}^{16} (-1)^i g_1(h_i) + 248 S - 248 S}$ can be disjoint. Arguing as before, we conclude that there exists a set ${X}$ of cardinality ${O(K^{O(1)})}$ such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) \in X + 496 S - 496 S \ \ \ \ \ (11)$

whenever (10) holds.

For any ${h \in 4A-4A}$, write ${h}$ arbitrarily as ${h = \sum_{i=1}^8 (-1)^{i-1} h_i}$ for some ${h_1,\dots,h_8 \in A}$ (with ${h_5=\dots=h_8=0}$ if ${h \in 2A-2A}$, and ${h_2 = \dots = h_8 = 0}$ if ${h \in A}$) and then set

$\displaystyle g(h) := \sum_{i=1}^8 (-1)^i g_1(h_i).$

Then from (11) we have (4). For ${h \in A}$ we have ${g(h) = g_1(h)}$, and (5) then follows from (9). $\Box$

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

$\displaystyle \partial_t P(t,z) = \partial_{zz} P(t,z)$

on the zeroes of a time-dependent family of polynomials ${z \mapsto P(t,z)}$, with a particular focus on the case when the polynomials ${z \mapsto P(t,z)}$ had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle ${\{ z: |z| = \sqrt{q} \}}$, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when ${P}$ is the numerator of the zeta function of a curve.

More precisely, let ${g}$ be a natural number. We will say that a polynomial

$\displaystyle P(z) = \sum_{j=0}^{2g} a_j z^j$

of degree ${2g}$ (so that ${a_{2g} \neq 0}$) obeys the functional equation if the ${a_j}$ are all real and

$\displaystyle a_j = q^{g-j} a_{2g-j}$

for all ${j=0,\dots,2g}$, thus

$\displaystyle P(\overline{z}) = \overline{P(z)}$

and

$\displaystyle P(q/z) = q^g z^{-2g} P(z)$

for all non-zero ${z}$. This means that the ${2g}$ zeroes ${\alpha_1,\dots,\alpha_{2g}}$ of ${P(z)}$ (counting multiplicity) lie in ${{\bf C} \backslash \{0\}}$ and are symmetric with respect to complex conjugation ${z \mapsto \overline{z}}$ and inversion ${z \mapsto q/z}$ across the circle ${\{ |z| = \sqrt{q}\}}$. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle ${\{ z = \sqrt{q}\}}$. For instance, in the ${g=1}$ case, the polynomial ${z^2 - a_1 z + q}$ obeys the Riemann hypothesis if and only if ${|a_1| \leq 2\sqrt{q}}$.

Such polynomials arise in number theory as follows: if ${C}$ is a projective curve of genus ${g}$ over a finite field ${\mathbf{F}_q}$, then, as famously proven by Weil, the associated local zeta function ${\zeta_{C,q}(z)}$ (as defined for instance in this previous blog post) is known to take the form

$\displaystyle \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}$

where ${P}$ is a degree ${2g}$ polynomial obeying both the functional equation and the Riemann hypothesis. In the case that ${C}$ is an elliptic curve, then ${g=1}$ and ${P}$ takes the form ${P(z) = z^2 - a_1 z + q}$, where ${a_1}$ is the number of ${{\bf F}_q}$-points of ${C}$ minus ${q+1}$. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

$\displaystyle P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)$

of ${2g \times 2g}$ matrices ${F}$ in the compact symplectic group ${Sp(g)}$. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials ${P}$ arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where ${F}$ is drawn uniformly from ${Sp(g)}$ with Haar measure.

Given a polynomial ${z \mapsto P(0,z)}$ of degree ${2g}$ with coefficients

$\displaystyle P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,$

we can evolve it in time by the formula

$\displaystyle P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,$

thus ${a_j(t) = \exp(t(j-g)) a_j(0)}$ for ${t \in {\bf R}}$. Informally, as one increases ${t}$, this evolution accentuates the effect of the extreme monomials, particularly, ${z^0}$ and ${z^{2g}}$ at the expense of the intermediate monomials such as ${z^g}$, and conversely as one decreases ${t}$. This family of polynomials obeys the heat-type equation

$\displaystyle \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)$

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group ${Sp(n)}$, and should also be tied to some sort of “${\beta=\infty}$” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if ${z \mapsto P(0,z)}$ obeys the functional equation, then so does ${z \mapsto P(t,z)}$ for any other time ${t}$. Now we investigate the evolution of the zeroes. Suppose at some time ${t_0}$ that the zeroes ${\alpha_1(t_0),\dots,\alpha_{2g}(t_0)}$ of ${z \mapsto P(t_0,z)}$ are distinct, then

$\displaystyle P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).$

From the inverse function theorem we see that for times ${t}$ sufficiently close to ${t_0}$, the zeroes ${\alpha_1(t),\dots,\alpha_{2g}(t)}$ of ${z \mapsto P(t,z)}$ continue to be distinct (and vary smoothly in ${t}$), with

$\displaystyle P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).$

Differentiating this at any ${z}$ not equal to any of the ${\alpha_j(t)}$, we obtain

$\displaystyle \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})$

and

$\displaystyle \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})$

and

$\displaystyle \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).$

Inserting these formulae into (2) (expanding ${(z \partial_z - g)^2}$ as ${z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}$) and canceling some terms, we conclude that

$\displaystyle - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}$

$\displaystyle - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}$

for ${t}$ sufficiently close to ${t_0}$, and ${z}$ not equal to ${\alpha_1(t),\dots,\alpha_{2g}(t)}$. Extracting the residue at ${z = \alpha_j(t)}$, we conclude that

$\displaystyle - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)$

which we can rearrange as

$\displaystyle \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.$

If we make the change of variables ${\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}}$ (noting that one can make ${\theta_j}$ depend smoothly on ${t}$ for ${t}$ sufficiently close to ${t_0}$), this becomes

$\displaystyle \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)$

Intuitively, this equation asserts that the phases ${\theta_j}$ repel each other if they are real (and attract each other if their difference is imaginary). If ${z \mapsto P(t_0,z)}$ obeys the Riemann hypothesis, then the ${\theta_j}$ are all real at time ${t_0}$, then the Picard uniqueness theorem (applied to ${\theta_j(t)}$ and its complex conjugate) then shows that the ${\theta_j}$ are also real for ${t}$ sufficiently close to ${t_0}$. If we then define the entropy functional

$\displaystyle H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }$

then the above equation becomes a gradient flow

$\displaystyle \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )$

which implies in particular that ${H(\theta_1(t),\dots,\theta_{2g}(t))}$ is non-increasing in time. This shows that as one evolves time forward from ${t_0}$, there is a uniform lower bound on the separation between the phases ${\theta_1(t),\dots,\theta_{2g}(t)}$, and hence the equation can be solved indefinitely; in particular, ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for all ${t > t_0}$ if it does so at time ${t_0}$. Our argument here assumed that the zeroes of ${z \mapsto P(t_0,z)}$ were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial ${z \mapsto P(0,z)}$ obeying the functional equation, the rescaled polynomials ${z \mapsto e^{-g^2 t} P(t,z)}$ converge locally uniformly to ${a_{2g}(0) (z^{2g} + q^g)}$ as ${t \rightarrow +\infty}$. By Rouche’s theorem, we conclude that the zeroes of ${z \mapsto P(t,z)}$ converge to the equally spaced points ${\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}}$ on the circle ${\{ |z| = \sqrt{q}\}}$. Together with the symmetry properties of the zeroes, this implies in particular that ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for all sufficiently large positive ${t}$. In the opposite direction, when ${t \rightarrow -\infty}$, the polynomials ${z \mapsto P(t,z)}$ converge locally uniformly to ${a_g(0) z^g}$, so if ${a_g(0) \neq 0}$, ${g}$ of the zeroes converge to the origin and the other ${g}$ converge to infinity. In particular, ${z \mapsto P(t,z)}$ fails the Riemann hypothesis for sufficiently large negative ${t}$. Thus (if ${a_g(0) \neq 0}$), there must exist a real number ${\Lambda}$, which we call the de Bruijn-Newman constant of the original polynomial ${z \mapsto P(0,z)}$, such that ${z \mapsto P(t,z)}$ obeys the Riemann hypothesis for ${t \geq \Lambda}$ and fails the Riemann hypothesis for ${t < \Lambda}$. The situation is a bit more complicated if ${a_g(0)}$ vanishes; if ${k}$ is the first natural number such that ${a_{g+k}(0)}$ (or equivalently, ${a_{g-j}(0)}$) does not vanish, then by the above arguments one finds in the limit ${t \rightarrow -\infty}$ that ${g-k}$ of the zeroes go to the origin, ${g-k}$ go to infinity, and the remaining ${2k}$ zeroes converge to the equally spaced points ${\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}$. In this case the de Bruijn-Newman constant remains finite except in the degenerate case ${k=g}$, in which case ${\Lambda = -\infty}$.

For instance, consider the case when ${g=1}$ and ${P(0,z) = z^2 - a_1 z + q}$ for some real ${a_1}$ with ${|a_1| \leq 2\sqrt{q}}$. Then the quadratic polynomial

$\displaystyle P(t,z) = e^t z^2 - a_1 z + e^t q$

has zeroes

$\displaystyle \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}$

and one easily checks that these zeroes lie on the circle ${\{ |z|=\sqrt{q}\}}$ when ${t \geq \log \frac{|a_1|}{2\sqrt{q}}}$, and are on the real axis otherwise. Thus in this case we have ${\Lambda = \log \frac{|a_1|}{2\sqrt{q}}}$ (with ${\Lambda=-\infty}$ if ${a_1=0}$). Note how as ${t}$ increases to ${+\infty}$, the zeroes repel each other and eventually converge to ${\pm i \sqrt{q}}$, while as ${t}$ decreases to ${-\infty}$, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial ${P}$ of degree ${g}$ that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to ${1/g}$, basically because the average spacing is ${1/g}$ and hence by (3) the typical velocity of the zeroes should be comparable to ${g}$, and the diameter of the unit circle is comparable to ${1}$, thus requiring time comparable to ${1/g}$ to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant ${\Lambda}$ should typically take on values comparable to ${-1/g}$ (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of ${P}$ given previously) to explore this further.

I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if ${D,X \in B(H)}$ are bounded operators on a Hilbert space ${H}$ whose commutator ${[D,X] := DX-XD}$ is close to the identity in the sense that

$\displaystyle \| [D,X] - I \|_{op} \leq \varepsilon \ \ \ \ \ (1)$

for some ${\varepsilon > 0}$, then one has the lower bound

$\displaystyle \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (2)$

In the other direction, for any ${0 < \varepsilon < 1}$, there are examples of operators ${D,X \in B(H)}$ obeying (1) such that

$\displaystyle \| X \|_{op} \|D \|_{op} \ll \varepsilon^{-2}. \ \ \ \ \ (3)$

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1 For any ${0 < \varepsilon < 1/2}$, and any infinite-dimensional ${H}$, there exist operators ${D,X \in B(H)}$ obeying (1) such that

$\displaystyle \| X \|_{op} \|D \|_{op} \ll \log^{16} \frac{1}{\varepsilon}. \ \ \ \ \ (4)$

One can probably improve the exponent ${16}$ somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to ${1}$ without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying ${B(H)}$ with the algebra ${M_2(B(H))}$ of ${2 \times 2}$ matrices that have entries in ${B(H)}$. It is then possible to find two matrices ${D, X \in M_2(B(H))}$ whose commutator takes the form

$\displaystyle [D,X] = \begin{pmatrix} I & u \\ 0 & I \end{pmatrix}$

for some bounded operator ${u \in B(H)}$ (for instance one can take ${u}$ to be an isometry). If one then conjugates ${D, X}$ by the diagonal operator ${\mathrm{diag}(\varepsilon,1)}$, one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to ${n \times n}$ matrices ${D,X \in M_n(B(H))}$ rather than ${2 \times 2}$ matrices, where ${n}$ is a parameter at one’s disposal. If one can find matrices ${D,X \in M_n(B(H))}$ that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator ${[D,X]}$ only differs from the identity in the top right corner, thus

$\displaystyle [D, X] = \begin{pmatrix} I & 0 & 0 & \dots & 0 & S \\ 0 & I & 0 & \dots & 0 & 0 \\ 0 & 0 & I & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & I & 0 \\ 0 & 0 & 0 & \dots & 0 & I \end{pmatrix}.$

for some ${S}$, then by conjugating by a diagonal matrix such as ${\mathrm{diag}( \mu^{n-1}, \mu^{n-2}, \dots, 1)}$ for some ${\mu}$ and optimising in ${\mu}$, one can improve the bound ${\varepsilon^{-2}}$ in (3) to ${O_n( \varepsilon^{-\frac{2}{n-1}} )}$; if the bounds in the implied constant in the ${O_n(1)}$ are polynomial in ${n}$, one can then optimise in ${n}$ to obtain a bound of the form (4) (perhaps with the exponent ${16}$ replaced by a different constant).

The task is then to find almost upper triangular matrices ${D, X}$ whose commutator takes the required form. The lower diagonals of ${D,X}$ must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say ${X}$ by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say ${u}$. After a lot of further lengthy experimentation, I eventually realised that one could conjugate ${X}$ further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that ${X}$ takes the normal form

$\displaystyle X := \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & b_1 \\ I & 0 & 0 & \dots & 0 & b_2 \\ 0 & I & 0 & \dots & 0 & b_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & 0 & b_{n-1} \\ 0 & 0 & 0 & \dots & I & b_n \end{pmatrix}.$

$\displaystyle D := \begin{pmatrix} v & I & 0 & \dots & 0 & b_1 u \\ u & v & 2 I & \dots & 0 & b_2 u \\ 0 & u & v & \dots & 0 & b_3 u \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & v & (n-1) I + b_{n-1} u \\ 0 & 0 & 0 & \dots & u & v + b_n u \end{pmatrix}$

for some ${u,v \in B(H)}$, solving the system of equations

$\displaystyle [v, b_i] + [u, b_{i-1}] + i b_{i+1} + b_i [u, b_n] = 0 \ \ \ \ \ (5)$

for ${i=2,\dots,n-1}$, and also

$\displaystyle [v, b_n] + [u, b_{n-1}] + b_n [u, b_n] = n \cdot 1_{B(H)}. \ \ \ \ \ (6)$

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes ${u,v}$ to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in ${n}$ in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if ${A \in B(H)}$ obeys the bounds

$\displaystyle \|A \| = O(1)$

and

$\displaystyle \| A \| = O( \mathrm{dist}( A, {\bf C} + K(H) )^{2/3} ) \ \ \ \ \ (7)$

(where ${{\bf C} + K(H)}$ denotes the space of all operators of the form ${\lambda I + T}$ with ${\lambda \in {\bf C}}$ and ${T}$ compact), then there exist operators ${D,X \in B(H)}$ with ${\|D\|, \|X\| = O(1)}$ such that ${A = [D,X]}$. (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite ${W^*}$ algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that ${\mathrm{dist}(A, {\bf C} + K(H)) = \varepsilon}$ and ${\|A\| = O( \varepsilon^{2/3})}$ for some ${0 < \varepsilon \ll 1}$. A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors ${e_n, f_n, g_n}$ for ${n=1,2,\dots}$ such that for each ${n}$, one has ${A e_n = \varepsilon_n f_n + v_n}$ for some ${\varepsilon_n}$ comparable to ${\varepsilon}$, and some ${v_n}$ orthogonal to all of the ${e_n,f_n,g_n}$. After some conjugation (and a suitable identification of ${B(H)}$ with ${M_2(B(H))}$, one can thus place ${A}$ in a normal form

$\displaystyle A = \begin{pmatrix} \varepsilon^{2/3} x & \varepsilon v^* \\ \varepsilon^{2/3} y & \varepsilon^{2/3} z \end{pmatrix}$

where ${v \in B(H)}$ is a isometry with infinite deficiency, and ${x,y,z \in B(H)}$ have norm ${O(1)}$. Setting ${\varepsilon' := \varepsilon^{1/3}}$, it then suffices to solve the commutator equation

$\displaystyle [D,X] = \begin{pmatrix} x & \varepsilon' v^* \\ y & z \end{pmatrix}$

with ${\|D\|_{op} \|X\|_{op} \ll (\varepsilon')^{-2}}$; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement ${v}$ with another isometry ${u}$ obeying the “Hilbert’s hotel” identity

$\displaystyle uu^* + vv^* = I$

and also ${u^* u = v^* v = I}$, ${u^* v = v^* u = 0}$. Proceeding as in the previous post, we can try the ansatz

$\displaystyle D = \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, X = \begin{pmatrix} b & \varepsilon' I \\ c & d \end{pmatrix}$

for some operators ${a,b,c,d \in B(H)}$, leading to the system of equations

$\displaystyle [\frac{1}{2} u^*, b] + [\frac{1}{2} u^* - v^*, c] = x+z$

$\displaystyle \varepsilon' a = [\frac{1}{2} u^*, b] - x$

$\displaystyle \frac{1}{2} u^* c + c (\frac{1}{2} u^* - v^*) + ab-da = y.$

Using the first equation to solve for ${b,c}$, the second to then solve for ${a}$, and the third to then solve for ${c}$, one can obtain matrices ${D,X}$ with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on ${D,X}$ have been unsuccessful. A model problem would be to express

$\displaystyle \begin{pmatrix} I & 0 & \varepsilon v^* \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}$

as a commutator ${[D,X]}$ with ${\|D\| \|X\|}$ significantly smaller than ${O(\varepsilon^{-2})}$. The construction in my paper achieves something like this, but with ${v^*}$ replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

Important note: As this is not a course in probability, we will try to avoid developing the general theory of stochastic calculus (which includes such concepts as filtrations, martingales, and Ito calculus). This will unfortunately limit what we can actually prove rigorously, and so at some places the arguments will be somewhat informal in nature. A rigorous treatment of many of the topics here can be found for instance in Lawler’s Conformally Invariant Processes in the Plane, from which much of the material here is drawn.

In these notes, random variables will be denoted in boldface.

Definition 1 A real random variable ${\mathbf{X}}$ is said to be normally distributed with mean ${x_0 \in {\bf R}}$ and variance ${\sigma^2 > 0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{X}) = \frac{1}{\sqrt{2\pi} \sigma} \int_{\bf R} e^{-(x-x_0)^2/2\sigma^2} F(x)\ dx$

for all test functions ${F \in C_c({\bf R})}$. Similarly, a complex random variable ${\mathbf{Z}}$ is said to be normally distributed with mean ${z_0 \in {\bf R}}$ and variance ${\sigma^2>0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{Z}) = \frac{1}{\pi \sigma^2} \int_{\bf C} e^{-|z-x_0|^2/\sigma^2} F(z)\ dx dy$

for all test functions ${F \in C_c({\bf C})}$, where ${dx dy}$ is the area element on ${{\bf C}}$.

A real Brownian motion with base point ${x_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{x_0}: [0,+\infty) \rightarrow {\bf R}}$ (using the locally uniform topology on continuous functions) with the property that (almost surely) ${\mathbf{B}^{x_0}(0) = x_0}$, and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent real random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$. Similarly, a complex Brownian motion with base point ${z_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{z_0}: [0,+\infty) \rightarrow {\bf R}}$ with the property that ${\mathbf{B}^{z_0}(0) = z_0}$ and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{z_0}(t_i) - \mathbf{B}^{z_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent complex random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$.

Remark 2 Thanks to the central limit theorem, the hypothesis that the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ be normally distributed can be dropped from the definition of a Brownian motion, so long as one retains the independence and the normalisation of the mean and variance (technically one also needs some uniform integrability on the increments beyond the second moment, but we will not detail this here). A similar statement is also true for the complex Brownian motion (where now we need to normalise the variances and covariances of the real and imaginary parts of the increments).

Real and complex Brownian motions exist from any base point ${x_0}$ or ${z_0}$; see e.g. this previous blog post for a construction. We have the following simple invariances:

Exercise 3

• (i) (Translation invariance) If ${\mathbf{B}^{x_0}}$ is a real Brownian motion with base point ${x_0 \in {\bf R}}$, and ${h \in {\bf R}}$, show that ${\mathbf{B}^{x_0}+h}$ is a real Brownian motion with base point ${x_0+h}$. Similarly, if ${\mathbf{B}^{z_0}}$ is a complex Brownian motion with base point ${z_0 \in {\bf R}}$, and ${h \in {\bf C}}$, show that ${\mathbf{B}^{z_0}+c}$ is a complex Brownian motion with base point ${z_0+h}$.
• (ii) (Dilation invariance) If ${\mathbf{B}^{0}}$ is a real Brownian motion with base point ${0}$, and ${\lambda \in {\bf R}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a real Brownian motion with base point ${0}$. Similarly, if ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, and ${\lambda \in {\bf C}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a complex Brownian motion with base point ${0}$.
• (iii) (Real and imaginary parts) If ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, show that ${\sqrt{2} \mathrm{Re} \mathbf{B}^0}$ and ${\sqrt{2} \mathrm{Im} \mathbf{B}^0}$ are independent real Brownian motions with base point ${0}$. Conversely, if ${\mathbf{B}^0_1, \mathbf{B}^0_2}$ are independent real Brownian motions of base point ${0}$, show that ${\frac{1}{\sqrt{2}} (\mathbf{B}^0_1 + i \mathbf{B}^0_2)}$ is a complex Brownian motion with base point ${0}$.

The next lemma is a special case of the optional stopping theorem.

Lemma 4 (Optional stopping identities)

• (i) (Real case) Let ${\mathbf{B}^{x_0}}$ be a real Brownian motion with base point ${x_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$ (or more precisely, this event is measurable with respect to the ${\sigma}$ algebra generated by this proprtion of the trajectory). Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{x_0}(\mathbf{t}) = x_0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^2 - \mathbf{t} = 0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

• (ii) (Complex case) Let ${\mathbf{B}^{z_0}}$ be a real Brownian motion with base point ${z_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$. Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t}) = z_0$

$\displaystyle \mathop{\bf E} (\mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} (\mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} \mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) \mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) = 0$

$\displaystyle \mathop{\bf E} |\mathbf{B}^{x_0}(\mathbf{t})-z_0|^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

Proof: (Slightly informal) We just prove (i) and leave (ii) as an exercise. By translation invariance we can take ${x_0=0}$. Let ${T}$ be an upper bound for ${\mathbf{t}}$. Since ${\mathbf{B}^0(T)}$ is a real normally distributed variable with mean zero and variance ${T}$, we have

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T ) = 0$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^2 = T$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^4 = 3T^2.$

By the law of total expectation, we thus have

$\displaystyle \mathop{\bf E} \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 0$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = T$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 3T^2$

where the inner conditional expectations are with respect to the event that ${\mathbf{t}, \mathbf{B}^{0}(\mathbf{t})}$ attains a particular point in ${S}$. However, from the independent increment nature of Brownian motion, once one conditions ${(\mathbf{t}, \mathbf{B}^{0}(\mathbf{t}))}$ to a fixed point ${(t, x)}$, the random variable ${\mathbf{B}^0(T)}$ becomes a real normally distributed variable with mean ${x}$ and variance ${T-t}$. Thus we have

$\displaystyle \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^2 + T - \mathbf{t}$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^4 + 6(T - \mathbf{t}) \mathbf{B}^{z_0}(\mathbf{t})^2 + 3(T - \mathbf{t})^2$

which give the first two claims, and (after some algebra) the identity

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t})^4 - 6 \mathbf{t} \mathbf{B}^{z_0}(\mathbf{t})^2 + 3 \mathbf{t}^2 = 0$

which then also gives the third claim. $\Box$

Exercise 5 Prove the second part of Lemma 4.

We now approach conformal maps from yet another perspective. Given an open subset ${U}$ of the complex numbers ${{\bf C}}$, define a univalent function on ${U}$ to be a holomorphic function ${f: U \rightarrow {\bf C}}$ that is also injective. We will primarily be studying this concept in the case when ${U}$ is the unit disk ${D(0,1) := \{ z \in {\bf C}: |z| < 1 \}}$.

Clearly, a univalent function ${f: D(0,1) \rightarrow {\bf C}}$ on the unit disk is a conformal map from ${D(0,1)}$ to the image ${f(D(0,1))}$; in particular, ${f(D(0,1))}$ is simply connected, and not all of ${{\bf C}}$ (since otherwise the inverse map ${f^{-1}: {\bf C} \rightarrow D(0,1)}$ would violate Liouville’s theorem). In the converse direction, the Riemann mapping theorem tells us that every open simply connected proper subset ${V \subsetneq {\bf C}}$ of the complex numbers is the image of a univalent function on ${D(0,1)}$. Furthermore, if ${V}$ contains the origin, then the univalent function ${f: D(0,1) \rightarrow {\bf C}}$ with this image becomes unique once we normalise ${f(0) = 0}$ and ${f'(0) > 0}$. Thus the Riemann mapping theorem provides a one-to-one correspondence between open simply connected proper subsets of the complex plane containing the origin, and univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with ${f(0)=0}$ and ${f'(0)>0}$. We will focus particular attention on the univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with the normalisation ${f(0)=0}$ and ${f'(0)=1}$; such functions will be called schlicht functions.

One basic example of a univalent function on ${D(0,1)}$ is the Cayley transform ${z \mapsto \frac{1+z}{1-z}}$, which is a Möbius transformation from ${D(0,1)}$ to the right half-plane ${\{ \mathrm{Re}(z) > 0 \}}$. (The slight variant ${z \mapsto \frac{1-z}{1+z}}$ is also referred to as the Cayley transform, as is the closely related map ${z \mapsto \frac{z-i}{z+i}}$, which maps ${D(0,1)}$ to the upper half-plane.) One can square this map to obtain a further univalent function ${z \mapsto \left( \frac{1+z}{1-z} \right)^2}$, which now maps ${D(0,1)}$ to the complex numbers with the negative real axis ${(-\infty,0]}$ removed. One can normalise this function to be schlicht to obtain the Koebe function

$\displaystyle f(z) := \frac{1}{4}\left( \left( \frac{1+z}{1-z} \right)^2 - 1\right) = \frac{z}{(1-z)^2}, \ \ \ \ \ (1)$

which now maps ${D(0,1)}$ to the complex numbers with the half-line ${(-\infty,-1/4]}$ removed. A little more generally, for any ${\theta \in {\bf R}}$ we have the rotated Koebe function

$\displaystyle f(z) := \frac{z}{(1 - e^{i\theta} z)^2} \ \ \ \ \ (2)$

that is a schlicht function that maps ${D(0,1)}$ to the complex numbers with the half-line ${\{ -re^{-i\theta}: r \geq 1/4\}}$ removed.

Every schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ has a convergent Taylor expansion

$\displaystyle f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots$

for some complex coefficients ${a_1,a_2,\dots}$ with ${a_1=1}$. For instance, the Koebe function has the expansion

$\displaystyle f(z) = z + 2 z^2 + 3 z^3 + \dots = \sum_{n=1}^\infty n z^n$

and similarly the rotated Koebe function has the expansion

$\displaystyle f(z) = z + 2 e^{i\theta} z^2 + 3 e^{2i\theta} z^3 + \dots = \sum_{n=1}^\infty n e^{(n-1)\theta} z^n.$

Intuitively, the Koebe function and its rotations should be the “largest” schlicht functions available. This is formalised by the famous Bieberbach conjecture, which asserts that for any schlicht function, the coefficients ${a_n}$ should obey the bound ${|a_n| \leq n}$ for all ${n}$. After a large number of partial results, this conjecture was eventually solved by de Branges; see for instance this survey of Korevaar or this survey of Koepf for a history.

It turns out that to resolve these sorts of questions, it is convenient to restrict attention to schlicht functions ${g: D(0,1) \rightarrow {\bf C}}$ that are odd, thus ${g(-z)=-g(z)}$ for all ${z}$, and the Taylor expansion now reads

$\displaystyle g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots$

for some complex coefficients ${b_1,b_3,\dots}$ with ${b_1=1}$. One can transform a general schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ to an odd schlicht function ${g: D(0,1) \rightarrow {\bf C}}$ by observing that the function ${f(z^2)/z^2: D(0,1) \rightarrow {\bf C}}$, after removing the singularity at zero, is a non-zero function that equals ${1}$ at the origin, and thus (as ${D(0,1)}$ is simply connected) has a unique holomorphic square root ${(f(z^2)/z^2)^{1/2}}$ that also equals ${1}$ at the origin. If one then sets

$\displaystyle g(z) := z (f(z^2)/z^2)^{1/2} \ \ \ \ \ (3)$

it is not difficult to verify that ${g}$ is an odd schlicht function which additionally obeys the equation

$\displaystyle f(z^2) = g(z)^2. \ \ \ \ \ (4)$

Conversely, given an odd schlicht function ${g}$, the formula (4) uniquely determines a schlicht function ${f}$.

For instance, if ${f}$ is the Koebe function (1), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1-z^2} = z + z^3 + z^5 + \dots, \ \ \ \ \ (5)$

which maps ${D(0,1)}$ to the complex numbers with two slits ${\{ \pm iy: y > 1/2 \}}$ removed, and if ${f}$ is the rotated Koebe function (2), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1- e^{i\theta} z^2} = z + e^{i\theta} z^3 + e^{2i\theta} z^5 + \dots. \ \ \ \ \ (6)$

De Branges established the Bieberbach conjecture by first proving an analogous conjecture for odd schlicht functions known as Robertson’s conjecture. More precisely, we have

Theorem 1 (de Branges’ theorem) Let ${n \geq 1}$ be a natural number.

• (i) (Robertson conjecture) If ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is an odd schlicht function, then

$\displaystyle \sum_{k=1}^n |b_{2k-1}|^2 \leq n.$

• (ii) (Bieberbach conjecture) If ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is a schlicht function, then

$\displaystyle |a_n| \leq n.$

It is easy to see that the Robertson conjecture for a given value of ${n}$ implies the Bieberbach conjecture for the same value of ${n}$. Indeed, if ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is schlicht, and ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is the odd schlicht function given by (3), then from extracting the ${z^{2n}}$ coefficient of (4) we obtain a formula

$\displaystyle a_n = \sum_{j=1}^n b_{2j-1} b_{2(n+1-j)-1}$

for the coefficients of ${f}$ in terms of the coefficients of ${g}$. Applying the Cauchy-Schwarz inequality, we derive the Bieberbach conjecture for this value of ${n}$ from the Robertson conjecture for the same value of ${n}$. We remark that Littlewood and Paley had conjectured a stronger form ${|b_{2k-1}| \leq 1}$ of Robertson’s conjecture, but this was disproved for ${k=3}$ by Fekete and Szegö.

To prove the Robertson and Bieberbach conjectures, one first takes a logarithm and deduces both conjectures from a similar conjecture about the Taylor coefficients of ${\log \frac{f(z)}{z}}$, known as the Milin conjecture. Next, one continuously enlarges the image ${f(D(0,1))}$ of the schlicht function to cover all of ${{\bf C}}$; done properly, this places the schlicht function ${f}$ as the initial function ${f = f_0}$ in a sequence ${(f_t)_{t \geq 0}}$ of univalent maps ${f_t: D(0,1) \rightarrow {\bf C}}$ known as a Loewner chain. The functions ${f_t}$ obey a useful differential equation known as the Loewner equation, that involves an unspecified forcing term ${\mu_t}$ (or ${\theta(t)}$, in the case that the image is a slit domain) coming from the boundary; this in turn gives useful differential equations for the Taylor coefficients of ${f(z)}$, ${g(z)}$, or ${\log \frac{f(z)}{z}}$. After some elementary calculus manipulations to “integrate” this equations, the Bieberbach, Robertson, and Milin conjectures are then reduced to establishing the non-negativity of a certain explicit hypergeometric function, which is non-trivial to prove (and will not be done here, except for small values of ${n}$) but for which several proofs exist in the literature.

The theory of Loewner chains subsequently became fundamental to a more recent topic in complex analysis, that of the Schramm-Loewner equation (SLE), which is the focus of the next and final set of notes.

This is the eighth “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant ${\Lambda}$, continuing this post. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

Significant progress has been made since the last update; by implementing the “barrier” method to establish zero free regions for $H_t$ by leveraging the extensive existing numerical verification of the Riemann hypothesis (which establishes zero free regions for $H_0$), we have been able to improve our upper bound on $\Lambda$ from 0.48 to 0.28. Furthermore, there appears to be a bit of further room to improve the bounds further by tweaking the parameters $t_0, y_0, X$ used in the argument (we are currently using $t_0=0.2, y_0 = 0.4, X = 5 \times 10^9$); the most recent idea is to try to use exponential sum estimates to improve the bounds on the derivative of the approximation to $H_t$ that is used in the barrier method, which currently is the most computationally intensive step of the argument.