We now begin the rigorous theory of the incompressible Navier-Stokes equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = \nu \Delta u - \nabla p \ \ \ \ \ (1)

\displaystyle  \nabla \cdot u = 0,

where {\nu>0} is a given constant (the kinematic viscosity, or viscosity for short), {u: I \times {\bf R}^d \rightarrow {\bf R}^d} is an unknown vector field (the velocity field), and {p: I \times {\bf R}^d \rightarrow {\bf R}} is an unknown scalar field (the pressure field). Here {I} is a time interval, usually of the form {[0,T]} or {[0,T)}. We will either be interested in spatially decaying situations, in which {u(t,x)} decays to zero as {x \rightarrow \infty}, or {{\bf Z}^d}-periodic (or periodic for short) settings, in which one has {u(t, x+n) = u(t,x)} for all {n \in {\bf Z}^d}. (One can also require the pressure {p} to be periodic as well; this brings up a small subtlety in the uniqueness theory for these equations, which we will address later in this set of notes.) As is usual, we abuse notation by identifying a {{\bf Z}^d}-periodic function on {{\bf R}^d} with a function on the torus {{\bf R}^d/{\bf Z}^d}.

In order for the system (1) to even make sense, one requires some level of regularity on the unknown fields {u,p}; this turns out to be a relatively important technical issue that will require some attention later in this set of notes, and we will end up transforming (1) into other forms that are more suitable for lower regularity candidate solution. Our focus here will be on local existence of these solutions in a short time interval {[0,T]} or {[0,T)}, for some {T>0}. (One could in principle also consider solutions that extend to negative times, but it turns out that the equations are not time-reversible, and the forward evolution is significantly more natural to study than the backwards one.) The study of Euler equations, in which {\nu=0}, will be deferred to subsequent lecture notes.

As the unknown fields involve a time parameter {t}, and the first equation of (1) involves time derivatives of {u}, the system (1) should be viewed as describing an evolution for the velocity field {u}. (As we shall see later, the pressure {p} is not really an independent dynamical field, as it can essentially be expressed in terms of the velocity field without requiring any differentiation or integration in time.) As such, the natural question to study for this system is the initial value problem, in which an initial velocity field {u_0: {\bf R}^d \rightarrow {\bf R}^d} is specified, and one wishes to locate a solution {(u,p)} to the system (1) with initial condition

\displaystyle  u(0,x) = u_0(x) \ \ \ \ \ (2)

for {x \in {\bf R}^d}. Of course, in order for this initial condition to be compatible with the second equation in (1), we need the compatibility condition

\displaystyle  \nabla \cdot u_0 = 0 \ \ \ \ \ (3)

and one should also impose some regularity, decay, and/or periodicity hypotheses on {u_0} in order to be compatible with corresponding level of regularity etc. on the solution {u}.

The fundamental questions in the local theory of an evolution equation are that of existence, uniqueness, and continuous dependence. In the context of the Navier-Stokes equations, these questions can be phrased (somewhat broadly) as follows:

  • (a) (Local existence) Given suitable initial data {u_0}, does there exist a solution {(u,p)} to the above initial value problem that exists for some time {T>0}? What can one say about the time {T} of existence? How regular is the solution?
  • (b) (Uniqueness) Is it possible to have two solutions {(u,p), (u',p')} of a certain regularity class to the same initial value problem on a common time interval {[0,T)}? To what extent does the answer to this question depend on the regularity assumed on one or both of the solutions? Does one need to normalise the solutions beforehand in order to obtain uniqueness?
  • (c) (Continuous dependence on data) If one perturbs the initial conditions {u_0} by a small amount, what happens to the solution {(u,p)} and on the time of existence {T}? (This question tends to only be sensible once one has a reasonable uniqueness theory.)

The answers to these questions tend to be more complicated than a simple “Yes” or “No”, for instance they can depend on the precise regularity hypotheses one wishes to impose on the data and on the solution, and even on exactly how one interprets the concept of a “solution”. However, once one settles on such a set of hypotheses, it generally happens that one either gets a “strong” theory (in which one has existence, uniqueness, and continuous dependence on the data), a “weak” theory (in which one has existence of somewhat low-quality solutions, but with only limited uniqueness results (or even some spectacular failures of uniqueness) and almost no continuous dependence on data), or no satsfactory theory whatsoever. In the former case, we say (roughly speaking) that the initial value problem is locally well-posed, and one can then try to build upon the theory to explore more interesting topics such as global existence and asymptotics, classifying potential blowup, rigorous justification of conservation laws, and so forth. With a weak local theory, it becomes much more difficult to address these latter sorts of questions, and there are serious analytic pitfalls that one could fall into if one tries too strenuously to treat weak solutions as if they were strong. (For instance, conservation laws that are rigorously justified for strong, high-regularity solutions may well fail for weak, low-regularity ones.) Also, even if one is primarily interested in solutions at one level of regularity, the well-posedness theory at another level of regularity can be very helpful; for instance, if one is interested in smooth solutions in {{\bf R}^d}, it turns out that the well-posedness theory at the critical regularity of {\dot H^{\frac{d}{2}-1}({\bf R}^d)} can be used to establish globally smooth solutions from small initial data. As such, it can become quite important to know what kind of local theory one can obtain for a given equation.

This set of notes will focus on the “strong” theory, in which a substantial amount of regularity is assumed in the initial data and solution, giving a satisfactory (albeit largely local-in-time) well-posedness theory. “Weak” solutions will be considered in later notes.

The Navier-Stokes equations are not the simplest of partial differential equations to study, in part because they are an amalgam of three more basic equations, which behave rather differently from each other (for instance the first equation is nonlinear, while the latter two are linear):

  • (a) Transport equations such as {\partial_t u + (u \cdot \nabla) u = 0}.
  • (b) Diffusion equations (or heat equations) such as {\partial_t u = \nu \Delta u}.
  • (c) Systems such as {v = F - \nabla p}, {\nabla \cdot v = 0}, which (for want of a better name) we will call Leray systems.

Accordingly, we will devote some time to getting some preliminary understanding of the linear diffusion and Leray systems before returning to the theory for the Navier-Stokes equation. Transport systems will be discussed further in subsequent notes; in this set of notes, we will instead focus on a more basic example of nonlinear equations, namely the first-order ordinary differential equation

\displaystyle  \partial_t u = F(u) \ \ \ \ \ (4)

where {u: I \rightarrow V} takes values in some finite-dimensional (real or complex) vector space {V} on some time interval {I}, and {F: V \rightarrow V} is a given linear or nonlinear function. (Here, we use “interval” to denote a connected non-empty subset of {{\bf R}}; in particular, we allow intervals to be half-infinite or infinite, or to be open, closed, or half-open.) Fundamental results in this area include the Picard existence and uniqueness theorem, the Duhamel formula, and Grönwall’s inequality; they will serve as motivation for the approach to local well-posedness that we will adopt in this set of notes. (There are other ways to construct strong or weak solutions for Navier-Stokes and Euler equations, which we will discuss in later notes.)

A key role in our treatment here will be played by the fundamental theorem of calculus (in various forms and variations). Roughly speaking, this theorem, and its variants, allow us to recast differential equations (such as (1) or (4)) as integral equations. Such integral equations are less tractable algebraically than their differential counterparts (for instance, they are not ideal for verifying conservation laws), but are significantly more convenient for well-posedness theory, basically because integration tends to increase the regularity of a function, while differentiation reduces it. (Indeed, the problem of “losing derivatives”, or more precisely “losing regularity”, is a key obstacle that one often has to address when trying to establish well-posedness for PDE, particularly those that are quite nonlinear and with rough initial data, though for nonlinear parabolic equations such as Navier-Stokes the obstacle is not as serious as it is for some other PDE, due to the smoothing effects of the heat equation.)

One weakness of the methods deployed here are that the quantitative bounds produced deteriorate to the point of uselessness in the inviscid limit {\nu \rightarrow 0}, rendering these techniques unsuitable for analysing the Euler equations in which {\nu=0}. However, some of the methods developed in later notes have bounds that remain uniform in the {\nu \rightarrow 0} limit, allowing one to also treat the Euler equations.

In this and subsequent set of notes, we use the following asymptotic notation (a variant of Vinogradov notation that is commonly used in PDE and harmonic analysis). The statement {X \lesssim Y}, {Y \gtrsim X}, or {X = O(Y)} will be used to denote an estimate of the form {|X| \leq CY} (or equivalently {Y \geq C^{-1} |X|}) for some constant {C}, and {X \sim Y} will be used to denote the estimates {X \lesssim Y \lesssim X}. If the constant {C} depends on other parameters (such as the dimension {d}), this will be indicated by subscripts, thus for instance {X \lesssim_d Y} denotes the estimate {|X| \leq C_d Y} for some {C_d} depending on {d}.

Read the rest of this entry »

In the last week or so there has been some discussion on the internet about a paper (initially authored by Hill and Tabachnikov) that was initially accepted for publication in the Mathematical Intelligencer, but with the editor-in-chief of that journal later deciding against publication; the paper, in significantly revised form (and now authored solely by Hill), was then quickly accepted by one of the editors in the New York Journal of Mathematics, but then was removed from publication after objections from several members on the editorial board of NYJM that the paper had not been properly refereed or was within the scope of the journal; see this statement by Benson Farb, who at the time was on that board, for more details.  Some further discussion of this incident may be found on Tim Gowers’ blog; the most recent version of the paper, as well as a number of prior revisions, are still available on the arXiv here.

For whatever reason, some of the discussion online has focused on the role of Amie Wilkinson, a mathematician from the University of Chicago (and who, incidentally, was a recent speaker here at UCLA in our Distinguished Lecture Series), who wrote an email to the editor-in-chief of the Intelligencer raising some concerns about the content of the paper and suggesting that it be published alongside commentary from other experts in the field.  (This, by the way, is not uncommon practice when dealing with a potentially provocative publication in one field by authors coming from a different field; for instance, when Emmanuel Candès and I published a paper in the Annals of Statistics introducing what we called the “Dantzig selector”, the Annals solicited a number of articles discussing the selector from prominent statisticians, and then invited us to submit a rejoinder.)    It seems that the editors of the Intelligencer decided instead to reject the paper.  The paper then had a complicated interaction with NYJM, but, as stated by Wilkinson in her recent statement on this matter as well as by Farb, this was done without any involvement from Wilkinson.  (It is true that Farb happens to also be Wilkinson’s husband, but I see no reason to doubt their statements on this matter.)

I have not interacted much with the Intelligencer, but I have published a few papers with NYJM over the years; it is an early example of a quality “diamond open access” mathematics journal.  It seems that this incident may have uncovered some issues with their editorial procedure for reviewing and accepting papers, but I am hopeful that they can be addressed to avoid this sort of event occurring again.

 

 

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures“. This is a sequel to our previous paper that studied logarithmic correlations of the form

\displaystyle  f(a) := \lim^*_{x \rightarrow \infty} \frac{1}{\log \omega(x)} \sum_{x/\omega(x) \leq n \leq x} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n},

where {g_1,\dots,g_k} were bounded multiplicative functions, {h_1,\dots,h_k \rightarrow \infty} were fixed shifts, {1 \leq \omega(x) \leq x} was a quantity going off to infinity, and {\lim^*} was a generalised limit functional. Our main technical result asserted that these correlations were necessarily the uniform limit of periodic functions {f_i}. Furthermore, if {g_1 \dots g_k} (weakly) pretended to be a Dirichlet character {\chi}, then the {f_i} could be chosen to be {\chi}isotypic in the sense that {f_i(ab) = f_i(a) \chi(b)} whenever {a,b} are integers with {b} coprime to the periods of {\chi} and {f_i}; otherwise, if {g_1 \dots g_k} did not weakly pretend to be any Dirichlet character {\chi}, then {f} vanished completely. This was then used to verify several cases of the logarithmically averaged Elliott and Chowla conjectures.

The purpose of this paper was to investigate the extent to which the methods could be extended to non-logarithmically averaged settings. For our main technical result, we now considered the unweighted averages

\displaystyle  f_d(a) := \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k),

where {d>1} is an additional parameter. Our main result was now as follows. If {g_1 \dots g_k} did not weakly pretend to be a twisted Dirichlet character {n \mapsto \chi(n) n^{it}}, then {f_d(a)} converged to zero on (doubly logarithmic) average as {d \rightarrow \infty}. If instead {g_1 \dots g_k} did pretend to be such a twisted Dirichlet character, then {f_d(a) d^{it}} converged on (doubly logarithmic) average to a limit {f(a)} of {\chi}-isotypic functions {f_i}. Thus, roughly speaking, one has the approximation

\displaystyle  \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k) \approx f(a) d^{-it}

for most {d}.

Informally, this says that at almost all scales {x} (where “almost all” means “outside of a set of logarithmic density zero”), the non-logarithmic averages behave much like their logarithmic counterparts except for a possible additional twisting by an Archimedean character {d \mapsto d^{it}} (which interacts with the Archimedean parameter {d} in much the same way that the Dirichlet character {\chi} interacts with the non-Archimedean parameter {a}). One consequence of this is that most of the recent results on the logarithmically averaged Chowla and Elliott conjectures can now be extended to their non-logarithmically averaged counterparts, so long as one excludes a set of exceptional scales {x} of logarithmic density zero. For instance, the Chowla conjecture

\displaystyle  \lim_{x \rightarrow\infty} \frac{1}{x} \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

is now established for {k} either odd or equal to {2}, so long as one excludes an exceptional set of scales.

In the logarithmically averaged setup, the main idea was to combine two very different pieces of information on {f(a)}. The first, coming from recent results in ergodic theory, was to show that {f(a)} was well approximated in some sense by a nilsequence. The second was to use the “entropy decrement argument” to obtain an approximate isotopy property of the form

\displaystyle  f(a) g_1 \dots g_k(p)\approx f(ap)

for “most” primes {p} and integers {a}. Combining the two facts, one eventually finds that only the almost periodic components of the nilsequence are relevant.

In the current situation, each {a \mapsto f_d(a)} is approximated by a nilsequence, but the nilsequence can vary with {d} (although there is some useful “Lipschitz continuity” of this nilsequence with respect to the {d} parameter). Meanwhile, the entropy decrement argument gives an approximation basically of the form

\displaystyle  f_{dp}(a) g_1 \dots g_k(p)\approx f_d(ap)

for “most” {d,p,a}. The arguments then proceed largely as in the logarithmically averaged case. A key lemma to handle the dependence on the new parameter {d} is the following cohomological statement: if one has a map {\alpha: (0,+\infty) \rightarrow S^1} that was a quasimorphism in the sense that {\alpha(xy) = \alpha(x) \alpha(y) + O(\varepsilon)} for all {x,y \in (0,+\infty)} and some small {\varepsilon}, then there exists a real number {t} such that {\alpha(x) = x^{it} + O(\varepsilon)} for all small {\varepsilon}. This is achieved by applying a standard “cocycle averaging argument” to the cocycle {(x,y) \mapsto \alpha(xy) \alpha(x)^{-1} \alpha(y)^{-1}}.

It would of course be desirable to not have the set of exceptional scales. We only know of one (implausible) scenario in which we can do this, namely when one has far fewer (in particular, subexponentially many) sign patterns for (say) the Liouville function than predicted by the Chowla conjecture. In this scenario (roughly analogous to the “Siegel zero” scenario in multiplicative number theory), the entropy of the Liouville sign patterns is so small that the entropy decrement argument becomes powerful enough to control all scales rather than almost all scales. On the other hand, this scenario seems to be self-defeating, in that it allows one to establish a large number of cases of the Chowla conjecture, and the full Chowla conjecture is inconsistent with having unusually few sign patterns. Still it hints that future work in this direction may need to split into “low entropy” and “high entropy” cases, in analogy to how many arguments in multiplicative number theory have to split into the “Siegel zero” and “no Siegel zero” cases.

This is the tenth “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}, continuing this post. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

Most of the progress since the last thread has been on the numerical side, in which the various techniques to numerically establish zero-free regions to the equation H_t(x+iy)=0 have been streamlined, made faster, and extended to larger heights than were previously possible.  The best bound for \Lambda now depends on the height to which one is willing to assume the Riemann hypothesis.  Using the conservative verification up to height (slightly larger than) 3 \times 10^{10}, which has been confirmed by independent work of Platt et al. and Gourdon-Demichel, the best bound remains at \Lambda \leq 0.22.  Using the verification up to height 2.5 \times 10^{12} claimed by Gourdon-Demichel, this improves slightly to \Lambda \leq 0.19, and if one assumes the Riemann hypothesis up to height 5 \times 10^{19} the bound improves to \Lambda \leq 0.11, contingent on a numerical computation that is still underway.   (See the table below the fold for more data of this form.)  This is broadly consistent with the expectation that the bound on \Lambda should be inversely proportional to the logarithm of the height at which the Riemann hypothesis is verified.

As progress seems to have stabilised, it may be time to transition to the writing phase of the Polymath15 project.  (There are still some interesting research questions to pursue, such as numerically investigating the zeroes of H_t for negative values of t, but the writeup does not necessarily have to contain every single direction pursued in the project. If enough additional interesting findings are unearthed then one could always consider writing a second paper, for instance.

Below the fold is the detailed progress report on the numerics by Rudolph Dwars and Kalpesh Muchhal.

Read the rest of this entry »

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

\displaystyle  F = ma

to the seemingly rather different-looking law

\displaystyle  \partial_t u + (u \cdot \nabla) u = -\nabla p + \nu \Delta u

\displaystyle  \nabla \cdot u = 0

for incompressible Navier-Stokes (or, if one drops the viscosity term {\nu \Delta u}, the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities {X \approx Y} as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

Read the rest of this entry »

Let {k} be a field, and let {E} be a finite extension of that field; in this post we will denote such a relationship by {k \hookrightarrow E}. We say that {E} is a Galois extension of {k} if the cardinality of the automorphism group {\mathrm{Aut}(E/k)} of {E} fixing {k} is as large as it can be, namely the degree {[E:k]} of the extension. In that case, we call {\mathrm{Aut}(E/k)} the Galois group of {E} over {k} and denote it also by {\mathrm{Gal}(E/k)}. The fundamental theorem of Galois theory then gives a one-to-one correspondence (also known as the Galois correspondence) between the intermediate extensions between {E} and {k} and the subgroups of {\mathrm{Gal}(E/k)}:

Theorem 1 (Fundamental theorem of Galois theory) Let {E} be a Galois extension of {k}.

  • (i) If {k \hookrightarrow F \hookrightarrow E} is an intermediate field betwen {k} and {E}, then {E} is a Galois extension of {F}, and {\mathrm{Gal}(E/F)} is a subgroup of {\mathrm{Gal}(E/k)}.
  • (ii) Conversely, if {H} is a subgroup of {\mathrm{Gal}(E/k)}, then there is a unique intermediate field {k \hookrightarrow F \hookrightarrow E} such that {\mathrm{Gal}(E/F)=H}; namely {F} is the set of elements of {E} that are fixed by {H}.
  • (iii) If {k \hookrightarrow F_1 \hookrightarrow E} and {k \hookrightarrow F_2 \hookrightarrow E}, then {F_1 \hookrightarrow F_2} if and only if {\mathrm{Gal}(E/F_2)} is a subgroup of {\mathrm{Gal}(E/F_1)}.
  • (iv) If {k \hookrightarrow F \hookrightarrow E} is an intermediate field between {k} and {E}, then {F} is a Galois extension of {k} if and only if {\mathrm{Gal}(E/F)} is a normal subgroup of {\mathrm{Gal}(E/k)}. In that case, {\mathrm{Gal}(F/k)} is isomorphic to the quotient group {\mathrm{Gal}(E/k) / \mathrm{Gal}(E/F)}.

Example 2 Let {k= {\bf Q}}, and let {E = {\bf Q}(e^{2\pi i/n})} be the degree {\phi(n)} Galois extension formed by adjoining a primitive {n^{th}} root of unity (that is to say, {E} is the cyclotomic field of order {n}). Then {\mathrm{Gal}(E/k)} is isomorphic to the multiplicative cyclic group {({\bf Z}/n{\bf Z})^\times} (the invertible elements of the ring {{\bf Z}/n{\bf Z}}). Amongst the intermediate fields, one has the cyclotomic fields of the form {F = {\bf Q}(e^{2\pi i/m})} where {m} divides {n}; they are also Galois extensions, with {\mathrm{Gal}(F/k)} isomorphic to {({\bf Z}/m{\bf Z})^\times} and {\mathrm{Gal}(E/F)} isomorphic to the elements {a} of {({\bf Z}/n{\bf Z})^\times} such that {a(n/m) = (n/m)} modulo {n}. (There can also be other intermediate fields, corresponding to other subgroups of {({\bf Z}/n{\bf Z})^\times}.)

Example 3 Let {k = {\bf C}(z)} be the field of rational functions of one indeterminate {z} with complex coefficients, and let {E = {\bf C}(w)} be the field formed by adjoining an {n^{th}} root {w = z^{1/n}} to {k}, thus {k = {\bf C}(w^n)}. Then {E} is a degree {n} Galois extension of {k} with Galois group isomorphic to {{\bf Z}/n{\bf Z}} (with an element {a \in {\bf Z}/n{\bf Z}} corresponding to the field automorphism of {k} that sends {w} to {e^{2\pi i a/n} w}). The intermediate fields are of the form {F = {\bf C}(w^{n/m})} where {m} divides {n}; they are also Galois extensions, with {\mathrm{Gal}(F/k)} isomorphic to {{\bf Z}/m{\bf Z}} and {\mathrm{Gal}(E/F)} isomorphic to the multiples of {m} in {{\bf Z}/n{\bf Z}}.

There is an analogous Galois correspondence in the covering theory of manifolds. For simplicity we restrict attention to finite covers. If {L} is a connected manifold and {\pi_{L \leftarrow M}: M \rightarrow L} is a finite covering map of {L} by another connected manifold {M}, we denote this relationship by {L \leftarrow M}. (Later on we will change our function notations slightly and write {\pi_{L \leftarrow M}: L \leftarrow M} in place of the more traditional {\pi_{L \leftarrow M}: M \rightarrow L}, and similarly for the deck transformations {g: M \leftarrow M} below; more on this below the fold.) If {L \leftarrow M}, we can define {\mathrm{Aut}(M/L)} to be the group of deck transformations: continuous maps {g: M \rightarrow M} which preserve the fibres of {\pi}. We say that this covering map is a Galois cover if the cardinality of the group {\mathrm{Aut}(M/L)} is as large as it can be. In that case we call {\mathrm{Aut}(M/L)} the Galois group of {M} over {L} and denote it by {\mathrm{Gal}(M/L)}.

Suppose {M} is a finite cover of {L}. An intermediate cover {N} between {M} and {L} is a cover of {N} by {L}, such that {L \leftarrow N \leftarrow M}, in such a way that the covering maps are compatible, in the sense that {\pi_{L \leftarrow M}} is the composition of {\pi_{L \leftarrow N}} and {\pi_{N \leftarrow M}}. This sort of compatibilty condition will be implicitly assumed whenever we chain together multiple instances of the {\leftarrow} notation. Two intermediate covers {N,N'} are equivalent if they cover each other, in a fashion compatible with all the other covering maps, thus {L \leftarrow N \leftarrow N' \leftarrow M} and {L \leftarrow N' \leftarrow N \leftarrow M}. We then have the analogous Galois correspondence:

Theorem 4 (Fundamental theorem of covering spaces) Let {L \leftarrow M} be a Galois covering.

  • (i) If {L \leftarrow N \leftarrow M} is an intermediate cover betwen {L} and {M}, then {M} is a Galois extension of {N}, and {\mathrm{Gal}(M/N)} is a subgroup of {\mathrm{Gal}(M/L)}.
  • (ii) Conversely, if {H} is a subgroup of {\mathrm{Gal}(M/L)}, then there is a intermediate cover {L \leftarrow N \leftarrow M}, unique up to equivalence, such that {\mathrm{Gal}(M/N)=H}.
  • (iii) If {L \leftarrow N_1 \leftarrow M} and {L \leftarrow N_2 \leftarrow M}, then {L \leftarrow N_1 \leftarrow N_2 \leftarrow M} if and only if {\mathrm{Gal}(M/N_2)} is a subgroup of {\mathrm{Gal}(M/N_1)}.
  • (iv) If {L \leftarrow N \leftarrow M}, then {N} is a Galois cover of {L} if and only if {\mathrm{Gal}(M/N)} is a normal subgroup of {\mathrm{Gal}(M/L)}. In that case, {\mathrm{Gal}(N/L)} is isomorphic to the quotient group {\mathrm{Gal}(M/L) / \mathrm{Gal}(N/L)}.

Example 5 Let {L= {\bf C}^\times := {\bf C} \backslash \{0\}}, and let {M = {\bf C}^\times} be the {n}-fold cover of {L} with covering map {\pi_{L \leftarrow M}(w) := w^n}. Then {M} is a Galois cover of {L}, and {\mathrm{Gal}(M/L)} is isomorphic to the cyclic group {{\bf Z}/n{\bf Z}}. The intermediate covers are (up to equivalence) of the form {N = {\bf C}^\times} with covering map {\pi_{L \leftarrow N}(u) := u^m} where {m} divides {n}; they are also Galois covers, with {\mathrm{Gal}(N/L)} isomorphic to {{\bf Z}/m{\bf Z}} and {\mathrm{Gal}(M/N)} isomorphic to the multiples of {m} in {{\bf Z}/n{\bf Z}}.

Given the strong similarity between the two theorems, it is natural to ask if there is some more concrete connection between Galois theory and the theory of finite covers.

In one direction, if the manifolds {L,M,N} have an algebraic structure (or a complex structure), then one can relate covering spaces to field extensions by considering the field of rational functions (or meromorphic functions) on the space. For instance, if {L = {\bf C}^\times} and {z} is the coordinate on {L}, one can consider the field {{\bf C}(z)} of rational functions on {L}; the {n}-fold cover {M = {\bf C}^\times} with coordinate {w} from Example 5 similarly has a field {{\bf C}(w)} of rational functions. The covering {\pi_{L \leftarrow M}(w) = w^n} relates the two coordinates {z,w} by the relation {z = w^n}, at which point one sees that the rational functions {{\bf C}(w)} on {L} are a degree {n} extension of that of {{\bf C}(z)} (formed by adjoining the {n^{th}} root of unity {w} to {z}). In this way we see that Example 5 is in fact closely related to Example 3.

Exercise 6 What happens if one uses meromorphic functions in place of rational functions in the above example? (To answer this question, I found it convenient to use a discrete Fourier transform associated to the multiplicative action of the {n^{th}} roots of unity on {M} to decompose the meromorphic functions on {M} as a linear combination of functions invariant under this action, times a power {w^j} of the coordinate {w} for {j=0,\dots,n-1}.)

I was curious however about the reverse direction. Starting with some field extensions {k \hookrightarrow F \hookrightarrow E}, is it is possible to create manifold like spaces {M_k \leftarrow M_F \leftarrow M_E} associated to these fields in such a fashion that (say) {M_E} behaves like a “covering space” to {M_k} with a group {\mathrm{Aut}(M_E/M_k)} of deck transformations isomorphic to {\mathrm{Aut}(E/k)}, so that the Galois correspondences agree? Also, given how the notion of a path (and associated concepts such as loops, monodromy and the fundamental group) play a prominent role in the theory of covering spaces, can spaces such as {M_k} or {M_E} also come with a notion of a path that is somehow compatible with the Galois correspondence?

The standard answer from modern algebraic geometry (as articulated for instance in this nice MathOverflow answer by Minhyong Kim) is to set {M_E} equal to the spectrum {\mathrm{Spec}(E)} of the field {E}. As a set, the spectrum {\mathrm{Spec}(R)} of a commutative ring {R} is defined as the set of prime ideals of {R}. Generally speaking, the map {R \mapsto \mathrm{Spec}(R)} that maps a commutative ring to its spectrum tends to act like an inverse of the operation that maps a space {X} to a ring of functions on that space. For instance, if one considers the commutative ring {{\bf C}[z, z^{-1}]} of regular functions on {M = {\bf C}^\times}, then each point {z_0} in {M} gives rise to the prime ideal {\{ f \in {\bf C}[z, z^{-1}]: f(z_0)=0\}}, and one can check that these are the only such prime ideals (other than the zero ideal {(0)}), giving an almost one-to-one correspondence between {\mathrm{Spec}( {\bf C}[z,z^{-1}] )} and {M}. (The zero ideal corresponds instead to the generic point of {M}.)

Of course, the spectrum of a field such as {E} is just a point, as the zero ideal {(0)} is the only prime ideal. Naively, it would then seem that there is not enough space inside such a point to support a rich enough structure of paths to recover the Galois theory of this field. In modern algebraic geometry, one addresses this issue by considering not just the set-theoretic elements of {E}, but more general “base points” {p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)} that map from some other (affine) scheme {\mathrm{Spec}(b)} to {\mathrm{Spec}(E)} (one could also consider non-affine base points of course). One has to rework many of the fundamentals of the subject to accommodate this “relative point of view“, for instance replacing the usual notion of topology with an étale topology, but once one does so one obtains a very satisfactory theory.

As an exercise, I set myself the task of trying to interpret Galois theory as an analogue of covering space theory in a more classical fashion, without explicit reference to more modern concepts such as schemes, spectra, or étale topology. After some experimentation, I found a reasonably satisfactory way to do so as follows. The space {M_E} that one associates with {E} in this classical perspective is not the single point {\mathrm{Spec}(E)}, but instead the much larger space consisting of ring homomorphisms {p: E \rightarrow b} from {E} to arbitrary integral domains {b}; informally, {M_E} consists of all the “models” or “representations” of {E} (in the spirit of this previous blog post). (There is a technical set-theoretic issue here because the class of integral domains {R} is a proper class, so that {M_E} will also be a proper class; I will completely ignore such technicalities in this post.) We view each such homomorphism {p: E \rightarrow b} as a single point in {M_E}. The analogous notion of a path from one point {p: E \rightarrow b} to another {p': E \rightarrow b'} is then a homomorphism {\gamma: b \rightarrow b'} of integral domains, such that {p'} is the composition of {p} with {\gamma}. Note that every prime ideal {I} in the spectrum {\mathrm{Spec}(R)} of a commutative ring {R} gives rise to a point {p_I} in the space {M_R} defined here, namely the quotient map {p_I: R \rightarrow R/I} to the ring {R/I}, which is an integral domain because {I} is prime. So one can think of {\mathrm{Spec}(R)} as being a distinguished subset of {M_R}; alternatively, one can think of {M_R} as a sort of “penumbra” surrounding {\mathrm{Spec}(R)}. In particular, when {E} is a field, {\mathrm{Spec}(E) = \{(0)\}} defines a special point {p_R} in {M_R}, namely the identity homomorphism {p_R: R \rightarrow R}.

Below the fold I would like to record this interpretation of Galois theory, by first revisiting the theory of covering spaces using paths as the basic building block, and then adapting that theory to the theory of field extensions using the spaces indicated above. This is not too far from the usual scheme-theoretic way of phrasing the connection between the two topics (basically I have replaced étale-type points {p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)} with more classical points {p: E \rightarrow b}), but I had not seen it explicitly articulated before, so I am recording it here for my own benefit and for any other readers who may be interested.

Read the rest of this entry »

As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic.  The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item).  So it will likely not change much going forward.

I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.).  But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate.  I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so.   The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently.  [Update: it is now on GitHub.]

[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text.  For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not.  This is presumably not an example of good programming practice, though it would be relatively easy to fix.  More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text.  It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]

[Update, Aug 23: links moved to GitHub version.]

 

 

Every four years at the International Congress of Mathematicians (ICM), the Fields Medal laureates are announced.     Today, at the 2018 ICM in Rio de Janeiro, it was announced that the Fields Medal was awarded to Caucher Birkar, Alessio Figalli, Peter Scholze, and Akshay Venkatesh.

After the two previous congresses in 2010 and 2014, I wrote blog posts describing some of the work of each of the winners.  This time, though, I happened to be a member of the Fields Medal selection committee, and as such had access to a large number of confidential letters and discussions about the candidates with the other committee members; in order to have the opinions and discussion as candid as possible, it was explicitly understood that these communications would not be publicly disclosed.  Because of this, I will unfortunately not be able to express much of a comment or opinion on the candidates or the process as an individual (as opposed to a joint statement of the committee).  I can refer you instead to the formal citations of the laureates (which, as a committee member, I was involved in crafting, and then signing off on), or the profiles of the laureates by Quanta magazine; see also the short biographical videos of the laureates by the Simons Foundation that accompanied the formal announcements of the winners. I am sure, though, that there will be plenty of other mathematicians who will be able to present the work of each of the medalists (for instance, there was a laudatio given at the ICM for each of the winners, which should eventually be made available at this link).

I know that there is a substantial amount of interest in finding out more about the inner workings of the Fields Medal selection process.  For the reasons stated above, I as an individual will unfortunately be unable to answer any questions about this process (e.g., I cannot reveal any information about other nominees, or of any comparisons between any two candidates or nominees).  I think I can safely express the following two personal opinions though.  Firstly, while I have served on many prize committees in the past, the process for the Fields Medal committee was by far the most thorough and deliberate of any I have been part of, and I for one learned an astonishing amount about the mathematical work of all of the shortlisted nominees, which was an absolutely essential component of the deliberations, in particular giving the discussions a context which would have been very difficult to obtain for an individual mathematician not in possession of all the confidential letters, presentations, and other information available to the committee (in particular, some of my preconceived impressions about the nominees going into the process had to be corrected in light of this more complete information).  Secondly, I believe the four medalists are all extremely deserving recipients of the prize, and I fully stand by the decision of the committee to award the Fields medals this year to these four.

I’ll leave the comments to this post open for anyone who wishes to discuss the work of the medalists.  But, for the reasons above, I will not participate in the discussion myself.

[Edit, Aug 1: looks like the ICM site is (barely) up and running now, so links have been added.  At this time of writing, there does not seem to be an online announcement of the composition of the committee, but this should appear in due course. -T.]

[Edit, Aug 9: the composition of the Fields Medal Committee for 2018 (which included myself) can be found here. -T.]

 

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown {x} using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “({A} AND {B}) OR (NOT {C})”, where {A,B,C} are some formulas). A typical such rule is Modus Ponens: if the sentence {A} is known to be true, and the implication “{A} IMPLIES {B}” is also known to be true, then one can deduce that {B} is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption {A}, one is able to derive the statement {B}, then one can conclude that the implication “{A} IMPLIES {B}” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Let {(X,T,\mu)} be a measure-preserving system – a probability space {(X,\mu)} equipped with a measure-preserving translation {T} (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points {x,y} in this space as being “close” if {y = T^n x} for some {n} that is not too large; this allows one to distinguish between “local” structure at a point {x} (in which one only looks at nearby points {T^n x} for moderately large {n}) and “global” structure (in which one looks at the entire space {X}). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function {f: X \rightarrow {\bf R}} which is locally essentially constant in the sense that {f(Tx) = f(x)} for {\mu}-almost every {x}, is necessarily globally essentially constant in the sense that there is a constant {c} such that {f(x) = c} for {\mu}-almost every {x}. A basic consequence of ergodicity is the mean ergodic theorem: if {f \in L^2(X,\mu)}, then the averages {x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)} converge in {L^2} norm to the mean {\int_X f\ d\mu}. (The mean ergodic theorem also applies to other {L^p} spaces with {1 < p < \infty}, though it is usually proven first in the Hilbert space {L^2}.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that {\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)} converges to {\mu(E)^2} for any measurable set {E}.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let {(X,T,\mu)} be an ergodic measure-preserving system, and let {f: X \rightarrow {\bf R}} be measurable. Suppose that

\displaystyle \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)

 

for some {0 \leq \delta \leq 1}. Then there exists a constant {c} such that {f(x)=c} for {x} in a set of measure at least {\delta}.

Informally: if {f} is locally constant on pairs {x, T^n x} at least {\delta} of the time, then {f} is globally constant at least {\delta} of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take {f} to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing {f} with (say) the arctangent function, we may assume without loss of generality that {f} is bounded. Let {k>0}, and partition {X} as {\bigcup_{m \in {\bf Z}} E_{m,k}}, where {E_{m,k}} is the level set

\displaystyle E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.

For each {k}, only finitely many of the {E_{m,k}} are non-empty. By (1), one has

\displaystyle \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.

Using the ergodic theorem, we conclude that

\displaystyle \sum_m \mu( E_{m,k} )^2 \geq \delta.

On the other hand, {\sum_m \mu(E_{m,k}) = 1}. Thus there exists {m_k} such that {\mu(E_{m_k,k}) \geq \delta}, thus

\displaystyle \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where {m_k 2^{-k}} converges to a limit {c}, then we have

\displaystyle \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta

for infinitely many {k}, and hence

\displaystyle \mu( \{ x \in X: f(x) = c \}) \geq \delta.

The claim follows. \Box

Archives