We now begin the study of (smooth) solutions $t \mapsto (M(t),g(t))$ to the Ricci flow equation

$\frac{d}{dt} g_{\alpha \beta} = - 2 \hbox{Ric}_{\alpha \beta}$, (1)

particularly for compact manifolds in three dimensions. Our first basic tool will be the maximum principle for parabolic equations, which we will use to bound (sub-)solutions to nonlinear parabolic PDE by (super-)solutions, and vice versa. Because the various curvatures $\hbox{Riem}_{\alpha \beta \gamma}^\delta$, $\hbox{Ric}_{\alpha \beta}$, R of a manifold undergoing Ricci flow do indeed obey nonlinear parabolic PDE (see equations (31) from Lecture 1), we will be able to obtain some important lower bounds on curvature, and in particular establishes that the curvature is either bounded, or else that the positive components of the curvature dominate the negative components. This latter phenomenon, known as the Hamilton-Ivey pinching phenomenon, is particularly important when studying singularities of Ricci flow, as it means that the geometry of such singularities is almost completely dominated by regions of non-negative (and often quite high) curvature.

– The maximum principle –

In freshman calculus, one learns that if a smooth function $u: [a,b] \to {\Bbb R}$ has a local minimum at an interior point $x_0$, then the first derivative $u'(x_0)$ vanishes and the second derivative $u''(x_0)$ is non-negative. This implies a higher-dimensional version: if U is an open domain in ${\Bbb R}^d$ and $u: U \to {\Bbb R}$ has a local minimum at some $x_0 \in U$, then $\nabla u(x_0) = 0$ and $\Delta u(x_0) \geq 0$. Geometrically, the Laplacian $\Delta u(x_0)$ measures the extent to which u at $x_0$ dips below the average value of u near $x_0$, which explains why the Laplacian is non-negative at local minima.

The same phenomenon occurs for Riemannian manifolds.

Lemma 1. Let $(M,g)$ be a d-dimensional Riemannian manifold, and let $u: M \to {\Bbb R}$ be a $C^2$ function that has a local minimum at a point $x_0 \in M$. Then $\nabla_\alpha u(x_0) = 0$ and $\Delta u(x_0) \geq 0$.

Proof. The vanishing $\nabla_\alpha u(x_0) = 0$ of the first derivative is clear, so we turn to the second derivative estimate. We let $e_1,\ldots,e_d$ be a (local) orthonormal frame of M. Then (by the Leibniz rule)

$\Delta u = e_a^\alpha e_a^\beta \nabla_\alpha \nabla_\beta u = \nabla_{e_a} \nabla_{e_a} u - \nabla_{\nabla_{e_a} e_a} u$. (2)

Since u has vanishing first derivative at $x_0$, we conclude that

$\Delta u(x_0) = \nabla_{e_a} \nabla_{e_a} u(x_0)$. (3)

But as u has a local minimum at $x_0$, it also has a local minimum on the geodesic through $x_0$ with velocity $e_a$. From one-dimensional calculus we conclude that $\nabla_{e_a} \nabla_{e_a} u(x_0)$ is non-negative for each a, and the claim follows. $\Box$

For applications to nonlinear parabolic PDE, we need a time-dependent version of this fact, in which the function u and the metric g also vary with time. It is also convenient to consider work not with one function u, but with a pair u, v, and to consider relative local minima of u with respect to v (i.e. local minima of u-v).

Lemma 2. (Dichotomy) Let $t \mapsto (M,g(t))$ be a smooth flow of compact Riemannian manifolds on a time interval ${}[0,T]$. Let $u, v: [0,T] \times M \to {\Bbb R}$ be $C^2$ functions such that $u(0,x) \geq v(0,x)$ for all $x \in M$. Let $A \in {\Bbb R}$. Then exactly one of the following is true:

1. $u(t,x) \geq v(t,x)$ for all $(t,x) \in [0,T] \times M$.
2. There exists $(t,x) \in (0,T] \times M$ such that
$u(t,x) < v(t,x)$
$\nabla_\alpha u(t,x) = \nabla_\alpha v(t,x)$
$\Delta_{g(t)} u(t,x) \geq \Delta_{g(t)} v(t,x)$ (1′)
$\frac{d}{dt} u(t,x) \leq \frac{d}{dt} v(t,x) - A (v(t,x)-u(t,x))$

where $\Delta_{g(t)}$ is the Laplacian with respect to the metric $g(t)$.

Proof. By replacing u, v with u-v,0 respectively we may assume that v=0. If we then replace u(t,x) by $e^{At} u(t,x)$ we may also assume that A=0.

Clearly 1 and 2 cannot both hold. If 1 fails, then there exists $\varepsilon > 0$ such that $u(t,x) \leq -\varepsilon$ for some $(t,x) \in [0,T] \times M$. Let t be the first time for which this occurs, and let $x \in M$ be a point such that $u(t,x) = -\varepsilon$. Then $t > 0$. Also x is a local minimum of u(t) and thus $\nabla_\alpha u(t,x) = 0$ and $\Delta_{g(t)} u(t,x) \geq 0$ by Lemma 1. Also, since $u(t',x) > -\varepsilon$ for all $t' < t$ we have $\frac{d}{dt} u(t,x) \leq 0$. The claim follows. $\Box$

This gives us our first version of the parabolic maximum principle.

Corollary 1. (Supersolutions dominate subsolutions) Let the assumptions be as in Lemma 2. Suppose also that we have the supersolution property

$\frac{d}{dt} u(t,x) \geq \Delta_{g(t)} u(t,x) + \nabla_{X(t)} u(t,x) + F(t, u(t,x))$ (2′)

and the subsolution property

$\frac{d}{dt} v(t,x) \leq \Delta_{g(t)} v(t,x) + \nabla_{X(t)} v(t,x) + F(t, v(t,x))$ (3′)

for all $(t,x) \in [0,T] \times M$ where for each time t, $X(t)$ is a vector field, and $F(t): {\Bbb R} \to {\Bbb R}$ is a Lipschitz function of constant less than A. Then $u(t,x) \geq v(t,x)$ for all $0 \leq t \leq T$.

Proof. If we subtract (2′) from (3′) and use the Lipschitz nature of F we obtain

$\frac{d}{dt} (u-v)(t,x) \geq \Delta_{g(t)} (u-v)(t,x) + \nabla_{X(t)} (u-v)(t,x) - A' |u(t,x)-v(t,x)|$ (4)

for some $A' < A$. But this is inconsistent with the set of equations (1′). The claim then follows immediately from Lemma 2. $\Box$

In our applications, the subsolution $v(t,x) = v(t)$ will in fact be independent of x, and so is really an ODE subsolution rather than a PDE subsolution:

$\frac{d}{dt} v(t) \leq F(t,v(t))$ (5)

Thus the parabolic maximum principle allows us to lower bound PDE supersolutions by ODE subsolutions, as long as we have a bound at time zero.

The above maximum principle is already very useful for scalar solutions (or supersolutions) $u: [0,T] \times M \to {\Bbb R}$ to scalar nonlinear parabolic PDE, but we will in fact need a more general version of this principle for vector-valued solutions $u: [0,T] \mapsto \Gamma( V )$ to nonlinear parabolic PDE, where V is a vector bundle over M, equipped with some connection $\nabla$. (In practice, V will be derived from the tangent bundle, and $\nabla$ will be derived from the Levi-Civita connection.)

We will need some more notation. Let us say that a subset K of a tensor bundle V is fibrewise convex if the fiber $K_x := K \cap V_x$ over each point $x \in M$ is a convex subset of the vector space $V_x$. We say that a subset K of a vector bundle V is parallel to the connection $\nabla$ if for any vector field $X$ on M, the induced vector field $\nabla_X$ preserves K (i.e. K is preserved by parallel transport).

To avoid some technical issues we shall refer vaguely to terms such as “tangent vector”, “inward-pointing vector”, and “outward-pointing vector” to a convex body at a boundary point. These terms can be made more precise, but an intuitive understanding of these concepts will have to suffice for now. [I may return to clean up these issues later, when I have a bit more time.]

We have a tensor variant of Lemma 1:

Lemma 3. Let $(M,g)$ be a d-dimensional Riemannian manifold, let V be a vector bundle over M with a connection $\nabla$, and let K be a closed, fibrewise convex subset of V which is parallel with respect to the connection. Let $u \in \Gamma(V)$ be a section such that $u(x) \in \partial K_x$ at some point $x \in M$, and $u(y) \in K_y$ for all y in a neighbourhood of x (thus u in some sense “attains a local maximum” at x with respect to K). Then every directional derivative $\nabla_X u(x) \in T_x M$ u at x is a tangent vector to $K_x$ at u(x), and the Laplacian $\nabla^\alpha \nabla_\alpha u(x) \in T_x M$ is an inward or tangential pointing vector to $K_x$ at $u(x)$ (i.e. it lives in the closed convex cone of $K_x - u(x)$). Here the space $T^* M \times V$ that $\nabla u$ is a section of is equipped with the direct sum of the Levi-Civita connection and the connection on V; by abuse of notation, we refer to all of these connections as $\nabla$.

Note that Lemma 1 corresponds to the special case when $V = M \times {\Bbb R}$ and $K = M \times [a,+\infty)$ for some a.

Proof. We begin with the claim concerning the first derivatives $\nabla_X u(x)$. One can restrict attention from M to (a local piece of) the one-dimensional geodesic through x with velocity $X(x)$, thus essentially reducing matters to the case d=1. Any one-dimensional connection can be locally trivialised (this is essentially the Picard existence theorem for ODE) and so we may take M to be a small interval $(-\varepsilon,\varepsilon)$ (with x now being identified with 0), take V to be the trivial bundle $M \times V_0$, and take $\nabla$ to be the trivial connection. The set K can then be identified with $M \times K_0$, and u can be viewed as a smooth function from $(-\varepsilon,\varepsilon)$ to $K_0$ that attains the boundary of $K_0$ at 0. It is then clear that the first derivative of u at 0 is tangent to $K_0$ at u(0).

Now we turn to the second derivatives. As in the proof of Lemma 1, we introduce an orthonormal frame $e_a$ and express the Laplacian in terms of this frame via the Leibniz rule as in (2). The first derivative terms are already tangential, so it suffices by convexity to show that $\nabla_{e_a} \nabla_{e_a} u(x)$ is tangential or inward pointing for each $a=1,\ldots,d$ separately. But for fixed a, we can reduce to the one-dimensional setting considered previously by restricting to the geodesic through x with velocity $e_a(x)$ as before, so that once again $u$ is now a smooth function from $(-\varepsilon, \varepsilon)$ to $K_0$ which attains a boundary value of $K_0$ at 0. In particular, if $\{ v \in V_0: \lambda(v) \leq c \}$ is a supporting halfspace for $K_0$ at $u(0)$ for some linear functional $\lambda: V_0 \to {\Bbb R}$, then the scalar function $x \mapsto \lambda(u(x)) \cdot w$ attains a maximum at 0 and thus has non-negative second derivative. The claim follows. $\Box$

As a consequence we can establish a rather general and powerful tensor maximum principle of Hamilton:

Proposition 1 (Hamilton’s maximum principle) Let $t \mapsto (M,g(t))$ be a smooth flow of compact Riemannian manifolds on a time interval ${}[0,T]$. Let V be a vector bundle over M with connection $\nabla$, and let $u: [0,T] \mapsto \Gamma(V)$ be a smoothly varying family of sections that obeys the nonlinear PDE

$\frac{d}{dt} u(t,x) = \nabla^\alpha \nabla_\alpha u + F(t,x,u)$ (6)

where for each $(t,x) \in [0,T] \times M$, $F(t,x): V_x \to V_x$ is a locally Lipschitz function (using the metric on $V_x$ induced by g) which is continuous in t,x with uniformly bounded Lipschitz constant in the 1-neighbourhood (say) of $K_x$. For each time $t \in [0,T]$, let $K(t) \subset V$ a closed fibrewise convex parallel set varying continuously in t. We assume that K is preserved by F in the sense that for each $(t,x) \in [0,T) \times M$ and each boundary point $v \in \partial K_x(t) \subset V_x$, the spacetime vector $(1,F(t,x,v)) \in {\Bbb R} \times V_x$ is an inward or tangential vector to the spacetime body $K_x := \{ (t',v'): t' \in [0,T], v' \in K_x(t') \}$ at the boundary point (t,v). Suppose also that $u(0,x) \in K_x(0)$ for all $x \in M$. Then $u(t,x) \in K_x(t)$ for all $(t,x) \in [0,T] \times M$.

Proof. By continuity in time, it suffices to prove the claim in ${}[0,T) \times M$ rather than ${}[0,T] \times M$.

Let us first give an “almost proof” of the claim, and then explain how to modify this to an actual proof. Suppose the claim failed; then $u(t,x)$ must exit $K_x(t)$ for some $(t,x) \in [0,T) \times M$. If we let t be the first time at which this occurs, then $t > 0$ and there exists $x \in M$ such that $u(t,x) \in \partial K_x(t)$, and $u(t,y) \in K_y(t)$ for all other $y \in M$. By Lemma 3, this implies that $\nabla^\alpha \nabla_\alpha u(t,x)$ is a tangential or inward pointing vector to $K_x(t)$ at $u(t,x)$. Also, since $u(t',x) \in K_x(t')$ for all $t' < t$, we see that $(1,\frac{d}{dt} u(t,x))$ is a tangential or outward pointing vector of $K_x$ at $(t, u(t,x))$. From (6) we conclude that $(1, F(u,t,x))$ is also a tangential or outward pointing vector. This almost contradicts the hypothesis, except that it is still possible that $(1,F(u,t,x))$ is tangential.

To modify this, what we do is that we enlarge the set K slightly. Let A be a large number (essentially this is the bound on the local Lipschitz constant on F) $\varepsilon > 0$ be small. For each $(t,x) \in [0,T] \times M$, let $K^{(\varepsilon,A)}_x(t)$ be the $\varepsilon e^{At}$-neighbourhood of $K_x(t)$ in $V_x$. If $\varepsilon$ is small enough compared to A, this new set $K^{(\varepsilon,A)}_x(t)$ lives in the 1-neighbourhood of the old set $K_x(t)$. If A is sufficiently large compared to the local Lipschitz constant of F, then (by the growth of the exponential function $e^{At}$, and the hypotheses on F) the vector $(1, F(t,x,u))$ will now always be inward pointing, and not just tangential or inward pointing, to the spacetime body $K^{(\varepsilon,A)}$ whenever $(t,x,u)$ is at a boundary point of this body. This allows us to use the previous arguments with $K_x(t)$ replaced by $K^{(\varepsilon,A)}$ throughout, to show that $u(t,x)$ cannot escape $K^{(\varepsilon,A)}$ if A is large enough. Sending $\varepsilon \to 0$ we obtain the claim. $\Box$

Remark 1. One can easily also add a drift term $\nabla_{X(t)} u(t,x)$ to (6), as in Corollary 1, though we will not need to do so here. With some more effort, one could start defining notions of “tensor supersolutions” and “tensor subsolutions”, which take values as fibrewise convex sets rather than sections, to try to obtain a true tensor generalisation of Corollary 1, but this becomes very technical and we will not need to use such generalisations here. $\diamond$

Remark 2. The above maximum principles are known as weak maximum principles: starting from an assumption of non-negativity (or similar closed bounds) at time zero, they ensure non-negativity (or closed bounds) at later times. Later on we shall also need strong maximum principles, in which one additionally assumes positivity at some initial point at time zero, and that the manifold is connected, and concludes positivity everywhere at later times. (This can be viewed as a substantial generalisation of the fact that the heat kernel on a connected manifold is everywhere strictly positive, or more informally that Brownian motion has a positive probability of hitting any given non-empty open region of the manifold.) Actually, it is the contrapositive of these strong maximum principles which will be of use to us, as they allow one to use vanishing of some key curvature at one point in spacetime to deduce vanishing of curvatures at many other points in spacetime also, which in particular will lead to some very important splitting theorems that will arise in the arguments later. $\diamond$

– Applications of the maximum principle –

We now apply the maximum principle (in both its scalar and tensor forms) to solutions of the Ricci flow (1) on some time interval ${}[0,T]$. The simplest application of these principles arises from exploiting the equation

$\frac{d}{dt} R = \Delta R + 2 |\hbox{Ric}|^2$ (7)

for the scalar curvature (see (31) from Lecture 1).

Remark 3. Intuitively, the two components on the RHS of (7) can be interpreted as follows. The dissipative term $\Delta R$ reflects the fact that a point in M with much higher (resp. lower) curvature than its neighbours (or more precisely, than the average curvature of its neighbours) will tend to revert to the mean, because the Ricci flow (1) will strongly contract the metric at regions of particularly high curvature (resp. strongly expand the metric at regions of particularly low curvature); one may visualise Ricci flow on a very pointed cigar, or a highly curved saddle, to try to see what is going on. The nonlinear term $2 |\hbox{Ric}|^2$ reflects the fact that if one is in a positive curvature region (e.g. a region behaving like a sphere), then the metric will contract under Ricci flow, thus increasing the curvature to be even more positive; conversely, if one is in a negative curvature region (such as a region behaving like a saddle), then the metric will expand, thus weakening the negativity of curvature. Note that in both cases the curvature is trending upwards, which is consistent with the non-negativity of $2 |\hbox{Ric}|^2$. $\diamond$

Remark 4. Another source of intuition can come from Einstein metrics, which are those metrics with the property that $\hbox{Ric}_{\alpha \beta} = k g_{\alpha \beta}$ for some constant k; in particular we have constant scalar curvature $R = kd$, where d is the dimension. It is not hard to show (using the equations for dilation, see (22) from Lecture 1) that the Ricci flow for such metrics is given explicitly by the formulae

$g_{\alpha \beta}(t) = (1 - 2kt) g_{\alpha \beta}(0)$

$g^{\alpha \beta}(t) = \frac{1}{1-2kt} g^{\alpha \beta}(0)$

$\hbox{Ric}_{\alpha \beta}(t) = \hbox{Ric}_{\alpha\beta}(0)$ (8)

$R(t) = \frac{1}{1-2kt} R(0) = \frac{1}{1-2kt} kd$.

Of course, this is completely consistent with (7). Note that if k is positive (which occurs for instance in manifolds of constant positive sectional curvature, such as the sphere and its quotients) then a singularity develops at time 1/2k, in which the diameter of the manifold has shrunk to zero and the curvature has become infinitely positive. In contrast, if k is negative (which occurs for manifolds of constant negative sectional curvature, such as hyperbolic space) the metric expands, becomes increasingly flat over time and does not develop singularities. $\diamond$

Since R is the trace of the self-adjoint tensor $\hbox{Ric}_{\alpha \beta}$, one has the decomposition

$|\hbox{Ric}|^2 = \frac{1}{d} R^2 + |\hbox{Ric}^0|^2$, (9)

where $\hbox{Ric}^0_{\alpha \beta} := \hbox{Ric}_{\alpha \beta} - \frac{1}{d} R g_{\alpha \beta}$ is the traceless component of the Ricci tensor. We conclude that R is a supersolution to a nonlinear parabolic PDE:

$\frac{d}{dt} R \geq \Delta R + \frac{2}{d} R^2$. (10)

For each time t, let $R_{\hbox{min}}(t)$ denote the minimum value of the scalar curvature. We thus conclude

Proposition 2 (Lower bounds on scalar curvature). Let $(M,g(t))$ be a Ricci flow on a compact d-dimensional manifold on some time interval ${}[0,T]$. Then for every $t \in [0,T]$, we have

$R_{\hbox{min}}(t) \geq \frac{R_{\hbox{min}}(0)}{ 1 - \frac{2t}{d} R_{\hbox{min}}(0)}$. (11)

In particular, if $R \geq c$ at time zero for some $c \in {\Bbb R}$, then $R \geq c$ for all subsequent times for which the flow exists; and if furthermore c is positive, then the flow cannot be extended beyond time $\frac{d}{2c}$.

From Remark 4 we see that for Einstein metrics, (11) is obeyed with equality, so that (11) can be quite sharp.

Exercise 1. Use Corollary 1 to deduce Proposition 2. $\diamond$

Proposition 2 asserts that while the scalar curvature can become extremely large and positive as time increases, it cannot become extremely large and negative. One quick corollary of this is

Corollary 2. (Upper bound on volume growth) Let $(M,g(t))$ be a Ricci flow on a compact d-dimensional manifold on some time interval ${}[0,T]$, such that we have the pointwise lower bound $R \geq c$ at time zero. Then we have

$\hbox{Vol}(M,g(t)) \leq e^{-2ct} \hbox{Vol}(M,g(0))$ (12)

for all $0 \leq t \leq T$.

Proof. From the variation formula for the volume measure $d\mu$ (see (33) from Lecture 1) we have

$\frac{d}{dt} \hbox{Vol}(M,g(t)) = -\int_M 2 R(t,x)\ d\mu_{g(t)}(x)$ (13).

By Proposition 2, R is bounded from below by c, leading to the inequality $\frac{d}{dt} \hbox{Vol}(M,g(t)) \leq - 2c \hbox{Vol}(M,g(t))$. The claim now follows from Gronwall’s inequality. $\Box$

Exercise 2. Strengthen the bound (12) to

$\hbox{Vol}(M,g(t)) \leq (1-\frac{2ct}{d})^d \hbox{Vol}(M,g(0))$ (13′)

and show that this inequality is sharp for Einstein metrics. Note that this improved bound demonstrates rather visibly that when $c > 0$, some singularity must develop at or before time $d/2c$. $\diamond$

We now turn to applications of the tensor maximum principle. It is natural to apply this principle to the equation for the Riemann tensor,

$\frac{d}{dt} \hbox{Riem}_{\alpha \beta} = \Delta \hbox{Riem}_{\alpha \beta} + {\mathcal O}(\hbox{Riem}^2)$ (14)

(see equation (31) from Lecture 1). In principle, this expression is of the required form (6), but the nonlinearity ${\mathcal O}(\hbox{Riem}^2)$, while explicit, is rather messy to work with. It is convenient to simplify (14) further by viewing things in a certain evolving orthonormal frame. For ease of notation, let us assume that the compact manifold M=M(0) is parallelisable, so that it enjoys a global orthonormal frame $e_1(0), \ldots, e_d(0) \in \Gamma(TM(0))$ for the metric g(0). (To handle the general case, one could work locally, or pass to a covering space, and/or replace the trivial bundle $M \times {\Bbb R}^d$ appearing below by a non-trivial bundle and eliminate explicit mention of the orthonormal frame altogether; we leave the details to the interested reader. In three dimensions, every orientable manifold is parallelisable, so it is even easier to reduce to the parallelisable case in that setting.) This orthonormal frame induces a linear identification between the tangent bundle $TM(0)$ and the trivial bundle $M \times {\Bbb R}^d$, with $e_1(0), \ldots, e_d(0)$ being identified with the standard basis sections of the trivial bundle. The metric $g_{\alpha \beta}(0)$ is then identified with the Euclidean section $\eta_{\alpha \beta} := e^a_\alpha e^a_\beta \in \hbox{Sym}^2(M \times {\Bbb R}^d$) (which is giving the fibres of $M \times {\Bbb R}^d$ a Euclidean structure). Note that this is NOT directly a metric on M, since $M \times {\Bbb R}^d$ is distinct from the tangent bundle TM, but the orthonormal frame provides an identification between the section $\eta$ and the metric g(0).

Now we start the Ricci flow, creating a family of new metrics $g(t)$ for $t \in [0,T]$. There is no reason why the frame $e_1(0),\ldots, e_d(0)$ should remain orthonormal in these new metrics. However, if we evolve the frame by the equation

$\frac{d}{dt} e_a^\alpha := \hbox{Ric}^{\alpha \beta} (e_a)_\beta$ (15)

(which, by Picard’s existence theorem for ODE, exists for all $t \in [0,T]$) then an easy computation using (1) (and Gronwall’s inequality) reveals that $e_1(t), \ldots, e_d(t)$ remain orthonormal with respect to g(t).

Exercise 3. Prove this. (Hint: differentiate $g_{\alpha \beta} e_a^\alpha e_b^\alpha$ in time and use (1), (15).) $\diamond$

The frame $e_1(t), \ldots, e_d(t)$ can be used to identify the tangent manifold $TM(t)$ at time t with the trivial bundle $M \times {\Bbb R}^d$, which identifies g(t) with $\eta$. In particular, the Levi-Civita connection $\nabla_{g(t)}$ can be identified with a connection $\nabla(t)$ on $M \times {\Bbb R}^d$ to which $\eta$ is parallel (thus parallel transport by $\nabla(t)$ proceeds by rotations). Similarly, we can identify the Riemann tensor $\hbox{Riem}(t) \in \hbox{Hom}( \bigwedge^2 T^* M, \bigwedge^2 T^* M )$ at that time with a tensor ${\mathcal T}(t) \in M \times \hbox{Hom}( \bigwedge^2 {\Bbb R}^d, \bigwedge^2 {\Bbb R}^d )$. Using the natural identification between $\bigwedge^2 {\Bbb R}^d$ and the Lie algebra $\mathfrak{so}(d)$, one can thus view ${\mathcal T}(t)$ as a section of $M \times \hbox{Hom}( \mathfrak{so}(d), \mathfrak{so}(d) )$. Actually, since the Riemann tensor is self-adjoint, ${\mathcal T}(t,x): \mathfrak{so}(d) \to \mathfrak{so}(d)$ is self-adjoint also (using the Killing form on $\mathfrak{so}(d)$).

After some significant algebraic computation, the equation (14) can be revealed to take the form

$\frac{d}{dt} {\mathcal T} = \nabla^\gamma \nabla_\gamma {\mathcal T} + {\mathcal T}^2 + {\mathcal T}^\#$ (16)

where the connection $\nabla = \nabla(t)$ has been extended from $M \times {\Bbb R}^d$ to $M \times \hbox{Hom}( \mathfrak{so}(d), \mathfrak{so}(d) )$ in the usual manner, ${\mathcal T}^2$ is the usual square of ${\mathcal T}$ (viewed as a linear operator from $\mathfrak{so}(d)$ to itself), and ${\mathcal T}^\#$ is the Lie algebra square of ${\mathcal T}$, defined by the formula

$\langle {\mathcal T}^\# X, Y \rangle := \hbox{tr}( {\mathcal T} (\hbox{ad} X) {\mathcal T} (\hbox{ad} Y))$ (17)

for all $X, Y \in \mathfrak{so}(d)$ where $\hbox{ad} X: Y \to [X,Y]$ is the usual adjoint operator and $\langle X, Y\rangle = \hbox{tr}( \hbox{ad} X \hbox{ad} Y )$ is the Killing form. One easily verifies that if ${\mathcal T}$ is self-adjoint, then so are ${\mathcal T}^2$ and ${\mathcal T}^\#$. (Curiously, in four and higher dimensions the Bianchi identity that ${\mathcal T}$ will satisfy if it comes from the Riemann tensor is not preserved by either ${\mathcal T}^2$ or ${\mathcal T}^\#$, but it is preserved by their sum ${\mathcal T}^2 + {\mathcal T}^\#$.)

Exercise 4. Show that (17) implies (7). $\diamond$

If ${\mathcal T}$ is positive semi-definite (which is equivalent to the Riemann tensor being non-negative), then it is easy to see that ${\mathcal T}^2 + {\mathcal T}^\#$ are also. Since the space ${\mathcal P}$ of positive semi-definite self-adjoint elements of $\hbox{Hom}(\mathfrak{so}(d),\mathfrak{so}(d))$ forms a closed convex cone which is invariant under the action of SO(d) (and in particular, $M\times {\mathcal P}$ is parallel with respect to the connections $\nabla_{g(t)}$), one can then apply the tensor maximum principle to conclude

Proposition 3 (Non-negative Riemann curvature is preserved). Let $(M,g(t))$ be a Ricci flow on a compact d-dimensional manifold on some time interval ${}[0,T]$. Suppose that the Riemann curvature is everywhere non-negative at time zero. Then the Riemann curvature is everywhere non-negative for all times $t \in [0,T]$.

Remark 6. Strictly speaking, there is an issue because the nonlinearity ${\mathcal T} \mapsto {\mathcal T}^2 + {\mathcal T}^\#$ is only locally Lipschitz rather than globally Lipschitz. But as we are assuming that the manifold is compact and the metrics vary smoothly, ${\mathcal T}$ is already bounded, and so one can truncate the nonlinearity by brute force outside of these bounds to ensure global Lipschitz bounds. We shall take advantage of this trick again below without further comment. $\diamond$
Now we specialise to three dimensions, in which the situation simplifies substantially, because $\mathfrak{so}(3) \equiv\bigwedge^2 {\Bbb R}^3$ can be identified with ${\Bbb R}^3$ by Hodge duality. If the self-adjoint map ${\mathcal T}: {\Bbb R}^3 \to {\Bbb R}^3$ is diagonalised as $\hbox{diag}(\lambda,\mu,\nu)$ in some orthonormal frame, then we have ${\mathcal T}^2 = \hbox{diag}(\lambda^2,\mu^2,\nu^2)$ and ${\mathcal T}^\# = \hbox{diag}(\mu \nu, \lambda \nu, \lambda \mu)$. Also, if ${\mathcal T}$ was representing the Riemann tensor, then the Ricci curvature in the same frame can be computed to be $\hbox{diag}(\mu+\nu, \lambda+\nu, \lambda+\mu)$, and so the scalar curvature is $2(\lambda+\mu+\nu)$. [Aside: there may be some factors of 2 that are off here; I did not have time to recheck these calculations.]

Heuristically, the tensor maximum principle predicts that the evolution of the equation (16) should be somehow “controlled” by the evolution of the ODE

$\frac{d}{dt} (\lambda,\mu,\nu) = F( \lambda, \mu, \nu )$ (18)

where $F(\lambda,\mu,\nu) := ( \lambda^2+\mu \nu, \mu^2+\lambda \nu, \nu^2 + \lambda \mu)$. It seems difficult to formulate this heuristic rigorously in complete generality (the main problem being that the convexity requirements of the maximum principle ultimately translate to rather significant constraints on what types of properties of the eigenvalues $\lambda,\mu,\nu$ one can study with this principle). However, we can do so in two important special cases. We begin with the simpler one.

Proposition 4. (Non-negative Ricci curvature is preserved in three dimensions) Let $(M,g(t))$ be a Ricci flow on a compact 3-dimensional manifold on some time interval ${}[0,T]$. Suppose that the Ricci curvature is everywhere non-negative at time zero. Then the Ricci curvature is everywhere non-negative for all times $t \in [0,T]$.

Proof. By the previous discussion, having non-negative Ricci curvature is equivalent to having all sums of pairs $\lambda+\mu, \mu+\nu, \nu+\lambda$ of ${\mathcal T}$ non-negative. Equivalently, this is asserting that the partial traces $\hbox{tr}( {\mathcal T}|_V )$ of ${\mathcal T}$ on any two-dimensional subspace of V is non-negative. If we let $K = K(t) \subset M \times \hbox{Hom}({\Bbb R}^3, {\Bbb R}^3)$ denote all the pairs $(x, {\mathcal T})$ for which this is true, we see that K is closed, convex, and parallel with respect to the connections $\nabla(t)$, since parallel transport by these connections acts on $\hbox{Hom}({\Bbb R}^3, {\Bbb R}^3)$ by orthogonal conjugation. Elementary algebraic computation also reveals that if the triplet $(\lambda, \mu, \nu)$ has the property that the sum of any two elements is non-negative, then the same is true of $F(\lambda, \mu, \nu)$. From this we see that the hypotheses of Proposition 1 are satisfied, and the claim follows. $\Box$

Remark 7. This claim is special to three (and lower) dimensions; it fails for four and higher dimensions. Similarly, in three dimensions, since non-negative Riemann curvature is equivalent to non-negative sectional curvature, we see from Proposition 3 that the latter is also preserved by three-dimensional Ricci flow. However, this claim also fails in four and higher dimensions. [Aside: I may be slightly confused on these points; I will check on them later, once I have access to some literature.] $\diamond$

Results such as Proposition 3 and Proposition 4 are of course useful if one has an initial assumption of non-negative curvature. But for our applications, we need to understand what is going on for manifolds which may have combinations of both positive and negative curvature at various points and in various directions. The bound on scalar curvature given by Proposition 2 is helpful in this regard, but it only partially controls the situation (in terms of the eigenvalues $\lambda,\mu,\nu$, it offers a lower bound on $\lambda+\mu+\nu$, but not on $\lambda,\mu,\nu$ individually). It turns out that one cannot completely establish a unilateral lower bound on the individual curvatures $\lambda, \mu, \nu$, but one can at least show that if one of these curvatures is large and negative, then one of the others must be extremely large and positive, and so in regions of high curvature, the positive curvature components dominate. This important phenomenon for Ricci flow is known as Hamilton-Ivey pinching, and is formalised as follows:

Theorem 1 (Hamilton-Ivey pinching phenomenon) Let $(M,g(t))$ be a Ricci flow on a compact 3-dimensional manifold on some time interval ${}[0,T]$. Suppose that the least eigenvalue $\nu(t,x)$ of the Riemann curvature tensor is bounded below by -1 at times t=0 and all $x \in M$. Then, at all spacetime points $(t,x) \in [0,T] \times M$, we have the scalar curvature bound

$R \geq \frac{-6}{4t+1}$ (19)

and furthermore whenever one has negative curvature in the sense that $\nu(t,x) < 0$, then one also has the pinching bound

$R \geq 2|\nu| ( \log |\nu| + \log(1+t) - 3)$. (20)

Exercise 5. With the assumptions of Theorem 1, use (19) and (20) to establish the lower bound

$(1+t) \nu \geq - C \frac{100 + (1+t) R}{\log( 100 + (1+t) R )}$ (21)

for all $(t,x) \in [0,T] \times M$ and some absolute constant C (note that $100+(1+t) R > 1$, thanks to (19). Conclude in particular that the scalar curvature controls the Riemann and Ricci tensors in the sense that we have the pointwise bounds

$|\hbox{Ric}|_g, |\hbox{Riem}|_g \leq C(100 + (1+t) R)$ (22)

for another absolute constant C.

Proof. Since $R = 2(\lambda+\mu+\nu)$ and the least eigenvalue $\nu$ is at least -1 at time zero, we have $R \geq -6$ at time zero. The claim (19) then follows immediately from Proposition 2.

The proof of (20) requires more work. Starting with the tensor ${\mathcal T}$ and its eigenvalues $\lambda \geq \mu \geq \nu$, we define the trace $S := \lambda+\mu+\nu = \frac{1}{2} R$ and the quantity $X := \max( - \nu, 0 )$. We write $f_t(x) := x(\log x + \log(1+t)-3)$ and let $\Omega_t$ be the set of all pairs $(x,s)$ such that $s \geq \frac{-3}{1+t}$ and such that $s \geq f_t(x)$ if $x > \frac{1}{1+t}$. (For $x < \frac{1}{1+t}$, the only constraint we place on s is that $s \geq \frac{-3}{1+t}$. Elementary calculus shows that $\Omega_t$ is a convex set, and furthermore is left-monotone in the sense that if $(x,s) \in \Omega_t$ and $x' < x$, then $(x',s) \in \Omega_t$. Because trace is a linear functional and the least eigenvalue $\nu$ is a convex functional, it is not hard to then see that the set $K(t) := \{ (x,{\mathcal T}): (X,S) \in \Omega_t \} \subset M \times \hbox{Hom}({\Bbb R}^3, {\Bbb R}^3)$ is closed and fibrewise convex. Also, since parallel transport on the connections $\nabla(t)$ acts by orthogonal conjugation, K(t) is also parallel.

The initial conditions easily ensure that ${\mathcal T}$ lies in $K(0)$ at time zero (since $X \leq 1$ and $S \geq -3$ in this case). Similarly, the conclusion (20) follows easily from the claim that ${\mathcal T}$ lies in $K(t)$ at all later times t (note that in the case $X \leq \frac{1}{1+t}$, one can use the trivial bound $S \geq -3X$ to establish the claim, rather than by exploiting the inclusion ${\mathcal T} \in K(t)$). So to finish the proof, it suffices by Proposition 1 to show that K is preserved by the ODE (18). This can be accomplished by a (rather tedious) elementary calculation, the key point being that if $(\lambda,\mu,\nu)$ solve (18) with $\lambda \geq \mu \geq \nu$ and X, S are defined as before, then one has the inequality

$\frac{d}{dt} ( \frac{S}{X} - \log X ) \geq X$ (23)

whenever $X > 0$.

The set K(t) can be viewed as the region in which either $X \leq \frac{1}{1+t}$, or $X > \frac{1}{1+t}$ and $\frac{S}{X} - \log X \geq \log(1+t)-3$, and then (23) implies that this region is preserved by the ODE. $\Box$

Remark 8. One can informally see how (18) is forcing some sort of pinching towards positive curvature as follows. In order for pinching not to occur, one needs $\nu$ to be large and negative, and $\lambda$ to be of order $O(|\nu|)$ in magnitude. Given the lower bounds on the scalar curvature, this in fact forces $\lambda$ to be positive and comparable to $|\nu|$ in magnitude. Now if $\mu$ is also positive, then the equation $\frac{d}{dt} \nu = \nu^2 + \lambda \mu$ rapidly causes $\nu$ to be less negative, while the equation $\frac{d}{dt} \lambda = \lambda^2 + \mu \nu$ can cause $\lambda$ to decrease, but not as rapidly as $\nu$ is increasing, thus the geometry does not become more pinched. If instead $\mu$ is negative, then $\nu$ can become more negative, but now $\lambda$ will increase faster than $\nu$ is decreasing, thus increasing the pinching towards positive (consider e.g. the case when $\nu=\mu=-\lambda/2$). $\diamond$

Remark 9. There are further applications of the tensor maximum principle to Ricci flow. One notable one is Hamilton’s rounding theorem, which asserts that if the Ricci curvature of a compact 3-manifold is strictly positive at time zero, then not only does a singularity develop in finite time (by Proposition 2), but the geometry becomes increasingly round in the sense that the ratio between the largest and smallest eigenvalues of this curvature go to 1 as one approaches the singularity. In fact, the rescaled limit of the geometry here has constant positive sectional curvature and is thus either a sphere or a spherical space form. $\diamond$

[Update, Apr 7: Some corrections, including rewording of proof of Proposition 1.]