You are currently browsing the monthly archive for March 2008.

In the first lecture, we introduce flows $t \mapsto (M(t), g(t))$ on Riemannian manifolds $(M,g)$, which are recipes for describing smooth deformations of such manifolds over time, and derive the basic first variation formulae for how various structures on such manifolds (e.g. curvature, length, volume) change by such flows. (One can view these formulae as describing the relationship between two “infinitesimally close” Riemannian manifolds.) We then specialise to the case of Ricci flow (together with some close relatives of this flow, such as renormalised Ricci flow, or Ricci flow composed with a diffeomorphism flow). We also discuss the “de Turck trick” that modifies the Ricci flow into a nonlinear parabolic equation, for the purposes of establishing local existence and uniqueness of that flow.

Next week (starting on Wednesday, to be more precise), I will begin my class on Perelman’s proof of the Poincaré conjecture. As I only have ten weeks in which to give this proof, I will have to move rapidly through some of the more basic aspects of Riemannian geometry which will be needed throughout the course. In particular, in this preliminary lecture, I will quickly review the basic notions of infinitesimal (or microlocal) Riemannian geometry, and in particular defining the Riemann, Ricci, and scalar curvatures of a Riemannian manifold. (The more “global” aspects of Riemannian geometry, for instance concerning the relationship between distance, curvature, injectivity radius, and volume, will be discussed later in this course.) This is a review only, in particular omitting any leisurely discussion of examples or motivation for Riemannian geometry; it is impossible to compress this subject into a single lecture, and I will have to refer you to a textbook on the subject for a more complete treatment (I myself am using the text “Riemannian geometry” by my colleague here at UCLA, Peter Petersen).

One of my favourite unsolved problems in mathematics is the Kakeya conjecture in geometric measure theory. This conjecture is descended from the

Kakeya needle problem. (1917) What is the least area in the plane required to continuously rotate a needle of unit length and zero thickness around completely (i.e. by $360^\circ$)?

For instance, one can rotate a unit needle inside a unit disk, which has area $\pi/4$. By using a deltoid one requires only $\pi/8$ area.

In 1928, Besicovitch showed that that in fact one could rotate a unit needle using an arbitrarily small amount of positive area. This unintuitive fact was a corollary of two observations. The first, which is easy, is that one can translate a needle using arbitrarily small area, by sliding the needle along the direction it points in for a long distance (which costs zero area), turning it slightly (costing a small amount of area), sliding back, and then undoing the turn. The second fact, which is less obvious, can be phrased as follows. Define a Kakeya set in ${\Bbb R}^2$ to be any set which contains a unit line segment in each direction. (See this Java applet of mine, or the wikipedia page, for some pictures of such sets.)

Theorem. (Besicovitch, 1919) There exists Kakeya sets ${\Bbb R}^2$ of arbitrarily small area (or more precisely, Lebesgue measure).

In fact, one can construct such sets with zero Lebesgue measure. On the other hand, it was shown by Davies that even though these sets had zero area, they were still necessarily two-dimensional (in the sense of either Hausdorff dimension or Minkowski dimension). This led to an analogous conjecture in higher dimensions:

Kakeya conjecture. A Besicovitch set in ${\Bbb R}^n$ (i.e. a subset of ${\Bbb R}^n$ that contains a unit line segment in every direction) has Minkowski and Hausdorff dimension equal to n.

This conjecture remains open in dimensions three and higher (and gets more difficult as the dimension increases), although many partial results are known. For instance, when n=3, it is known that Besicovitch sets have Hausdorff dimension at least 5/2 and (upper) Minkowski dimension at least $5/2 + 10^{-10}$. See my Notices article for a general survey of this problem (and its connections with Fourier analysis, additive combinatorics, and PDE), my paper with Katz for a more technical survey, and Wolff’s survey for a systematic treatment of the field (up to about 1998 or so).

In 1999, Wolff proposed a simpler finite field analogue of the Kakeya conjecture as a model problem that avoided all the technical issues involving Minkowski and Hausdorff dimension. If $F^n$ is a vector space over a finite field F, define a Kakeya set to be a subset of $F^n$ which contains a line in every direction.

Finite field Kakeya conjecture. Let $E \subset F^n$ be a Kakeya set. Then E has cardinality at least $c_n |F|^n$, where $c_n > 0$ depends only on n.

This conjecture has had a significant influence in the subject, in particular inspiring work on the sum-product phenomenon in finite fields, which has since proven to have many applications in number theory and computer science. Modulo minor technicalities, the progress on the finite field Kakeya conjecture was, until very recently, essentially the same as that of the original “Euclidean” Kakeya conjecture.

Last week, the finite field Kakeya conjecture was proven using a beautifully simple argument by Zeev Dvir, using the polynomial method in algebraic extremal combinatorics. The proof is so short that I can present it in full here.

Over two years ago, Emmanuel Candés and I submitted the paper “The Dantzig selector: Statistical estimation when $p$ is much
larger than $n$
” to the Annals of Statistics. This paper, which appeared last year, proposed a new type of selector (which we called the Dantzig selector, due to its reliance on the linear programming methods to which George Dantzig, who had died as we were finishing our paper, had contributed so much to) for statistical estimation, in the case when the number $p$ of unknown parameters is much larger than the number $n$ of observations. More precisely, we considered the problem of obtaining a reasonable estimate $\beta^*$ for an unknown vector $\beta \in {\Bbb R}^p$ of parameters given a vector $y = X \beta + z \in {\Bbb R}^n$ of measurements, where $X$ is a known $n \times p$ predictor matrix and $z$ is a (Gaussian) noise error with some variance $\sigma^2$. We assumed that the predictor matrix X obeyed the restricted isometry property (RIP, also known as UUP), which roughly speaking asserts that $X\beta$ has norm comparable to $\beta$ whenever the vector $\beta$ is sparse. This RIP property is known to hold for various ensembles of random matrices of interest; see my earlier blog post on this topic.

Our selection algorithm, inspired by our previous work on compressed sensing, chooses the estimated parameters $\beta^*$ to have minimal $l^1$ norm amongst all vectors which are consistent with the data in the sense that the residual vector $r := y - X \beta^*$ obeys the condition

$\| X^* r \|_\infty \leq \lambda$, where $\lambda := C \sqrt{\log p} \sigma$ (1)

(one can check that such a condition is obeyed with high probability in the case that $\beta^* = \beta$, thus the true vector of parameters is feasible for this selection algorithm). This selector is similar, though not identical, to the more well-studied lasso selector in the literature, which minimises the $l^1$ norm of $\beta^*$ penalised by the $l^2$ norm of the residual.

A simple model case arises when n=p and X is the identity matrix, thus the observations are given by a simple additive noise model $y_i = \beta_i + z_i$. In this case, the Dantzig selector $\beta^*$ is given by the hard soft thresholding formula

$\beta^*_i = \max(|y_i| - \lambda, 0 ) \hbox{sgn}(y_i).$

The mean square error ${\Bbb E}( \| \beta - \beta^* \|^2 )$ for this selector can be computed to be roughly

$\lambda^2 + \sum_{i=1}^n \min( |y_i|^2, \lambda^2)$ (2)

and one can show that this is basically best possible (except for constants and logarithmic factors) amongst all selectors in this model. More generally, the main result of our paper was that under the assumption that the predictor matrix obeys the RIP, the mean square error of the Dantzig selector is essentially equal to (2) and thus close to best possible.

After accepting our paper, the Annals of Statistics took the (somewhat uncommon) step of soliciting responses to the paper from various experts in the field, and then soliciting a rejoinder to these responses from Emmanuel and I. Recently, the Annals posted these responses and rejoinder on the arXiv:

This week I am at Rutgers University, giving the Lewis Memorial Lectures for this year, which are also concurrently part of a workshop in random matrices. I gave four lectures, three of which were on random matrices, and one of which was on the Szemerédi regularity lemma.

The titles, abstracts, and slides of these talks are as follows.

1. Szemerédi’s lemma revisited. In this general-audience talk, I discuss the Szemerédi regularity lemma (which, roughly speaking, shows that an arbitrary large dense graph can always be viewed as the disjoint union of a bounded number of pseudorandom components), and how it has recently been reinterpreted in a more analytical (and infinitary) language using the theory of graph limits or of exchangeable measures. I also discuss arithmetic analogues of this lemma, including one which (implicitly) underlies my result with Ben Green that the primes contain arbitrarily long arithmetic progressions.
2. Singularity and determinant of random matrices. Here, I present recent progress in understanding the question of how likely a random matrix (e.g. one whose entries are all +1 or -1 with equal probability) is to be invertible, as well as the related question of how large the determinant should be. The case of continuous matrix ensembles (such as the Gaussian ensemble) is well understood, but the discrete case contains some combinatorial difficulties and took longer to understand properly. In particular I present the results of Kahn-Komlós-Szemerédi and later authors showing that discrete random matrices are invertible with exponentially high probability, and also give some results for the distribution of the determinant.
3. The least singular value of random matrices. A more quantitative version of the question “when is a matrix invertible?” is “what is the least singular value of that matrix”? I present here the recent results of Litvak-Pajor-Rudelson-Tomczak-Jaegermann, Rudelson, myself and Vu, and RudelsonVershynin on addressing this question in the discrete case. A central role is played by the inverse Littlewood-Offord theorems of additive combinatorics, which give reasonably sharp necessary conditions for a discrete random walk to concentrate in a small ball.
4. The circular law. One interesting application of the above theory is to extend the circular law for the spectrum of random matrices from the continuous case to the discrete case. Previous arguments of Girko and Bai for the continuous case can be transplanted to the discrete case, but the key new ingredient needed is a least singular value bound for shifted matrices $M-\lambda I$ in order to avoid the spectrum being overwhelmed by pseudospectrum. It turns out that the results of the preceding lecture are almost precisely what are needed to accomplish this.

[Update, Mar 31: first lecture slides corrected.  Thanks to Yoshiyasu Ishigami for pointing out a slight inaccuracy in the text.]

In this final lecture, we establish a Ratner-type theorem for actions of the special linear group $SL_2({\Bbb R})$ on homogeneous spaces. More precisely, we show:

Theorem 1. Let G be a Lie group, let $\Gamma < G$ be a discrete subgroup, and let $H \leq G$ be a subgroup isomorphic to $SL_2({\Bbb R})$. Let $\mu$ be an H-invariant probability measure on $G/\Gamma$ which is ergodic with respect to H (i.e. all H-invariant sets either have full measure or zero measure). Then $\mu$ is homogeneous in the sense that there exists a closed connected subgroup $H \leq L \leq G$ and a closed orbit $Lx \subset G/\Gamma$ such that $\mu$ is L-invariant and supported on Lx.

This result is a special case of a more general theorem of Ratner, which addresses the case when H is generated by elements which act unipotently on the Lie algebra ${\mathfrak g}$ by conjugation, and when $G/\Gamma$ has finite volume. To prove this theorem we shall follow an argument of Einsiedler, which uses many of the same ingredients used in Ratner’s arguments but in a simplified setting (in particular, taking advantage of the fact that H is semisimple with no non-trivial compact factors). These arguments have since been extended and made quantitative by Einsiedler, Margulis, and Venkatesh.
Read the rest of this entry »

I’m closing my series of articles for the Princeton Companion to Mathematics with my article on “Ricci flow“. Of course, this flow on Riemannian manifolds is now very well known to mathematicians, due to its fundamental role in Perelman’s celebrated proof of the Poincaré conjecture. In this short article, I do not focus on that proof, but instead on the more basic questions as to what a Riemannian manifold is, what the Ricci curvature tensor is on such a manifold, and how Ricci flow qualitatively changes the geometry (and with surgery, the topology) of such manifolds over time.

I’ve saved this article for last, in part because it ties in well with my upcoming course on Perelman’s proof which will start in a few weeks (details to follow soon).

The last external article for the PCM that I would like to point out here is Brian Osserman‘s article on the Weil conjectures, which include the “Riemann hypothesis over finite fields” that was famously solved by Deligne. These (now solved) conjectures, which among other things gives some quite precise control on the number of points in an algebraic variety over a finite field, were (and continue to be) a major motivating force behind much of modern arithmetic and algebraic geometry.

The last two lectures of this course will be on Ratner’s theorems on equidistribution of orbits on homogeneous spaces. Due to lack of time, I will not be able to cover all the material here that I had originally planned; in particular, for an introduction to this family of results, and its connections with number theory, I will have to refer readers to my previous blog post on these theorems. In this course, I will discuss two special cases of Ratner-type theorems. In this lecture, I will talk about Ratner-type theorems for discrete actions (of the integers ${\Bbb Z})$ on nilmanifolds; this case is much simpler than the general case, because there is a simple criterion in the nilmanifold case to test whether any given orbit is equidistributed or not. Ben Green and I had need recently to develop quantitative versions of such theorems for a number-theoretic application. In the next and final lecture of this course, I will discuss Ratner-type theorems for actions of $SL_2({\Bbb R})$, which is simpler in a different way (due to the semisimplicity of $SL_2({\Bbb R})$, and lack of compact factors).

My penultimate article for my PCM series is a very short one, on “Hamiltonians“. The PCM has a number of short articles to define terms which occur frequently in the longer articles, but are not substantive enough topics by themselves to warrant a full-length treatment. One of these is the term “Hamiltonian”, which is used in all the standard types of physical mechanics (classical or quantum, microscopic or statistical) to describe the total energy of a system. It is a remarkable feature of the laws of physics that this single object (which is a scalar-valued function in classical physics, and a self-adjoint operator in quantum mechanics) suffices to describe the entire dynamics of a system, although from a mathematical perspective it is not always easy to read off all the analytic aspects of this dynamics just from the form of the Hamiltonian.

In mathematics, Hamiltonians of course arise in the equations of mathematical physics (such as Hamilton’s equations of motion, or Schrödinger’s equations of motion), but also show up in symplectic geometry (as a special case of a moment map) and in microlocal analysis.

For this post, I would also like to highlight an article of my good friend Andrew Granville on one of my own favorite topics, “Analytic number theory“, focusing in particular on the classical problem of understanding the distribution of the primes, via such analytic tools as zeta functions and L-functions, sieve theory, and the circle method.

In this lecture – the final one on general measure-preserving dynamics – we put together the results from the past few lectures to establish the Furstenberg-Zimmer structure theorem for measure-preserving systems, and then use this to finish the proof of the Furstenberg recurrence theorem.