You are currently browsing the monthly archive for July 2007.

I’ve just uploaded to the arXiv my lecture notes “Structure and randomness in combinatorics” for my tutorial at the upcoming FOCS 2007 conference in October. This tutorial covers similar ground as my ICM paper (or slides), or my first two Simons lectures, but focuses more on the “nuts-and-bolts” of how structure theorems actually work to separate objects into structured pieces and pseudorandom pieces, for various definitions of “structured” and “pseudorandom”. Given that the target audience consists of computer scientists, I have focused exclusively here on the combinatorial aspects of this dichotomy (applied for instance to functions on the Hamming cube) rather than, say, the ergodic theory aspects (which are covered in Bryna Kra‘s lecture notes from Montreal, or my notes from Montreal for that matter). While most of the known applications of these decompositions are number-theoretic (e.g. my theorem with Ben Green), the number theory aspects are not covered in detail in these notes. (For that, you can read Bernard Host’s Bourbaki article, Ben Green‘s Gauss-Dirichlet article or ICM article, or my Coates article.)

One of the oldest and most fundamental concepts in mathematics is the line. Depending on exactly what mathematical structures we want to study (algebraic, geometric, topological, order-theoretic, etc.), we model lines nowadays by a variety of standard mathematical objects, such as the real line ${\Bbb R}$, the complex line ${\Bbb C}$, the projective line $\Bbb{RP}^1$, the extended real line ${}[-\infty,+\infty]$, the affine line ${\Bbb A}^1$, the continuum $c$, the long line $L$, etc. We also have discrete versions of the line, such as the natural numbers ${\Bbb N}$, the integers ${\Bbb Z}$, and the ordinal $\omega$, as well as compact versions of the line, such as the unit interval ${}[0,1]$ or the unit circle ${\Bbb T} := {\Bbb R}/{\Bbb Z}$. Finally we have discrete and compact versions of the line, such as the cyclic groups ${\Bbb Z}/N{\Bbb Z}$ and the discrete intervals $\{1,\ldots,N\}$ and $\{0,\ldots,N-1\}$. By taking Cartesian products we then obtain higher-dimensional objects such as Euclidean space ${\Bbb R}^n$, the standard lattice ${\Bbb Z}^n$, the standard torus ${\Bbb T}^n = {\Bbb R}^n/{\Bbb Z}^n$, and so forth. These objects of course form the background on which a very large fraction of modern mathematics is set.

Broadly speaking, the line has three major families of structures on it:

1. Geometric structures, such as a metric or a measure, completeness, scales (coarse and fine), rigid motions (translations and reflection), similarities (dilation, affine maps), and differential structures (tangent bundle, etc.);
2. Algebraic structures, such group, ring, or field structures, and everything else that comes from those categories (e.g. subgroups, homomorphisms, involutions, etc.); and
3. One-dimensional structures, such as order, a length space structure (in particular, path-connectedness structure), a singleton generator, the Archimedean property, the ability to use mathematical induction (i.e. well-ordering), convexity, or the ability to disconnect the line by removing a single point.

Of course, these structures are inter-related, and it is an important phenomenon that a mathematical concept which appears to be native to one structure, can often be equivalently defined in terms of other structures. For instance, the absolute value $|n|$ of an integer $n$ can be defined geometrically as the distance from 0 to $n$, algebraically as the index of the subgroup $\langle n \rangle = n \cdot \Bbb Z$ of the integers ${\Bbb Z}$ generated by n, or one-dimensionally as the number of integers between 0 and $n$ (including 0, but excluding $n$). This equivalence of definitions becomes important when one wants to work in more general contexts in which one or more of the above structures is missing or otherwise weakened.

What I want to talk about today is an important toy model for the line (in any of its incarnations), in which the geometric and algebraic structures are enhanced (and become neatly nested and recursive), at the expense of the one-dimensional structure (which is largely destroyed). This model has many different names, depending on what field of mathematics one is working in and which structures one is interested in. In harmonic analysis it is called the dyadic model, the Walsh model, or the Cantor group model; in number theory and arithmetic geometry it is known as the function field model; in topology it is the Cantor space model; in probability it is the martingale model; in metric geometry it is the ultrametric, tree, or non-Archimedean model; in algebraic geometry it is the Puiseux series model; in additive combinatorics it is the bounded torsion or finite field model; in computer science and information theory it is the Hamming cube model; in representation theory it is the Kashiwara crystal model. Let me arbitrarily select one of these terms, and refer to all of these models as dyadic models for the line (or of objects derived from the line). While there is often no direct link between a dyadic model and a non-dyadic model, dyadic models serve as incredibly useful laboratories in which to gain insight and intuition for the “real-world” non-dyadic model, since one has much more powerful and elegant algebraic and geometric structure to play with in this setting (though the loss of one-dimensional structure can be a significant concern). Perhaps the most striking example of this is the three-line proof of the Riemann hypothesis in the function field model of the integers, which I will discuss a little later.

I’ve just uploaded to the arXiv the paper “The cubic nonlinear Schrödinger equation in two dimensions with radial data“, joint with Rowan Killip and Monica Visan, and submitted to the Annals of Mathematics. This is a sequel of sorts to my paper with Monica and Xiaoyi Zhang, in which we established global well-posedness and scattering for the defocusing mass-critical nonlinear Schrödinger equation (NLS) $iu_t + \Delta u = |u|^{4/d} u$ in three and higher dimensions $d \geq 3$ assuming spherically symmetric data. (This is another example of the recently active field of critical dispersive equations, in which both coarse and fine scales are (just barely) nonlinearly active, and propagate at different speeds, leading to significant technical difficulties.)

In this paper we obtain the same result for the defocusing two-dimensional mass-critical NLS $iu_t + \Delta u= |u|^2 u$, as well as in the focusing case $iu_t + \Delta u= -|u|^2 u$ under the additional assumption that the mass of the initial data is strictly less than the mass of the ground state. (When mass equals that of the ground state, there is an explicit example, built using the pseudoconformal transformation, which shows that solutions can blow up in finite time.) In fact we can show a slightly stronger statement: for spherically symmetric focusing solutions with arbitrary mass, we can show that the first singularity that forms concentrates at least as much mass as the ground state.

Last week, as mentioned previously, I attended a very inspiring interdisciplinary meeting at the Royal Society. It would be impossible for me to describe all 47 talks in detail; but I thought that I would take the time to discuss one representative talk, presented by Rosemary Grant FRS, on her research (together with several collaborators, including her husband), on the ongoing evolution (and speciation) of Darwin’s finches in the Galápagos islands; this was a particularly striking talk for me, as I had not been aware that such dramatic microevolutionary changes could be seen in real-time (Grant and her co-authors have tracked multiple generations of these finches for 35 years!). They have several interesting results; the talk that I am reproducing here is largely based on this article in Evolution.

I’ve added some new pages to this blog. Firstly, I’ve moved my old page on my books to a new page on this blog, so that each book now has its own page for publication information, errata, etc., as well as its own comments section for feedback (including the inevitable reports of future errata).

I also added a new page to my advice page on writing and submitting papers, on “taking advantage of the English language“.

Finally, I added some links on the sidebar of this blog to some of the more popular of my older posts and articles, and reorganised the categories slightly. In particular, my expository articles were moved out of the “short story” category (they didn’t fit particularly well with the other wordpress articles in this category) and into the “expository” category. (Now that there are a number of wordpress blogs in research mathematics, it might eventually be worth trying to establish some standards for use of mathematical tags, but this is hardly an urgent concern.)

[Update, July 18: Added the arXiv math categories, as well as a parent Mathematics category.]

This week I was in London, attending the New Fellows Seminar at the Royal Society. This was a fairly low-key event preceding the formal admissions ceremony; for instance, it is not publicised on their web site. The format was very interesting: they had each of the new Fellows of the Society give a brief (15 minute) presentation of their work in quick succession, in a manner which would be accessible to a diverse audience in the physical and life sciences. The result was a wonderful two-day seminar on the state of the art in many areas of physics, chemistry, engineering, biology, medicine, and mathematics. For instance, I learnt

• How the solar neutrino problem was resolved by the discovery that the neutrino had mass, which did not commute with flavour and hence caused neutrino oscillations, which have since been detected experimentally;
• Why modern aircraft (such as the Dreamliner and A380) are now assembled using (incredibly tough and waterproofed) adhesives instead of bolts or welds, and how adhesion has been enhanced by nanoparticles;
• How the bacterium Helicobacter pylori was recently demonstrated (by two Aussies :-) ) to be a major cause of peptic ulcers (though the exact mechanism is not fully understood), but has also been proposed (somewhat paradoxically) to also have a preventative effect against esophageal cancer (cf. the hygiene hypothesis);
• How recent advances in machine learning and image segmentation (including graph cut methods!) now allow computers to identify and track many general classes of objects (e.g. people, cars, animals) simultaneously in real-world images and video, though not quite in real-time yet;
• How large-scale structure maps of the universe (such as the 2dF Galaxy Redshift Survey) combine with measurements of the cosmic background radiation (e.g. from WMAP) to demonstrate the existence of both dark matter and dark energy (they have different impacts on the evolution of the curvature of the universe and on the current distribution of visible matter);
• … and 42 other topics like this. (One strongly recurrent theme in the life science talks was just how much recent genomic technologies, such as the genome projects of various key species, have accelerated (by several orders of magnitude!) the ability to identify the genes, proteins, and mechanisms that underlie any given biological function or disease. To paraphrase one speaker, a modern genomics lab could now produce the equivalent of one 1970s PhD thesis in the subject every minute.)

I’ve just uploaded to the ArXiV my paper “Norm convergence of multiple ergodic averages for commuting transformations“, submitted to
Ergodic Theory and Dynamical Systems. This paper settles in full generality the norm convergence problem for several commuting transformations. Specifically, if $(X, {\mathcal X},\mu)$ is a probability space and $T_1, \ldots, T_l: X \to X$ are commuting measure-preserving transformations, then for any bounded measurable functions $f_1, \ldots, f_l: X \to {\Bbb R}$, the multiple average

$\frac{1}{N} \sum_{n=0}^{N-1} \int_X T_1^n f_1 \ldots T_l^n f_l$ (1)

is convergent in the $L^2(X)$ norm topology (and thus also converges in probability). The corresponding question of pointwise almost everywhere convergence remains open (and quite difficult, in my opinion). My argument also does not readily establish a formula as to what this limit actually is (it really establishes that the sequence is Cauchy in $L^2$ rather than convergent).

The l=1 case of this theorem is the classical mean ergodic theorem (also known as the von Neumann ergodic theorem). The l=2 case was established by Conze and Lesigne. The higher l case was partially resolved by Frantzikinakis and Kra, under the additional hypotheses that all of the transformations $T_i$, as well as the quotients $T_i T_j^{-1}$, are ergodic. The special case $T_j=T^j$ was established by Host-Kra (with another proof given subsequently by Ziegler). Another relevant result is the Furstenberg-Katznelson theorem, which asserts among other things when $f_1=\ldots=f_l$ is non-negative and not identically zero, then the inner product of the expression (1) with f has a strictly positive limit inferior as $N \to \infty$. This latter result also implies Szemerédi’s theorem.

It is also known that the Furstenberg-Katznelson theorem can be proven by hypergraph methods, and in fact my paper also proceeds by a hypergraph-inspired approach, although the language of hypergraphs is not explicitly used in the body of the argument. (In contrast to the work of Host-Kra and Ziegler, no nilsystems appear in the proof.)

A few months ago, I was invited to contribute an article to Scholarpedia – a Wikipedia-like experiment (using essentially the same software, in fact) in which the articles are far fewer in number, but have specialists as the primary authors (and curators) and are peer-reviewed in a manner similar to submissions to a research journal. Specifically, I was invited (with Ben Green) to author the article on Szemerédi’s theorem. The article is now submitted (awaiting review) reviewed and accepted, and can be viewed on the Scholarpedia page for that theorem. Like Wikipedia, the page is open to edits or any other comments by any user (once they register an account and login); but the edits are moderated by the curators and primary authors, who thus remain responsible for the content.

Scholarpedia seems to be an interesting experiment, trying to blend the collaborative and dynamic strengths of the wiki system with the traditional and static strengths of the peer-review system. At any rate, any feedback on my article with Ben, either at the Scholarpedia page or here, would be welcome.

[Update, July 9: the article has been reviewed, modified, and accepted in just three days – a blindingly fast speed as far as peer review goes!]

This problem in compressed sensing is an example of a derandomisation problem: take an object which, currently, can only be constructed efficiently by a probabilistic method, and figure out a deterministic construction of comparable strength and practicality. (For a general comparison of probabilistic and deterministic algorithms, I can point you to these slides by Avi Wigderson).

I will define exactly what UUP matrices (the UUP stands for “uniform uncertainty principle“) are later in this post. For now, let us just say that they are a generalisation of (rectangular) orthogonal matrices, in which the columns are locally almost orthogonal rather than globally perfectly orthogonal. Because of this, it turns out that one can pack significantly more columns into a UUP matrix than an orthogonal matrix, while still capturing many of the desirable features of orthogonal matrices, such as stable and computable invertibility (as long as one restricts attention to sparse or compressible vectors). Thus UUP matrices can “squash” sparse vectors from high-dimensional space into a low-dimensional while still being able to reconstruct those vectors; this property underlies many of the recent results on compressed sensing today.

There are several constructions of UUP matrices known today (e.g. random normalised Gaussian matrices, random normalised Bernoulli matrices, or random normalised minors of a discrete Fourier transform matrix) but (if one wants the sparsity parameter to be large) they are all probabilistic in nature; in particular, these constructions are not 100% guaranteed to actually produce a UUP matrix, although in many cases the failure rate can be proven to be exponentially small in the size of the matrix. Furthermore, there is no fast (e.g. sub-exponential time) algorithm known to test whether any given matrix is UUP or not. The failure rate is small enough that this is not a problem for most applications (especially since many compressed sensing applications are for environments which are already expected to be noisy in many other ways), but is slightly dissatisfying from a theoretical point of view. One is thus interested in finding a deterministic construction which can locate UUP matrices in a reasonably rapid manner. (One could of course simply search through all matrices in a given class and test each one for the UUP property, but this is an exponential-time algorithm, and thus totally impractical for applications.) In analogy with error-correcting codes, it may be that algebraic or number-theoretic constructions may hold the most promise for such deterministic UUP matrices (possibly assuming some unproven conjectures on exponential sums); this has already been accomplished by de Vore for UUP matrices with small sparsity parameter.