You are currently browsing the category archive for the ‘math.MG’ category.

In a previous blog post, I discussed the recent result of Guth and Katz obtaining a near-optimal bound on the Erdos distance problem. One of the tools used in the proof (building upon the earlier work of Elekes and Sharir) was the observation that the incidence geometry of the Euclidean group ${SE(2)}$ of rigid motions of the plane was almost identical to that of lines in the Euclidean space ${{\bf R}^3}$:

Proposition 1 One can identify a (Zariski-)dense portion of ${SE(2)}$ with ${{\bf R}^3}$, in such a way that for any two points ${A, B}$ in the plane ${{\bf R}^2}$, the set ${l_{AB} := \{ R \in SE(2): RA = B \}}$ of rigid motions mapping ${A}$ to ${B}$ forms a line in ${{\bf R}^3}$.

Proof: A rigid motion is either a translation or a rotation, with the latter forming a Zariski-dense subset of ${SE(2)}$. Identify a rotation ${R}$ in ${SE(2)}$ by an angle ${\theta}$ with ${|\theta| < \pi}$ around a point ${P}$ with the element ${(P, \cot \frac{\theta}{2})}$ in ${{\bf R}^3}$. (Note that such rotations also form a Zariski-dense subset of ${SE(2)}$.) Elementary trigonometry then reveals that if ${R}$ maps ${A}$ to ${B}$, then ${P}$ lies on the perpendicular bisector of ${AB}$, and depends in a linear fashion on ${\cot \frac{\theta}{2}}$ (for fixed ${A,B}$). The claim follows. $\Box$

As seen from the proof, this proposition is an easy (though ad hoc) application of elementary trigonometry, but it was still puzzling to me why such a simple parameterisation of the incidence structure of ${SE(2)}$ was possible. Certainly it was clear from general algebraic geometry considerations that some bounded-degree algebraic description was available, but why would the ${l_{AB}}$ be expressible as lines and not as, say, quadratic or cubic curves?

In this post I would like to record some observations arising from discussions with Jordan Ellenberg, Jozsef Solymosi, and Josh Zahl which give a more conceptual (but less elementary) derivation of the above proposition that avoids the use of ad hoc coordinate transformations such as ${R \mapsto (P, \cot\frac{\theta}{2})}$. The starting point is to view the Euclidean plane ${{\bf R}^2}$ as the scaling limit of the sphere ${S^2}$ (a fact which is familiar to all of us through the geometry of the Earth), which makes the Euclidean group ${SE(2)}$ a scaling limit of the rotation group ${SO(3)}$. The latter can then be lifted to a double cover, namely the spin group ${Spin(3)}$. This group has a natural interpretation as the unit quaternions, which is isometric to the unit sphere ${S^3}$. The analogue of the lines ${l_{AB}}$ in this setting become great circles on this sphere; applying a projective transformation, one can map ${S^3}$ to ${{\bf R}^3}$ (or more precisely to the projective space ${{\bf P}^3}$), at whichi point the great circles become lines. This gives a proof of Proposition 1.

Details of the correspondence are provided below the fold. One by-product of this analysis, incidentally, is the observation that the Guth-Katz bound ${g(N) \gg N / \log N}$ for the Erdos distance problem in the plane ${{\bf R}^2}$, immediately extends with almost no modification to the sphere ${S^2}$ as well (i.e. any ${N}$ points in ${S^2}$ determine ${\gg N/\log N}$ distances), as well as to the hyperbolic plane ${H^2}$.

Assaf Naor and I have just uploaded to the arXiv our joint paper “Scale-oblivious metric fragmentation and the nonlinear Dvoretzky theorem“.

Consider a finite metric space ${(X,d)}$, with ${X}$ being a set of ${n}$ points. It is not always the case that this space can be isometrically embedded into a Hilbert space such as ${\ell^2}$; for instance, in a Hilbert space every pair of points has a unique midpoint, but uniqueness can certainly fail for finite metric spaces (consider for instance the graph metric on a four-point diamond). The situation improves, however, if one allows (a) the embedding to have some distortion, and (b) one is willing to embed just a subset of ${X}$, rather than all of ${X}$. More precisely, for any integer ${n}$ and any distortion ${D \geq 1}$, let ${k(n,D)}$ be the largest integer with the property that given every ${n}$-point metric space ${(X,d)}$, there exists a ${k}$-point subset ${Y}$ of ${X}$, and a map ${f: Y \rightarrow \ell^2}$, which has distortion at most ${D}$ in the sense that

$\displaystyle d(y,y') \leq \| f(y) - f(y') \|_{\ell^2} \leq D d(y,y')$

for all ${y,y' \in Y}$.

Bourgain, Figiel, and Milman established that for any fixed ${D>1}$, one has the lower bound ${k(n,D) \gg_D \log n}$, and also showed for ${D}$ sufficiently close to ${1}$ there was a matching upper bound ${k(n,D) \ll_D \log n}$. This type of result was called a nonlinear Dvoretsky theorem, in analogy with the linear Dvoretsky theorem that asserts that given any ${n}$-dimensional normed vector space and ${D > 1}$, there existed a ${k}$-dimensional subspace which embedded with distortion ${D}$ into ${\ell^2}$, with ${k \sim_D \log n}$. In other words, one can ensure that about ${\log n}$ of the ${n}$ points in an arbitrary metric space behave in an essentially Euclidean manner, and that this is best possible if one wants the distortion to be small.

Bartel, Linial, Mendel, and Naor observed that there was a threshold phenomenon at ${D=2}$. Namely, for ${D < 2}$, the Bourgain-Figiel-Milman bounds were sharp with ${k(n,D) \sim_D \log n}$; but for ${D>2}$ one instead had a power law

$\displaystyle n^{1-B(D)} \ll_D k(n,D) \ll n^{1-b(D)}$

for some ${0 < b(D) \leq B(D) < 1}$. In other words, once one allows distortion by factors greater than ${2}$, one can now embed a polynomial portion of the points, rather than just a logarithmic portion, into a Hilbert space. This has some applications in theoretical computer science to constructing “approximate distance oracles” for high-dimensional sets of data. (The situation at the critical value ${D=2}$ is still unknown.)

In the special case that the metric ${d}$ is an ultrametric, so that the triangle inequality ${d(x,z) \leq d(x,y) + d(y,z)}$ is upgraded to the ultra-triangle inequality ${d(x,z) \leq \max(d(x,y), d(y,z))}$, then it is an easy exercise to show that ${X}$ can be embedded isometrically into a Hilbert space, and in fact into a sphere of radius ${\hbox{diam}(X)/\sqrt{2}}$. Indeed, this can be established by an induction on the cardinality of ${X}$, using the ultrametric inequality to partition any finite ultrametric space ${X}$ of two or more points into sets of strictly smaller diameter that are all separated by ${\hbox{diam}(X)}$, and using the inductive hypothesis and Pythagoras’ theorem.

One can then replace the concept of embedding into a Hilbert space, with the apparently stronger concept of embedding into an ultrametric space; this is useful for the computer science applications as ultrametrics have a tree structure which allows for some efficient algorithms for computing distances in such spaces. As it turns out, all the preceding constructions carry over without difficulty to this setting; thus, for ${D<2}$, one can embed a logarithmic number of points with distortion ${D}$ into an ultrametric space, and for ${D>2}$ one can embed a polynomial number of points.

One can view the task of locating a subset of a metric space that is equivalent (up to bounded distortion) to an ultrametric as that of fragmenting a metric space into a tree-like structure. For instance, the standard metric on the arithmetic progression ${\{1,\ldots,3^m\}}$ of length ${n=3^m}$ is not an ultrametric (and in fact needs a huge distortion factor of ${3^m}$ in order to embed into an ultrametric space), but if one restricts to the Cantor-like subset of integers in ${\{1,\ldots,3^m\}}$ whose base ${3}$ expansion consists solely of ${0}$s and ${1}$s, then one easily verifies that the resulting set is fragmented enough that the standard metric is equivalent (up to a distortion factor of ${3}$) to an ultrametric, namely the metric ${d(x,y) := 3^{j}/2}$ where ${j}$ is the largest integer for which ${3^j}$ divides ${y-x}$ and ${x \neq y}$.

The above fragmentation constructions were somewhat complicated and deterministic in nature. Mendel and Naor introduced a simpler probabilistic method, based on random partition trees of ${X}$, which reproved the above results in the high-distortion case ${D \gg 1}$ (with ${b(D), B(D) \sim 1/D}$ in the limit ${D \rightarrow \infty}$). However, this method had inherent limitations in the low-distortion case, in particular failing for ${D < 3}$ (basically because the method would establish a stronger embedding result which fails in that regime).

In this paper, we introduce a variant of the random partition tree method, which involves a random fragmentation at a randomly chosen set of scales, that works all the way down to ${D>2}$ and gives a clean value for the exponent ${b(D)}$, namely ${2e/D}$; in fact we have the more precise result that we can take ${b(D)}$ to be ${1-\theta}$, where ${0 < \theta < 1}$ is the unique solution to the equation

$\displaystyle \frac{2}{D} = (1-\theta) \theta^{\theta/(1-\theta)},$

and that this is the limit of the method if one chooses a “scale-oblivious” approach that applies uniformly to all metric spaces, ignoring the internal structure of each individual metric space.

The construction ends up to be relatively simple (taking about three pages). The basic idea is as follows. Pick two scales ${R > r > 0}$, and let ${x_1, x_2, x_3, \ldots}$ be a random sequence of points in ${X}$ (of course, being finite, every point in ${X}$ will almost surely be visited infinitely often by this sequence). We can then partition ${X}$ into pieces ${Y_1, Y_2, \ldots}$, by defining ${Y_m}$ to be the set of points ${x}$ for which ${x_m}$ is the first point in the sequence to lie in the ball ${B(x,R)}$. We then let ${S_m}$ be the subset of ${Y_m}$ in which ${x_m}$ actually falls in the smaller ball ${B(x,r)}$. As a consequence, we see that each ${S_m}$ is contained in a ball of radius ${r}$ (namely ${B(x_m,r)}$), but that any two ${S_m, S_{m'}}$ are separated by a distance of at least ${R-r}$ (from the triangle inequality). This gives one layer of a quasi-ultrametric tree structure on ${\bigcup_m S_m}$; if one iterates this over many different pairs of scales ${R, r}$, one gets a full quasi-ultrametric tree structure, which one can then adjust with bounded distortion to a genuine ultrametric structure. The game is then to optimise the choice of ${R, r}$ so as to maximise the portion of ${X}$ that remains in the tree; it turns out that a suitably random choice of such scales is the optimal one.

Assaf Naor and I have just uploaded to the arXiv our paper “Random Martingales and localization of maximal inequalities“, to be submitted shortly. This paper investigates the best constant in generalisations of the classical Hardy-Littlewood maximal inequality

for any absolutely integrable ${f: {\mathbb R}^n \rightarrow {\mathbb R}}$, where ${B(x,r)}$ is the Euclidean ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the Lebesgue measure of a subset ${E}$ of ${{\mathbb R}^n}$. This inequality is fundamental to a large part of real-variable harmonic analysis, and in particular to Calderón-Zygmund theory. A similar inequality in fact holds with the Euclidean norm replaced by any other convex norm on ${{\mathbb R}^n}$.

The exact value of the constant ${C_n}$ is only known in ${n=1}$, with a remarkable result of Melas establishing that ${C_1 = \frac{11+\sqrt{61}}{12}}$. Classical covering lemma arguments give the exponential upper bound ${C_n \leq 2^n}$ when properly optimised (a direct application of the Vitali covering lemma gives ${C_n \leq 5^n}$, but one can reduce ${5}$ to ${2}$ by being careful). In an important paper of Stein and Strömberg, the improved bound ${C_n = O( n \log n )}$ was obtained for any convex norm by a more intricate covering norm argument, and the slight improvement ${C_n = O(n)}$ obtained in the Euclidean case by another argument more adapted to the Euclidean setting that relied on heat kernels. In the other direction, a recent result of Aldaz shows that ${C_n \rightarrow \infty}$ in the case of the ${\ell^\infty}$ norm, and in fact in an even more recent preprint of Aubrun, the lower bound ${C_n \gg_\epsilon \log^{1-\epsilon} n}$ for any ${\epsilon > 0}$ has been obtained in this case. However, these lower bounds do not apply in the Euclidean case, and one may still conjecture that ${C_n}$ is in fact uniformly bounded in this case.

Unfortunately, we do not make direct progress on these problems here. However, we do show that the Stein-Strömberg bound ${C_n = O(n \log n)}$ is extremely general, applying to a wide class of metric measure spaces obeying a certain “microdoubling condition at dimension ${n}$“; and conversely, in such level of generality, it is essentially the best estimate possible, even with additional metric measure hypotheses on the space. Thus, if one wants to improve this bound for a specific maximal inequality, one has to use specific properties of the geometry (such as the connections between Euclidean balls and heat kernels). Furthermore, in the general setting of metric measure spaces, one has a general localisation principle, which roughly speaking asserts that in order to prove a maximal inequality over all scales ${r \in (0,+\infty)}$, it suffices to prove such an inequality in a smaller range ${r \in [R, nR]}$ uniformly in ${R>0}$. It is this localisation which ultimately explains the significance of the ${n \log n}$ growth in the Stein-Strömberg result (there are ${O(n \log n)}$ essentially distinct scales in any range ${[R,nR]}$). It also shows that if one restricts the radii ${r}$ to a lacunary range (such as powers of ${2}$), the best constant improvees to ${O(\log n)}$; if one restricts the radii to an even sparser range such as powers of ${n}$, the best constant becomes ${O(1)}$.

Yehuda Shalom and I have just uploaded to the arXiv our paper “A finitary version of Gromov’s polynomial growth theorem“, to be submitted to Geom. Func. Anal..  The purpose of this paper is to establish a quantitative version of Gromov’s polynomial growth theorem which, among other things, is meaningful for finite groups.   Here is a statement of Gromov’s theorem:

Gromov’s theorem. Let $G$ be a group generated by a finite (symmetric) set $S$, and suppose that one has the polynomial growth condition

$|B_S(R)| \leq R^d$ (1)

for all sufficiently large $R$ and some fixed $d$, where $B_S(R)$ is the ball of radius $R$ generated by $S$ (i.e. the set of all words in $S$ of length at most $d$, evaluated in $G$).  Then $G$ is virtually nilpotent, i.e. it has a finite index subgroup $H$ which is nilpotent of some finite step $s$.

As currently stated, Gromov’s theorem is qualitative rather than quantitative; it does not specify any relationship between the input data (the growth exponent $d$ and the range of scales $R$ for which one has (1)), and the output parameters (in particular, the index $|G/H|$ of the nilpotent subgroup $H$ of $G$, and the step $s$ of that subgroup).  However, a compactness argument (sketched in this previous blog post) shows that some such relationship must exist; indeed, if one has (1) for all $R_0 \leq R \leq C( R_0, d )$ for some sufficiently large $C(R_0,d)$, then one can ensure $H$ has index at most $C'(R_0,d)$ and step at most $C''(R_0,d)$ for some quantities $C'(R_0,d)$, $C''(R_0,d)$; thus Gromov’s theorem is inherently a “local” result which only requires one to multiply the generator set $S$ a finite number $C(R_0,d)$ of times before one sees the virtual nilpotency of the group.  However, the compactness argument does not give an explicit value to the quantities $C(R_0,d), C'(R_0,d), C''(R_0,d)$, and the nature of Gromov’s proof (using, in particular, the deep Montgomery-Zippin-Yamabe theory on Hilbert’s fifth problem) does not easily allow such an explicit value to be extracted.

Another point is that the original formulation of Gromov’s theorem required the polynomial bound (1) at all sufficiently large scales $R$.  A later proof of this theorem by van den Dries and Wilkie relaxed this hypothesis to requiring (1) just for infinitely many scales $R$; the later proof by Kleiner (which I blogged about here) also has this relaxed hypothesis.

Our main result reduces the hypothesis (1) to a single large scale, and makes most of the qualitative dependencies in the theorem quantitative:

Theorem 1. If (1) holds for some $R > \exp(\exp(C d^C))$ for some sufficiently large absolute constant $C$, then $G$ contains a finite index subgroup $H$ which is nilpotent of step at most $C^d$.

The argument does in principle provide a bound on the index of $H$ in $G$, but it is very poor (of Ackermann type).  If instead one is willing to relax “nilpotent” to “polycyclic“, the bounds on the index are somewhat better (of tower exponential type), though still far from ideal.

There is a related finitary analogue of Gromov’s theorem by Makarychev and Lee, which asserts that any finite group of uniformly polynomial growth has a subgroup with a large abelianisation.  The quantitative bounds in that result are quite strong, but on the other hand the hypothesis is also strong (it requires upper and lower bounds of the form (1) at all scales) and the conclusion is a bit weaker than virtual nilpotency.  The argument is based on a modification of Kleiner’s proof.

Our argument also proceeds by modifying Kleiner’s proof of Gromov’s theorem (a significant fraction of which was already quantitative), and carefully removing all of the steps which require one to take an asymptotic limit.  To ease this task, we look for the most elementary arguments available for each step of the proof (thus consciously avoiding powerful tools such as the Tits alternative).  A key technical issue is that because there is only a single scale $R$ for which one has polynomial growth, one has to work at scales significantly less than $R$ in order to have any chance of keeping control of the various groups and other objects being generated.

Below the fold, I discuss a stripped down version of Kleiner’s argument, and then how we convert it to a fully finitary argument.

I am posting the last two talks in my Clay-Mahler lecture series here:

[Update, Sep 14: Poincaré conjecture slides revised.]

[Update, Sep 18: Prime slides revised also.]

The celebrated Szemerédi-Trotter theorem gives a bound for the set of incidences ${I(P,L) := \{ (p,\ell) \in P \times L: p \in \ell \}}$ between a finite set of points ${P}$ and a finite set of lines ${L}$ in the Euclidean plane ${{\bf R}^2}$. Specifically, the bound is

$\displaystyle |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L| \ \ \ \ \ (1)$

where we use the asymptotic notation ${X \ll Y}$ or ${X=O(Y)}$ to denote the statement that ${X \leq CY}$ for some absolute constant ${C}$. In particular, the number of incidences between ${n}$ points and ${n}$ lines is ${O(n^{4/3})}$. This bound is sharp; consider for instance the discrete box ${P := \{ (a,b) \in {\bf Z}^2: 1 \leq a \leq N; 1 \leq b \leq 2N^2 \}}$ with ${L}$ being the collection of lines ${\{ (x,mx+b): m, b \in {\bf Z}, 1 \leq m \leq N, 1 \leq b \leq N^2 \}}$. One easily verifies that ${|P|=2N^3}$, ${|L| = N^3}$, and ${|I(P,L)| = N^4}$, showing that (1) is essentially sharp in the case ${|P| \sim |L|}$; one can concoct similar examples for other regimes of ${|P|}$ and ${|L|}$.

On the other hand, if one replaces the Euclidean plane ${{\bf R}^2}$ by a finite field geometry ${F^2}$, where ${F}$ is a finite field, then the estimate (1) is false. For instance, if ${P}$ is the entire plane ${F^2}$, and ${L}$ is the set of all lines in ${F^2}$, then ${|P|, |L|}$ are both comparable to ${|F|^2}$, but ${|I(P,L)|}$ is comparable to ${|F|^3}$, thus violating (1) when ${|F|}$ is large. Thus any proof of the Szemerédi-Trotter theorem must use a special property of the Euclidean plane which is not enjoyed by finite field geometries. In particular, this strongly suggests that one cannot rely purely on algebra and combinatorics to prove (1); one must also use some Euclidean geometry or topology as well.

Nowadays, the slickest proof of the Szemerédi-Trotter theorem is via the crossing number inequality (as discussed in this previous post), which ultimately relies on Euler’s famous formula ${|V|-|E|+|F|=2}$; thus in this argument it is topology which is the feature of Euclidean space which one is exploiting, and which is not present in the finite field setting. Today, though, I would like to mention a different proof (closer in spirit to the original proof of Szemerédi-Trotter, and also a later argument of Clarkson et al.), based on the method of cell decomposition, which has proven to be a very flexible method in combinatorial incidence geometry. Here, the distinctive feature of Euclidean geometry one is exploiting is convexity, which again has no finite field analogue.

Roughly speaking, the idea is this. Using nothing more than the axiom that two points determine at most one line, one can obtain the bound

$\displaystyle |I(P,L)| \ll |P| |L|^{1/2} + |L|, \ \ \ \ \ (2)$

which is inferior to (1). (On the other hand, this estimate works in both Euclidean and finite field geometries, and is sharp in the latter case, as shown by the example given earlier.) Dually, the axiom that two lines determine at most one point gives the bound

$\displaystyle |I(P,L)| \ll |L| |P|^{1/2} + |P| \ \ \ \ \ (3)$

(or alternatively, one can use projective duality to interchange points and lines and deduce (3) from (2)).

An inspection of the proof of (2) shows that it is only expected to be sharp when the bushes ${L_p := \{ \ell \in L: \ell \ni p \}}$ associated to each point ${p \in P}$ behave like “independent” subsets of ${L}$, so that there is no significant correlation between the bush ${L_p}$ of one point and the bush of another point ${L_q}$.

However, in Euclidean space, we have the phenomenon that the bush of a point ${L_p}$ is influenced by the region of space that ${p}$ lies in. Clearly, if ${p}$ lies in a set ${\Omega}$ (e.g. a convex polygon), then the only lines ${\ell \in L}$ that can contribute to ${L_p}$ are those lines which pass through ${\Omega}$. If ${\Omega}$ is a small convex region of space, one expects only a fraction of the lines in ${L}$ to actually pass through ${\Omega}$. As such, if ${p}$ and ${q}$ both lie in ${\Omega}$, then ${L_p}$ and ${L_q}$ are compressed inside a smaller subset of ${L}$, namely the set of lines passing through ${\Omega}$, and so should be more likely to intersect than if they were independent. This should lead to an improvement to (2) (and indeed, as we shall see below, ultimately leads to (1)).

More formally, the argument proceeds by applying the following lemma:

Lemma 1 (Cell decomposition) Let ${L}$ be a finite collection of lines in ${{\bf R}^2}$, let ${P}$ be a finite set of points, and let ${r \geq 1}$. Then it is possible to find a set ${R}$ of ${O(r)}$ lines in ${L}$, plus some additional open line segments not containing any point in ${P}$, which subdivide ${{\bf R}^2}$ into ${O(r^2)}$ convex regions (or cells), such that the interior of each such cell is incident to at most ${O(|L|/r)}$ lines.

The deduction of (1) from (2), (3) and Lemma 1 is very quick. Firstly we may assume we are in the range

$\displaystyle |L|^{1/2} \ll |P| \ll |L|^2 \ \ \ \ \ (4)$

otherwise the bound (1) follows already from either (2) or (3) and some high-school algebra.

Let ${r \geq 1}$ be a parameter to be optimised later. We apply the cell decomposition to subdivide ${{\bf R}^2}$ into ${O(r^2)}$ open convex regions, plus a family ${R}$ of ${O(r)}$ lines. Each of the ${O(r^2)}$ convex regions ${\Omega}$ has only ${O(|L|/r)}$ lines through it, and so by (2) contributes ${O( |P \cap \Omega| |L|^{1/2}/r^{1/2} + |L| / r )}$ incidences. Meanwhile, on each of the lines ${\ell}$ in ${R}$ used to perform this decomposition, there are at most ${|L|}$ transverse incidences (because each line in ${L}$ distinct from ${\ell}$ can intersect ${\ell}$ at most once), plus all the incidences along ${\ell}$ itself. Putting all this together, one obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P| |L|^{1/2}/r^{1/2} + |L| r).$

We optimise this by selecting ${r \sim |P|^{2/3} / |L|^{1/3}}$; from (4) we can ensure that ${r \leq |L|/2}$, so that ${|L \cap R| \leq |L|/2}$. One then obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P|^{2/3} |L|^{2/3} ).$

We can iterate away the ${L \cap R}$ error (halving the number of lines each time) and sum the resulting geometric series to obtain (1).

It remains to prove (1). If one subdivides ${{\bf R}^2}$ using ${r}$ arbitrary lines, one creates at most ${O(r^2)}$ cells (because each new line intersects the existing lines at most once, and so can create at most ${O(r)}$ distinct cells), and for a similar reason, every line in ${L}$ visits at most ${r}$ of these regions, and so by double counting one expects ${O(|L|/r)}$ lines per cell “on the average”. The key difficulty is then to get ${O(|L|/r)}$ lines through every cell, not just on the average. It turns out that a probabilistic argument will almost work, but with a logarithmic loss (thus having ${O( |L| \log |L| / r )}$ lines per cell rather than ${O(|L|/r)}$); but with a little more work one can then iterate away this loss also. The arguments here are loosely based on those of Clarkson et al.; a related (deterministic) decomposition also appears in the original paper of Szemerédi and Trotter. But I wish to focus here on the probabilistic approach.)

It is also worth noting that the original (somewhat complicated) argument of Szemerédi-Trotter has been adapted to establish the analogue of (1) in the complex plane ${{\bf C}^2}$ by Toth, while the other known proofs of Szemerédi-Trotter, so far, have not been able to be extended to this setting (the Euler characteristic argument clearly breaks down, as does any proof based on using lines to divide planes into half-spaces). So all three proofs have their advantages and disadvantages.

In the theory of discrete random matrices (e.g. matrices whose entries are random signs ${\pm 1}$), one often encounters the problem of understanding the distribution of the random variable ${\hbox{dist}(X,V)}$, where ${X = (x_1,\ldots,x_n) \in \{-1,+1\}^n}$ is an ${n}$-dimensional random sign vector (so ${X}$ is uniformly distributed in the discrete cube ${\{-1,+1\}^n}$), and ${V}$ is some ${d}$-dimensional subspace of ${{\bf R}^n}$ for some ${0 \leq d \leq n}$.

It is not hard to compute the second moment of this random variable. Indeed, if ${P = (p_{ij})_{1 \leq i,j \leq n}}$ denotes the orthogonal projection matrix from ${{\bf R}^n}$ to the orthogonal complement ${V^\perp}$ of ${V}$, then one observes that

$\displaystyle \hbox{dist}(X,V)^2 = X \cdot P X = \sum_{i=1}^n \sum_{j=1}^n x_i x_j p_{ij}$

and so upon taking expectations we see that

$\displaystyle {\Bbb E} \hbox{dist}(X,V)^2 = \sum_{i=1}^n p_{ii} = \hbox{tr} P = n-d \ \ \ \ \ (1)$

since ${P}$ is a rank ${n-d}$ orthogonal projection. So we expect ${\hbox{dist}(X,V)}$ to be about ${\sqrt{n-d}}$ on the average.

In fact, one has sharp concentration around this value, in the sense that ${\hbox{dist}(X,V) = \sqrt{n-d}+O(1)}$ with high probability. More precisely, we have

Proposition 1 (Large deviation inequality) For any ${t>0}$, one has

$\displaystyle {\Bbb P}( |\hbox{dist}(X,V) - \sqrt{n-d}| \geq t ) \leq C \exp(- c t^2 )$

for some absolute constants ${C, c > 0}$.

In fact the constants ${C, c}$ are very civilised; for large ${t}$ one can basically take ${C=4}$ and ${c=1/16}$, for instance. This type of concentration, particularly for subspaces ${V}$ of moderately large codimension ${n-d}$, is fundamental to much of my work on random matrices with Van Vu, starting with our first paper (in which this proposition first appears). (For subspaces of small codimension (such as hyperplanes) one has to use other tools to get good results, such as inverse Littlewood-Offord theory or the Berry-Esséen central limit theorem, but that is another story.)

Proposition 1 is an easy consequence of the second moment computation and Talagrand’s inequality, which among other things provides a sharp concentration result for convex Lipschitz functions on the cube ${\{-1,+1\}^n}$; since ${\hbox{dist}(x,V)}$ is indeed a convex Lipschitz function, this inequality can be applied immediately. The proof of Talagrand’s inequality is short and can be found in several textbooks (e.g. Alon and Spencer), but I thought I would reproduce the argument here (specialised to the convex case), mostly to force myself to learn the proof properly. Note the concentration of ${O(1)}$ obtained by Talagrand’s inequality is much stronger than what one would get from more elementary tools such as Azuma’s inequality or McDiarmid’s inequality, which would only give concentration of about ${O(\sqrt{n})}$ or so (which is in fact trivial, since the cube ${\{-1,+1\}^n}$ has diameter ${2\sqrt{n}}$); the point is that Talagrand’s inequality is very effective at exploiting the convexity of the problem, as well as the Lipschitz nature of the function in all directions, whereas Azuma’s inequality can only easily take advantage of the Lipschitz nature of the function in coordinate directions. On the other hand, Azuma’s inequality works just as well if the ${\ell^2}$ metric is replaced with the larger ${\ell^1}$ metric, and one can conclude that the ${\ell^1}$ distance between ${X}$ and ${V}$ concentrates around its median to a width ${O(\sqrt{n})}$, which is a more non-trivial fact than the ${\ell^2}$ concentration bound given by that inequality. (The computation of the median of the ${\ell^1}$ distance is more complicated than for the ${\ell^2}$ distance, though, and depends on the orientation of ${V}$.)

Remark 1 If one makes the coordinates of ${X}$ iid Gaussian variables ${x_i \equiv N(0,1)}$ rather than random signs, then Proposition 1 is much easier to prove; the probability distribution of a Gaussian vector is rotation-invariant, so one can rotate ${V}$ to be, say, ${{\bf R}^d}$, at which point ${\hbox{dist}(X,V)^2}$ is clearly the sum of ${n-d}$ independent squares of Gaussians (i.e. a chi-square distribution), and the claim follows from direct computation (or one can use the Chernoff inequality). The gaussian counterpart of Talagrand’s inequality is more classical, being essentially due to Lévy, and will also be discussed later in this post.

A fundamental characteristic of many mathematical spaces (e.g. vector spaces, metric spaces, topological spaces, etc.) is their dimension, which measures the “complexity” or “degrees of freedom” inherent in the space. There is no single notion of dimension; instead, there are a variety of different versions of this concept, with different versions being suitable for different classes of mathematical spaces. Typically, a single mathematical object may have several subtly different notions of dimension that one can place on it, which will be related to each other, and which will often agree with each other in “non-pathological” cases, but can also deviate from each other in many other situations. For instance:

• One can define the dimension of a space ${X}$ by seeing how it compares to some standard reference spaces, such as ${{\bf R}^n}$ or ${{\bf C}^n}$; one may view a space as having dimension ${n}$ if it can be (locally or globally) identified with a standard ${n}$-dimensional space. The dimension of a vector space or a manifold can be defined in this fashion.
• Another way to define dimension of a space ${X}$ is as the largest number of “independent” objects one can place inside that space; this can be used to give an alternate notion of dimension for a vector space, or of an algebraic variety, as well as the closely related notion of the transcendence degree of a field. The concept of VC dimension in machine learning also broadly falls into this category.
• One can also try to define dimension inductively, for instance declaring a space ${X}$ to be ${n}$-dimensional if it can be “separated” somehow by an ${n-1}$-dimensional object; thus an ${n}$-dimensional object will tend to have “maximal chains” of sub-objects of length ${n}$ (or ${n+1}$, depending on how one initialises the chain and how one defines length). This can give a notion of dimension for a topological space or a commutative ring.

The notions of dimension as defined above tend to necessarily take values in the natural numbers (or the cardinal numbers); there is no such space as ${{\bf R}^{\sqrt{2}}}$, for instance, nor can one talk about a basis consisting of ${\pi}$ linearly independent elements, or a chain of maximal ideals of length ${e}$. There is however a somewhat different approach to the concept of dimension which makes no distinction between integer and non-integer dimensions, and is suitable for studying “rough” sets such as fractals. The starting point is to observe that in the ${d}$-dimensional space ${{\bf R}^d}$, the volume ${V}$ of a ball of radius ${R}$ grows like ${R^d}$, thus giving the following heuristic relationship

$\displaystyle \frac{\log V}{\log R} \approx d \ \ \ \ \ (1)$

between volume, scale, and dimension. Formalising this heuristic leads to a number of useful notions of dimension for subsets of ${{\bf R}^n}$ (or more generally, for metric spaces), including (upper and lower) Minkowski dimension (also known as box-packing dimension or Minkowski-Bougliand dimension), and Hausdorff dimension.

[In ${K}$-theory, it is also convenient to work with virtual" vector spaces or vector bundles, such as formal differences of such spaces, and which may therefore have a negative dimension; but as far as I am aware there is no connection between this notion of dimension and the metric ones given here.]

Minkowski dimension can either be defined externally (relating the external volume of ${\delta}$-neighbourhoods of a set ${E}$ to the scale ${\delta}$) or internally (relating the internal ${\delta}$-entropy of ${E}$ to the scale). Hausdorff dimension is defined internally by first introducing the ${d}$-dimensional Hausdorff measure of a set ${E}$ for any parameter ${0 \leq d < \infty}$, which generalises the familiar notions of length, area, and volume to non-integer dimensions, or to rough sets, and is of interest in its own right. Hausdorff dimension has a lengthier definition than its Minkowski counterpart, but is more robust with respect to operations such as countable unions, and is generally accepted as the “standard” notion of dimension in metric spaces. We will compare these concepts against each other later in these notes.

One use of the notion of dimension is to create finer distinctions between various types of “small” subsets of spaces such as ${{\bf R}^n}$, beyond what can be achieved by the usual Lebesgue measure (or Baire category). For instance, a point, line, and plane in ${{\bf R}^3}$ all have zero measure with respect to three-dimensional Lebesgue measure (and are nowhere dense), but of course have different dimensions (${0}$, ${1}$, and ${2}$ respectively). (The Kakeya set conjecture, discussed recently on this blog, offers another good example.) This can be used to clarify the nature of various singularities, such as that arising from non-smooth solutions to PDE; a function which is non-smooth on a set of large Hausdorff dimension can be considered less smooth than one which is non-smooth on a set of small Hausdorff dimension, even if both are smooth almost everywhere. While many properties of the singular set of such a function are worth studying (e.g. their rectifiability), understanding their dimension is often an important starting point. The interplay between these types of concepts is the subject of geometric measure theory.

The notion of what it means for a subset E of a space X to be “small” varies from context to context.  For instance, in measure theory, when $X = (X, {\mathcal X}, \mu)$ is a measure space, one useful notion of a “small” set is that of a null set: a set E of measure zero (or at least contained in a set of measure zero).  By countable additivity, countable unions of null sets are null.  Taking contrapositives, we obtain

Lemma 1. (Pigeonhole principle for measure spaces) Let $E_1, E_2, \ldots$ be an at most countable sequence of measurable subsets of a measure space X.  If $\bigcup_n E_n$ has positive measure, then at least one of the $E_n$ has positive measure.

Now suppose that X was a Euclidean space ${\Bbb R}^d$ with Lebesgue measure m.  The Lebesgue differentiation theorem easily implies that having positive measure is equivalent to being “dense” in certain balls:

Proposition 1. Let $E$ be a measurable subset of ${\Bbb R}^d$.  Then the following are equivalent:

1. E has positive measure.
2. For any $\varepsilon > 0$, there exists a ball B such that $m( E \cap B ) \geq (1-\varepsilon) m(B)$.

Thus one can think of a null set as a set which is “nowhere dense” in some measure-theoretic sense.

It turns out that there are analogues of these results when the measure space $X = (X, {\mathcal X}, \mu)$  is replaced instead by a complete metric space $X = (X,d)$.  Here, the appropriate notion of a “small” set is not a null set, but rather that of a nowhere dense set: a set E which is not dense in any ball, or equivalently a set whose closure has empty interior.  (A good example of a nowhere dense set would be a proper subspace, or smooth submanifold, of ${\Bbb R}^d$, or a Cantor set; on the other hand, the rationals are a dense subset of ${\Bbb R}$ and thus clearly not nowhere dense.)   We then have the following important result:

Theorem 1. (Baire category theorem). Let $E_1, E_2, \ldots$ be an at most countable sequence of subsets of a complete metric space X.  If $\bigcup_n E_n$ contains a ball B, then at least one of the $E_n$ is dense in a sub-ball B’ of B (and in particular is not nowhere dense).  To put it in the contrapositive: the countable union of nowhere dense sets cannot contain a ball.

Exercise 1. Show that the Baire category theorem is equivalent to the claim that in a complete metric space, the countable intersection of open dense sets remain dense.  $\diamond$

Exercise 2. Using the Baire category theorem, show that any non-empty complete metric space without isolated points is uncountable.  (In particular, this shows that Baire category theorem can fail for incomplete metric spaces such as the rationals ${\Bbb Q}$.)  $\diamond$

To quickly illustrate an application of the Baire category theorem, observe that it implies that one cannot cover a finite-dimensional real or complex vector space ${\Bbb R}^n, {\Bbb C}^n$ by a countable number of proper subspaces.  One can of course also establish this fact by using Lebesgue measure on this space.  However, the advantage of the Baire category approach is that it also works well in infinite dimensional complete normed vector spaces, i.e. Banach spaces, whereas the measure-theoretic approach runs into significant difficulties in infinite dimensions.  This leads to three fundamental equivalences between the qualitative theory of continuous linear operators on Banach spaces (e.g. finiteness, surjectivity, etc.) to the quantitative theory (i.e. estimates):

1. The uniform boundedness principle, that equates the qualitative boundedness (or convergence) of a family of continuous operators with their quantitative boundedness.
2. The open mapping theorem, that equates the qualitative solvability of a linear problem Lu = f with the quantitative solvability.
3. The closed graph theorem, that equates the qualitative regularity of a (weakly continuous) operator T with the quantitative regularity of that operator.

Strictly speaking, these theorems are not used much directly in practice, because one usually works in the reverse direction (i.e. first proving quantitative bounds, and then deriving qualitative corollaries); but the above three theorems help explain why we usually approach qualitative problems in functional analysis via their quantitative counterparts.

To progress further in our study of function spaces, we will need to develop the standard theory of metric spaces, and of the closely related theory of topological spaces (i.e. point-set topology).  I will be assuming that students in my class will already have encountered these concepts in an undergraduate topology or real analysis course, but for sake of completeness I will briefly review the basics of both spaces here.