You are currently browsing the monthly archive for September 2010.

Gil Kalai has officially started the Polymath3 project (Polynomial Hirsch conjecture) with a research thread at his blog.

The original aim of this project is to prove the polynomial Hirsch conjecture, which is a conjecture in the combinatorial geometry of polytopes.  However, there is a reduction due to Eisenbrand, Hahnle, Razborov, and Rothvoss that would deduce this conjecture from a purely combinatorial conjecture, which can be stated as follows.

Combinatorial polynomial Hirsch conjecture. Let {\mathcal F}_1,\ldots,{\mathcal F}_t be non-empty collections of subsets of \{1,\ldots,n\} with the following properties:

  1. (Disjointness) {\mathcal F}_i \cap {\mathcal F}_j = \emptyset for every i \neq j.
  2. (Connectedness)  If 1 \leq i < j < k \leq t and A \in {\mathcal F}_i, B \in {\mathcal F}_ k, there exists C \in {\mathcal F}_j such that A \cap B \subset C.

Then t is of polynomial size in n (i.e. t = O(n^{O(1)})).

For instance, when n=3, one can obtain such a family with t=6, e.g.

{\mathcal F}_1 = \{ \emptyset\}, {\mathcal F}_2 = \{ \{1\}\}, {\mathcal F}_3 = \{\{1,2\}\}, {\mathcal F}_4 = \{\{1,2,3\}\},

{\mathcal F}_5 = \{\{2,3\}\}, {\mathcal F}_6 = \{\{3\}\};

one can show that this is the best possible value of t for this choice of n.  The best possible value of t for n=4 is still not worked out; it is between 8 and 11.

One appealing thing about this problem is that there is a simple elementary argument that gives the bound t \leq n^{\log_2 n + 1} for all n \geq 2; and so in some sense one is “only a logarithm away” from proving the conjecture.  Anyway, the project is just starting, and does not require any particularly specialised background, so anyone who may be interested in this problem one may want to take a look at the research thread.

Thus far, we have only focused on measure and integration theory in the context of Euclidean spaces {{\bf R}^d}. Now, we will work in a more abstract and general setting, in which the Euclidean space {{\bf R}^d} is replaced by a more general space {X}.

It turns out that in order to properly define measure and integration on a general space {X}, it is not enough to just specify the set {X}. One also needs to specify two additional pieces of data:

  1. A collection {{\mathcal B}} of subsets of {X} that one is allowed to measure; and
  2. The measure {\mu(E) \in [0,+\infty]} one assigns to each measurable set {E \in {\mathcal B}}.

For instance, Lebesgue measure theory covers the case when {X} is a Euclidean space {{\bf R}^d}, {{\mathcal B}} is the collection {{\mathcal B} = {\mathcal L}[{\bf R}^d]} of all Lebesgue measurable subsets of {{\bf R}^d}, and {\mu(E)} is the Lebesgue measure {\mu(E)=m(E)} of {E}.

The collection {{\mathcal B}} has to obey a number of axioms (e.g. being closed with respect to countable unions) that make it a {\sigma}-algebra, which is a stronger variant of the more well-known concept of a boolean algebra. Similarly, the measure {\mu} has to obey a number of axioms (most notably, a countable additivity axiom) in order to obtain a measure and integration theory comparable to the Lebesgue theory on Euclidean spaces. When all these axioms are satisfied, the triple {(X, {\mathcal B}, \mu)} is known as a measure space. These play much the same role in abstract measure theory that metric spaces or topological spaces play in abstract point-set topology, or that vector spaces play in abstract linear algebra.

On any measure space, one can set up the unsigned and absolutely convergent integrals in almost exactly the same way as was done in the previous notes for the Lebesgue integral on Euclidean spaces, although the approximation theorems are largely unavailable at this level of generality due to the lack of such concepts as “elementary set” or “continuous function” for an abstract measure space. On the other hand, one does have the fundamental convergence theorems for the subject, namely Fatou’s lemma, the monotone convergence theorem and the dominated convergence theorem, and we present these results here.

One question that will not be addressed much in this current set of notes is how one actually constructs interesting examples of measures. We will discuss this issue more in later notes (although one of the most powerful tools for such constructions, namely the Riesz representation theorem, will not be covered until 245B).

Read the rest of this entry »

Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers U^{s+1}[N] norm“, which was previously announced on this blog.  We are still planning one final round of reviewing the preprint before submitting the paper, but it has gotten to the stage where we are comfortable with having the paper available on the arXiv.

The main result of the paper is to establish the inverse conjecture for the Gowers norm over the integers, which has a number of applications, in particular to counting solutions to various linear equations in primes.  In spirit, the proof of the paper follows the 21-page announcement that was uploaded previously.  However, for various rather annoying technical reasons, the 117-page paper has to devote a large amount of space to setting up various bits of auxiliary machinery (as well as a dozen or so pages worth of examples and discussion).  For instance, the announcement motivates many of the steps of the argument by heuristically identifying nilsequences n \mapsto F(g(n) \Gamma) with bracket polynomial phases such as n \mapsto e( \{ \alpha n \} \beta n ).  However, a rather significant amount of theory (which was already worked out to a large extent by Leibman) is needed to formalise the “bracket algebra” needed to manipulate such bracket polynomials and to connect them with nilsequences.  Furthermore, the “piecewise smooth” nature of bracket polynomials causes some technical issues with the equidistribution theory for these sequences.  Our original version of the paper (which was even longer than the current version) set out this theory.  But we eventually decided that it was best to eschew almost all use of bracket polynomials (except as motivation and examples), and run the argument almost entirely within the language of nilsequences, to keep the argument a bit more notationally focused (and to make the equidistribution theory easier to establish).  But this was not without a tradeoff; some statements that are almost trivially true for bracket polynomials, required some “nilpotent algebra” to convert to the language of nilsequences.  Here are some examples of this:

  1. It is intuitively clear that a bracket polynomial phase e(P(n)) of degree k in one variable n can be “multilinearised” to a polynomial e(Q(n_1,\ldots,n_k)) of multi-degree (1,\ldots,1) in k variables n_1,\ldots,n_k, such that e(P(n)) and e(Q(n,\ldots,n)) agree modulo lower order terms.  For instance, if e(P(n)) = e(\alpha n \{ \beta n \{ \gamma n \} \}) (so k=3), then one could take e(Q(n_1,n_2,n_3)) = e( \alpha n_1 \{ \beta n_2 \{ \gamma n_3 \} \}).   The analogue of this statement for nilsequences is true, but required a moderately complicated nilpotent algebra construction using the Baker-Campbell-Hausdorff formula.
  2. Suppose one has a bracket polynomial phase e(P_h(n)) of degree k in one variable n that depends on an additional parameter h, in such a way that exactly one of the coefficients in each monomial depends on h.  Furthermore, suppose this dependence is bracket linear in h.  Then it is intuitively clear that this phase can be rewritten (modulo lower order terms) as e( Q(h,n) ) where Q is a bracket polynomial of multidegree (1,k) in (h,n).  For instance, if e(P_h(n)) = e( \{ \alpha_h n \} \beta n ) and \alpha_h = \{\gamma h \} \delta, then we can take e(Q(h,n)) = e(\{ \{\gamma h\} \delta n\} \beta n ).  The nilpotent algebra analogue of this claim is true, but requires another moderately complicated nilpotent algebra construction based on semi-direct products.
  3. A bracket polynomial has a fairly visible concept of a “degree” (analogous to the corresponding notion for true polynomials), as well as a “rank” (which, roughly speaking measures the number of parentheses in the bracket monomials, plus one).  Thus, for instance, the bracket monomial \{\{ \alpha n^4 \} \beta n \} \gamma n^2 has degree 7 and rank 3.  Defining degree and rank for nilsequences requires one to generalise the notion of a (filtered) nilmanifold to one in which the lower central series is replaced by a filtration indexed by both the degree and the rank.

There are various other tradeoffs of this type in this paper.  For instance, nonstandard analysis tools were introduced to eliminate what would otherwise be quite a large number of epsilons and regularity lemmas to manage, at the cost of some notational overhead; and the piecewise discontinuities mentioned earlier were eliminated by the use of vector-valued nilsequences, though this again caused some further notational overhead.    These difficulties may be a sign that we do not yet have the “right” proof of this conjecture, but one will probably have to wait a few years before we get a proper amount of perspective and understanding on this circle of ideas and results.

I’ve uploaded to the arXiv the polymath research paper “Deterministic methods to find primes“, which is the outcome of the Polymath4 collaborative mathematics project, and has been submitted to Mathematics of Computation.

The objective of this paper was to find fast deterministic algorithms to solve the following problem:

Given a (large) integer x, find a prime p larger than x.

Thanks to the AKS algorithm, a number of size O(x) can be deterministically tested for primality in time O( \log^{O(1)} x ) = O( x^{o(1)} ).   By Bertrand’s postulate, there is always at least one prime between x and 2x; by testing each one of these integers in turn for primality, one can thus obtain a deterministic algorithm to find primes in time O( x^{1+o(1)}).

But one should be able to do much better.  For comparison, if probabilistic algorithms are allowed, then by randomly selecting integers between x and 2x to test for primality, it is easy to see from the prime number theorem that one will succeed in obtaining a prime with high probability in time O( \log^{O(1)} x ) = O( x^{o(1)} ).  However, after some effort we were not able to “derandomise” this algorithm to create any reasonable deterministic counterpart.  Nevertheless, we conjecture that a deterministic algorithm with run time O( x^{o(1)}) exists.  Such algorithms can be easily obtained if one assumes some standard conjectures regarding the primes (e.g. Cramer’s conjecture on prime gaps), but we do not know of any deterministic algorithms which can be unconditionally proved to run in time O( x^{o(1)}).

Currently, the best known deterministic algorithm is due to Lagarias and Odlyzko, and has a run time of O( x^{1/2+o(1)}).  Roughly speaking, it is based on the ability to compute the prime counting function \pi(x) in time O( x^{1/2+o(1)} ); once one has this function, one can detect which intervals contain primes or not, and then starting from Bertrand’s postulate and performing a binary search one can then locate a prime.  The Lagarias-Odlyzko argument is based on approximating \pi(x) by a certain integral of the Riemann zeta function, which one then approximates in turn by quadrature.

We conjecture that one should be able to compute \pi(x) in faster time, and in particular in time O(x^{1/2-c+o(1)}) for some c>0.  Unfortunately, we were not able to achieve this; however, we do have a non-trivial method to compute the parity \pi(x) \hbox{ mod } 2 of \pi(x) in such a time; a bit more generally (and oversimplifying a little bit), we can compute various projections \sum_{p \leq x} t^p \hbox{ mod } 2, g(t) of the prime polynomial \sum_{p \leq x} t^p modulo some small polynomials g.  This seems close to being able to achieve the goal of detecting whether primes exist in a given interval, but unfortunately some additional work seems to be needed in that area.  Still, the methods used to get the parity seem to be interesting (involving the Dirichlet hyperbola method, a piecewise approximation of the hyperbola, and the Strassen fast multiplication algorithm), and these techniques might be useful for some other problems.

Roughly speaking, the idea to compute the parity of \pi(x) \hbox{ mod } 2 is as follows.  The first observation is that, for square-free n, the number \tau(n) of divisors of n is equal to 2 when n is a prime, and a multiple of 4 otherwise.  So to compute the parity of \pi(x), it suffices to compute \sum_{n \leq x} \tau(n) modulo 4 (or more precisely, the restriction of this sum to squarefree n).

The restriction to squarefree numbers turns out to be relatively easy to handle, so we ignore it.  Since \tau(n) = \sum_{a,b: ab=n} 1, we can rewrite

\sum_{n \leq x} \tau(n) = \sum_{a,b: ab \leq x} 1.

So it suffices to find an efficient way to count the number of lattice points below the hyperbola \{ (a,b): ab = x \}.

The classic way to do this is via the Dirichlet hyperbola method.  This lets one rewrite the above expression as

2 \sum_{a \leq \sqrt{x}} \lfloor \frac{x}{a} \rfloor - \lfloor \sqrt{x}\rfloor^2

(assuming x is not a perfect square, where \lfloor x \rfloor is the greatest integer function).  One can then compute this quantity in time O(x^{1/2+o(1)}) without difficulty.

To improve this to O(x^{1/2-c+o(1)}), we used some ideas of Vinogradov, based on using Diophantine approximation to find moderately long arithmetic progressions on which the map a \mapsto \lfloor \frac{x}{a} \rfloor is linear, and thus easily summable.

The same method extends to the polynomial setting, though now, instead of summing an arithmetic progression, one has to find an efficient way to sum quadratic sums such as \sum_{n=1}^N t^{an^2+bn+c}.  This turns out to be possible by expressing this sum as a matrix product and then using fast matrix multiplication algorithms.

There is still some scope to improve the results and methods further; Ernie Croot is pursuing this with two undergraduate students, David Hollis and David Lowry,

In the previous notes, we defined the Lebesgue measure {m(E)} of a Lebesgue measurable set {E \subset {\bf R}^d}, and set out the basic properties of this measure. In this set of notes, we use Lebesgue measure to define the Lebesgue integral

\displaystyle \int_{{\bf R}^d} f(x)\ dx

of functions {f: {\bf R}^d \rightarrow {\bf C} \cup \{\infty\}}. Just as not every set can be measured by Lebesgue measure, not every function can be integrated by the Lebesgue integral; the function will need to be Lebesgue measurable. Furthermore, the function will either need to be unsigned (taking values on {[0,+\infty]}), or absolutely integrable.

To motivate the Lebesgue integral, let us first briefly review two simpler integration concepts. The first is that of an infinite summation

\displaystyle \sum_{n=1}^\infty c_n

of a sequence of numbers {c_n}, which can be viewed as a discrete analogue of the Lebesgue integral. Actually, there are two overlapping, but different, notions of summation that we wish to recall here. The first is that of the unsigned infinite sum, when the {c_n} lie in the extended non-negative real axis {[0,+\infty]}. In this case, the infinite sum can be defined as the limit of the partial sums

\displaystyle \sum_{n=1}^\infty c_n = \lim_{N \rightarrow \infty} \sum_{n=1}^N c_n \ \ \ \ \ (1)


or equivalently as a supremum of arbitrary finite partial sums:

\displaystyle \sum_{n=1}^\infty c_n = \sup_{A \subset {\bf N}, A \hbox{ finite}} \sum_{n \in A} c_n. \ \ \ \ \ (2)


The unsigned infinite sum {\sum_{n=1}^\infty c_n} always exists, but its value may be infinite, even when each term is individually finite (consider e.g. {\sum_{n=1}^\infty 1}).

The second notion of a summation is the absolutely summable infinite sum, in which the {c_n} lie in the complex plane {{\bf C}} and obey the absolute summability condition

\displaystyle \sum_{n=1}^\infty |c_n| < \infty,

where the left-hand side is of course an unsigned infinite sum. When this occurs, one can show that the partial sums {\sum_{n=1}^N c_n} converge to a limit, and we can then define the infinite sum by the same formula (1) as in the unsigned case, though now the sum takes values in {{\bf C}} rather than {[0,+\infty]}. The absolute summability condition confers a number of useful properties that are not obeyed by sums that are merely conditionally convergent; most notably, the value of an absolutely convergent sum is unchanged if one rearranges the terms in the series in an arbitrary fashion. Note also that the absolutely summable infinite sums can be defined in terms of the unsigned infinite sums by taking advantage of the formulae

\displaystyle \sum_{n=1}^\infty c_n = (\sum_{n=1}^\infty \hbox{Re}(c_n)) + i (\sum_{n=1}^\infty \hbox{Im}(c_n))

for complex absolutely summable {c_n}, and

\displaystyle \sum_{n=1}^\infty c_n = \sum_{n=1}^\infty c_n^+ - \sum_{n=1}^\infty c_n^-

for real absolutely summable {c_n}, where {c_n^+ := \max(c_n,0)} and {c_n^- := \max(-c_n,0)} are the (magnitudes of the) positive and negative parts of {c_n}.

In an analogous spirit, we will first define an unsigned Lebesgue integral {\int_{{\bf R}^d} f(x)\ dx} of (measurable) unsigned functions {f: {\bf R}^d \rightarrow [0,+\infty]}, and then use that to define the absolutely convergent Lebesgue integral {\int_{{\bf R}^d} f(x)\ dx} of absolutely integrable functions {f: {\bf R}^d \rightarrow {\bf C} \cup \{\infty\}}. (In contrast to absolutely summable series, which cannot have any infinite terms, absolutely integrable functions will be allowed to occasionally become infinite. However, as we will see, this can only happen on a set of Lebesgue measure zero.)

To define the unsigned Lebesgue integral, we now turn to another more basic notion of integration, namely the Riemann integral {\int_a^b f(x)\ dx} of a Riemann integrable function {f: [a,b] \rightarrow {\bf R}}. Recall from the prologue that this integral is equal to the lower Darboux integral

\displaystyle \int_a^b f(x) = \underline{\int_a^b} f(x)\ dx := \sup_{g \leq f; g \hbox{ piecewise constant}} \hbox{p.c.} \int_a^b g(x)\ dx.

(It is also equal to the upper Darboux integral; but much as the theory of Lebesgue measure is easiest to define by relying solely on outer measure and not on inner measure, the theory of the unsigned Lebesgue integral is easiest to define by relying solely on lower integrals rather than upper ones; the upper integral is somewhat problematic when dealing with “improper” integrals of functions that are unbounded or are supported on sets of infinite measure.) Compare this formula also with (2). The integral {\hbox{p.c.} \int_a^b g(x)\ dx} is a piecewise constant integral, formed by breaking up the piecewise constant functions {g, h} into finite linear combinations of indicator functions of intervals, and then measuring the length of each interval.

It turns out that virtually the same definition allows us to define a lower Lebesgue integral {\underline{\int_{{\bf R}^d}} f(x)\ dx} of any unsigned function {f: {\bf R}^d \rightarrow [0,+\infty]}, simply by replacing intervals with the more general class of Lebesgue measurable sets (and thus replacing piecewise constant functions with the more general class of simple functions). If the function is Lebesgue measurable (a concept that we will define presently), then we refer to the lower Lebesgue integral simply as the Lebesgue integral. As we shall see, it obeys all the basic properties one expects of an integral, such as monotonicity and additivity; in subsequent notes we will also see that it behaves quite well with respect to limits, as we shall see by establishing the two basic convergence theorems of the unsigned Lebesgue integral, namely Fatou’s lemma and the monotone convergence theorem.

Once we have the theory of the unsigned Lebesgue integral, we will then be able to define the absolutely convergent Lebesgue integral, similarly to how the absolutely convergent infinite sum can be defined using the unsigned infinite sum. This integral also obeys all the basic properties one expects, such as linearity and compatibility with the more classical Riemann integral; in subsequent notes we will see that it also obeys a fundamentally important convergence theorem, the dominated convergence theorem. This convergence theorem makes the Lebesgue integral (and its abstract generalisations to other measure spaces than {{\bf R}^d}) particularly suitable for analysis, as well as allied fields that rely heavily on limits of functions, such as PDE, probability, and ergodic theory.

Remark 1 This is not the only route to setting up the unsigned and absolutely convergent Lebesgue integrals. Stein-Shakarchi, for instance, proceeds slightly differently, beginning with the unsigned integral but then making an auxiliary stop at integration of functions that are bounded and are supported on a set of finite measure, before going to the absolutely convergent Lebesgue integral. Another approach (which will not be discussed here) is to take the metric completion of the Riemann integral with respect to the {L^1} metric.

The Lebesgue integral and Lebesgue measure can be viewed as completions of the Riemann integral and Jordan measure respectively. This means three things. Firstly, the Lebesgue theory extends the Riemann theory: every Jordan measurable set is Lebesgue measurable, and every Riemann integrable function is Lebesgue measurable, with the measures and integrals from the two theories being compatible. Conversely, the Lebesgue theory can be approximated by the Riemann theory; as we saw in the previous notes, every Lebesgue measurable set can be approximated (in various senses) by simpler sets, such as open sets or elementary sets, and in a similar fashion, Lebesgue measurable functions can be approximated by nicer functions, such as Riemann integrable or continuous functions. Finally, the Lebesgue theory is complete in various ways; we will formalise this properly only in the next quarter when we study {L^p} spaces, but the convergence theorems mentioned above already hint at this completeness. A related fact, known as Egorov’s theorem, asserts that a pointwise converging sequence of functions can be approximated as a (locally) uniformly converging sequence of functions. The facts listed here manifestations of Littlewood’s three principles of real analysis, which capture much of the essence of the Lebesgue theory.

Read the rest of this entry »

I’ve spent the last week or so reworking the first draft of my universality article for Mathematics Awareness Month, in view of the useful comments and feedback received on that draft here on this blog, as well as elsewhere.  In fact, I ended up rewriting the article from scratch, and expanding it substantially, in order to focus on a more engaging and less technical narrative.  I found that I had to use a substantially different mindset than the one I am used to having for technical expository writing; indeed, the exercise reminded me more of my high school English assignments than of my professional work.  (This is perhaps a bad sign: English was not exactly my strongest subject as a student.)

The piece now has title: “E pluribus unum: from complexity, universality”.  This is a somewhat US-centric piece of wordplay, but Mathematics Awareness Month is, after all, a US-based initiative, even though awareness of mathematics certainly transcends national boundaries.   Still, it is a trivial matter to modify the title later if a better proposal arises, and I am sure that if I send this text to be published, that the editors may have some suggestions in this regard.

By coincidence, I moved up and expanded the other US-centric item – the discussion of the 2008 US presidential elections – to the front of the paper to play the role of the hook.  I’ll try to keep the Commonwealth spelling conventions, though. :-)

I decided to cut out the discussion of the N-body problem for various values of N, in part due to the confusion over the notion of a “solution”; there is a nice mathematical story there, but perhaps one that gets in the way of the main story of universality.

I have added a fair number of relevant images, though some of them will have to be changed in the final version for copyright reasons.  The narrow column format of this blog means that the image placement is not optimal, but I am sure that this can be rectified if this article is published professionally.

Read the rest of this entry »

In the prologue for this course, we recalled the classical theory of Jordan measure on Euclidean spaces {{\bf R}^d}. This theory proceeded in the following stages:

  1. First, one defined the notion of a box {B} and its volume {|B|}.
  2. Using this, one defined the notion of an elementary set {E} (a finite union of boxes), and defines the elementary measure {m(E)} of such sets.
  3. From this, one defined the inner and outer Jordan measures {m_{*,(J)}(E), m^{*,(J)}(E)} of an arbitrary bounded set {E \subset {\bf R}^d}. If those measures match, we say that {E} is Jordan measurable, and call {m(E) = m_{*,(J)}(E) = m^{*,(J)}(E)} the Jordan measure of {E}.

As long as one is lucky enough to only have to deal with Jordan measurable sets, the theory of Jordan measure works well enough. However, as noted previously, not all sets are Jordan measurable, even if one restricts attention to bounded sets. In fact, we shall see later in these notes that there even exist bounded open sets, or compact sets, which are not Jordan measurable, so the Jordan theory does not cover many classes of sets of interest. Another class that it fails to cover is countable unions or intersections of sets that are already known to be measurable:

Exercise 1 Show that the countable union {\bigcup_{n=1}^\infty E_n} or countable intersection {\bigcap_{n=1}^\infty E_n} of Jordan measurable sets {E_1, E_2, \ldots \subset {\bf R}} need not be Jordan measurable, even when bounded.

This creates problems with Riemann integrability (which, as we saw in the preceding notes, was closely related to Jordan measure) and pointwise limits:

Exercise 2 Give an example of a sequence of uniformly bounded, Riemann integrable functions {f_n: [0,1] \rightarrow {\bf R}} for {n=1,2,\ldots} that converge pointwise to a bounded function {f: [0,1] \rightarrow {\bf R}} that is not Riemann integrable. What happens if we replace pointwise convergence with uniform convergence?

These issues can be rectified by using a more powerful notion of measure than Jordan measure, namely Lebesgue measure. To define this measure, we first tinker with the notion of the Jordan outer measure

\displaystyle  m^{*,(J)}(E) := \inf_{B \supset E; B \hbox{ elementary}} m(B)

of a set {E \subset {\bf R}^d} (we adopt the convention that {m^{*,(J)}(E) = +\infty} if {E} is unbounded, thus {m^{*,(J)}} now takes values in the extended non-negative reals {[0,+\infty]}, whose properties we will briefly review below). Observe from the finite additivity and subadditivity of elementary measure that we can also write the Jordan outer measure as

\displaystyle  m^{*,(J)}(E) := \inf_{B_1 \cup \ldots \cup B_k \supset E; B_1,\ldots, B_k \hbox{ boxes}} |B_1| + \ldots + |B_k|,

i.e. the Jordan outer measure is the infimal cost required to cover {E} by a finite union of boxes. (The natural number {k} is allowed to vary freely in the above infimum.) We now modify this by replacing the finite union of boxes by a countable union of boxes, leading to the Lebesgue outer measure {m^*(E)} of {E}:

\displaystyle  m^*(E) := \inf_{\bigcup_{n=1}^\infty B_n \supset E; B_1, B_2, \ldots \hbox{ boxes}} \sum_{n=1}^\infty |B_n|,

thus the Lebesgue outer measure is the infimal cost required to cover {E} by a countable union of boxes. Note that the countable sum {\sum_{n=1}^\infty |B_n|} may be infinite, and so the Lebesgue outer measure {m^*(E)} could well equal {+\infty}.

(Caution: the Lebesgue outer measure {m^*(E)} is sometimes denoted {m_*(E)}; this is for instance the case in Stein-Shakarchi.)

Clearly, we always have {m^*(E) \leq m^{*,(J)}(E)} (since we can always pad out a finite union of boxes into an infinite union by adding an infinite number of empty boxes). But {m^*(E)} can be a lot smaller:

Example 1 Let {E = \{ x_1, x_2, x_3, \ldots \} \subset {\bf R}^d} be a countable set. We know that the Jordan outer measure of {E} can be quite large; for instance, in one dimension, {m^{*,(J)}({\bf Q})} is infinite, and {m^{*,(J)}({\bf Q} \cap [-R,R]) = m^{*,(J)}([-R,R]) = 2R} since {{\bf Q} \cap [-R,R]} has {[-R,R]} as its closure (see Exercise 18 of the prologue). On the other hand, all countable sets {E} have Lebesgue outer measure zero. Indeed, one simply covers {E} by the degenerate boxes {\{x_1\}, \{x_2\}, \ldots} of sidelength and volume zero.

Alternatively, if one does not like degenerate boxes, one can cover each {x_n} by a cube {B_n} of sidelength {\epsilon/2^n} (say) for some arbitrary {\epsilon > 0}, leading to a total cost of {\sum_{n=1}^\infty (\epsilon/2^n)^d}, which converges to {C_d \epsilon^d} for some absolute constant {C_d}. As {\epsilon} can be arbitrarily small, we see that the Lebesgue outer measure must be zero. We will refer to this type of trick as the {\epsilon/2^n} trick; it will be used many further times in this course.

From this example we see in particular that a set may be unbounded while still having Lebesgue outer measure zero, in contrast to Jordan outer measure.

As we shall see later in this course, Lebesgue outer measure (also known as Lebesgue exterior measure) is a special case of a more general concept known as an outer measure.

In analogy with the Jordan theory, we would also like to define a concept of “Lebesgue inner measure” to complement that of outer measure. Here, there is an asymmetry (which ultimately arises from the fact that elementary measure is subadditive rather than superadditive): one does not gain any increase in power in the Jordan inner measure by replacing finite unions of boxes with countable ones. But one can get a sort of Lebesgue inner measure by taking complements; see Exercise 18. This leads to one possible definition for Lebesgue measurability, namely the Carathéodory criterion for Lebesgue measurability, see Exercise 17. However, this is not the most intuitive formulation of this concept to work with, and we will instead use a different (but logically equivalent) definition of Lebesgue measurability. The starting point is the observation (see Exercise 5 of the prologue) that Jordan measurable sets can be efficiently contained in elementary sets, with an error that has small Jordan outer measure. In a similar vein, we will define Lebesgue measurable sets to be sets that can be efficiently contained in open sets, with an error that has small Lebesgue outer measure:

Definition 1 (Lebesgue measurability) A set {E \subset {\bf R}^d} is said to be Lebesgue measurable if, for every {\epsilon > 0}, there exists an open set {U \subset {\bf R}^d} containing {E} such that {m^*(U \backslash E) \leq \epsilon}. If {E} is Lebesgue measurable, we refer to {m(E) := m^*(E)} as the Lebesgue measure of {E} (note that this quantity may be equal to {+\infty}). We also write {m(E)} as {m^d(E)} when we wish to emphasise the dimension {d}.

(The intuition that measurable sets are almost open is also known as Littlewood’s first principle, this principle is a triviality with our current choice of definitions, though less so if one uses other, equivalent, definitions of Lebesgue measurability.)

As we shall see later, Lebesgue measure extends Jordan measure, in the sense that every Jordan measurable set is Lebesgue measurable, and the Lebesgue measure and Jordan measure of a Jordan measurable set are always equal. We will also see a few other equivalent descriptions of the concept of Lebesgue measurability.

In the notes below we will establish the basic properties of Lebesgue measure. Broadly speaking, this concept obeys all the intuitive properties one would ask of measure, so long as one restricts attention to countable operations rather than uncountable ones, and as long as one restricts attention to Lebesgue measurable sets. The latter is not a serious restriction in practice, as almost every set one actually encounters in analysis will be measurable (the main exceptions being some pathological sets that are constructed using the axiom of choice). In the next set of notes we will use Lebesgue measure to set up the Lebesgue integral, which extends the Riemann integral in the same way that Lebesgue measure extends Jordan measure; and the many pleasant properties of Lebesgue measure will be reflected in analogous pleasant properties of the Lebesgue integral (most notably the convergence theorems).

We will treat all dimensions {d=1,2,\ldots} equally here, but for the purposes of drawing pictures, we recommend to the reader that one sets {d} equal to {2}. However, for this topic at least, no additional mathematical difficulties will be encountered in the higher-dimensional case (though of course there are significant visual difficulties once {d} exceeds {3}).

The material here is based on Sections 1.1-1.3 of the Stein-Shakarchi text, though it is arranged somewhat differently.

Read the rest of this entry »

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

Read the rest of this entry »

One of the most fundamental concepts in Euclidean geometry is that of the measure {m(E)} of a solid body {E} in one or more dimensions. In one, two, and three dimensions, we refer to this measure as the length, area, or volume of {E} respectively. In the classical approach to geometry, the measure of a body was often computed by partitioning that body into finitely many components, moving around each component by a rigid motion (e.g. a translation or rotation), and then reassembling those components to form a simpler body which presumably has the same area. One could also obtain lower and upper bounds on the measure of a body by computing the measure of some inscribed or circumscribed body; this ancient idea goes all the way back to the work of Archimedes at least. Such arguments can be justified by an appeal to geometric intuition, or simply by postulating the existence of a measure {m(E)} that can be assigned to all solid bodies {E}, and which obeys a collection of geometrically reasonable axioms. One can also justify the concept of measure on “physical” or “reductionistic” grounds, viewing the measure of a macroscopic body as the sum of the measures of its microscopic components.

With the advent of analytic geometry, however, Euclidean geometry became reinterpreted as the study of Cartesian products {{\bf R}^d} of the real line {{\bf R}}. Using this analytic foundation rather than the classical geometrical one, it was no longer intuitively obvious how to define the measure {m(E)} of a general subset {E} of {{\bf R}^d}; we will refer to this (somewhat vaguely defined) problem of writing down the “correct” definition of measure as the problem of measure. (One can also pose the problem of measure on other domains than Euclidean space, such as a Riemannian manifold, but we will focus on the Euclidean case here for simplicity.)

To see why this problem exists at all, let us try to formalise some of the intuition for measure discussed earlier. The physical intuition of defining the measure of a body {E} to be the sum of the measure of its component “atoms” runs into an immediate problem: a typical solid body would consist of an infinite (and uncountable) number of points, each of which has a measure of zero; and the product {\infty \cdot 0} is indeterminate. To make matters worse, two bodies that have exactly the same number of points, need not have the same measure. For instance, in one dimension, the intervals {A := [0,1]} and {B := [0,2]} are in one-to-one correspondence (using the bijection {x \mapsto 2x} from {A} to {B}), but of course {B} is twice as long as {A}. So one can disassemble {A} into an uncountable number of points and reassemble them to form a set of twice the length.

Of course, one can point to the infinite (and uncountable) number of components in this disassembly as being the cause of this breakdown of intuition, and restrict attention to just finite partitions. But one still runs into trouble here for a number of reasons, the most striking of which is the Banach-Tarski paradox, which shows that the unit ball {B := \{ (x,y,z) \in {\bf R}^3: x^2+y^2+z^2 \leq 1 \}} in three dimensions can be disassembled into a finite number of pieces (in fact, just five pieces suffice), which can then be reassembled (after translating and rotating each of the pieces) to form two disjoint copies of the ball {B}. (The paradox only works in three dimensions and higher, for reasons having to do with the property of amenability; see this blog post for further discussion of this interesting topic, which is unfortunately too much of a digression from the current subject.)

Here, the problem is that the pieces used in this decomposition are highly pathological in nature; among other things, their construction requires use of the axiom of choice. (This is in fact necessary; there are models of set theory without the axiom of choice in which the Banach-Tarski paradox does not occur, thanks to a famous theorem of Solovay.) Such pathological sets almost never come up in practical applications of mathematics. Because of this, the standard solution to the problem of measure has been to abandon the goal of measuring every subset {E} of {{\bf R}^d}, and instead to settle for only measuring a certain subclass of “non-pathological” subsets of {{\bf R}^d}, which are then referred to as the measurable sets. The problem of measure then divides into several subproblems:

  1. What does it mean for a subset {E} of {{\bf R}^d} to be measurable?
  2. If a set {E} is measurable, how does one define its measure?
  3. What nice properties or axioms does measure (or the concept of measurability) obey?
  4. Are “ordinary” sets such as cubes, balls, polyhedra, etc. measurable?
  5. Does the measure of an “ordinary” set equal the “naive geometric measure” of such sets? (e.g. is the measure of an {a \times b} rectangle equal to {ab}?)

These questions are somewhat open-ended in formulation, and there is no unique answer to them; in particular, one can expand the class of measurable sets at the expense of losing one or more nice properties of measure in the process (e.g. finite or countable additivity, translation invariance, or rotation invariance). However, there are two basic answers which, between them, suffice for most applications. The first is the concept of Jordan measure of a Jordan measurable set, which is a concept closely related to that of the Riemann integral (or Darboux integral). This concept is elementary enough to be systematically studied in an undergraduate analysis course, and suffices for measuring most of the “ordinary” sets (e.g. the area under the graph of a continuous function) in many branches of mathematics. However, when one turns to the type of sets that arise in analysis, and in particular those sets that arise as limits (in various senses) of other sets, it turns out that the Jordan concept of measurability is not quite adequate, and must be extended to the more general notion of Lebesgue measurability, with the corresponding notion of Lebesgue measure that extends Jordan measure. With the Lebesgue theory (which can be viewed as a completion of the Jordan-Darboux-Riemann theory), one keeps almost all of the desirable properties of Jordan measure, but with the crucial additional property that many features of the Lebesgue theory are preserved under limits (as exemplified in the fundamental convergence theorems of the Lebesgue theory, such as the monotone convergence theorem and the dominated convergence theorem, which do not hold in the Jordan-Darboux-Riemann setting). As such, they are particularly well suited for applications in analysis, where limits of functions or sets arise all the time. (There are other ways to extend Jordan measure and the Riemann integral, but the Lebesgue approach handles limits better than the other alternatives, and so has become the standard approach in analysis.)

In the rest of the course, we will formally define Lebesgue measure and the Lebesgue integral, as well as the more general concept of an abstract measure space and the associated integration operation. In the rest of this post, we will discuss the more elementary concepts of Jordan measure and the Riemann integral. This material will eventually be superceded by the more powerful theory to be treated in the main body of the course; but it will serve as motivation for that later material, as well as providing some continuity with the treatment of measure and integration in undergraduate analysis courses.

Read the rest of this entry »