You are currently browsing the monthly archive for February 2009.

Today I’d like to discuss (in the Tricks Wiki format) a fundamental trick in “soft” analysis, sometimes known as the “limiting argument” or “epsilon regularisation argument”.

Title: Give yourself an epsilon of room.

Quick description: You want to prove some statement $S_0$ about some object $x_0$ (which could be a number, a point, a function, a set, etc.).  To do so, pick a small $\varepsilon > 0$, and first prove a weaker statement $S_\varepsilon$ (which allows for “losses” which go to zero as $\varepsilon \to 0$) about some perturbed object $x_\varepsilon$.  Then, take limits $\varepsilon \to 0$.  Provided that the dependency and continuity of the weaker conclusion $S_\varepsilon$ on $\varepsilon$ are sufficiently controlled, and $x_\varepsilon$ is converging to $x_0$ in an appropriately strong sense, you will recover the original statement.

One can of course play a similar game when proving a statement $S_\infty$ about some object $X_\infty$, by first proving a weaker statement $S_N$ on some approximation $X_N$ to $X_\infty$ for some large parameter N, and then send $N \to \infty$ at the end.

General discussion: Here are some typical examples of a target statement $S_0$, and the approximating statements $S_\varepsilon$ that would converge to $S$:

 $S_0$ $S_\varepsilon$ $f(x_0) = g(x_0)$ $f(x_\varepsilon) = g(x_\varepsilon) + o(1)$ $f(x_0) \leq g(x_0)$ $f(x_\varepsilon) \leq g(x_\varepsilon) + o(1)$ $f(x_0) > 0$ $f(x_\varepsilon) \geq c - o(1)$ for some $c>0$ independent of $\varepsilon$ $f(x_0)$ is finite $f(x_\varepsilon)$ is bounded uniformly in $\varepsilon$ $f(x_0) \geq f(x)$ for all $x \in X$ (i.e. $x_0$ maximises f) $f(x_\varepsilon) \geq f(x)-o(1)$ for all $x \in X$ (i.e. $x_\varepsilon$ nearly maximises f) $f_n(x_0)$ converges as $n \to \infty$ $f_n(x_\varepsilon)$ fluctuates by at most o(1) for sufficiently large n $f_0$ is a measurable function $f_\varepsilon$ is a measurable function converging pointwise to $f_0$ $f_0$ is a continuous function $f_\varepsilon$ is an equicontinuous family of functions converging pointwise to $f_0$ OR $f_\varepsilon$ is continuous and converges (locally) uniformly to $f_0$ The event $E_0$ holds almost surely The event $E_\varepsilon$ holds with probability 1-o(1) The statement $P_0(x)$ holds for almost every x The statement $P_\varepsilon(x)$ holds for x outside of a set of measure o(1)

Of course, to justify the convergence of $S_\varepsilon$ to $S_0$, it is necessary that $x_\varepsilon$ converge to $x_0$ (or $f_\varepsilon$ converge to $f_0$, etc.) in a suitably strong sense. (But for the purposes of proving just upper bounds, such as $f(x_0) \leq M$, one can often get by with quite weak forms of convergence, thanks to tools such as Fatou’s lemma or the weak closure of the unit ball.)  Similarly, we need some continuity (or at least semi-continuity) hypotheses on the functions f, g appearing above.

It is also necessary in many cases that the control $S_\varepsilon$ on the approximating object $x_\varepsilon$ is somehow “uniform in $\varepsilon$“, although for “$\sigma$-closed” conclusions, such as measurability, this is not required. [It is important to note that it is only the final conclusion $S_\varepsilon$ on $x_\varepsilon$ that needs to have this uniformity in $\varepsilon$; one is permitted to have some intermediate stages in the derivation of $S_\varepsilon$ that depend on $\varepsilon$ in a non-uniform manner, so long as these non-uniformities cancel out or otherwise disappear at the end of the argument.]

By giving oneself an epsilon of room, one can evade a lot of familiar issues in soft analysis.  For instance, by replacing “rough”, “infinite-complexity”, “continuous”,  “global”, or otherwise “infinitary” objects $x_0$ with “smooth”, “finite-complexity”, “discrete”, “local”, or otherwise “finitary” approximants $x_\varepsilon$, one can finesse most issues regarding the justification of various formal operations (e.g. exchanging limits, sums, derivatives, and integrals).  [It is important to be aware, though, that any quantitative measure on how smooth, discrete, finite, etc. $x_\varepsilon$ should be expected to degrade in the limit $\varepsilon \to 0$, and so one should take extreme caution in using such quantitative measures to derive estimates that are uniform in $\varepsilon$.]  Similarly, issues such as whether the supremum $M := \sup \{ f(x): x \in X \}$ of a function on a set is actually attained by some maximiser $x_0$ become moot if one is willing to settle instead for an almost-maximiser $x_\varepsilon$, e.g. one which comes within an epsilon of that supremum M (or which is larger than $1/\varepsilon$, if M turns out to be infinite).  Last, but not least, one can use the epsilon room to avoid degenerate solutions, for instance by perturbing a non-negative function to be strictly positive, perturbing a non-strictly monotone function to be strictly monotone, and so forth.

To summarise: one can view the epsilon regularisation argument as a “loan” in which one borrows an epsilon here and there in order to be able to ignore soft analysis difficulties, and can temporarily be able to utilise estimates which are non-uniform in epsilon, but at the end of the day one needs to “pay back” the loan by establishing a final “hard analysis” estimate which is uniform in epsilon (or whose error terms decay to zero as epsilon goes to zero).

A variant: It may seem that the epsilon regularisation trick is useless if one is already in “hard analysis” situations when all objects are already “finitary”, and all formal computations easily justified.  However, there is an important variant of this trick which applies in this case: namely, instead of sending the epsilon parameter to zero, choose epsilon to be a sufficiently small (but not infinitesimally small) quantity, depending on other parameters in the problem, so that one can eventually neglect various error terms and to obtain a useful bound at the end of the day.  (For instance, any result proven using the Szemerédi regularity lemma is likely to be of this type.)  Since one is not sending epsilon to zero, not every term in the final bound needs to be uniform in epsilon, though for quantitative applications one still would like the dependencies on such parameters to be as favourable as possible.

Prerequisites: Graduate real analysis.  (Actually, this isn’t so much a prerequisite as it is a corequisite: the limiting argument plays a central role in many fundamental results in real analysis.)  Some examples also require some exposure to PDE.

A normed vector space ${(X, \| \|_X)}$ automatically generates a topology, known as the norm topology or strong topology on ${X}$, generated by the open balls ${B(x,r) := \{ y \in X: \|y-x\|_X < r \}}$. A sequence ${x_n}$ in such a space converges strongly (or converges in norm) to a limit ${x}$ if and only if ${\|x_n-x\|_X \rightarrow 0}$ as ${n \rightarrow \infty}$. This is the topology we have implicitly been using in our previous discussion of normed vector spaces.

However, in some cases it is useful to work in topologies on vector spaces that are weaker than a norm topology. One reason for this is that many important modes of convergence, such as pointwise convergence, convergence in measure, smooth convergence, or convergence on compact subsets, are not captured by a norm topology, and so it is useful to have a more general theory of topological vector spaces that contains these modes. Another reason (of particular importance in PDE) is that the norm topology on infinite-dimensional spaces is so strong that very few sets are compact or pre-compact in these topologies, making it difficult to apply compactness methods in these topologies. Instead, one often first works in a weaker topology, in which compactness is easier to establish, and then somehow upgrades any weakly convergent sequences obtained via compactness to stronger modes of convergence (or alternatively, one abandons strong convergence and exploits the weak convergence directly). Two basic weak topologies for this purpose are the weak topology on a normed vector space ${X}$, and the weak* topology on a dual vector space ${X^*}$. Compactness in the latter topology is usually obtained from the Banach-Alaoglu theorem (and its sequential counterpart), which will be a quick consequence of the Tychonoff theorem (and its sequential counterpart) from the previous lecture.

The strong and weak topologies on normed vector spaces also have analogues for the space ${B(X \rightarrow Y)}$ of bounded linear operators from ${X}$ to ${Y}$, thus supplementing the operator norm topology on that space with two weaker topologies, which (somewhat confusingly) are named the strong operator topology and the weak operator topology.

[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]

Many properties of a (sufficiently nice) function ${f: {\mathbb R} \rightarrow {\mathbb C}}$ are reflected in its Fourier transform ${\hat f: {\mathbb R} \rightarrow {\mathbb C}}$, defined by the formula

$\displaystyle \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (1)$

For instance, decay properties of ${f}$ are reflected in smoothness properties of ${\hat f}$, as the following table shows:

 If ${f}$ is… then ${\hat f}$ is… and this relates to… Square-integrable square-integrable Plancherel’s theorem Absolutely integrable continuous Riemann-Lebesgue lemma Rapidly decreasing smooth theory of Schwartz functions Exponentially decreasing analytic in a strip Compactly supported entire and at most exponential growth Paley-Wiener theorem

Another important relationship between a function ${f}$ and its Fourier transform ${\hat f}$ is the uncertainty principle, which roughly asserts that if a function ${f}$ is highly localised in space, then its Fourier transform ${\hat f}$ must be widely dispersed in space, or to put it another way, ${f}$ and ${\hat f}$ cannot both decay too strongly at infinity (except of course in the degenerate case ${f=0}$). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise

$\displaystyle \int_{{\mathbb R}} |f(x)|^2\ dx = \int_{\mathbb R} |\hat f(\xi)|^2\ d\xi = 1$

then we must have

$\displaystyle (\int_{\mathbb R} |x|^2 |f(x)|^2\ dx) \cdot (\int_{\mathbb R} |\xi|^2 |\hat f(\xi)|^2\ dx)\geq \frac{1}{(4\pi)^2}$

thus forcing at least one of ${f}$ or ${\hat f}$ to not be too concentrated near the origin. This principle can be proven (for sufficiently nice ${f}$, initially) by observing the integration by parts identity

$\displaystyle \langle xf, f' \rangle = \int_{\mathbb R} x f(x) \overline{f'(x)}\ dx = - \frac{1}{2} \int_{\mathbb R} |f(x)|^2\ dx$

and then using Cauchy-Schwarz and the Plancherel identity.

Another well known manifestation of the uncertainty principle is the fact that it is not possible for ${f}$ and ${\hat f}$ to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if ${f}$ is compactly supported, then ${\hat f}$ is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless ${f}$ vanishes. (Indeed, the table also shows that if one of ${f}$ and ${\hat f}$ is compactly supported, then the other cannot have exponential decay.)

On the other hand, we have the example of the Gaussian functions ${f(x) = e^{-\pi a x^2}}$, ${\hat f(\xi) = \frac{1}{\sqrt{a}} e^{-\pi \xi^2/a }}$, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that ${f}$ and ${\hat f}$ can simultaneously decay:

Theorem 1 (Hardy uncertainty principle) Suppose that ${f}$ is a (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}}$ for all ${x, \xi}$ and some ${C, C', a > 0}$. Then ${f(x)}$ is a scalar multiple of the gaussian ${e^{-\pi ax^2}}$.

This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:

Theorem 2 (Weak Hardy uncertainty principle) Suppose that ${f}$ is a non-zero (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi b \xi^2}}$ for all ${x, \xi}$ and some ${C, C', a, b > 0}$. Then ${ab \leq C_0}$ for some absolute constant ${C_0}$.

Note that the correct value of ${C_0}$ should be ${1}$, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.

Van Vu and I have just uploaded to the arXiv our preprint “A sharp inverse Littlewood-Offord theorem“, which we have submitted to Random Structures and Algorithms.  This paper gives a solution to the (inverse) Littlewood-Offord problem of understanding when random walks are concentrated in the case when the concentration is of polynomial size in the length $n$ of the walk; our description is sharp up to epsilon powers of $n$.  The theory of inverse Littlewood-Offord problems and related topics has been of importance in recent developments in the spectral theory of discrete random matrices (e.g. a “robust” variant of these theorems was crucial in our work on the circular law).

For simplicity I will restrict attention to the Bernoulli random walk.  Given $n$ real numbers $v_1,\ldots,v_n$, one can form the random variable

$S := \epsilon_1 v_1 + \ldots + \epsilon_n v_n$

where $\epsilon_1,\ldots,\epsilon_n \in \{-1,+1\}$ are iid random signs (with either sign +1, -1 chosen with probability 1/2).  This is a discrete random variable which typically takes $2^n$ values.  However, if there are various arithmetic relations between the step sizes $v_1,\ldots,v_n$, then many of the $2^n$ possible sums collide, and certain values may then arise with much higher probability.  To measure this, define the concentration probability $p(v_1,\ldots,v_n)$ by the formula

$p(v_1,\ldots,v_n) = \sup_x {\Bbb P}(S=x)$.

Intuitively, this probability measures the amount of additive structure present between the $v_1,\ldots,v_n$.  There are two (opposing) problems in the subject:

• (Forward Littlewood-Offord problem) Given some structural assumptions on $v_1,\ldots,v_n$, what bounds can one place on $p(v_1,\ldots,v_n)$?
• (Inverse Littlewood-Offord problem) Given some bounds on $p(v_1,\ldots,v_n)$, what structural assumptions can one then conclude about $v_1,\ldots,v_n$?

Ideally one would like answers to both of these problems which come close to inverting each other, and this is the guiding motivation for our paper.

This thread is a continuation of the previous thread here on the polymath1 project.  Currently, activity is focusing on the following problems:

1.  Upper and lower bounds for $c_n$ for small n.

Let $c_n$ be the largest size of a set in ${}[3]^n$ without a combinatorial line.  Thanks to efforts from previous threads, we have the first five values of $c_n$:

$c_0=1; c_1=2; c_2=6; c_3=18; c_4=52$.

We also know that $150 \leq c_5 \leq 154$, and are working to narrow this further.  The arguments justifying these bounds can be found here.  The latest bounds for the next few values of $c_n$ can be found here.

2.  A hyper-optimistic conjecture

Consider a variant of the above problem in which each element of ${}[3]^n$ with a 1s, b 2s, and c 3s is weighted by the factor $\frac{a! b! c!}{n!}$; this gives ${}[3]^n$ a total weight of $\frac{(n+1)(n+2)}{2}$.  Let $c^\mu_n$ be the largest weight of a line-free set of ${}[3]^n$, and let $\overline{c}^\mu_n$ be the largest size of a subset of

$\Delta_n := \{ (a,b,c) \in {\Bbb Z}_+^3: a+b+c=n \}$

which contains no upward-pointing equilateral triangles $(a+r,b,c), (a,b+r,c), (a,b,c+r)$ with $r>0$.  It is known that $c^\mu_n \geq \overline{c}^\mu_n$; the “hyper-optimistic conjecture” is that one in fact has $c^\mu_n = \overline{c}^\mu_n$.  This would imply density Hales-Jewett for k=3.

Currently, we know the following values for these sequences:

$c^\mu_0 = 1; c^\mu_1 = 2; c^\mu_2 = 4$

and

$\overline{c}^\mu_0 = 1; \overline{c}^\mu_1 = 2; \overline{c}^\mu_2 = 4; \overline{c}^\mu_3 = 6; \overline{c}^\mu_4 = 9; \overline{c}^\mu_5 = 12$.

There are also some further upper and lower bounds known; see this spreadsheet for the latest bounds, and this page for proofs of the bounds.  The data so far is consistent with the conjecture, but more work would be needed to obtain a more convincing case for it.

3.  Asymptotics for Moser’s cube problem

Moser’s cube problem asks to compute the largest size $c'_n$ of a subset of the cube ${}[3]^n$ without geometric lines.  The first few values of $c'_n$ are known:

$c'_0=1; c'_1 = 2; c'_2 = 6; c'_3 = 14; c'_4 = 43$.

The best asymptotic lower bound known is $c'_n \gg 3^n/\sqrt{n}$.  Given that we have a significantly superior lower bound of $c_n \geq 3^{n-O(\sqrt{\log n})}$ for $c_n$, one might hope to similarly improve the lower bound for Moser’s problem.  But there seems to be a technical snag, based on the observation that between two slices $\Gamma_{a,b,c}$ and $\Gamma_{a',b',c'}$ with the $c-a=c'-a'$, there are in fact quite a few combinatorial lines connecting them, and so it is not obvious how to create a geometric line-free set that involves more than one slice per value of c-a.

It is also of interest to work out what asymptotic lower bound for $c_n$ one can get for larger values of k than 3.

Comments on this thread should start at 700.  We will also try to record progress made at the polymath1 wiki and at the polymath1 spreadsheet, as appropriate.

As part of the polymath1 project, I would like to set up a reading seminar on this blog for the following three papers and notes:

1. H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem for k=3“, Graph Theory and Combinatorics (Cambridge, 1988). Discrete Math. 75 (1989), no. 1-3, 227–241.
2. R. McCutcheon, “The conclusion of the proof of the density Hales-Jewett theorem for k=3“, unpublished.
3. H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem“, J. Anal. Math. 57 (1991), 64–119.

As I understand it, paper #1 begins the proof of DHJ(3) (the k=3 version of density Hales-Jewett), but the proof is not quite complete, and the notes in #2 completes the proof using ideas from both paper #1 and paper #3.  Paper #3, of course, does DHJ(k) for all k.  For the purposes of the polymath1 project, though, I think it would be best if we focus exclusively on k=3.

While this seminar is of course related in content to the main discussion threads in the polymath1 project, I envision this to be a more sedate affair, in which we go slowly through various sections of various papers, asking questions of each other along the way, and presenting various bits and pieces of the proof.  The papers require a certain technical background in ergodic theory in order to understand, but my hope is that if enough other people (in particular, combinatorialists) ask questions here (and “naive” or “silly” questions are strongly encouraged) then we should be able to make a fair amount of the arguments here accessible.  I also hope that some ergodic theorists who have been intending to read these papers already, but didn’t get around to it, will join with reading the papers with me.

This is the first time I am trying something like this, and so we shall be using the carefully thought out protocol known as “making things up as we go along”.  My initial plan is to start understanding the “big picture” (in particular, to outline the general strategy of proof), while also slowly going through the key stages of that proof in something resembling a linear order.  But I imagine that the focus may change as the seminar progresses.

I’ll start the ball rolling with some initial impressions of paper #1 in the comments below.  As with other threads in this project, I would like all comments to come with a number and title, starting with 600 and then incrementing (the numbers 1-599 being reserved by other threads in this project).

By and large, I am quite happy with the LaTeX support provided by wordpress, which hosts my blog (and many other maths blogs out there); it is certainly superior to the maths support of most other off-the-shelf blog formats I have looked at. But it does have some annoying bugs and quirks; a lot of perfectly good LaTeX gets converted into “Formula does not parse”.

It occurred to me that maybe one should start a page that collects all the known bugs so that they can be sent at some point to the wordpress programmers to fix. So, here is my initial list of known bugs: any other wordpress users out there are very welcome to contribute their own.

Convention: I will drop the “latex” when I want to show a fragment of LaTeX source code without making it compile, e.g.

$\int_{-\infty}^\infty e^{-\pi x^2}\ dx = 1$ becomes $\int_{-\infty}^\infty e^{-\pi x^2}\ dx = 1$.

[Update, Feb 14: The bugs have now been fixed; see comments below.]

One of the most useful concepts for analysis that arise from topology and metric spaces is the concept of compactness; recall that a space ${X}$ is compact if every open cover of ${X}$ has a finite subcover, or equivalently if any collection of closed sets with the finite intersection property (i.e. every finite subcollection of these sets has non-empty intersection) has non-empty intersection. In these notes, we explore how compactness interacts with other key topological concepts: the Hausdorff property, bases and sub-bases, product spaces, and equicontinuity, in particular establishing the useful Tychonoff and Arzelá-Ascoli theorems that give criteria for compactness (or precompactness).

Exercise 1 (Basic properties of compact sets)

• Show that any finite set is compact.
• Show that any finite union of compact subsets of a topological space is still compact.
• Show that any image of a compact space under a continuous map is still compact.

Show that these three statements continue to hold if “compact” is replaced by “sequentially compact”.

For the last ten years or so, I used to maintain a list of conferences in the area of analysis & PDE (see e.g. this page for a partial archive of this conference list).  However, due to many other demands on my time, I eventually ceased to maintain it, instead passing it over to the Harmonic Analysis and Related Problems (HARP) group that was supported by the European Mathematical Society.  Unfortunately, the EMS funding ran out a few years back, the HARP group dissolved, and so the page drifted for a while.

This week, I talked to my good friend Jim Colliander (who maintains the DispersiveWiki) and he agreed to host a new version of the conferences page on the Wiki, where it can be collaboratively updated.  It is rather sparse right now, but I hope people will contribute to it, either by adding new conferences and related content, or by cleaning up the organisation of existing content.  I have also migrated a list of lecture notes and courses in analysis and PDE which is badly in need of updating.

[One can presumably use the Wiki to also host other items of this nature than just a conference list; any suggestions for expanding the page would also be welcome.]

I came across the following news item today which might be of interest to many of the readers of this blog: as part of a bipartisan effort (headed by Sen. Susan Collins (R-ME) and Sen. Ben Nelson (D-NE))  to reduce the size of the U.S. federal government stimulus package (currently at over $900 billion) by about$77 billion, the entire appropriation for the National Science Foundation (NSF) in the package ($1.4 billion – about 0.15% of the entire stimulus package) is being proposed to be eliminated, with significant cuts in related agencies (NASA, DOE office of science, NOAA, etc.). The$1.4 billion for the NSF in the Senate version of the package is already a reduction from the $2.5 billion allocated in the House version. For comparison, the entire 2009FY budget for the NSF is$6.9 billion, or about 0.5% of the discretionary federal budget of $1.21 trillion. [Full disclosure: a majority of my own research funding comes from the NSF.] [Update, Feb 8: It now appears that$1.2 billion of the funding for the NSF has been restored in a Senate compromise.]

[Update, Feb 12: After the House/Senate reconciliation process, the amount allocated to the NSF has been increased to \$3 billion.  Thanks to Alex Iosevich for the link.  See also this Sciencedebate update.]