You are currently browsing the monthly archive for April 2009.

As discussed in previous notes, a function space norm can be viewed as a means to rigorously quantify various statistics of a function {f: X \rightarrow {\bf C}}. For instance, the “height” and “width” can be quantified via the {L^p(X,\mu)} norms (and their relatives, such as the Lorentz norms {\|f\|_{L^{p,q}(X,\mu)}}). Indeed, if {f} is a step function {f = A 1_E}, then the {L^p} norm of {f} is a combination {\|f\|_{L^p(X,\mu)} = |A| \mu(E)^{1/p}} of the height (or amplitude) {A} and the width {\mu(E)}.

However, there are more features of a function {f} of interest than just its width and height. When the domain {X} is a Euclidean space {{\bf R}^d} (or domains related to Euclidean spaces, such as open subsets of {{\bf R}^d}, or manifolds), then another important feature of such functions (especially in PDE) is the regularity of a function, as well as the related concept of the frequency scale of a function. These terms are not rigorously defined; but roughly speaking, regularity measures how smooth a function is (or how many times one can differentiate the function before it ceases to be a function), while the frequency scale of a function measures how quickly the function oscillates (and would be inversely proportional to the wavelength). One can illustrate this informal concept with some examples:

  • Let {\phi \in C^\infty_c({\bf R})} be a test function that equals {1} near the origin, and {N} be a large number. Then the function {f(x) := \phi(x) \sin(Nx)} oscillates at a wavelength of about {1/N}, and a frequency scale of about {N}. While {f} is, strictly speaking, a smooth function, it becomes increasingly less smooth in the limit {N \rightarrow \infty}; for instance, the derivative {f'(x) = \phi'(x) \sin(Nx) + N \phi(x) \cos(Nx)} grows at a roughly linear rate as {N \rightarrow \infty}, and the higher derivatives grow at even faster rates. So this function does not really have any regularity in the limit {N \rightarrow \infty}. Note however that the height and width of this function is bounded uniformly in {N}; so regularity and frequency scale are independent of height and width.
  • Continuing the previous example, now consider the function {g(x) := N^{-s} \phi(x) \sin(Nx)}, where {s \geq 0} is some parameter. This function also has a frequency scale of about {N}. But now it has a certain amount of regularity, even in the limit {N \rightarrow \infty}; indeed, one easily checks that the {k^{th}} derivative of {g} stays bounded in {N} as long as {k \leq s}. So one could view this function as having “{s} degrees of regularity” in the limit {N \rightarrow \infty}.
  • In a similar vein, the function {N^{-s} \phi(Nx)} also has a frequency scale of about {N}, and can be viewed as having {s} degrees of regularity in the limit {N \rightarrow \infty}.
  • The function {\phi(x) |x|^s 1_{x > 0}} also has about {s} degrees of regularity, in the sense that it can be differentiated up to {s} times before becoming unbounded. By performing a dyadic decomposition of the {x} variable, one can also decompose this function into components {\psi(2^n x) |x|^s} for {n \geq 0}, where {\psi(x) := (\phi(x)-\phi(2x)) 1_{x>0}} is a bump function supported away from the origin; each such component has frequency scale about {2^n} and {s} degrees of regularity. Thus we see that the original function {\phi(x) |x|^s 1_{x > 0}} has a range of frequency scales, ranging from about {1} all the way to {+\infty}.
  • One can of course concoct higher-dimensional analogues of these examples. For instance, the localised plane wave {\phi(x) \sin(\xi \cdot x)} in {{\bf R}^d}, where {\phi \in C^\infty_c({\bf R}^d)} is a test function, would have a frequency scale of about {|\xi|}.

There are a variety of function space norms that can be used to capture frequency scale (or regularity) in addition to height and width. The most common and well-known examples of such spaces are the Sobolev space norms {\| f\|_{W^{s,p}({\bf R}^d)}}, although there are a number of other norms with similar features (such as Hölder norms, Besov norms, and Triebel-Lizorkin norms). Very roughly speaking, the {W^{s,p}} norm is like the {L^p} norm, but with “{s} additional degrees of regularity”. For instance, in one dimension, the function {A \phi(x/R) \sin(Nx)}, where {\phi} is a fixed test function and {R, N} are large, will have a {W^{s,p}} norm of about {|A| R^{1/p} N^s}, thus combining the “height” {|A|}, the “width” {R}, and the “frequency scale” {N} of this function together. (Compare this with the {L^p} norm of the same function, which is about {|A| R^{1/p}}.)

To a large extent, the theory of the Sobolev spaces {W^{s,p}({\bf R}^d)} resembles their Lebesgue counterparts {L^p({\bf R}^d)} (which are as the special case of Sobolev spaces when {s=0}), but with the additional benefit of being able to interact very nicely with (weak) derivatives: a first derivative {\frac{\partial f}{\partial x_j}} of a function in an {L^p} space usually leaves all Lebesgue spaces, but a first derivative of a function in the Sobolev space {W^{s,p}} will end up in another Sobolev space {W^{s-1,p}}. This compatibility with the differentiation operation begins to explain why Sobolev spaces are so useful in the theory of partial differential equations. Furthermore, the regularity parameter {s} in Sobolev spaces is not restricted to be a natural number; it can be any real number, and one can use fractional derivative or integration operators to move from one regularity to another. Despite the fact that most partial differential equations involve differential operators of integer order, fractional spaces are still of importance; for instance it often turns out that the Sobolev spaces which are critical (scale-invariant) for a certain PDE are of fractional order.

The uncertainty principle in Fourier analysis places a constraint between the width and frequency scale of a function; roughly speaking (and in one dimension for simplicity), the product of the two quantities has to be bounded away from zero (or to put it another way, a wave is always at least as wide as its wavelength). This constraint can be quantified as the very useful Sobolev embedding theorem, which allows one to trade regularity for integrability: a function in a Sobolev space {W^{s,p}} will automatically lie in a number of other Sobolev spaces {W^{\tilde s,\tilde p}} with {\tilde s < s} and {\tilde p > p}; in particular, one can often embed Sobolev spaces into Lebesgue spaces. The trade is not reversible: one cannot start with a function with a lot of integrability and no regularity, and expect to recover regularity in a space of lower integrability. (One can already see this with the most basic example of Sobolev embedding, coming from the fundamental theorem of calculus. If a (continuously differentiable) function {f: {\bf R} \rightarrow {\bf R}} has {f'} in {L^1({\bf R})}, then we of course have {f \in L^\infty({\bf R})}; but the converse is far from true.)

Plancherel’s theorem reveals that Fourier-analytic tools are particularly powerful when applied to {L^2} spaces. Because of this, the Fourier transform is very effective at dealing with the {L^2}-based Sobolev spaces {W^{s,2}({\bf R}^d)}, often abbreviated {H^s({\bf R}^d)}. Indeed, using the fact that the Fourier transform converts regularity to decay, we will see that the {H^s({\bf R}^d)} spaces are nothing more than Fourier transforms of weighted {L^2} spaces, and in particular enjoy a Hilbert space structure. These Sobolev spaces, and in particular the energy space {H^1({\bf R}^d)}, are of particular importance in any PDE that involves some sort of energy functional (this includes large classes of elliptic, parabolic, dispersive, and wave equations, and especially those equations connected to physics and/or geometry).

We will not fully develop the theory of Sobolev spaces here, as this would require the theory of singular integrals, which is beyond the scope of this course. There are of course many references for further reading; one is Stein’s “Singular integrals and differentiability properties of functions“.

Read the rest of this entry »

This weekend I was in Washington, D.C., for the annual meeting of the National Academy of Sciences.   Among the various events at this meeting was an address to the Academy by President Obama this morning on several major science and education policy initiatives, including some already announced in the economic stimulus package and draft federal budget, and some carried over from the previous administration.   (I myself missed the address, though, as I had to return back to LA to teach.)  Among the initiatives stated were the creation of an Advanced Research Projects Agency for Energy (ARPA-E), modeled on DARPA (and recommended by the NAS); a significant increase in funding to the NSF and related agencies (which was committed to by the Bush administration, but not yet implemented; this is distinct from the one-time funding from the stimulus package discussed in this previous post), leading in particular to a tripling in the number of NSF graduate research fellowships; and a “race to the top” fund administered by the Department of Education to provide incentives for states to improve their quality of maths and science education, among other goals.  Some of these initiatives may not survive the budgetary process, of course, but it does seem that there is both symbolic and substantive support for science and education at the federal level.

Here are the video, audio, and transcript of the talk.

[Update, Apr 28: Another event at the meeting is the announcement of the new membership of the Academy for 2009.  In mathematics, the new members include Alice Chang, Percy Deift, John Morgan, and Gilbert Strang; congratulations to all four, of course.]

I have received some anecdotal evidence that wordpress blogs such as this one have recently been blocked again by the “great firewall of China“.   I was wondering if the readers here could confirm or disconfirm this, and also if they knew of some effective ways to circumvent this firewall, as I have been getting a number of requests on how to do so.

[Of course, by definition, if a reader is directly affected by this blockage, then they would not be able to comment here or to read about any workarounds; but perhaps they would be able to confirm the situation indirectly, and I could still pass on any relevant tips obtained here by other channels.]

[Update, June 3: A partial list of blocked sites can be found here, and a firewall tester can be found here.]

In the theory of dense graphs on {n} vertices, where {n} is large, a fundamental role is played by the Szemerédi regularity lemma:

Lemma 1 (Regularity lemma, standard version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0} and {k_0 \geq 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {k_0 \leq k \leq C(k_0,\epsilon)} bounded below by {k_0} and above by a quantity {C(k_0,\epsilon)} depending only on {k_0, \epsilon}, obeying the following properties:

  • (Equitable partition) For any {1 \leq i,j \leq k}, the cardinalities {|V_i|, |V_j|} of {V_i} and {V_j} differ by at most {1}.
  • (Regularity) For all but at most {\epsilon k^2} pairs {1 \leq i < j \leq k}, the portion of the graph {G} between {V_i} and {V_j} is {\epsilon}-regular in the sense that one has

    \displaystyle  |d( A, B ) - d( V_i, V_j )| \leq \epsilon

    for any {A \subset V_i} and {B \subset V_j} with {|A| \geq \epsilon |V_i|, |B| \geq \epsilon |V_j|}, where {d(A,B) := |E \cap (A \times B)|/|A| |B|} is the density of edges between {A} and {B}.

This lemma becomes useful in the regime when {n} is very large compared to {k_0} or {1/\epsilon}, because all the conclusions of the lemma are uniform in {n}. Very roughly speaking, it says that “up to errors of size {\epsilon}“, a large graph can be more or less described completely by a bounded number of quantities {d(V_i, V_j)}. This can be interpreted as saying that the space of all graphs is totally bounded (and hence precompact) in a suitable metric space, thus allowing one to take formal limits of sequences (or subsequences) of graphs; see for instance this paper of Lovasz and Szegedy for a discussion.

For various technical reasons it is easier to work with a slightly weaker version of the lemma, which allows for the cells {V_1,\ldots,V_k} to have unequal sizes:

Lemma 2 (Regularity lemma, weighted version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {1 \leq k \leq C(\epsilon)} bounded above by a quantity {C(\epsilon)} depending only on {\epsilon}, obeying the following properties:

  • (Regularity) One has

    \displaystyle  \sum_{(V_i,V_j) \hbox{ not } \epsilon-\hbox{regular}} |V_i| |V_j| = O(\epsilon |V|^2) \ \ \ \ \ (1)

    where the sum is over all pairs {1 \leq i \leq j \leq k} for which {G} is not {\epsilon}-regular between {V_i} and {V_j}.

While Lemma 2 is, strictly speaking, weaker than Lemma 1 in that it does not enforce the equitable size property between the atoms, in practice it seems that the two lemmas are roughly of equal utility; most of the combinatorial consequences of Lemma 1 can also be proven using Lemma 2. The point is that one always has to remember to weight each cell {V_i} by its density {|V_i|/|V|}, rather than by giving each cell an equal weight as in Lemma 1. Lemma 2 also has the advantage that one can easily generalise the result from finite vertex sets {V} to other probability spaces (for instance, one could weight {V} with something other than the uniform distribution). For applications to hypergraph regularity, it turns out to be slightly more convenient to have two partitions (coarse and fine) rather than just one; see for instance my own paper on this topic. In any event the arguments below that we give to prove Lemma 2 can be modified to give a proof of Lemma 1 also. The proof of the regularity lemma is usually conducted by a greedy algorithm. Very roughly speaking, one starts with the trivial partition of {V}. If this partition already regularises the graph, we are done; if not, this means that there are some sets {A} and {B} in which there is a significant density fluctuation beyond what has already been detected by the original partition. One then adds these sets to the partition and iterates the argument. Every time a new density fluctuation is incorporated into the partition that models the original graph, this increases a certain “index” or “energy” of the partition. On the other hand, this energy remains bounded no matter how complex the partition, so eventually one must reach a long “energy plateau” in which no further refinement is possible, at which point one can find the regular partition.

One disadvantage of the greedy algorithm is that it is not efficient in the limit {n \rightarrow \infty}, as it requires one to search over all pairs of subsets {A, B} of a given pair {V_i, V_j} of cells, which is an exponentially long search. There are more algorithmically efficient ways to regularise, for instance a polynomial time algorithm was given by Alon, Duke, Lefmann, Rödl, and Yuster. However, one can do even better, if one is willing to (a) allow cells of unequal size, (b) allow a small probability of failure, (c) have the ability to sample vertices from {G} at random, and (d) allow for the cells to be defined “implicitly” (via their relationships with a fixed set of reference vertices) rather than “explicitly” (as a list of vertices). In that case, one can regularise a graph in a number of operations bounded in {n}. Indeed, one has

Lemma 3 (Regularity lemma via random neighbourhoods) Let {\epsilon > 0}. Then there exists integers {M_1,\ldots,M_m} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, if one selects one of the integers {M_r} at random from {M_1,\ldots,M_m}, then selects {M_r} vertices {v_1,\ldots,v_{M_r} \in V} uniformly from {V} at random, then the {2^{M_r}} vertex cells {V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}} (some of which can be empty) generated by the vertex neighbourhoods {A_t := \{ v \in V: (v,v_t) \in E \}} for {1 \leq t \leq M_r}, will obey the conclusions of Lemma 2 with probability at least {1-O(\epsilon)}.

Thus, roughly speaking, one can regularise a graph simply by taking a large number of random vertex neighbourhoods, and using the partition (or Venn diagram) generated by these neighbourhoods as the partition. The intuition is that if there is any non-uniformity in the graph (e.g. if the graph exhibits bipartite behaviour), this will bias the random neighbourhoods to seek out the partitions that would regularise that non-uniformity (e.g. vertex neighbourhoods would begin to fill out the two vertex cells associated to the bipartite property); if one takes sufficiently many such random neighbourhoods, the probability that all detectable non-uniformity is captured by the partition should converge to {1}. (It is more complicated than this, because the finer one makes the partition, the finer the types of non-uniformity one can begin to detect, but this is the basic idea.)

This fact seems to be reasonably well-known folklore, discovered independently by many authors; it is for instance quite close to the graph property testing results of Alon and Shapira, and also appears implicitly in a paper of Ishigami, as well as a paper of Austin (and perhaps even more implicitly in a paper of myself). However, in none of these papers is the above lemma stated explicitly. I was asked about this lemma recently, so I decided to provide a proof here.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “An inverse theorem for the bilinear $L^2$ Strichartz estimate for the wave equation“.  This paper is another technical component of my “heatwave project“, which aims to establish the global regularity conjecture for energy-critical wave maps into hyperbolic space.    I have been in the process of writing the final paper of that project, in which I will show that the only way singularities can form is if a special type of solution, known as an “almost periodic blowup solution”, exists.  However, I recently discovered that the existing function space estimates that I was relying on for the large energy perturbation theory were not quite adequate, and in particular I needed a certain “inverse theorem” for a standard bilinear estimate which was not quite in the literature.  The purpose of this paper is to establish that inverse theorem, which may also have some application to other nonlinear wave equations.

Read the rest of this entry »

In set theory, a function {f: X \rightarrow Y} is defined as an object that evaluates every input {x} to exactly one output {f(x)}. However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space {X}, such as {C(X)} or {L^\infty(X)}, with a more general – and possibly non-commutative – algebra (e.g. a {C^*}-algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value {f(x)} to every point {x \in X}, but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).

Generalisations of functions are also very useful in analysis. In our study of {L^p} spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function {f} (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point {x}, if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the {L^p} spaces can usually be described via duality, as the dual space of {L^{p'}} (except in some endpoint cases, namely when {p=\infty}, or when {p=1} and the underlying space is not {\sigma}-finite).

We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions {f \in L^1_{\hbox{loc}}({\bf R})} on, say, the real line {{\bf R}}, can be identified with locally finite absolutely continuous measures {m_f} on the line, by multiplying Lebesgue measure {m} by the function {f}. So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures {\mu} (not necessarily absolutely continuous), such as the Dirac measure {\delta_0}. With this concept of “generalised function”, one can still add and subtract two measures {\mu, \nu}, and integrate any measure {\mu} against a (bounded) measurable set {E} to obtain a number {\mu(E)}, but one cannot evaluate a measure {\mu} (or more precisely, the Radon-Nikodym derivative {d\mu/dm} of that measure) at a single point {x}, and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on {C_c({\bf R})}.

There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space {{\bf R}^d}. In contrast to Radon measures {\mu}, which can be defined by how they “pair up” against continuous, compactly supported test functions {f \in C_c({\bf R}^d)} to create numbers {\langle f, \mu \rangle := \int_{{\bf R}^d} f\ d\overline{\mu}}, a distribution {\lambda} is defined by how it pairs up against a smooth compactly supported function {f \in C^\infty_c({\bf R}^d)} to create a number {\langle f, \lambda \rangle}. As the space {C^\infty_c({\bf R}^d)} of smooth compactly supported functions is smaller than (but dense in) the space {C_c({\bf R}^d)} of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space {C^\infty_c({\bf R}^d)} is closed under more operations than {C_c({\bf R}^d)}, and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or {L^p} functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.

If one shrinks the space of distributions slightly, to the space of tempered distributions (which is formed by enlarging dual class {C^\infty_c({\bf R}^d)} to the Schwartz class {{\mathcal S}({\bf R}^d)}), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.

Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as {Lu=f}, as it allows one to recover a smooth solution {u} from smooth, compactly supported data {f} by convolving {f} with a specific distribution {G}, known as the fundamental solution of {L}. We will give some examples of this later in these notes.

It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.

Read the rest of this entry »

From Tim Gowers’ blog comes the announcement that the Tricki – a wiki for various tricks and strategies for proving mathematical results – is now live.  (My own articles for the Tricki are also on this blog; also Ben Green has written up an article on using finite fields to prove results about infinite fields which is loosely based on my own post on the topic, which is in turn based on an article of Serre.)  It seems to already be growing at a reasonable rate, with many contributors.

Recently, I have been studying the concept of amenability on groups. This concept can be defined in a “combinatorial” or “finitary” fashion, using Følner sequences, and also in a more “functional-analytic” or “infinitary”‘ fashion, using invariant means. I wanted to get some practice passing back and forth between these two definitions, so I wrote down some notes on how to do this, and also how to take some facts about amenability that are usually proven in one setting, and prove them instead in the other. These notes are thus mostly for my own benefit, but I thought I might post them here also, in case anyone else is interested.

Read the rest of this entry »

The famous Gödel completeness theorem in logic (not to be confused with the even more famous Gödel incompleteness theorem) roughly states the following:

Theorem 1 (Gödel completeness theorem, informal statement) Let {\Gamma} be a first-order theory (a formal language {{\mathcal L}}, together with a set of axioms, i.e. sentences assumed to be true), and let {\phi} be a sentence in the formal language. Assume also that the language {{\mathcal L}} has at most countably many symbols. Then the following are equivalent:

  • (i) (Syntactic consequence) {\phi} can be deduced from the axioms in {\Gamma} by a finite number of applications of the laws of deduction in first order logic. (This property is abbreviated as {\Gamma \vdash \phi}.)
  • (ii) (Semantic consequence) Every structure {{\mathfrak U}} which satisfies or models {\Gamma}, also satisfies {\phi}. (This property is abbreviated as {\Gamma \models \phi}.)
  • (iii) (Semantic consequence for at most countable models) Every structure {{\mathfrak U}} which is at most countable, and which models {\Gamma}, also satisfies {\phi}.

One can also formulate versions of the completeness theorem for languages with uncountably many symbols, but I will not do so here. One can also force other cardinalities on the model {{\mathfrak U}} by using the Löwenheim-Skolem theorem.

To state this theorem even more informally, any (first-order) result which is true in all models of a theory, must be logically deducible from that theory, and vice versa. (For instance, any result which is true for all groups, must be deducible from the group axioms; any result which is true for all systems obeying Peano arithmetic, must be deducible from the Peano axioms; and so forth.) In fact, it suffices to check countable and finite models only; for instance, any first-order statement which is true for all finite or countable groups, is in fact true for all groups! Informally, a first-order language with only countably many symbols cannot “detect” whether a given structure is countably or uncountably infinite. Thus for instance even the ZFC axioms of set theory must have some at most countable model, even though one can use ZFC to prove the existence of uncountable sets; this is known as Skolem’s paradox. (To resolve the paradox, one needs to carefully distinguish between an object in a set theory being “externally” countable in the structure that models that theory, and being “internally” countable within that theory.)

Of course, a theory {\Gamma} may contain undecidable statements {\phi} – sentences which are neither provable nor disprovable in the theory. By the completeness theorem, this is equivalent to saying that {\phi} is satisfied by some models of {\Gamma} but not by other models. Thus the completeness theorem is compatible with the incompleteness theorem: recursively enumerable theories such as Peano arithmetic are modeled by the natural numbers {{\mathbb N}}, but are also modeled by other structures also, and there are sentences satisfied by {{\mathbb N}} which are not satisfied by other models of Peano arithmetic, and are thus undecidable within that arithmetic.

An important corollary of the completeness theorem is the compactness theorem:

Corollary 2 (Compactness theorem, informal statement) Let {\Gamma} be a first-order theory whose language has at most countably many symbols. Then the following are equivalent:

  • (i) {\Gamma} is consistent, i.e. it is not possible to logically deduce a contradiction from the axioms in {\Gamma}.
  • (ii) {\Gamma} is satisfiable, i.e. there exists a structure {{\mathfrak U}} that models {\Gamma}.
  • (iii) There exists a structure {{\mathfrak U}} which is at most countable, that models {\Gamma}.
  • (iv) Every finite subset {\Gamma'} of {\Gamma} is consistent.
  • (v) Every finite subset {\Gamma'} of {\Gamma} is satisfiable.
  • (vi) Every finite subset {\Gamma'} of {\Gamma} is satisfiable with an at most countable model.

Indeed, the equivalence of (i)-(iii), or (iv)-(vi), follows directly from the completeness theorem, while the equivalence of (i) and (iv) follows from the fact that any logical deduction has finite length and so can involve at most finitely many of the axioms in {\Gamma}. (Again, the theorem can be generalised to uncountable languages, but the models become uncountable also.)

There is a consequence of the compactness theorem which more closely resembles the sequential concept of compactness. Given a sequence {{\mathfrak U}_1, {\mathfrak U}_2, \ldots} of structures for {{\mathcal L}}, and another structure {{\mathfrak U}} for {{\mathcal L}}, let us say that {{\mathfrak U}_n} converges elementarily to {{\mathfrak U}} if every sentence {\phi} which is satisfied by {{\mathfrak U}}, is also satisfied by {{\mathfrak U}_n} for sufficiently large {n}. (Replacing {\phi} by its negation {\neg \phi}, we also see that every sentence that is not satisfied by {{\mathfrak U}}, is not satisfied by {{\mathfrak U}_n} for sufficiently large {n}.) Note that the limit {{\mathfrak U}} is only unique up to elementary equivalence. Clearly, if each of the {{\mathfrak U}_n} models some theory {\Gamma}, then the limit {{\mathfrak U}} will also; thus for instance the elementary limit of a sequence of groups is still a group, the elementary limit of a sequence of rings is still a ring, etc.

Corollary 3 (Sequential compactness theorem) Let {{\mathcal L}} be a language with at most countably many symbols, and let {{\mathfrak U}_1, {\mathfrak U}_2, \ldots} be a sequence of structures for {{\mathcal L}}. Then there exists a subsequence {{\mathfrak U}_{n_j}} which converges elementarily to a limit {{\mathfrak U}} which is at most countable.

Proof: For each structure {{\mathfrak U}_n}, let {\hbox{Th}({\mathfrak U}_n)} be the theory of that structure, i.e. the set of all sentences that are satisfied by that structure. One can view that theory as a point in {\{0,1\}^{{\mathcal S}}}, where {{\mathcal S}} is the set of all sentences in the language {{\mathcal L}}. Since {{\mathcal L}} has at most countably many symbols, {{\mathcal S}} is at most countable, and so (by the sequential Tychonoff theorem) {\{0,1\}^{{\mathcal S}}} is sequentially compact in the product topology. (This can also be seen directly by the usual Arzelá-Ascoli diagonalisation argument.) Thus we can find a subsequence {\hbox{Th}({\mathfrak U}_{n_j})} which converges in the product topology to a limit theory {\Gamma \in \{0,1\}^{{\mathcal S}}}, thus every sentence in {\Gamma} is satisfied by {{\mathfrak U}_{n_j}} for sufficiently large {j} (and every sentence not in {\Gamma} is not satisfied by {{\mathfrak U}_{n_j}} for sufficiently large {j}). In particular, any finite subset of {\Gamma} is satisfiable, hence consistent; by the compactness theorem, {\Gamma} itself is therefore consistent, and has an at most countable model {{\mathfrak U}}. Also, each of the theories {\hbox{Th}({\mathfrak U}_{n_j})} is clearly complete (given any sentence {\phi}, either {\phi} or {\neg \phi} is in the theory), and so {\Gamma} is complete as well. One concludes that {\Gamma} is the theory of {{\mathfrak U}}, and hence {{\mathfrak U}} is the elementary limit of the {{\mathfrak U}_{n_j}} as claimed. \Box

[It is also possible to state the compactness theorem using the topological notion of compactness, as follows: let {X} be the space of all structures of a given language {{\mathcal L}}, quotiented by elementary equivalence. One can define a topology on {X} by taking the sets {\{ {\mathfrak U} \in X: {\mathfrak U} \models \phi \}} as a sub-base, where {\phi} ranges over all sentences. Then the compactness theorem is equivalent to the assertion that {X} is topologically compact.]

One can use the sequential compactness theorem to build a number of interesting “non-standard” models to various theories. For instance, consider the language {{\mathcal L}} used by Peano arithmetic (which contains the operations {+, \times} and the successor operation {S}, the relation {=}, and the constant {0}), and adjoint a new constant {N} to create an expanded language {{\mathcal L} \cup \{N\}}. For each natural number {n \in {\Bbb N}}, let {{\Bbb N}_n} be a structure for {{\mathcal L} \cup \{N\}} which consists of the natural numbers {{\Bbb N}} (with the usual interpretations of {+}, {\times}, etc.) and interprets the symbol {N} as the natural number {n}. By the compactness theorem, some subsequence of {{\Bbb N}_n} must converge elementarily to a new structure {*{\Bbb N}} of {{\mathcal L} \cup \{N\}}, which still models Peano arithmetic, but now has the additional property that {N>n} for every (standard) natural number {n}; thus we have managed to create a non-standard model of Peano arithmetic which contains a non-standardly large number (one which is larger than every standard natural number).

The sequential compactness theorem also lets us construct infinitary limits of various sequences of finitary objects; for instance, one can construct infinite pseudo-finite fields as the elementary limits of sequences of finite fields. I recently discovered that other several correspondence principles between finitary and infinitary objects, such as the Furstenberg correspondence principle between sets of integers and dynamical systems, or the more recent correspondence principles concerning graph limits, can be viewed as special cases of the sequential compactness theorem; it also seems possible to encode much of the sum-product theory in finite fields in an infinitary setting using this theorem. I hope to discuss these points in more detail in a later post. In this post, I wish to review (partly for my own benefit) the proof of the completeness (and hence compactness) theorem. The material here is quite standard (I basically follow the usual proof of Henkin, and taking advantage of Skolemisation), but perhaps the concept of an elementary limit is not as well-known outside of logic as it might be. (The closely related concept of an ultraproduct is better known, and can be used to prove most of the compactness theorem already, thanks to Los’s theorem, but I do not know how to use ultraproducts to ensure that the limiting model is countable. However, one can think (intuitively, at least), of the limit model {{\mathfrak U}} in the above theorem as being the set of “constructible” elements of an ultraproduct of the {{\mathfrak U}_n}.)

In order to emphasise the main ideas in the proof, I will gloss over some of the more technical details in the proofs, relying instead on informal arguments and examples at various points.

Read the rest of this entry »

In these notes we lay out the basic theory of the Fourier transform, which is of course the most fundamental tool in harmonic analysis and also of major importance in related fields (functional analysis, complex analysis, PDE, number theory, additive combinatorics, representation theory, signal processing, etc.). The Fourier transform, in conjunction with the Fourier inversion formula, allows one to take essentially arbitrary (complex-valued) functions on a group {G} (or more generally, a space {X} that {G} acts on, e.g. a homogeneous space {G/H}), and decompose them as a (discrete or continuous) superposition of much more symmetric functions on the domain, such as characters {\chi: G \rightarrow S^1}; the precise superposition is given by Fourier coefficients {\hat f(\xi)}, which take values in some dual object such as the Pontryagin dual {\hat G} of {G}. Characters behave in a very simple manner with respect to translation (indeed, they are eigenfunctions of the translation action), and so the Fourier transform tends to simplify any mathematical problem which enjoys a translation invariance symmetry (or an approximation to such a symmetry), and is somehow “linear” (i.e. it interacts nicely with superpositions). In particular, Fourier analytic methods are particularly useful for studying operations such as convolution {f, g \mapsto f*g} and set-theoretic addition {A, B \mapsto A+B}, or the closely related problem of counting solutions to additive problems such as {x = a_1 + a_2 + a_3} or {x = a_1 - a_2}, where {a_1, a_2, a_3} are constrained to lie in specific sets {A_1, A_2, A_3}. The Fourier transform is also a particularly powerful tool for solving constant-coefficient linear ODE and PDE (because of the translation invariance), and can also approximately solve some variable-coefficient (or slightly non-linear) equations if the coefficients vary smoothly enough and the nonlinear terms are sufficiently tame.

The Fourier transform {\hat f(\xi)} also provides an important new way of looking at a function {f(x)}, as it highlights the distribution of {f} in frequency space (the domain of the frequency variable {\xi}) rather than physical space (the domain of the physical variable {x}). A given property of {f} in the physical domain may be transformed to a rather different-looking property of {\hat f} in the frequency domain. For instance:

  • Smoothness of {f} in the physical domain corresponds to decay of {\hat f} in the Fourier domain, and conversely. (More generally, fine scale properties of {f} tend to manifest themselves as coarse scale properties of {\hat f}, and conversely.)
  • Convolution in the physical domain corresponds to pointwise multiplication in the Fourier domain, and conversely.
  • Constant coefficient differential operators such as {d/dx} in the physical domain corresponds to multiplication by polynomials such as {2\pi i \xi} in the Fourier domain, and conversely.
  • More generally, translation invariant operators in the physical domain correspond to multiplication by symbols in the Fourier domain, and conversely.
  • Rescaling in the physical domain by an invertible linear transformation corresponds to an inverse (adjoint) rescaling in the Fourier domain.
  • Restriction to a subspace (or subgroup) in the physical domain corresponds to projection to the dual quotient space (or quotient group) in the Fourier domain, and conversely.
  • Frequency modulation in the physical domain corresponds to translation in the frequency domain, and conversely.

(We will make these statements more precise below.)

On the other hand, some operations in the physical domain remain essentially unchanged in the Fourier domain. Most importantly, the {L^2} norm (or energy) of a function {f} is the same as that of its Fourier transform, and more generally the inner product {\langle f, g \rangle} of two functions {f} is the same as that of their Fourier transforms. Indeed, the Fourier transform is a unitary operator on {L^2} (a fact which is variously known as the Plancherel theorem or the Parseval identity). This makes it easier to pass back and forth between the physical domain and frequency domain, so that one can combine techniques that are easy to execute in the physical domain with other techniques that are easy to execute in the frequency domain. (In fact, one can combine the physical and frequency domains together into a product domain known as phase space, and there are entire fields of mathematics (e.g. microlocal analysis, geometric quantisation, time-frequency analysis) devoted to performing analysis on these sorts of spaces directly, but this is beyond the scope of this course.)

In these notes, we briefly discuss the general theory of the Fourier transform, but will mainly focus on the two classical domains for Fourier analysis: the torus {{\Bbb T}^d := ({\bf R}/{\bf Z})^d}, and the Euclidean space {{\bf R}^d}. For these domains one has the advantage of being able to perform very explicit algebraic calculations, involving concrete functions such as plane waves {x \mapsto e^{2\pi i x \cdot \xi}} or Gaussians {x \mapsto A^{d/2} e^{-\pi A |x|^2}}.

Read the rest of this entry »

Archives