You are currently browsing the category archive for the ‘245C – Real analysis’ category.
A fundamental characteristic of many mathematical spaces (e.g. vector spaces, metric spaces, topological spaces, etc.) is their dimension, which measures the “complexity” or “degrees of freedom” inherent in the space. There is no single notion of dimension; instead, there are a variety of different versions of this concept, with different versions being suitable for different classes of mathematical spaces. Typically, a single mathematical object may have several subtly different notions of dimension that one can place on it, which will be related to each other, and which will often agree with each other in “non-pathological” cases, but can also deviate from each other in many other situations. For instance:
- One can define the dimension of a space by seeing how it compares to some standard reference spaces, such as or ; one may view a space as having dimension if it can be (locally or globally) identified with a standard -dimensional space. The dimension of a vector space or a manifold can be defined in this fashion.
- Another way to define dimension of a space is as the largest number of “independent” objects one can place inside that space; this can be used to give an alternate notion of dimension for a vector space, or of an algebraic variety, as well as the closely related notion of the transcendence degree of a field. The concept of VC dimension in machine learning also broadly falls into this category.
- One can also try to define dimension inductively, for instance declaring a space to be -dimensional if it can be “separated” somehow by an -dimensional object; thus an -dimensional object will tend to have “maximal chains” of sub-objects of length (or , depending on how one initialises the chain and how one defines length). This can give a notion of dimension for a topological space or a commutative ring.
The notions of dimension as defined above tend to necessarily take values in the natural numbers (or the cardinal numbers); there is no such space as , for instance, nor can one talk about a basis consisting of linearly independent elements, or a chain of maximal ideals of length . There is however a somewhat different approach to the concept of dimension which makes no distinction between integer and non-integer dimensions, and is suitable for studying “rough” sets such as fractals. The starting point is to observe that in the -dimensional space , the volume of a ball of radius grows like , thus giving the following heuristic relationship
between volume, scale, and dimension. Formalising this heuristic leads to a number of useful notions of dimension for subsets of (or more generally, for metric spaces), including (upper and lower) Minkowski dimension (also known as box-packing dimension or Minkowski-Bougliand dimension), and Hausdorff dimension.
[In -theory, it is also convenient to work with “virtual” vector spaces or vector bundles, such as formal differences of such spaces, and which may therefore have a negative dimension; but as far as I am aware there is no connection between this notion of dimension and the metric ones given here.]
Minkowski dimension can either be defined externally (relating the external volume of -neighbourhoods of a set to the scale ) or internally (relating the internal -entropy of to the scale). Hausdorff dimension is defined internally by first introducing the -dimensional Hausdorff measure of a set for any parameter , which generalises the familiar notions of length, area, and volume to non-integer dimensions, or to rough sets, and is of interest in its own right. Hausdorff dimension has a lengthier definition than its Minkowski counterpart, but is more robust with respect to operations such as countable unions, and is generally accepted as the “standard” notion of dimension in metric spaces. We will compare these concepts against each other later in these notes.
One use of the notion of dimension is to create finer distinctions between various types of “small” subsets of spaces such as , beyond what can be achieved by the usual Lebesgue measure (or Baire category). For instance, a point, line, and plane in all have zero measure with respect to three-dimensional Lebesgue measure (and are nowhere dense), but of course have different dimensions (, , and respectively). (The Kakeya set conjecture, discussed recently on this blog, offers another good example.) This can be used to clarify the nature of various singularities, such as that arising from non-smooth solutions to PDE; a function which is non-smooth on a set of large Hausdorff dimension can be considered less smooth than one which is non-smooth on a set of small Hausdorff dimension, even if both are smooth almost everywhere. While many properties of the singular set of such a function are worth studying (e.g. their rectifiability), understanding their dimension is often an important starting point. The interplay between these types of concepts is the subject of geometric measure theory.
In set theory, a function is defined as an object that evaluates every input to exactly one output . However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space , such as or , with a more general – and possibly non-commutative – algebra (e.g. a -algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value to every point , but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).
Generalisations of functions are also very useful in analysis. In our study of spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point , if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the spaces can usually be described via duality, as the dual space of (except in some endpoint cases, namely when , or when and the underlying space is not -finite).
We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions on, say, the real line , can be identified with locally finite absolutely continuous measures on the line, by multiplying Lebesgue measure by the function . So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures (not necessarily absolutely continuous), such as the Dirac measure . With this concept of “generalised function”, one can still add and subtract two measures , and integrate any measure against a (bounded) measurable set to obtain a number , but one cannot evaluate a measure (or more precisely, the Radon-Nikodym derivative of that measure) at a single point , and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on .
There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space . In contrast to Radon measures , which can be defined by how they “pair up” against continuous, compactly supported test functions to create numbers , a distribution is defined by how it pairs up against a smooth compactly supported function to create a number . As the space of smooth compactly supported functions is smaller than (but dense in) the space of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space is closed under more operations than , and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.
If one shrinks the space of distributions slightly, to the space of tempered distributions (which is formed by enlarging dual class to the Schwartz class ), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.
Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as , as it allows one to recover a smooth solution from smooth, compactly supported data by convolving with a specific distribution , known as the fundamental solution of . We will give some examples of this later in these notes.
It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.
In these notes we lay out the basic theory of the Fourier transform, which is of course the most fundamental tool in harmonic analysis and also of major importance in related fields (functional analysis, complex analysis, PDE, number theory, additive combinatorics, representation theory, signal processing, etc.). The Fourier transform, in conjunction with the Fourier inversion formula, allows one to take essentially arbitrary (complex-valued) functions on a group (or more generally, a space that acts on, e.g. a homogeneous space ), and decompose them as a (discrete or continuous) superposition of much more symmetric functions on the domain, such as characters ; the precise superposition is given by Fourier coefficients , which take values in some dual object such as the Pontryagin dual of . Characters behave in a very simple manner with respect to translation (indeed, they are eigenfunctions of the translation action), and so the Fourier transform tends to simplify any mathematical problem which enjoys a translation invariance symmetry (or an approximation to such a symmetry), and is somehow “linear” (i.e. it interacts nicely with superpositions). In particular, Fourier analytic methods are particularly useful for studying operations such as convolution and set-theoretic addition , or the closely related problem of counting solutions to additive problems such as or , where are constrained to lie in specific sets . The Fourier transform is also a particularly powerful tool for solving constant-coefficient linear ODE and PDE (because of the translation invariance), and can also approximately solve some variable-coefficient (or slightly non-linear) equations if the coefficients vary smoothly enough and the nonlinear terms are sufficiently tame.
The Fourier transform also provides an important new way of looking at a function , as it highlights the distribution of in frequency space (the domain of the frequency variable ) rather than physical space (the domain of the physical variable ). A given property of in the physical domain may be transformed to a rather different-looking property of in the frequency domain. For instance:
- Smoothness of in the physical domain corresponds to decay of in the Fourier domain, and conversely. (More generally, fine scale properties of tend to manifest themselves as coarse scale properties of , and conversely.)
- Convolution in the physical domain corresponds to pointwise multiplication in the Fourier domain, and conversely.
- Constant coefficient differential operators such as in the physical domain corresponds to multiplication by polynomials such as in the Fourier domain, and conversely.
- More generally, translation invariant operators in the physical domain correspond to multiplication by symbols in the Fourier domain, and conversely.
- Rescaling in the physical domain by an invertible linear transformation corresponds to an inverse (adjoint) rescaling in the Fourier domain.
- Restriction to a subspace (or subgroup) in the physical domain corresponds to projection to the dual quotient space (or quotient group) in the Fourier domain, and conversely.
- Frequency modulation in the physical domain corresponds to translation in the frequency domain, and conversely.
(We will make these statements more precise below.)
On the other hand, some operations in the physical domain remain essentially unchanged in the Fourier domain. Most importantly, the norm (or energy) of a function is the same as that of its Fourier transform, and more generally the inner product of two functions is the same as that of their Fourier transforms. Indeed, the Fourier transform is a unitary operator on (a fact which is variously known as the Plancherel theorem or the Parseval identity). This makes it easier to pass back and forth between the physical domain and frequency domain, so that one can combine techniques that are easy to execute in the physical domain with other techniques that are easy to execute in the frequency domain. (In fact, one can combine the physical and frequency domains together into a product domain known as phase space, and there are entire fields of mathematics (e.g. microlocal analysis, geometric quantisation, time-frequency analysis) devoted to performing analysis on these sorts of spaces directly, but this is beyond the scope of this course.)
In these notes, we briefly discuss the general theory of the Fourier transform, but will mainly focus on the two classical domains for Fourier analysis: the torus , and the Euclidean space . For these domains one has the advantage of being able to perform very explicit algebraic calculations, involving concrete functions such as plane waves or Gaussians .
In the previous two quarters, we have been focusing largely on the “soft” side of real analysis, which is primarily concerned with “qualitative” properties such as convergence, compactness, measurability, and so forth. In contrast, we will begin this quarter with more of an emphasis on the “hard” side of real analysis, in which we study estimates and upper and lower bounds of various quantities, such as norms of functions or operators. (Of course, the two sides of analysis are closely connected to each other; an understanding of both sides and their interrelationships, are needed in order to get the broadest and most complete perspective for this subject.)
One basic tool in hard analysis is that of interpolation, which allows one to start with a hypothesis of two (or more) “upper bound” estimates, e.g. and , and conclude a family of intermediate estimates (or maybe , where is a constant) for any choice of parameter . Of course, interpolation is not a magic wand; one needs various hypotheses (e.g. linearity, sublinearity, convexity, or complexifiability) on in order for interpolation methods to be applicable. Nevertheless, these techniques are available for many important classes of problems, most notably that of establishing boundedness estimates such as for linear (or “linear-like”) operators from one Lebesgue space to another . (Interpolation can also be performed for many other normed vector spaces than the Lebesgue spaces, but we will just focus on Lebesgue spaces in these notes to focus the discussion.) Using interpolation, it is possible to reduce the task of proving such estimates to that of proving various “endpoint” versions of these estimates. In some cases, each endpoint only faces a portion of the difficulty that the interpolated estimate did, and so by using interpolation one has split the task of proving the original estimate into two or more simpler subtasks. In other cases, one of the endpoint estimates is very easy, and the other one is significantly more difficult than the original estimate; thus interpolation does not really simplify the task of proving estimates in this case, but at least clarifies the relative difficulty between various estimates in a given family.
As is the case with many other tools in analysis, interpolation is not captured by a single “interpolation theorem”; instead, there are a family of such theorems, which can be broadly divided into two major categories, reflecting the two basic methods that underlie the principle of interpolation. The real interpolation method is based on a divide and conquer strategy: to understand how to obtain control on some expression such as for some operator and some function , one would divide into two or more components, e.g. into components where is large and where is small, or where is oscillating with high frequency or only varying with low frequency. Each component would be estimated using a carefully chosen combination of the extreme estimates available; optimising over these choices and summing up (using whatever linearity-type properties on are available), one would hope to get a good estimate on the original expression. The strengths of the real interpolation method are that the linearity hypotheses on can be relaxed to weaker hypotheses, such as sublinearity or quasilinearity; also, the endpoint estimates are allowed to be of a weaker “type” than the interpolated estimates. On the other hand, the real interpolation often concedes a multiplicative constant in the final estimates obtained, and one is usually obligated to keep the operator fixed throughout the interpolation process. The proofs of real interpolation theorems are also a little bit messy, though in many cases one can simply invoke a standard instance of such theorems (e.g. the Marcinkiewicz interpolation theorem) as a black box in applications.
The complex interpolation method instead proceeds by exploiting the powerful tools of complex analysis, in particular the maximum modulus principle and its relatives (such as the Phragmén-Lindelöf principle). The idea is to rewrite the estimate to be proven (e.g. ) in such a way that it can be embedded into a family of such estimates which depend holomorphically on a complex parameter in some domain (e.g. the strip . One then exploits things like the maximum modulus principle to bound an estimate corresponding to an interior point of this domain by the estimates on the boundary of this domain. The strengths of the complex interpolation method are that it typically gives cleaner constants than the real interpolation method, and also allows the underlying operator to vary holomorphically with respect to the parameter , which can significantly increase the flexibility of the interpolation technique. The proofs of these methods are also very short (if one takes the maximum modulus principle and its relatives as a black box), which make the method particularly amenable for generalisation to more intricate settings (e.g. multilinear operators, mixed Lebesgue norms, etc.). On the other hand, the somewhat rigid requirement of holomorphicity makes it much more difficult to apply this method to non-linear operators, such as sublinear or quasilinear operators; also, the interpolated estimate tends to be of the same “type” as the extreme ones, so that one does not enjoy the upgrading of weak type estimates to strong type estimates that the real interpolation method typically produces. Also, the complex method runs into some minor technical problems when target space ceases to be a Banach space (i.e. when ) as this makes it more difficult to exploit duality.
Despite these differences, the real and complex methods tend to give broadly similar results in practice, especially if one is willing to ignore constant losses in the estimates or epsilon losses in the exponents.
The theory of both real and complex interpolation can be studied abstractly, in general normed or quasi-normed spaces; see e.g. this book for a detailed treatment. However in these notes we shall focus exclusively on interpolation for Lebesgue spaces (and their cousins, such as the weak Lebesgue spaces and the Lorentz spaces ).
The 245B final can be found here. I am not posting solutions, but readers (both students and non-students) are welcome to discuss the final questions in the comments below.
The continuation to this course, 245C, will begin on Monday, March 29. The topics for this course are still somewhat fluid – but I tentatively plan to cover the following topics, roughly in order:
- spaces and interpolation; fractional integration
- The Fourier transform on (a very quick review; this is of course covered more fully in 247A)
- Schwartz functions, and the theory of distributions
- Hausdorff measure
- The spectral theorem (introduction only; the topic is covered in depth in 255A)
I am open to further suggestions for topics that would build upon the 245AB material, which would be of interest to students, and which would not overlap too substantially with other graduate courses offered at UCLA.