You are currently browsing Terence Tao’s articles.
Consider the free Schrödinger equation in spatial dimensions, which I will normalise as
is the unknown field and
is the spatial Laplacian. To avoid irrelevant technical issues I will restrict attention to smooth (classical) solutions to this equation, and will work locally in spacetime avoiding issues of decay at infinity (or at other singularities); I will also avoid issues involving branch cuts of functions such as
(if one wishes, one can restrict
to be even in order to safely ignore all branch cut issues). The space of solutions to (1) enjoys a number of symmetries. A particularly non-obvious symmetry is the pseudoconformal symmetry: if
solves (1), then the pseudoconformal solution
defined by
can be seen after some computation to also solve (1). (If
has suitable decay at spatial infinity and one chooses a suitable branch cut for
, one can extend
continuously to the
spatial slice, whereupon it becomes essentially the spatial Fourier transform of
, but we will not need this fact for the current discussion.)
An analogous symmetry exists for the free wave equation in spatial dimensions, which I will write as
is the unknown field. In analogy to pseudoconformal symmetry, we have conformal symmetry: if
solves (3), then the function
, defined in the interior
of the light cone by the formula
There are also some direct links between the Schrödinger equation in dimensions and the wave equation in
dimensions. This can be easily seen on the spacetime Fourier side: solutions to (1) have spacetime Fourier transform (formally) supported on a
-dimensional hyperboloid, while solutions to (3) have spacetime Fourier transform formally supported on a
-dimensional cone. To link the two, one then observes that the
-dimensional hyperboloid can be viewed as a conic section (i.e. hyperplane slice) of the
-dimensional cone. In physical space, this link is manifested as follows: if
solves (1), then the function
defined by
solves (3). More generally, for any non-zero scaling parameter , the function
defined by
As an “extra challenge” posed in an exercise in one of my books (Exercise 2.28, to be precise), I asked the reader to use the embeddings (or more generally
) to explicitly connect together the pseudoconformal transformation
and the conformal transformation
. It turns out that this connection is a little bit unusual, with the “obvious” guess (namely, that the embeddings
intertwine
and
) being incorrect, and as such this particular task was perhaps too difficult even for a challenge question. I’ve been asked a couple times to provide the connection more explicitly, so I will do so below the fold.
Let be
Hermitian matrices, with eigenvalues
and
. The Harish-Chandra-Itzykson-Zuber integral formula exactly computes the integral
where is integrated over the Haar probability measure of the unitary group
and
is a non-zero complex parameter, as the expression
when the eigenvalues of are simple, where
denotes the Vandermonde determinant
and is the constant
There are at least two standard ways to prove this formula in the literature. One way is by applying the Duistermaat-Heckman theorem to the pushforward of Liouville measure on the coadjoint orbit (or more precisely, a rotation of such an orbit by
) under the moment map
, and then using a stationary phase expansion. Another way, which I only learned about recently, is to use the formulae for evolution of eigenvalues under Dyson Brownian motion (as well as the closely related formulae for the GUE ensemble), which were derived in this previous blog post. Both of these approaches can be found in several places in the literature (the former being observed in the original paper of Duistermaat and Heckman, and the latter observed in the paper of Itzykson and Zuber as well as in this later paper of Johansson), but I thought I would record both of these here for my own benefit.
The Harish-Chandra-Itzykson-Zuber formula can be extended to other compact Lie groups than . At first glance, this might suggest that these formulae could be of use in the study of the GOE ensemble, but unfortunately the Lie algebra associated to
corresponds to real anti-symmetric matrices rather than real symmetric matrices. This also occurs in the
case, but there one can simply multiply by
to rotate a complex skew-Hermitian matrix into a complex Hermitian matrix. This is consistent, though, with the fact that the (somewhat rarely studied) anti-symmetric GOE ensemble has cleaner formulae (in particular, having a determinantal structure similar to GUE) than the (much more commonly studied) symmetric GOE ensemble.
[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]
One of the most fundamental partial differential equations in mathematics is the heat equation
is a scalar function
of both time and space, and
is the Laplacian
. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of
in the definition of the heat propagator
is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.
In probability theory, this equation takes on particular significance when is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that
for all . (Actually, it suffices to verify this constraint at time
, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret
as the probability distribution of a Brownian motion
is a stochastic process with initial probability distribution
; see for instance this previous blog post for more discussion.
A model example of a solution to the heat equation to keep in mind is that of the fundamental solution
, which represents the distribution of Brownian motion of a particle starting at the origin
at time
. At time
,
represents an
-valued random variable, each coefficient of which is an independent random variable of mean zero and variance
. (As
,
converges in the sense of distributions to a Dirac mass at the origin.)
The heat equation can also be viewed as the gradient flow for the Dirichlet form
, which formally implies that
is (half of) the negative gradient of the Dirichlet energy
with respect to the
inner product. Among other things, this implies that the Dirichlet energy decreases in time:
that
Since is non-negative, the formula (6) implies that
is integrable in time, and in particular we see that
converges to zero as
, in some averaged
sense at least; similarly, (8) suggests that
also converges to zero. This suggests that
converges to a constant function; but as
is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in
to decay to zero in some sense as
. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the
norm like
.
Since , we also observe the basic cancellation property
.
There are other quantities relating to that also decrease in time under heat flow, particularly in the important case when
is a probability measure. In this case, it is natural to introduce the entropy
Thus, for instance, if is the uniform distribution on some measurable subset
of
of finite measure
, the entropy would be
. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has
for any
, reflecting the fact that
is approximately uniformly distributed on a ball of radius
(and thus of measure
).
A short formal computation shows (if one assumes for simplicity that is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that
where is the square root of
. For instance, if
is the fundamental solution (3), one can check that
(note that this is a significantly cleaner formula than (7)!).
In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.
Actually, one can say more: the rate of decrease of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have
, we see from (1) that
and thus (again assuming that , and hence
, is strictly positive to avoid technicalities)
We thus have
It is now convenient to compute using the Einstein summation convention to hide the summation over indices . We have
and
By integration by parts and interchanging partial derivatives, we may write the first integral as
and from the quotient and product rules, we may write the second integral as
Gathering terms, completing the square, and making the summations explicit again, we see that
and so in particular is always decreasing.
The above identity can also be written as
Exercise 1 Give an alternate proof of the above identity by writing
,
and deriving the equation
for
.
It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.
Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our survey “Small doubling in groups“, for the proceedings of the upcoming Erdos Centennial. This is a short survey of the known results on classifying finite subsets of an (abelian) additive group
or a (not necessarily abelian) multiplicative group
that have small doubling in the sense that the sum set
or product set
is small. Such sets behave approximately like finite subgroups of
(and there is a closely related notion of an approximate group in which the analogy is even tighter) , and so this subject can be viewed as a sort of approximate version of finite group theory. (Unfortunately, thus far the theory does not have much new to say about the classification of actual finite groups; progress has been largely made instead on classifying the (highly restricted) number of ways in which approximate groups can differ from a genuine group.)
In the classical case when is the integers
, these sets were classified (in a qualitative sense, at least) by a celebrated theorem of Freiman, which roughly speaking says that such sets
are necessarily “commensurate” in some sense with a (generalised) arithmetic progression
of bounded rank. There are a number of essentially equivalent ways to define what “commensurate” means here; for instance, in the original formulation of the theorem, one asks that
be a dense subset of
, but in modern formulations it is often more convenient to require instead that
be of comparable size to
and be covered by a bounded number of translates of
, or that
and
have an intersection that is of comparable size to both
and
(cf. the notion of commensurability in group theory).
Freiman’s original theorem was extended to more general abelian groups in a sequence of papers culminating in the paper of Green and Ruzsa that handled arbitrary abelian groups. As such groups now contain non-trivial finite subgroups, the conclusion of the theorem must be modified by allowing for “coset progressions” , which can be viewed as “extensions” of generalized arithmetic progressions
by genuine finite groups
.
The proof methods in these abelian results were Fourier-analytic in nature (except in the cases of sets of very small doubling, in which more combinatorial approaches can be applied, and there were also some geometric or combinatorial methods that gave some weaker structural results). As such, it was a challenge to extend these results to nonabelian groups, although for various important special types of ambient group (such as an linear group over a finite or infinite field) it turns out that one can use tools exploiting the special structure of those groups (e.g. for linear groups one would use tools from Lie theory and algebraic geometry) to obtain quite satisfactory results; see e.g. this survey of Pyber and Szabo for the linear case. When the ambient group
is completely arbitrary, it turns out the problem is closely related to the classical Hilbert’s fifth problem of determining the minimal requirements of a topological group in order for such groups to have Lie structure; this connection was first observed and exploited by Hrushovski, and then used by Breuillard, Green, and myself to obtain the analogue of Freiman’s theorem for an arbitrary nonabelian group.
This survey is too short to discuss in much detail the proof techniques used in these results (although the abelian case is discussed in this book of mine with Vu, and the nonabelian case discussed in this more recent book of mine), but instead focuses on the statements of the various known results, as well as some remaining open questions in the subject (in particular, there is substantial work left to be done in making the estimates more quantitative, particularly in the nonabelian setting).
The determinant of a square matrix
obeys a large number of important identities, the most basic of which is the multiplicativity property
are square matrices of the same dimension. This identity then generates many other important identities. For instance, if
is an
matrix and
is an
matrix, then by applying the previous identity to equate the determinants of
and
(where we will adopt the convention that
denotes an identity matrix of whatever dimension is needed to make sense of the expressions being computed, and similarly for
) we obtain the Sylvester determinant identity
determinant with an
determinant, is very useful in random matrix theory (a point emphasised in particular by Deift), particularly in regimes in which
is much smaller than
.
Another identity generated from (1) arises when trying to compute the determinant of a block matrix
where is an
matrix,
is an
matrix,
is an
matrix, and
is an
matrix. If
is invertible, then we can manipulate this matrix via block Gaussian elimination as
and on taking determinants using (1) we obtain the Schur determinant identity
of the upper left block
. This identity can be viewed as the correct way to generalise the
determinant formula
It is also possible to use determinant identities to deduce other matrix identities that do not involve the determinant, by the technique of matrix differentiation (or equivalently, matrix linearisation). The key observation is that near the identity, the determinant behaves like the trace, or more precisely one has
and infinitesimal
. (If one is uncomfortable with infinitesimals, one can interpret this sort of identity as an asymptotic as
.) Combining this with (1) we see that for square matrices
of the same dimension with
invertible and
invertible, one has
for infinitesimal . To put it another way, if
is a square matrix that depends in a differentiable fashion on a real parameter
, then
whenever is invertible. (Note that if one combines this identity with cofactor expansion, one recovers Cramer’s rule.)
Let us see some examples of this differentiation method. If we take the Sylvester identity (2) and multiply one of the rectangular matrices by an infinitesimal
, we obtain
applying (4) and extracting the linear term in (or equivalently, differentiating at
and then setting
) we conclude the cyclic property of trace:
To manipulate derivatives and inverses, we begin with the Neumann series approximation
for bounded square and infinitesimal
, which then leads to the more general approximation
of the same dimension with
bounded. To put it another way, we have
whenever depends in a differentiable manner on
and
is invertible.
We can then differentiate (or linearise) the Schur identity (3) in a number of ways. For instance, if we replace the lower block by
for some test
matrix
, then by (4), the left-hand side of (3) becomes (assuming the invertibility of the block matrix)
while the right-hand side becomes
extracting the linear term in , we conclude that
As was an arbitrary
matrix, we conclude from duality that the lower right
block of
is given by the inverse
of the Schur complement:
One can also compute the other components of this inverse in terms of the Schur complement by a similar method (although the formulae become more complicated). As a variant of this method, we can perturb the block matrix in (3) by an infinitesimal multiple of the identity matrix giving
From (5), we have
and so from (4) the right-hand side of (6) is
extracting the linear component in , we conclude the identity
As a final example of this method, we can analyse low rank perturbations of a large (
) matrix
, where
is an
matrix and
is an
matrix for some
. (This type of situation is also common in random matrix theory, for instance it arose in this previous paper of mine on outliers to the circular law.) If
is invertible, then from (1) and (2) one has the matrix determinant lemma
if one then perturbs by an infinitesimal matrix
, we have
Extracting the linear component in as before, one soon arrives at
assuming that and
are both invertible; as
is arbitrary, we conclude (after using the cyclic property of trace) the Sherman-Morrison formula
for the inverse of a low rank perturbation of a matrix
. While this identity can be easily verified by direct algebraic computation, it is somewhat difficult to discover this identity by such algebraic manipulation; thus we see that the “determinant first” approach to matrix identities can make it easier to find appropriate matrix identities (particularly those involving traces and/or inverses), even if the identities one is ultimately interested in do not involve determinants. (As differentiation typically makes an identity lengthier, but also more “linear” or “additive”, the determinant identity tends to be shorter (albeit more nonlinear and more multiplicative) than the differentiated identity, and can thus be slightly easier to derive.)
Exercise 1 Use the “determinant first” approach to derive the Woodbury matrix identity (also known as the binomial inverse theorem)
where
is an
matrix,
is an
matrix,
is an
matrix, and
is an
matrix, assuming that
,
and
are all invertible.
Exercise 2 Let
be invertible
matrices. Establish the identity
and differentiate this in
to deduce the identity
(assuming that all inverses exist) and thence
Rotating
by
then gives
which is useful for inverting a matrix
that has been split into a self-adjoint component
and a skew-adjoint component
.
Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers or the real numbers
. Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.
A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system (or related systems, such as
if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence
) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.
However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as ; instead, one has to say something like “the length of this object is
yards”, combining both a number
and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now
feet; if one instead uses metres, the length is now
metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:
It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels metres in
seconds, then its speed should be
where we use the usual abbreviations of and
for metres and seconds respectively. Similarly, if the speed of light
is
and an object has mass
, then Einstein’s mass-energy equivalence
then tells us that the energy-content of this object is
Note that the symbols are being manipulated algebraically as if they were mathematical variables such as
and
. By collecting all these units together, we see that every physical quantity gets assigned a unit of a certain dimension: for instance, we see here that the energy
of an object can be given the unit of
(more commonly known as a Joule), which has the dimension of
where
are the dimensions of mass, length, and time respectively.
There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if is a mass (having the units
) and
is a speed (having the units
), then it is physically “legitimate” to form an expression such as
, but not an expression such as
or
; in a similar spirit, statements such as
or
are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as
or
should only be applied to arguments that are dimensionless; thus, for instance, if
is a speed, then
is not physically meaningful, but
is (this particular quantity is known as the rapidity associated to this speed).
These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure of a symplectic manifold can be usefully thought of as a component of the exponential
of the symplectic form
).
However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object , the energy content
, and the speed of light
, dimensional analysis is already sufficient to deduce that the relationship must be of the form
for some dimensionless absolute constant
; the only remaining task is then to work out the constant of proportionality
, which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham
theorem.)
The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).
The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that are equidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation as the assertion that the energy
is the volume of a rectangular box whose height is the mass
and whose length and width is given by the speed of light
.
But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations, particularly when manipulating vector or tensor-valued quantities. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as or
are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality; it is also close to the more informal practice of treating mathematical manipulations that do not preserve dimensional consistency as being physically meaningless.
Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle. I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle. This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing); I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes. (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)
The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).
[Dec 24 and then Jan 21: notes updated, in response to comments.]
Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:
Lemma 1 (Szemerédi regularity lemma) Let
be a graph on
vertices, and let
. Then there exists a partition
for some
with the property that for all but at most
of the pairs
, the pair
is
-regular in the sense that
whenever
are such that
and
, and
is the edge density between
and
. Furthermore, the partition is equitable in the sense that
for all
.
There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of , which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.
For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells by their magnitude when counting bad pairs):
Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let
be a graph on
vertices, and let
. Then there exists a partition
for some
with the property that for all pairs
outside of an exceptional set
, one has
whenever
, for some real number
, where
is the number of edges between
and
. Furthermore, we have
Let us now prove Lemma 2. We enumerate (after relabeling) as
. The adjacency matrix
of the graph
is then a self-adjoint
matrix, and thus admits an eigenvalue decomposition
for some orthonormal basis of
and some eigenvalues
, which we arrange in decreasing order of magnitude:
We can compute the trace of as
But we also have , so
.
Let be a function (depending on
) to be chosen later, with
for all
. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find
such that
(Indeed, the bound on is basically
iterated
times.) We can now split
is the “structured” component
is the “small” component
is the “pseudorandom” component
We now design a vertex partition to make approximately constant on most cells. For each
, we partition
into
cells on which
(viewed as a function from
to
) only fluctuates by
, plus an exceptional cell of size
coming from the values where
is excessively large (larger than
). Combining all these partitions together, we can write
for some
, where
has cardinality at most
, and for all
, the eigenfunctions
all fluctuate by at most
. In particular, if
, then (by (4) and (6)) the entries of
fluctuate by at most
on each block
. If we let
be the mean value of these entries on
, we thus have
and
, where we view the indicator functions
as column vectors of dimension
.
Next, we observe from (3) and (7) that . If we let
be the coefficients of
, we thus have
and hence by Markov’s inequality we have
outside of an exceptional set
with
If avoids
, we thus have
, by (10) and the Cauchy-Schwarz inequality.
Finally, to control we see from (4) and (8) that
has an operator norm of at most
. In particular, we have from the Cauchy-Schwarz inequality that
.
Let be the set of all pairs
where either
,
,
, or
One easily verifies that (2) holds. If is not in
, then by summing (9), (11), (12) and using (5), we see that
. The left-hand side is just
. As
, we have
and so (since )
If we let be a sufficiently rapidly growing function of
that depends on
, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.
To prove Lemma 1, one argues similarly (after modifying as necessary), except that the initial partition
of
constructed above needs to be subdivided further into equitable components (of size
), plus some remainder sets which can be aggregated into an exceptional component of size
(and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.
Remark 1 It is easy to verify that
needs to be growing exponentially in
in order for the above argument to work, which leads to tower-exponential bounds in the number of cells
in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying
, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting
essentially gives the weak regularity lemma of Frieze and Kannan.
Remark 2 If we specialise to a Cayley graph, in which
is a finite abelian group and
for some (symmetric) subset
of
, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes
are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components
of
, representing high, medium, and low eigenvalues of
, then become a decomposition associated to high, medium, and low Fourier coefficients of
.
Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.
Lars Hörmander, who made fundamental contributions to all areas of partial differential equations, but particularly in developing the analysis of variable-coefficient linear PDE, died last Sunday, aged 81.
I unfortunately never met Hörmander personally, but of course I encountered his work all the time while working in PDE. One of his major contributions to the subject was to systematically develop the calculus of Fourier integral operators (FIOs), which are a substantial generalisation of pseudodifferential operators and which can be used to (approximately) solve linear partial differential equations, or to transform such equations into a more convenient form. Roughly speaking, Fourier integral operators are to linear PDE as canonical transformations are to Hamiltonian mechanics (and one can in fact view FIOs as a quantisation of a canonical transformation). They are a large class of transformations, for instance the Fourier transform, pseudodifferential operators, and smooth changes of the spatial variable are all examples of FIOs, and (as long as certain singular situations are avoided) the composition of two FIOs is again an FIO.
The full theory of FIOs is quite extensive, occupying the entire final volume of Hormander’s famous four-volume series “The Analysis of Linear Partial Differential Operators”. I am certainly not going to try to attempt to summarise it here, but I thought I would try to motivate how these operators arise when trying to transform functions. For simplicity we will work with functions on a Euclidean domain
(although FIOs can certainly be defined on more general smooth manifolds, and there is an extension of the theory that also works on manifolds with boundary). As this will be a heuristic discussion, we will ignore all the (technical, but important) issues of smoothness or convergence with regards to the functions, integrals and limits that appear below, and be rather vague with terms such as “decaying” or “concentrated”.
A function can be viewed from many different perspectives (reflecting the variety of bases, or approximate bases, that the Hilbert space
offers). Most directly, we have the physical space perspective, viewing
as a function
of the physical variable
. In many cases, this function will be concentrated in some subregion
of physical space. For instance, a gaussian wave packet
,
and
are parameters, would be physically concentrated in the ball
. Then we have the frequency space (or momentum space) perspective, viewing
now as a function
of the frequency variable
. For this discussion, it will be convenient to normalise the Fourier transform using a small constant
(which has the physical interpretation of Planck’s constant if one is doing quantum mechanics), thus
For instance, for the gaussian wave packet (1), one has
and so we see that is concentrated in frequency space in the ball
.
However, there is a third (but less rigorous) way to view a function in
, which is the phase space perspective in which one tries to view
as distributed simultaneously in physical space and in frequency space, thus being something like a measure on the phase space
. Thus, for instance, the function (1) should heuristically be concentrated on the region
in phase space. Unfortunately, due to the uncertainty principle, there is no completely satisfactory way to canonically and rigorously define what the “phase space portrait” of a function
should be. (For instance, the Wigner transform of
can be viewed as an attempt to describe the distribution of the
energy of
in phase space, except that this transform can take negative or even complex values; see Folland’s book for further discussion.) Still, it is a very useful heuristic to think of functions has having a phase space portrait, which is something like a non-negative measure on phase space that captures the distribution of functions in both space and frequency, albeit with some “quantum fuzziness” that shows up whenever one tries to inspect this measure at scales of physical space and frequency space that together violate the uncertainty principle. (The score of a piece of music is a good everyday example of a phase space portrait of a function, in this case a sound wave; here, the physical space is the time axis (the horizontal dimension of the score) and the frequency space is the vertical dimension. Here, the time and frequency scales involved are well above the uncertainty principle limit (a typical note lasts many hundreds of cycles, whereas the uncertainty principle kicks in at
cycles) and so there is no obstruction here to musical notation being unambiguous.) Furthermore, if one takes certain asymptotic limits, one can recover a precise notion of a phase space portrait; for instance if one takes the semiclassical limit
then, under certain circumstances, the phase space portrait converges to a well-defined classical probability measure on phase space; closely related to this is the high frequency limit of a fixed function, which among other things defines the wave front set of that function, which can be viewed as another asymptotic realisation of the phase space portrait concept.
If functions in can be viewed as a sort of distribution in phase space, then linear operators
should be viewed as various transformations on such distributions on phase space. For instance, a pseudodifferential operator
should correspond (as a zeroth approximation) to multiplying a phase space distribution by the symbol
of that operator, as discussed in this previous blog post. Note that such operators only change the amplitude of the phase space distribution, but not the support of that distribution.
Now we turn to operators that alter the support of a phase space distribution, rather than the amplitude; we will focus on unitary operators to emphasise the amplitude preservation aspect. These will eventually be key examples of Fourier integral operators. A physical translation should correspond to pushing forward the distribution by the transformation
, as can be seen by comparing the physical and frequency space supports of
with that of
. Similarly, a frequency modulation
should correspond to the transformation
; a linear change of variables
, where
is an invertible linear transformation, should correspond to
; and finally, the Fourier transform
should correspond to the transformation
.
Based on these examples, one may hope that given any diffeomorphism of phase space, one could associate some sort of unitary (or approximately unitary) operator
, which (heuristically, at least) pushes the phase space portrait of a function forward by
. However, there is an obstruction to doing so, which can be explained as follows. If
pushes phase space portraits by
, and pseudodifferential operators
multiply phase space portraits by
, then this suggests the intertwining relationship
and thus is approximately conjugate to
:
Applying commutators, we conclude the approximate conjugacy relationship
Now, the pseudodifferential calculus (as discussed in this previous post) tells us (heuristically, at least) that
and
where is the Poisson bracket. Comparing this with (2), we are then led to the compatibility condition
thus needs to preserve (approximately, at least) the Poisson bracket, or equivalently
needs to be a symplectomorphism (again, approximately at least).
Now suppose that is a symplectomorphism. This is morally equivalent to the graph
being a Lagrangian submanifold of
(where we give the second copy of phase space the negative
of the usual symplectic form
, thus yielding
as the full symplectic form on
; this is another instantiation of the closed graph theorem, as mentioned in this previous post. This graph is known as the canonical relation for the (putative) FIO that is associated to
. To understand what it means for this graph to be Lagrangian, we coordinatise
as
suppose temporarily that this graph was (locally, at least) a smooth graph in the
and
variables, thus
for some smooth functions . A brief computation shows that the Lagrangian property of
is then equivalent to the compatibility conditions
for , where
denote the components of
. Some Fourier analysis (or Hodge theory) lets us solve these equations as
for some smooth potential function . Thus, we have parameterised our graph
as
maps
to
.
A reasonable candidate for an operator associated to and
in this fashion is the oscillatory integral operator
(note that the Fourier transform is the special case when
and
, which helps explain the genesis of the term “Fourier integral operator”). Indeed, if one computes an inner product
for gaussian wave packets
of the form (1) and localised in phase space near
respectively, then a Taylor expansion of
around
, followed by a stationary phase computation, shows (again heuristically, and assuming
is suitably non-degenerate) that
has (3) as its canonical relation. (Furthermore, a refinement of this stationary phase calculation suggests that if
is normalised to be the half-density
, then
should be approximately unitary.) As such, we view (4) as an example of a Fourier integral operator (assuming various smoothness and non-degeneracy hypotheses on the phase
and amplitude
which we do not detail here).
Of course, it may be the case that is not a graph in the
coordinates (for instance, the key examples of translation, modulation, and dilation are not of this form), but then it is often a graph in some other pair of coordinates, such as
. In that case one can compose the oscillatory integral construction given above with a Fourier transform, giving another class of FIOs of the form
for some time
, which can be written in the form
This corresponds to the phase space transformation , which can be viewed as the classical propagator associated to the “quantum” propagator
. More generally, propagators for linear Hamiltonian partial differential equations can often be expressed (at least approximately) by Fourier integral operators corresponding to the propagator of the associated classical Hamiltonian flow associated to the symbol of the Hamiltonian operator
; this leads to an important mathematical formalisation of the correspondence principle between quantum mechanics and classical mechanics, that is one of the foundations of microlocal analysis and which was extensively developed in Hörmander’s work. (More recently, numerically stable versions of this theory have been developed to allow for rapid and accurate numerical solutions to various linear PDE, for instance through Emmanuel Candés’ theory of curvelets, so the theory that Hörmander built now has some quite significant practical applications in areas such as geology.)
In some cases, the canonical relation may have some singularities (such as fold singularities) which prevent it from being written as graphs in the previous senses, but the theory for defining FIOs even in these cases, and in developing their calculus, is now well established, in large part due to the foundational work of Hörmander.

Recent Comments