You are currently browsing Terence Tao's articles.
In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)
The basic result in this area is
Theorem 1 (Circular law) Let
be an
iid matrix, whose entries
,
are iid with a fixed (complex) distribution
of mean zero and variance one. Then the spectral measure
converges both in probability and almost surely to the circular law
, where
are the real and imaginary coordinates of the complex plane.
This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts , was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.
At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.
In harmonic analysis and PDE, one often wants to place a function on some domain (let’s take a Euclidean space
for simplicity) in one or more function spaces in order to quantify its “size” in some sense. Examples include
- The Lebesgue spaces
of functions
whose norm
is finite, as well as their relatives such as the weak
spaces
(and more generally the Lorentz spaces
) and Orlicz spaces such as
and
;
- The classical regularity spaces
, together with their Hölder continuous counterparts
;
- The Sobolev spaces
of functions
of functions whose norm
is finite (other equivalent definitions of this norm exist, and there are technicalities if
is negative or
), as well as relatives such as homogeneous Sobolev spaces
, Besov spaces
, and Triebel-Lizorkin spaces
. (The conventions for the superscripts and subscripts here are highly variable.)
- Hardy spaces
, the space BMO of functions of bounded mean oscillation (and the subspace VMO of functions of vanishing mean oscillation);
- The Wiener algebra
;
- Morrey spaces
;
- The space
of finite measures;
- etc., etc.
As the above partial list indicates, there is an entire zoo of function spaces one could consider, and it can be difficult at first to see how they are organised with respect to each other. However, one can get some clarity in this regard by drawing a type diagram for the function spaces one is trying to study. A type diagram assigns a tuple (usually a pair) of relevant exponents to each function space. For function spaces on Euclidean space, two such exponents are the regularity
of the space, and the integrability
of the space. These two quantities are somewhat fuzzy in nature (and are not easily defined for all possible function spaces), but can basically be described as follows. We test the function space norm
of a modulated rescaled bump function
(1)
where is an amplitude,
is a radius,
is a test function,
is a position, and
is a frequency of some magnitude
. One then studies how the norm
depends on the parameters
. Typically, one has a relationship of the form
(2)
for some exponents , at least in the high-frequency case when
is large (in particular, from the uncertainty principle it is natural to require
, and when dealing with inhomogeneous norms it is also natural to require
). The exponent
measures how sensitive the
norm is to oscillation, and thus controls regularity; if
is large, then oscillating functions will have large
norm, and thus functions in
will tend not to oscillate too much and thus be smooth. Similarly, the exponent
measures how sensitive the
norm is to the function
spreading out to large scales; if
is small, then slowly decaying functions will have large norm, so that functions in
tend to decay quickly; conversely, if
is large, then singular functions will tend to have large norm, so that functions in
will tend to not have high peaks.
Note that the exponent in (2) could be positive, zero, or negative, however the exponent
should be non-negative, since intuitively enlarging
should always lead to a larger (or at least comparable) norm. Finally, the exponent in the
parameter should always be
, since norms are by definition homogeneous. Note also that the position
plays no role in (1); this reflects the fact that most of the popular function spaces in analysis are translation-invariant.
The type diagram below plots the indices of various spaces. The black dots indicate those spaces for which the
indices are fixed; the blue dots are those spaces for which at least one of the
indices are variable (and so, depending on the value chosen for these parameters, these spaces may end up in a different location on the type diagram than the typical location indicated here).
(There are some minor cheats in this diagram, for instance for the Orlicz spaces and
one has to adjust (1) by a logarithmic factor. Also, the norms for the Schwartz space
are not translation-invariant and thus not perfectly describable by this formalism. This picture should be viewed as a visual aid only, and not as a genuinely rigorous mathematical statement.)
The type diagram can be used to clarify some of the relationships between function spaces, such as Sobolev embedding. For instance, when working with inhomogeneous spaces (which basically identifies low frequencies with medium frequencies
, so that one is effectively always in the regime
), then decreasing the
parameter results in decreasing the right-hand side of (1). Thus, one expects the function space norms to get smaller (and the function spaces to get larger) if one decreases
while keeping
fixed. Thus, for instance,
should be contained in
, and so forth. Note however that this inclusion is not available for homogeneous function spaces such as
, in which the frequency parameter
can be either much larger than
or much smaller than
.
Similarly, if one is working in a compact domain rather than in , then one has effectively capped the radius parameter
to be bounded, and so we expect the function space norms to get smaller (and the function spaces to get larger) as one increases
, thus for instance
will be contained in
. Conversely, if one is working in a discrete domain such as
, then the radius parameter
has now effectively been bounded from below, and the reverse should occur: the function spaces should get larger as one decreases
. (If the domain is both compact and discrete, then it is finite, and on a finite-dimensional space all norms are equivalent.)
As mentioned earlier, the uncertainty principle suggests that one has the restriction . From this and (2), we expect to be able to enlarge the function space by trading in the regularity parameter
for the integrability parameter
, keeping the dimensional quantity
fixed. This is indeed how Sobolev embedding works. Note in some cases one runs out of regularity before p goes all the way to infinity (thus ending up at an
space), while in other cases p hits infinity first. In the latter case, one can embed the Sobolev space into a Holder space such as
.
On continuous domains, one can send the frequency off to infinity, keeping the amplitude
and radius
fixed. From this and (1) we see that norms with a lower regularity
can never hope to control norms with a higher regularity
, no matter what one does with the integrability parameter. Note however that in discrete settings this obstruction disappears; when working on, say,
, then in fact one can gain as much regularity as one wishes for free, and there is no distinction between a Lebesgue space
and their Sobolev counterparts
in such a setting.
When interpolating between two spaces (using either the real or complex interpolation method), the interpolated space usually has regularity and integrability exponents on the line segment between the corresponding exponents of the endpoint spaces. (This can be heuristically justified from the formula (2) by thinking about how the real or complex interpolation methods actually work.) Typically, one can control the norm of the interpolated space by the geometric mean of the endpoint norms that is indicated by this line segment; again, this is plausible from looking at (2).
The space is self-dual. More generally, the dual of a function space
will generally have type exponents that are the reflection of the original exponents around the
origin. Consider for instance the dual spaces
or
in the above diagram.
Spaces whose integrability exponent is larger than 1 (i.e. which lie to the left of the dotted line) tend to be Banach spaces, while spaces whose integrability exponent is less than 1 are almost never Banach spaces. (This can be justified by covering a large ball into small balls and considering how (1) would interact with the triangle inequality in this case). The case
is borderline; some spaces at this level of integrability, such as
, are Banach spaces, while other spaces, such as
, are not.
While the regularity and integrability
are usually the most important exponents in a function space (because amplitude, width, and frequency are usually the most important features of a function in analysis), they do not tell the entire story. One major reason for this is that the modulated bump functions (1), while an important class of test examples of functions, are by no means the only functions that one would wish to study. For instance, one could also consider sums of bump functions (1) at different scales. The behaviour of the function space norms on such spaces is often controlled by secondary exponents, such as the second exponent
that arises in Lorentz spaces, Besov spaces, or Triebel-Lizorkin spaces. For instance, consider the function
, (3)
where is a large integer, representing the number of distinct scales present in
. Any function space with regularity
and
should assign each summand
in (3) a norm of O(1), so the norm of
could be as large as
if one assumes the triangle inequality. This is indeed the case for the
norm, but for the weak
norm, i.e. the
norm,
only has size
. More generally, for the Lorentz spaces
,
will have a norm of about
. Thus we see that such secondary exponents can influence the norm of a function by an amount which is polynomial in the number of scales. In many applications, though, the number of scales is a “logarithmic” quantity and thus of lower order interest when compared against the “polynomial” exponents such as
and
. So the fine distinctions between, say, strong
and weak
, are only of interest in “critical” situations in which one cannot afford to lose any logarithmic factors (this is for instance the case in much of Calderon-Zygmund theory).
We have cheated somewhat by only working in the high frequency regime. When dealing with inhomogeneous spaces, one often has a different set of exponents for (1) in the low-frequency regime than in the high-frequency regime. In such cases, one sometimes has to use a more complicated type diagram to genuinely model the situation, e.g. by assigning to each space a convex set of type exponents rather than a single exponent, or perhaps having two separate type diagrams, one for the high frequency regime and one for the low frequency regime. Such diagrams can get quite complicated, and will probably not be much use to a beginner in the subject, though in the hands of an expert who knows what he or she is doing, they can still be an effective visual aid.
Starting on Monday, March 29, I will begin my graduate class for the winter quarter, entitled “Higher order Fourier analysis“. While classical Fourier analysis is concerned with correlations with linear phases such as (where
), quadratic and higher order Fourier analysis is concerned with quadratic and higher order phases such as
,
, etc.
In recent years, it has become clear that certain problems in additive combinatorics are naturally associated with a certain order of Fourier analysis. For instance, problems involving arithmetic progressions of length three are connected with classical Fourier analysis; problems involving progressions of length four are connected with quadratic Fourier analysis; problems involving progressions of length five are connected with cubic Fourier analysis; and so forth. The reasons for this will be discussed later in the course, but we will just give one indication of the connection here: linear phases and arithmetic progressions
of length three are connected by the identity
while quadratic phases and arithmetic progressions
of length four are connected by the identity
and so forth.
It turns out that in order to get a complete theory of higher order Fourier analysis, the simple polynomial phases of the type given above do not suffice. One must also consider more exotic objects such as locally polynomial phases, bracket polynomial phases (such as , and/or nilsequences (sequences arising from an orbit in a nilmanifold
). These (closely related) families of objects will be introduced later in the course.
Classical Fourier analysis revolves around the Fourier transform and the inversion formula. Unfortunately, we have not yet been able to locate similar identities in the higher order setting, but one can establish weaker results, such as higher order structure theorems and arithmetic regularity lemmas, which are sufficient for many purposes, such as proving Szemeredi’s theorem on arithmetic progressions, or my theorem with Ben Green that the primes contain arbitrarily long arithmetic progressions. These results are powered by the inverse conjecture for the Gowers norms, which is now extremely close to being fully resolved.
Our focus here will primarily be on the finitary approach to the subject, but there is also an important infinitary aspect to the theory also, originally coming from ergodic theory but more recently also from nonstandard analysis (or more precisely, ultralimit analysis), and we will touch upon this in the course also. If time permits, we will also present the number-theoretic applications of this machinery to counting arithmetic progressions and other linear patterns in the primes.
Now we turn attention to another important spectral statistic, the least singular value of an
matrix
(or, more generally, the least non-trivial singular value
of a
matrix with
). This quantity controls the invertibility of
. Indeed,
is invertible precisely when
is non-zero, and the operator norm
of
is given by
. This quantity is also related to the condition number
of
, which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of
(and more generally, of the shifts
for complex
) will be of importance in rigorously establishing the circular law for iid random matrices
, as it plays a key role in computing the Stieltjes transform
of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.
The least singular value
which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm
at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of , but are not able to control the “generic” or “incompressible” choices of
, for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with
for
, it gives an upper bound of
for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as
, though these are more difficult to compute than positive moments
.
So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the rows
of the matrix
, and the hyperplane spanned by the other
rows. The reason for this is as follows. First suppose that
, so that
is non-invertible, and there is a linear dependence between the rows
. Thus, one of the
will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the
distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so
.
More generally, if the least singular value is small, one generically expects many of these
distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row
and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of
with a unit normal
of this hyperplane.
When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal (which depends on all the rows other than
) is independent of
, so even after conditioning
to be fixed, the entries of
remain independent. As such, the dot product
is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal
is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)
These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).
To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble of random sign matrices, where
are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.
Van Vu and I have just uploaded to the arXiv our joint paper “The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi“. In this short paper we give a different proof of a high-dimensional Littlewood-Offord result of Frankl and Füredi, and in the process also affirmatively answer one of their open problems.
Let be
vectors in
, which we normalise to all have length at least
. For any given radius
, we consider the small ball probability
where are iid Bernoulli signs (i.e. they take values
or
independently with a probability of
of each), and
ranges over all (closed) balls of radius
. The Littlewood-Offord problem is to compute the quantity
where range over all vectors in
of length at least one. Informally, this number measures the extent to which a random walk of length
(with all steps of size at least one) can concentrate into a ball of radius
.
The one-dimensional case of this problem was answered by Erdös. First, one observes that one can normalise all the to be at least
(as opposed to being at most
). In the model case when
, he made the following simple observation: if a random sum
fell into a ball of radius
(which in the one-dimensional case, is an interval of length less than
), and one then changed one or more of the signs
from
to
, then the new sum must necessarily lie outside of the ball. In other words, for any ball
of radius
, the set of signs
for which
forms an antichain. Applying Sperner’s theorem, the maximal size of this antichain is
, and this soon leads to the exact value
when (the bound is attained in the extreme case
).
A similar argument works for higher values of , using Dilworth’s theorem instead of Sperner’s theorem, and gives the exact value
whenever and
for some natural number
, where
are the
largest binomial coefficients of
.
Now consider the higher-dimensional problem. One has the obvious bound
but it is not obvious whether this inequality is strict. In other words, is there some way to exploit the additional freedom given by higher dimensions to make random walks concentrate more than in the one-dimensional case?
For some values of , it turns out that the answer is no, as was first observed by Kleitman (and discussed further by Frankl and Füredi). Suppose for instance that
for some . Then one can consider the example in which
is one unit vector, and
is another unit vector orthogonal to
. The small ball probability in this case can be computed to equal
rather than
, which is slightly larger.
In the positive direction, Frankl and Füredi established the asymptotic
(holding
and
fixed). Furthermore, if
was close to an integer, and more precisely if
(so that the above counterexample can be avoided) they showed that for sufficiently large
(depending on
).
The factor was an artefact of their method, and they conjectured in fact that one should have
for sufficiently large
whenever
by Kleitman and for
by Frankl and Füredi.
In this paper we verify the conjecture of Frankl and Füredi (and give a new proof of their asymptotic (1)). Our main tool is the following high-dimensional Littlewood-Offord inequality:
Theorem 1 Suppose that
which is genuinely
-dimensional in the sense that for any hyperplane
going through the origin, one has
for at least
values of
. Then one has
Theorem 1 can be viewed as a high-dimensional variant of Erdös’s inequality (but without the sharp upper bound). It is proven by the Fourier-analytic method of Halász. (This theorem was announced in my book with Van Vu several years ago, but we did not get around to publishing it until now.)
Using Theorem 1, one can verify the conjecture of Frankl and Füredi fairly quickly (the deduction takes a little over a page). The main point is that if there is excessive concentration, then Theorem 1 quickly places almost all of the vectors to lie very close to a line. If all the vectors are close to a line, then we can project onto this line and rescale, which causes
to worsen a little bit in this reduction to the one-dimensional case, but it turns out that the bounds (2) allow us to tolerate this degradation of
once
(so it is fortunate that the cases
were already done for us!). If instead we have a vector far from the line (as is the case in the key counterexample), then we manually eliminate that vector using the parallelogram law, which effectively drops
below
(half of the time, at least) if
was initially less than
, which gives enough of a saving to conclude the argument.
One moral that one can draw from this argument is that one can use a quasi-sharp estimate (such as Theorem 1), which ostensibly loses constant factors, to then deduce a sharp estimate (such as the Frankl-Furëdi conjecture) that loses no constant factors, as long as one is in an asymptotic regime (in this case, and
large depending on
). The key is to exploit the fine structure in the main term (in this case, the piecewise constant nature of
when
passes over integers) to extract gains that can absorb the losses coming from the quasi-sharp estimate).
Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:
- The Gaussian Unitary Ensemble (GUE), which is an ensemble of random
Hermitian matrices
in which the upper-triangular entries are iid with distribution
, and the diagonal entries are iid with distribution
, and independent of the upper-triangular ones; and
- Gaussian random matrices, which is an ensemble of random
(non-Hermitian) matrices
whose entries are iid with distribution
.
The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal fine spacing information, such as the distribution of the gap between adjacent eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).
Similarly, we will see for the first time the circular law for eigenvalues of non-Hermitian matrices.
There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the text by Deift.
When solving the initial value problem to an ordinary differential equation, such as
is the unknown solution (taking values in some finite-dimensional vector space
),
is the initial datum, and
is some nonlinear function (which we will take to be smooth for sake of argument), then one can construct a solution locally in time via the Picard iteration method. There are two basic ideas. The first is to use the fundamental theorem of calculus to rewrite the initial value problem (1) as the problem of solving an integral equation,
defined by
is a contraction on a suitable complete metric space (e.g. a closed ball in the function space ), and thus has a unique fixed point in this space. This method works as long as one only seeks to construct local solutions (for time
in
for sufficiently small
), but the solutions constructed have a number of very good properties, including
- Existence: A solution
exists in the space
(and even in
) for
sufficiently small.
- Uniqueness: There is at most one solution
to the initial value problem in the space
(or in smoother spaces, such as
). (For solutions in the weaker space
we use the integral formulation (2) to define the solution concept.)
- Lipschitz continuous dependence on the data: If
is a sequence of initial data converging to
, then the associated solutions
converge uniformly to
on
(possibly after shrinking
slightly). In fact we have the Lipschitz bound
for
large enough and
, where
is an absolute constant.
This package of properties is referred to as (Lipschitz) wellposedness.
This method extends to certain partial differential equations, particularly those of a semilinear nature (linear except for lower order nonlinear terms). For instance, if trying to solve an initial value problem of the form
where now takes values in a function space
(e.g. a Sobolev space
),
is an initial datum,
is some (differential) operator (independent of
) that is (densely) defined on
, and
is a nonlinearity which is also (densely) defined on
, then (formally, at least) one can solve this problem by using Duhamel’s formula to convert the problem to that of solving an integral equation
and one can then hope to show that the associated nonlinear integral operator
is a contraction in a subset of a suitably chosen function space.
This method turns out to work surprisingly well for many semilinear partial differential equations, and in particular for semilinear parabolic, semilinear dispersive, and semilinear wave equations. As in the ODE case, when the method works, it usually gives the entire package of Lipschitz well-posedness: existence, uniqueness, and Lipschitz continuous dependence on the initial data, for short times at least.
However, when one moves from semilinear initial value problems to quasilinear initial value problems such as
in which the top order operator now depends on the solution
itself, then the nature of well-posedness changes; one can still hope to obtain (local) existence and uniqueness, and even continuous dependence on the data, but one usually is forced to give up Lipschitz continuous dependence at the highest available regularity (though one can often recover it at lower regularities). As a consequence, the Picard iteration method is not directly suitable for constructing solutions to such equations.
One can already see this phenomenon with a very simple equation, namely the one-dimensional constant-velocity transport equation
as part of the initial data. (If one wishes, one could view this equation as a rather trivial example of a system.
to emphasis this viewpoint, but this would be somewhat idiosyncratic.) One can solve this equation explicitly of course to get the solution
In particular, if we look at the solution just at time for simplicity, we have
Now let us see how this solution depends on the parameter
. One can ask whether this dependence is Lipschitz in
, in some function space
:
for some finite . But using the Newton approximation
we see that we should only expect such a bound when (and its translates) lie in
. Thus, we see a loss of derivatives phenomenon with regard to Lipschitz well-posedness; if the initial data
is in some regularity space, say
, then one only obtains Lipschitz dependence on
in a lower regularity space such as
.
We have just seen that if all one knows about the initial data is that it is bounded in a function space
, then one usually cannot hope to make the dependence of
on the velocity parameter
Lipschitz continuous. Indeed, one cannot even make it continuous uniformly in
. Given two values of
that are close together, e.g.
and
, and a reasonable function space
(e.g. a Sobolev space
, or a classical regularity space
) one can easily cook up a function
that is bounded in
but whose two solutions
and
separate in the
norm at time
, simply by choosing
to be supported on an interval of width
.
(Part of the problem here is that using a subtractive method to determine the distance between two solutions
is not a physically natural operation when transport mechanisms are present that could cause the key features of
(such as singularities) to be situated in slightly different locations. In such cases, the correct notion of distance may need to take transport into account, e.g. by using metrics of Wasserstein type.)
On the other hand, one still has non-uniform continuous dependence on the initial parameters: if lies in some reasonable function space
, then the map
is continuous in the
topology, even if it is not uniformly continuous with respect to
. (More succinctly: translation is a continuous but not uniformly continuous operation in most function spaces.) The reason for this is that we already have established this continuity in the case when
is so smooth that an additional derivative of
lies in
; and such smooth functions tend to be dense in the original space
, so the general case can then be established by a limiting argument, approximating a general function in
by a smoother function. We then see that the non-uniformity ultimately comes from the fact that a given function in
may be arbitrarily rough (or concentrated at an arbitrarily fine scale), and so the ability to approximate such a function by a smooth one can be arbitrarily poor.
In many quasilinear PDE, one often encounters qualitatively similar phenomena. Namely, one often has local well-posedness in sufficiently smooth function spaces (so that if the initial data lies in
, then for short times one has existence, uniqueness, and continuous dependence on the data in the
topology), but Lipschitz or uniform continuity in the
topology is usually false. However, if the data (and solution) is known to be in a high-regularity function space
, one can often recover Lipschitz or uniform continuity in a lower-regularity topology.
Because the continuous dependence on the data in quasilinear equations is necessarily non-uniform, the arguments needed to establish this dependence can be remarkably delicate. As with the simple example of the transport equation, the key is to approximate a rough solution by a smooth solution first, by smoothing out the data (this is the non-uniform step, as it depends on the physical scale (or wavelength) that the data features are located). But for quasilinear equations, keeping the rough and smooth solution together can require a little juggling of function space norms, in particular playing the low-frequency nature of the smooth solution against the high-frequency nature of the residual between the rough and smooth solutions.
Below the fold I will illustrate this phenomenon with one of the simplest quasilinear equations, namely the initial value problem for the inviscid Burgers’ equation
is no longer a parameter, but now depends (and is, in this case, actually equal to) the solution. To avoid technicalities we will work only with the classical function spaces
of
times continuously differentiable functions, though one can certainly work with other spaces (such as Sobolev spaces) by exploiting the Sobolev embedding theorem. To avoid having to distinguish continuity from uniform continuity, we shall work in a compact domain by assuming periodicity in space, thus for instance restricting
to the unit circle
.
This discussion is inspired by this survey article of Nikolay Tzvetkov, which further explores the distinction between well-posedness and ill-posedness in both semilinear and quasilinear settings.
A celebrated theorem of Gromov reads:
Theorem 1 Every finitely generated group of polynomial growth is virtually nilpotent.
The original proof of Gromov’s theorem was quite non-elementary, using an infinitary limit and exploiting the work surrounding the solution to Hilbert’s fifth problem. More recently, Kleiner provided a proof which was more elementary (based in large part on an earlier paper of Colding and Minicozzi), though still not entirely so, relying in part on (a weak form of the) Tits alternative and also on an ultrafilter argument of Korevaar-Schoen and Mok. I discuss Kleiner’s argument more in this previous blog post.
Recently, Yehuda Shalom and I established a quantitative version of Gromov’s theorem by making every component of Kleiner’s argument finitary. Technically, this provides a fully elementary proof of Gromov’s theorem (we do use one infinitary limit to simplify the argument a little bit, but this is not truly necessary); however, because we were trying to quantify as much of the result as possible, the argument became quite lengthy.
In this note I want to record a short version of the argument of Yehuda and myself which is not quantitative, but gives a self-contained and largely elementary proof of Gromov’s theorem. The argument is not too far from the Kleiner argument, but has a number of simplifications at various places. In a number of places, there was a choice to take between a short argument that was “inefficient” in the sense that it did not lead to a good quantitative bound, and a lengthier argument which led to better quantitative bounds. I have opted for the former in all such cases.
Yehuda and I plan to write a short paper containing this argument as well as some additional material, but due to some interest in this particular proof, we are detailing it here on this blog in advance of our paper.
Note: this post will assume familiarity with the basic terminology of group theory, and will move somewhat quickly through the technical details.
Ben Green, and I have just uploaded to the arXiv a short (six-page) paper “Yet another proof of Szemeredi’s theorem“, submitted to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we put in print a folklore observation, namely that the inverse conjecture for the Gowers norm, together with the density increment argument, easily implies Szemerédi’s famous theorem on arithmetic progressions. This is unsurprising, given that Gowers’ proof of Szemerédi’s theorem proceeds through a weaker version of the inverse conjecture and a density increment argument, and also given that it is possible to derive Szemerédi’s theorem from knowledge of the characteristic factor for multiple recurrence (the ergodic theory analogue of the inverse conjecture, first established by Host and Kra), as was done by Bergelson, Leibman, and Lesigne (and also implicitly in the earlier paper of Bergelson, Host, and Kra); but to our knowledge the exact derivation of Szemerédi’s theorem from the inverse conjecture was not in the literature. Ordinarily this type of folklore might be considered too trifling (and too well known among experts in the field) to publish; but we felt that the venue of the Szemerédi birthday conference provided a natural venue for this particular observation.
The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real and any integer
, that the expression
gets arbitrarily close to an integer) that given a (polynomial) nilsequence
, one can subdivide any long arithmetic progression (such as
) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)
Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…
Ben Green, and I have just uploaded to the arXiv our paper “An arithmetic regularity lemma, an associated counting lemma, and applications“, submitted (a little behind schedule) to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we describe the general-degree version of the arithmetic regularity lemma, which can be viewed as the counterpart of the Szemerédi regularity lemma, in which the object being regularised is a function on a discrete interval
rather than a graph, and the type of patterns one wishes to count are additive patterns (such as arithmetic progressions
) rather than subgraphs. Very roughly speaking, this regularity lemma asserts that all such functions can be decomposed as a degree
nilsequence (or more precisely, a variant of a nilsequence that we call an virtual irrational nilsequence), plus a small error, plus a third error which is extremely tiny in the Gowers uniformity norm
. In principle, at least, the latter two errors can be readily discarded in applications, so that the regularity lemma reduces many questions in additive combinatorics to questions concerning (virtual irrational) nilsequences. To work with these nilsequences, we also establish a arithmetic counting lemma that gives an integral formula for counting additive patterns weighted by such nilsequences.
The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree case
, this result is essentially due to Green. It is powered by the inverse conjecture for the Gowers norms, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the
case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.
The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if has density
, and
, then there exist
shifts
for which
contains at least
arithmetic progressions of length
of spacing
. (The
case of this conjecture was established earlier by Green; the
case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over
indeed matches the conjectured value predicted in their first paper.
In all three applications, the scheme of proof can be described as follows:
- Apply the arithmetic regularity lemma, and decompose a relevant function
into three pieces,
.
- The uniform part
is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
- The contribution of the (virtual, irrational) nilsequence
can be controlled using the arithmetic counting lemma.
- Finally, one needs to check that the contribution of the small error
does not overwhelm the main term
. This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for
that is so sufficiently “equidistributed” that it is not impacted by the small error.
To illustrate the last point, let us give the following example. Suppose we have a set of some positive density (say
) and we have managed to prove that
contains a reasonable number of arithmetic progressions of length
(say), e.g. it contains at least
such progressions. Now we perturb
by deleting a small number, say
, elements from
to create a new set
. Can we still conclude that the new set
contains any arithmetic progressions of length
?
Unfortunately, the answer could be no; conceivably, all of the arithmetic progressions in
could be wiped out by the
elements removed from
, since each such element of
could be associated with up to
(or even
) arithmetic progressions in
.
But suppose we knew that the arithmetic progressions in
were equidistributed, in the sense that each element in
belonged to the same number of such arithmetic progressions, namely
. Then each element deleted from
only removes at most
progressions, and so one can safely remove
elements from
and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only approximately equidistributed, in the sense that the number of progressions that a given element
belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.
A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.
One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.


Recent Comments