The random expression (2) is somewhat reminiscent of a moment of a random matrix, and one can start computing it analogously. For instance, if one has a decomposition such as (1), then (2) expands out as a sum

The random fluctuations of this sum can be treated by a routine second moment estimate, and the main task is to show that the expected value becomes asymptotically independent of .If all the were distinct then one could use independence to factor the expectation to get

which is a relatively straightforward expression to calculate (particularly in the model (1), where all the expectations here in fact vanish). The main difficulty is that there are a number of configurations in (3) in which various of the collide with each other, preventing one from easily factoring the expression. A typical problematic contribution for instance would be a sum of the form This is an example of what we call aIn principle all of these limits are computable, but the combinatorics is remarkably complicated, and while there is certainly some algebraic structure to the calculations, it does not seem to be easily describable in terms of an existing framework (e.g., that of free probability).

]]>

If is an absolutely integrable function, its Fourier coefficients are defined by the formula

If is smooth, then the Fourier coefficients are absolutely summable, and we have the Fourier inversion formula where the series here is uniformly convergent. In particular, if we define the partial summation operators then converges uniformly to when is smooth.What if is not smooth, but merely lies in an class for some ? The Fourier coefficients remain well-defined, as do the partial summation operators . The question of convergence in norm is relatively easy to settle:

Exercise 1

- (i) If and , show that converges in norm to . (
Hint:first use the boundedness of the Hilbert transform to show that is bounded in uniformly in .)- (ii) If or , show that there exists such that the sequence is unbounded in (so in particular it certainly does not converge in norm to . (
Hint:first show that is not bounded in uniformly in , then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)

Note from Hölder’s inequality that contains for all , so Carleson’s theorem covers the case of Hunt’s theorem. We remark that the precise threshold near between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function or the summation method by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a *frequency modulation symmetry* in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

** — 1. Equivalent forms of almost everywhere convergence of Fourier series — **

A standard technique to prove almost everywhere convergence results is by first establishing a weak-type estimate of an associated maximal function. For instance, the Lebesgue differentiation theorem is usually established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse this implication in certain cases by exploiting a symmetry of the problem. Here is the principle specialised to the application of pointwise convergence of Fourier series, and also combined with a transference principle of Kenig and Tomas:

Proposition 3 (Equivalent forms of almost everywhere convergence)Let . Then the following statements are equivalent:

- (i) For every , one has for almost every .
- (ii) There does not exist such that for almost every .
- (iii) One has the maximal inequality for all smooth , where the weak norm is defined as and denotes the Lebesgue measure of a set (which in this setting is a subset of the unit circle).
- (iv) One has the maximal inequality for all smooth , where denotes the partial Fourier series
- (v) One has the maximal inequality for all , where denotes the Fourier multiplier operator

Among other things, this proposition equates the qualitative property (i) of almost everywhere convergence to the quantitative property (iii) of a maximal inequality. This equivalence (first observed by Calderón) is similar in spirit to the uniform boundedness principle (see e.g. Corollary 1 of this previous blog post). The restriction is needed for just one implication (from (ii) to (iii)) in the arguments below, and arises due to the use of Khintchine’s inequality at one point. The equivalence of (iv) and (v) is part of a more general principle of *transference* that allows one to pass back and forth between periodic domains such as with non-periodic domains such as (or, on the Fourier side, between discrete domains and continuous domains ) if the estimates in question enjoy suitable scaling symmetries. We will use the formulation (v), as it enjoys the most symmetries.

*Proof:* We first show that (iii) implies (i). If (1) holds for all smooth , then certainly for all finite one has

Clearly (i) implies (ii). Now we assume that (iii) fails and use this to show that (ii) fails as well. From the failure of (iii) and monotone convergence, for any one can find , a measurable subset of , a finite , and such that

and such that In particular, has positive measure. By homogeneity we may normalise . At this stage, nothing prevents the measure of from being much smaller than ; but we can exploit translation invariance to increase the measure of to be comparable to as follows. Let be the integer part of . We claim that there exist translations of whose union has measure comparable to : This is easiest to establish by the probabilistic method (which in this context we might call theNow consider the randomised linear combination

of translates of , where are random Bernoulli signs. From Khintchine’s inequality and the hypothesis we have hence by construction of and (4)Now we study the behaviour of when . Since is a convolution operator, it commutes with translations, and hence

for each . On the other hand, from (3) we have and hence there exists such that In particular, the square function is at least . Meanwhile, from Khintchine’s inequality and (7) we have for all . Applying the Paley-Zygmund inequality (setting , for instance) we conclude that (for suitable choices of implied constants), so in particular Integrating in using (5), and applying the Fubini-Tonelli theorem, we conclude that hence by (6) one has In particular, there exists a deterministic choice of signs (and hence of ) for which On the other hand, the left-hand side is at most . We conclude that for every , we can find a smooth function with and a finite , as well as a set of measure , such that for all .Applying this fact iteratively (each time choosing to be sufficiently large depending on all previous choices), we can construct a sequence of smooth functions , finite , and sets for such that

- (a) for all .
- (b) for all .
- (c) One has for all and (note that the right-hand side is finite since the are smooth for ).
- (d) for all (note that the left-hand side is bounded by ).

Now we assume (iv) and work to establish (v). The idea here is to use a rescaling argument, viewing as the limit as of the large circle (in physical space) or the fine lattice (in frequency space).

By limiting arguments we may assume that is compactly supported on some interval . Let be a large scaling parameter, and consider the periodic function defined by

For large enough, this function is smooth and supported on the interval , with norm The Fourier coefficients of is given as so that Applying (iv), we see that for any , we have Rescaling by , we conclude that We can let range over the reals rather than the integers as this does not affect the constraint . Rescaling by , we see that for any compact intervals , we have By uniform Riemann integrability and the rapid decrease of uniformly for , . We conclude that By monotone convergence we may replace with , and we then obtain (v).Finally, we assume (v) and establish (iv). By a limiting argument it suffices to establish (iv) for trigonometric polynomials , that is to say periodic functions whose Fourier coefficients are supported in for some natural number . Let be a non-zero Schwartz function with supported in , and for a given scaling parameter let denote the Schwartz function

For sufficiently large one easily checks that The Fourier transform of can be calculated as hence (for large enough) and thus From (v) we conclude that for any we have For large enough, the left-hand side is for some depending on . Dividing by and replacing by , we obtain the claim (iv).

Exercise 4For , let denote the Fejér summation operators

- (i) For any , establish the pointwise bound where is the Hardy-Littlewood maximal function
- (ii) Show that for , one has for almost all .

Exercise 5 (Pointwise convergence of Fourier integrals)Let be such that the conclusion of Theorem 3(v) holds. Show that for any , one has for almost all , where is defined for Schwartz functions by the formula and then extended to by density.

Exercise 6Let . Suppose that is such that one has the restriction estimate for all Schwartz functions , where denotes the surface measure on the sphere . Conclude that for all Schwartz functions . (This observation is due to Bourgain.) In particular, by Marcinkiewicz interpolation, implies for all . (Hint:adapt some parts of the argument used to get from (iii) to (i) in Proposition 3, using rotation invariance as a substitute for translation invariance. (But the translational symmetry of the restriction problem – more precisely, the ability to translate a function in physical space without changing the absolute value of its Fourier transform – will also be useful.))

We are now ready to establish Kolmogorov’s theorem (Theorem 2(i)); our arguments are loosely based on the original construction of Kolmogorov (though he was not in possession at the time of the Stein maximal principle). In view of the equivalence between (ii) and (v) in Theorem 3, it suffices to show that the maximal operator

fails to be of weak-type on Schwartz functions. Recalling that the Hilbert transform is also a Fourier multiplier operator some routine calculations then show that for any Schwartz function . By the triangle inequality, it then suffices to show that the maximal operator fails to be of weak type on Schwartz functions.To motivate the construction, note from a naive application of the triangle inequality that

If the function was absolutely integrable, then by Young’s inequality we would conclude that the maximal operator was strong type , and hence also weak type . Thus any counterexample must somehow exploit the logarithmic divergence of the integral of . However, there are two potential sources of cancellation that could ameliorate this divergence: the sign of the Hilbert kernel , and the phase . But because of the supremum in , we can select the frequency parameter as we please, as long as it depends only on and not on . The idea is then to choose (and the support of ) to remove both sources of cancellation as much as possible.We turn to the details. Let be a large natural number, and then select widely separated frequency scales

In order to assist with removing cancellation in the phases later, we will require these scales to be integers. The precise choice of scales is not too important as long as they are widely separated and integer valued, but for sake of concreteness one could for instance set . Let be a bump function of total mass supported on , and let be the Schwartz function thus is an approximation (in a weak sense) to the sum of Dirac masses , with the frequency scale of the approximation to increasing rapidly in . We easily compute the norm of :Now we estimate for in the interval for some natural number ; note the set of all such has measure . In this range we will test the maximal operator at the frequency cutoff :

As is supported in , we see (for large enough) that avoids the support of and we can replace the principal value integral with the ordinary integral. Substituting (9), we conclude that As is an integer, the phase is equal to . We also cancel out the phase as being independent of , thus For , we exploit the oscillatory nature of the phase through an integration by parts, leading to the bound (one could even gain a factor of here if desired, but we will not need it). Summing, we have For , we instead exploit the near-constant nature of the phase by writing and similarly to conclude that Summing and combining with (11), we conclude (from the rapidly increasing nature of the ) that and thus (for large) Comparing this with (10) we contradict the conclusion of Theorem 3(iv), giving the claim.

Remark 7In 1926, Kolmogorov refined his construction to obtain a function whose Fourier sums diverged everywhere (not just almost everywhere).

Exercise 8 (Radamacher-Menshov theorem)

- (i) Let be some square-integrable functions on a probability space , with a power of two. By performing a suitable Whitney type decomposition (similar to that used in Section 3 of Notes 1), establish the pointwise bound where for each , ranges over dyadic intervals of the form with . If furthermore the are orthogonal to each other, establish the maximal inequality
- (ii) If is a trigonometric polynomial with at most non-zero coefficients for some , use part (i) to establish the bound
- (iii) If lies in the Sobolev space for some , use (ii) to show that for almost every .

** — 2. Carleson’s theorem — **

We now begin the proof of Carleson’s theorem (Theorem 2(ii)), loosely following the arguments of Lacey and Thiele (we briefly comment on other approaches at the end of these notes). In view of Proposition 3, it suffices to establish the weak-type bound

for Schwartz functions . Because of the supremum, the expression depends sublinearly on rather than linearly; however there is a trick to reduce matters to considering linear estimates. By selecting, for each , to be a frequency which attains (or nearly attains) the supremal value of , it suffices to establish the linearised estimate uniformly for all measurable functions , where is the operator One can think of this operator as the (Kohn-Nirenberg) quantisation of the rough symbol . Unfortunately this symbol is far too rough for us to be able to use pseudodifferential operator tools from the previous set of notes. Nevertheless, the “time-frequency analysis” mindset of trying to efficiently decompose phase space into rectangles consistent with the uncertainty principle will remain very useful.The next step is to dualise the weak norm to linearise the dependence on even further:

Exercise 9Let , let be a -finite measure space, let be a measurable function, and let . Show that the following claims are equivalent (up to changes in the implied constants in the asymptotic notation):

- (i) One has .
- (ii) For every subset of of finite measure, the function is absolutely integrable on , and

In view of this exercise, we see that it suffices to obtain the bound

for all Schwartz , all sets of finite measure, and all measurable functions . Actually only the restriction of to is relevant here, so one can view as a function just on if desired. The operator can be viewed as the quantisation of the (very rough) symbol , that is to say the indicator function of the region lying underneath the graph of :
A notable feature of the estimate (12) is that it enjoys *three* different symmetries (or near-symmetries), each of which is “non-compact” in the sense that it is parameterised by a parameter taking values in a non-compact space such as or :

- (i) (Translation symmetry) For any spatial shift , both sides of (12) remain unchanged if we replace by , the set by the translate , and the function by .
- (ii) (Dilation symmetry) For any scaling factor , both sides of (12) become multiplied by the same scaling factor if we replace by , by the dilate , and the function by .
- (iii) (Modulation symmetry) For any frequency shift , both sides of (12) remain (almost) unchanged if we replace by , do not modify the set , and replace the function by . (Technically the left-hand side changes because of an additional factor of , but this factor can be handled for instance by generalising the indicator function cutoff to a subindicator function cutoff that has the pointwise bound ; we will ignore this very minor issue here.)

Each of these symmetries corresponds to a different symmetry of phase space , namely spatial translation , dilation , and frequency translation respectively. As a general rule of thumb, if one wants to prove a delicate estimate such as (12) that is invariant with respect to one or more non-compact symmetries, then one should use tools that are similarly invariant (or approximately invariant) with respect to these symmetries. Thus for instance Littlewood-Paley theory or Calderón-Zygmund theory would not be suitable tools to use here, as they are only invariant with respect to translation and dilation symmetry but absolutely fail to have any modulation symmetry properties (these theories prescribe a privileged role to the frequency origin, or equivalently they isolate functions of mean zero as playing a particularly important role).

Besides the need to respect the symmetries of the problem, one of the main difficulties in establishing (12) is that the expression , couples together the function with the function in a rather complicated way (via the frequency variable ). We would like to try to decouple this interaction by making and instead interact with simpler objects (such as “wave packets”), rather than being coupled directly to each other. To motivate the decomposition to use, we begin with a heuristic discussion. The first main idea is to temporarily work in the (non-invertible) coordinate system of phase space rather than in order to simplify the constraint to the simple geometric region of a half-plane (this coordinate system is of course a terrible choice for most of the other parts of the argument, but is the right system to use for the frequency decompositions we will now employ). In analogy to the Whitney type decompositions used in Notes 1, one can split

for almost all choices of and (at least if have the same sign), where range over pairs of dyadic intervals that are “close” in the sense that and that and are not adjacent, but their parents are adjacent, and with to the left of . (Here it is convenient to work with half-open dyadic intervals , to avoid issues with overlap.) If one ignores the caveats and blindly substitutes in the decomposition (13), the expression in the left of (12) becomes To decouple further, we will try to decompose into “rank one” operators. More precisely, we manipulate where we use the notation . It will be convenient to try to discretise this integral average. From the uncertainty principle, modifying by should only modify approximately by a phase, so the integral here is roughly constant at spatial scales . So we heuristically have If we now define aWe now make the above heuristic decomposition rigorous. For any dyadic interval , let denote the left child interval, and the right child interval. We fix a bump function supported on normalised to have norm ; henceforth we permit all implied constants in the asymptotic notation to depend on . For each interval let denote the rescaled function

noting that this is a bump function supported on . We will establish the estimate where ranges over all dyadic intervals. We assume (15) for now and see why it implies (12). The left-hand side of (15) is not quite dilation or frequency modulation invariant, but we can fix this by an averaging argument as follows. Applying the modulation invariance, we see for any that since we thus have We temporarily truncate to a finite range of scales, and use the triangle inequality, to obtain for any finite . For fixed , the expression is periodic in with period , with average equal to which we can rewrite as which one can rewrite further (using the change of variables ) as whereHence if we average over all in (say) , we conclude that

and hence on sending to infinity Using dilation symmetry, we also see that for any . Averaging this for with Haar measure , we conclude that But as is a bump function supported in , one has The quantity is a non-zero constant, hence which is (12).It remains to prove (15). As in the heuristic discussion, we approximately decompose the convolution into a sum over tiles. We have

Motivated by this, we define as before aIt remains to establish (17). It is convenient to introduce the sets

so that the target estimate (17) simplifies slightly to As advertised, we have now decoupled the influences of and the influences of (which determine the sets ), as these quantities now only directly interact with the wave packets , rather than with each other. Moreover, in some sense only interacts with the lower half of the tile (as this is where is concentrated), while and only interact with the upper half of the tile.One advantage of this “model” formulation of the problem is that one can naturally build up to the full problem by trying to establish estimates of the form

where is some smaller set of tiles. For instance, if we can prove (19) for all finite collections of tiles, then by monotone convergence we recover the required estimate.The key problem here is that tiles have three degrees of freedom: scale, spatial location, and frequency location, corresponding to the three symmetries of dilation, spatial translation, and frequency modulation of the original estimate (12). But one can warm up by looking at families of tiles that only exhibit two or fewer degrees of freedom, in a way that slowly builds up the various techniques we will need to apply to establish the general case:

**The case of a single tile** We begin with the simplest case of a single tile (so that there are zero degrees of freedom):

**The case of separated tiles of fixed scale** Now we let be a collection of tiles all of a fixed spatial scale (so that (so that we have the two parameters of spatial and frequency location, but not the scale parameter). Among other things, this makes the tiles in essentially disjoint (i.e., disjoint ignoring sets of measure zero). This disjointness manifests itself in two useful ways. Firstly, we claim that we can improve the trivial bound

Now let us see why (24) is true. To motivate the argument, suppose that had no tail outside of , so that one could replace to in (22). Then would have

and as the tiles are all essentially disjoint the claim (24) would then follow from summing in , since each contributes to at most one of the sets . Now we have to deal with the contribution of the tails. We can bound For each , there is at most one dyadic interval of the fixed length such that . Thus in the above sum is fixed, and only can vary; from (22) we then see that , giving (24).Now we prove (25). The intuition here is that the essential disjointness of the tiles make the approximately orthogonal, so that (25) should be a variant of Bessel’s inequality. We exploit this approximate orthogonality by a method, which we perform here explicitly. By duality we have

for some coefficients with , so by Cauchy-Schwarz it suffices to show that The left-hand side expands as From the Fourier support of we see that the inner product vanishes unless the intervals overlap which by the equal sizes of force . In this case we can use (22) to bound the inner product by and then a routine application of Schur’s test gives (26). This establishes (25), giving (19) in the case of tiles of equal dimensions.
**The case of a regular -tree**

Now we attack some cases where the tiles can vary in scale. In phase space, a key geometric difficulty now arises from the fact that tiles may start partially overlapping each other, in contrast to the previous case in which the essential disjointness of the tile set was crucial in establishing the key estimates (24), (25). However, because we took care to restrict the intervals of the tiles to be dyadic, there are only a limited number of ways in which two tiles can overlap. Given two rectangles and , we define the relation if and ; this is clearly a partial order on rectangles. The key observation is as follows: if two tiles overlap, then either or . Similarly if are replaced by their upper tiles or by their lower tiles . Note that if are tiles with , then one of or holds (and the only way both inequalities can hold simultaneously is if ).

As was first observed by Fefferman, a key configuration of tiles that needs to be understood for these sorts of problems is that of a *tree*.

Definition 10Let be a tile. Atree with topis a collection of tiles with the property that for all . (For minor technical reasons it is convenient to not require the top to actually lie in the tree , though this is often the case.) We write for the spatial support of the tree, and for the frequency support of the tree top. If we in fact have for all , we say that is a -tree; similarly if for all , we say that is a -tree. (Thus every tree can be partitioned into a -tree and a -tree with the same top as the original tree.)

The tiles in a tree can vary in scale and in spatial location, but once these two parameters are given, the frequency location is fixed, so a tree can again be viewed as a “two-parameter” subfamily of the three-parameter family of tiles.

We now prove (19) in the case when is a -tree , thus for all . Here, the factors will all “collide” with each other and there will be no orthogonality to exploit here; on the other hand, there will be a lot of “disjointness” in the that can be exploited instead.

To illustrate the key ideas (and to help motivate the arguments for the general case) we will also make the following “regularity” hypotheses: there exists two quantities (which we will refer to as the *energy* and *mass* of the tree respectively) for which we have the upper bounds

We also assume that we have the reverse bounds for the tree top:

and It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either or .
We will use (27), (28), (29) to establish the *tree estimate*

Note from (30) and Cauchy-Schwarz that

and from (31) and Cauchy-Schwarz one similarly has and so (32) recovers the desired estimate (19).It remains to establish the tree estimate (32). It will be convenient to use the tree to partition the real line into dyadic intervals that are naturally “adapted to” the geometry of the tree (or more precisely to the spatial intervals of the tree) in a certain way (in a manner reminiscent of a Whitney decomposition).

Exercise 11 (Whitney-type decomposition associated to a tree)Let be a non-empty tree. Show that there exists a family of dyadic intervals with the following properties:(Hint: one can choose to be the collection of all dyadic intervals whose dilate does not contain any , and which is maximal with respect to set inclusion.)

- (i) The intervals in form a partition of (up to sets of measure zero).
- (ii) For each and any with , we have .
- (iii) For each , there exists with and .

We can of course assume that the tree is non-empty, since (32) is trivial for empty sets of tiles. We apply the partition from Exercise 11. By the triangle inequality, we can bound the left hand side of (32) by

which by (27), (22) may be bounded by We first dispose of the narrow tiles in which . By Exercise 11(ii) this forces . From (28) we have (say). For each fixed spatial scale , the intervals in the tree are all essentially disjoint, so a routine calculation then shows (say), so that which from Exercise 11(ii) implies that the contribution of the case to (32) is acceptable.Now we consider the wide tiles in which . From Exercise 11(ii) this case is only possible if and . Thus the are now restricted to an interval of length , and it will suffice to establish the local estimate

for each . Note that for each fixed spatial scale , there is at most one choice of frequency interval with and , thus for fixed the set is independent of . We may then sum in for each such scale to conclude Now we make the crucial observation that in a -tree , the intervals are all essentially disjoint, hence the are disjoint as well. As these sets are also contained in , we conclude that From Exercise 11(iii) and (29) (choosing a tile with spatial scale and within of , and with for the tile provided by Exercise 11(iii)) we have giving the claim.
**The case of a regular -tree**

We now complement the previous case by establishing (19) for (certain types of) -trees . The situation is now reversed: there is a lot of “collision” in the , but on the other hand there is now some “orthogonality” in the that can be exploited.

As before we will assume some regularity on the -tree , namely that there exist for which one has the upper bounds

for all (note this is slightly stronger than (27)), as well as the bound (29) for any tile with for some . We complement this with the matching lower bounds and (31).As before we will focus on establishing the tree estimate (32). From (31) and Cauchy-Schwarz as before we have

As we now have a -tree, the tiles become disjoint (up to null sets), and we can obtain an almost orthogonality estimate:

Exercise 12 (Almost orthogonality)For any -tree , show that for all complex numbers , and use this to deduce the Bessel-type inequality

From this exercise and (34) we see that

and so the desired bound (19) will follow from the tree estimate (32).In this case it will be convenient to linearise the sum to remove the absolute value signs; more precisely, to show (32) it suffices to show that

for any complex numbers of magnitude . Again we may assume that the tree is non-empty, and use the partition from Exercise 11, to split the left-hand side as The contribution of the narrow tiles can be disposed of as before without any additional difficulty, so we focus on estimating the contribution of the wide tiles. As before, in order for this sum to be non-empty has to be contained in an neighbourhood of .The main difficulty here is the dependence of on . We rewrite

so that the above expression can be written as Now for a key geometric observation: the intervals are nested (and decrease when increases), so the condition is equivalent to a condition of the form for some scale depending on . Thus the above sum can be written as One can bound the integrand here by a “maximal Calderón-Zygmund operator” which is basically a sup over truncations of the “(modulated) pseudodifferential operator” The point of this formulation is that the integrand can now be expressed as a sort of “Littlewood-Paley projection” of the function to the region of frequency space corresponding to those intervals with :

Exercise 13Establish the pointwise estimate for all where ranges over all intervals (not necessarily dyadic) containing .

From (29) and Exercise 11(iii) as before we have

and so we can bound the expression (35) by which one can bound in terms of the Hardy-Littlewood maximal function of , followed by Cauchy-Schwarz and the Hardy-Littlewood inequality, and finally Exercise 12, as On the other hand, from (33) we have for every . By grouping the tiles in according to their maximal elements (which necessarily have essentially disjoint spatial intervals) and applying the above inequality to each such group and summing, we conclude that and the tree estimate (32) follows.
**The general case**

We are now ready to handle the general case of an arbitrary finite collection of tiles. Motivated by the previous discussion, we define two quantities:

Definition 14 (Energy and mass)For any non-empty finite collection of tiles, we define theenergyto be the quantity where ranges over all -trees in , and themassto be the quantity where is the set (thus for instance ). By convention, we declare the empty set of tiles to have energy and mass equal to zero.

Note here that the definition of mass has been modified slightly from previous arguments, in that we now use instead of . However, this turns out to be an acceptable modification, in the sense that we still continue to have the analogue of (32):

Exercise 15 (Tree estimate)If is a tree, show that

Since has an norm of , we also have the trivial bound

for any finite collection of tiles .The strategy is now to try to partition an arbitrary family of tiles into collections of disjoint trees (or “forests”, if you will) whose energy , mass , and spatial scale are all under control, apply Exercise 15 to each tree, and sum. To do this we rely on two key selection results, which are vaguely reminiscent of the Calderón-Zygmund decomposition:

Proposition 16 (Energy selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

Proposition 17 (Mass selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

(In these propositions, “disjoint” means that any given tile belongs to at most one of the trees in ; but the tiles in one tree are allowed to overlap the tiles in another tree.)

Let us assume these two propositions for now and see how these (together with Exercise 15) establishes the required estimate (19) for an arbitrary collection of tiles. We may assume without loss of generality that and are non-zero. Rearranging the above two propositions slightly, we see that if is a finite collection of tiles such that

for some integer then after applying Proposition 16 followed by Proposition 17, we can partition into a disjoint collection of trees with together with a remainder with Note that any finite collection of tiles will obey (38) for some sufficiently large and negative . Starting with this and then iterating indefinitely, and discarding any empty families, we can therefore partition any finite collection of tiles as where are collections of trees (empty for all but finitely many ) such that and (39) holds, and is a residual collection of tiles with We can then bound the left-hand side of (19) by From Exercise 15 applied to individual tiles and (41) we see that the second term in this expression vanishes. For the first term, we use Exercise 15, (40), (36) to bound this sum by which by (39) is bounded by which sums to as required.It remains to establish the energy and mass selection lemmas. We begin with the mass selection claim, Proposition 17. Let denote the set of all tiles with for some and such that

Let denote the set of tiles in that are maximal with respect to the tile partial order. (Note that the left-hand side of (42) is bounded by , so there is an upper bound to the spatial scales of the tiles involved here.) Then every tile in is either less than or equal to a tile in , or is such that for all . Thus if we let be the collection of tiles of the second form, and let be the collection of trees with tree top associated to each (selected greedily, and in arbitrary order, subject of course to the requirement that no tile belongs to more than one tree), we obtain the required partition with and it remains to establish the bound This will be a (rather heavily disguised) variant of the Hardy-Littlewood maximal inequality. By construction, the tree tops are essentially disjoint, and one has for all such tree tops. To motivate the argument, suppose for sake of discussion that we had the stronger estimate By the essential disjointness of the , the sets are also essentially disjoint subsets of , hence and the claim (43) would then follow. Now we do not quite have (44); but from the pigeonhole principle we see that for each there is a natural number such that (say), where denotes the interval with the same center as but times the length (this is not quite a dyadic interval). We now restrict attention to those associated to a fixed choice of . Let denote the corresponding dilated tiles, then we have for each with .Unfortunately, the are no longer disjoint. However, by the greedy algorithm (repeatedly choosing maximal tiles (in the tile ordering)), we can find a collection such that

- (i) All the dilated tree tops are essentially disjoint.
- (ii) For every with , there is such that intersects and .

From property (i) and (45) we have

On the other hand, from property (ii) we see that the sum of all the for all with associated to a single is . Putting the two statements together we see that and on summing in we obtain the required claim (43).Finally, we prove the energy selection claim, Proposition 16. The basic idea is to extract all the high-energy trees from in such a way that the -tree component of those trees are sufficiently “disjoint” from each other that a useful Bessel inequality, generalising Exercise 12, may be deployed. Implementing this strategy correctly turns out however to be slightly delicate. We perform the following iterative algorithm to generate a partition

as well as a companion collection of -trees as follows.

- Step 1. Initialise and .
- Step 2. If then STOP. Otherwise, go on to Step 3.
- Step 3. Since we now have , contains a -tree for which
Among all such , choose one for which the midpoint of the frequency is
*minimal*. (The reason for this rather strange choice will be made clearer shortly.) - Step 4. Add to , add the larger tree (with the same top as ) to , then remove from . We also remove the adjacent trees and from and also place them into . Now return to Step 2.

This procedure terminates in finite time to give a partition (46) with , and with the trees coming in triplets all associated to a -tree in with the same spatial scale as , with all the -trees disjoint and obeying the estimates

(both the upper and lower bounds will be important for this argument). It will then suffice to show that by (48), it then suffices to show the Bessel type inequalityNow we make a crucial observation: not only are the trees in disjoint (in the sense that no tile belongs to two of these trees), but the lower tiles are also essentially disjoint. Indeed we claim an even stronger disjointness property: if , are such that , then is not only disjoint from the larger dyadic interval , but is in fact disjoint from the even larger interval . To see this, suppose for contradiction that and . There are three possibilities to rule out:

- is equal to . This can be ruled out because any two lower frequency intervals associated to a -tree are either equal or disjoint.
- was selected after was. To rule this out, observe that contains the parent of , and hence , , or . Thus, when was selected, should have been placed with one of the three trees associated to and would therefore not have been available for inclusion into , a contradiction.
- was selected before was. If this case held, then the midpoint of would have to be greater than or equal to that of , otherwise would not have a minimal midpoint at the time of its selection. But is contained in , which is contained in , which lies below , which contains , which contains the midpoint of ; thus the midpoint of lies strictly below that of , a contradiction.

If the were perfectly orthogonal to each other, this disjointness would be more than enough to establish (49). Unfortunately we only have imperfect orthogonality, and we have to work a little harder. As usual, we turn to a type argument. We can write the left-hand side of (49) as

so by Cauchy-Schwarz it suffices to show that By the triangle inequality, the left-hand side may be bounded by As has Fourier support in , we see that vanishes unless and overlap. By symmetry it suffices to consider the cases and .First let us consider the contribution of . Using Young’s inequality and symmetry, we may bound this contribution by

A direct calculation using (22) reveals that so the contribution of this case is at most as desired.Now we deal with the case when , which by the preceding discussion implies that and lies outside of . Here we use (37) to bound

andand then we can bound this contribution by

Direct calculation using (22) reveals that (say), and also so we obtain a bound of which is acceptable by (48). This finally finishes the proof of Proposition 16, which in turn completes the proof of Carleson’s theorem.

Remark 18The Lacey-Thiele proof of Carleson’s theorem given above relies on a decomposition of a tileset in a way that controls both energy and mass. The original proof of Carleson dispenses with mass (or with the function ), and focuses on controlling maximal operators that (in our notation) are basically of the form To control such functions, one iterates a decomposition similar to Proposition 16 to partition into trees with good energy control, and establishes pointwise control of the contribution of each tree outside of an exceptional set. See Section 4 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis. The proof of Fefferman takes the opposite tack, dispensing with energy and focusing on bounding the operator norm of the linearised operator Roughly speaking, the strategy is to iterate a version of Proposition 16 for partition into “forests” of disjoint trees, though in Fefferman’s argument some additional work is invested into obtaining even better disjointness properties on these forests than is given here. See Section 5 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis.

A modification of the above arguments used to establish the weak estimate can also establish restricted weak-type estimates for any :

Exercise 19For any sets of finite measure, and any measurable function , show that for any . (Hint:repeat the previous analysis with , but supplement it with an additional energy bound coming from a suitably localised version of Exercise 12.)

The bound (51) is also true for , yielding Hunt’s theorem, but this requires some additional arguments of Calderón-Zygmund type, involving the removal of an exceptional set defined using the Hardy-Littlewood maximal function:

Exercise 20 (Hunt’s theorem)Let be of finite non-zero measure, and let be a measurable function. Let be the exceptional set for a large absolute constant ; note from the Hardy-Littlewood inequality that if is large enough.

- (i) If be a finite collection of tiles with for all , show that (
Hint:By using (22) and the disjointness of the when is fixed, first establish the estimate whenever is a natural number and is an interval with and .)- (ii) If be a finite collection of tiles with for all , show that . (For a given tree , one can introduce the dyadic intervals as in Exercise 11, then perform a Calderón-Zygmund type decomposition to , splitting it into a “good” function bounded pointwise by , plus “bad functions” that are supported on the intervals and have mean zero. See this paper of Grafakos, Terwilleger, and myself for details.)
- (iii) For any finite collection of tiles for all
- (iv) Show that (51) holds for all , and conclude Theorem 2(iii).

Remark 21The methods of time-frequency analysis given here can handle several other operators that, like the Carleson operator, exhibit scaling, translation, and frequency modulation symmetries. One model example is the bilinear Hilbert transform for . The methods in this set of notes were used by Lacey and Thiele to establish the estimates for with (these estimates have since been strengthened and extended in a number of ways). We only give the briefest of sketches here. Much as how Carleson’s theorem can be reduced to a bound (19), the above estimates can be reduced to the estimation of a model sum where is a certain collection of triples of tiles with common spatial interval and frequency intervals varying along a certain one-parameter family for each fixed choice of spatial interval. One then uses a variant of Proposition 16 to partition into “-trees”, “-trees”, and “-trees”, the contribution of each of which can be controlled by the energies of on such trees, times the length of the spatial support of the tree, in analogy with Exercise 15. See for instance the text of Muscalu and Schlag for more discussion and further results.

Remark 22The concepts of mass and energy can be abstracted into a framework of spaces associated to outer measures (as opposed to the classical setup of spaces associated to countably additive measures), in which the mass and energy selection propositions can be viewed as consequences of an abstract Carleson embedding theorem, and the calculations establishing estimates such as (19) from such propositions and a tree estimate can be viewed as consequences of an “outer Hölder inequality”. See this paper of Do and Thiele for details.

]]>

In previous notes we have often performed various localisations in either physical space or Fourier space , for instance in order to take advantage of the uncertainty principle. One can formalise these operations in terms of the functional calculus of two basic operations on Schwartz functions , the *position operator* defined by

and the *momentum operator* , defined by

(The terminology comes from quantum mechanics, where it is customary to also insert a small constant on the right-hand side of (1) in accordance with de Broglie’s law. Such a normalisation is also used in several branches of mathematics, most notably semiclassical analysis and microlocal analysis, where it becomes profitable to consider the semiclassical limit , but we will not emphasise this perspective here.) The momentum operator can be viewed as the counterpart to the position operator, but in frequency space instead of physical space, since we have the standard identity

for any and . We observe that both operators are formally self-adjoint in the sense that

for all , where we use the Hermitian inner product

Clearly, for any polynomial of one real variable (with complex coefficients), the operator is given by the spatial multiplier operator

and similarly the operator is given by the Fourier multiplier operator

Inspired by this, if is any smooth function that obeys the derivative bounds

for all and (that is to say, all derivatives of grow at most polynomially), then we can define the spatial multiplier operator by the formula

one can easily verify from several applications of the Leibniz rule that maps Schwartz functions to Schwartz functions. We refer to as the *symbol* of this spatial multiplier operator. In a similar fashion, we define the Fourier multiplier operator associated to the symbol by the formula

For instance, any constant coefficient linear differential operators can be written in this notation as

however there are many Fourier multiplier operators that are not of this form, such as fractional derivative operators for non-integer values of , which is a Fourier multiplier operator with symbol . It is also very common to use spatial cutoffs and Fourier cutoffs for various bump functions to localise functions in either space or frequency; we have seen several examples of such cutoffs in action in previous notes (often in the higher dimensional setting ).

We observe that the maps and are ring homomorphisms, thus for instance

and

for any obeying the derivative bounds (2); also is formally adjoint to in the sense that

for , and similarly for and . One can interpret these facts as part of the functional calculus of the operators , which can be interpreted as densely defined self-adjoint operators on . However, in this set of notes we will not develop the spectral theory necessary in order to fully set out this functional calculus rigorously.

In the field of PDE and ODE, it is also very common to study *variable coefficient* linear differential operators

where the are now functions of the spatial variable obeying the derivative bounds (2). A simple example is the quantum harmonic oscillator Hamiltonian . One can rewrite this operator in our notation as

and so it is natural to interpret this operator as a combination of both the position operator and the momentum operator , where the *symbol* this operator is the function

Indeed, from the Fourier inversion formula

for any we have

and hence on multiplying by and summing we have

Inspired by this, we can introduce the *Kohn-Nirenberg quantisation* by defining the operator by the formula

whenever and is any smooth function obeying the derivative bounds

for all and (note carefully that the exponent in on the right-hand side is required to be uniform in ). This quantisation clearly generalises both the spatial multiplier operators and the Fourier multiplier operators defined earlier, which correspond to the cases when the symbol is a function of only or only respectively. Thus we have combined the physical space and the frequency space into a single domain, known as phase space . The term “time-frequency analysis” encompasses analysis based on decompositions and other manipulations of phase space, in much the same way that “Fourier analysis” encompasses analysis based on decompositions and other manipulations of frequency space. We remark that the Kohn-Nirenberg quantization is not the only choice of quantization one could use; see Remark 19 below.

In principle, the quantisations are potentially very useful for such tasks as inverting variable coefficient linear operators, or to localize a function simultaneously in physical and Fourier space. However, a fundamental difficulty arises: map from symbols to operators is now no longer a ring homomorphism, in particular

in general. Fundamentally, this is due to the fact that pointwise multiplication of symbols is a commutative operation, whereas the composition of operators such as and does not necessarily commute. This lack of commutativity can be measured by introducing the *commutator*

of two operators , and noting from the product rule that

(In the language of Lie groups and Lie algebras, this tells us that are (up to complex constants) the standard Lie algebra generators of the Heisenberg group.) From a quantum mechanical perspective, this lack of commutativity is the root cause of the uncertainty principle that prevents one from simultaneously localizing in both position and momentum past a certain point. Here is one basic way of formalising this principle:

Exercise 2 (Heisenberg uncertainty principle)For any and , show that(

Hint:evaluate the expression in two different ways and apply the Cauchy-Schwarz inequality.) Informally, this exercise asserts that the spatial uncertainty and the frequency uncertainty of a function obey the Heisenberg uncertainty relation .

Nevertheless, one still has the correspondence principle, which asserts that in certain regimes (which, with our choice of normalisations, corresponds to the high-frequency regime), quantum mechanics continues to behave like a commutative theory, and one can sometimes proceed as if the operators (and the various operators constructed from them) commute up to “lower order” errors. This can be formalised using the *pseudodifferential calculus*, which we give below the fold, in which we restrict the symbol to certain “symbol classes” of various orders (which then restricts to be pseudodifferential operators of various orders), and obtains approximate identities such as

where the error between the left and right-hand sides is of “lower order” and can in fact enjoys a useful asymptotic expansion. As a first approximation to this calculus, one can think of functions as having some sort of “phase space portrait” which somehow combines the physical space representation with its Fourier representation , and pseudodifferential operators behave approximately like “phase space multiplier operators” in this representation in the sense that

Unfortunately the uncertainty principle (or the non-commutativity of and ) prevents us from making these approximations perfectly precise, and it is not always clear how to even define a phase space portrait of a function precisely (although there are certain popular candidates for such a portrait, such as the FBI transform (also known as the Gabor transform in signal processing literature), or the Wigner quasiprobability distribution, each of which have some advantages and disadvantages). Nevertheless even if the concept of a phase space portrait is somewhat fuzzy, it is of great conceptual benefit both within mathematics and outside of it. For instance, the musical score one assigns a piece of music can be viewed as a phase space portrait of the sound waves generated by that music.

To complement the pseudodifferential calculus we have the basic *Calderón-Vaillancourt theorem*, which asserts that pseudodifferential operators of order zero are Calderón-Zygmund operators and thus bounded on for . The standard proof of this theorem is a classic application of one of the basic techniques in harmonic analysis, namely the exploitation of *almost orthogonality*; the proof we will give here will achieve this through the elegant device of the Cotlar-Stein lemma.

Pseudodifferential operators (especially when generalised to higher dimensions ) are a fundamental tool in the theory of linear PDE, as well as related fields such as semiclassical analysis, microlocal analysis, and geometric quantisation. There is an even wider class of operators that is also of interest, namely the Fourier integral operators, which roughly speaking not only approximately multiply the phase space portrait of a function by some multiplier , but also move the portrait around by a canonical transformation. However, the development of theory of these operators is beyond the scope of these notes; see for instance the texts of Hormander or Eskin.

This set of notes is only the briefest introduction to the theory of pseudodifferential operators. Many texts are available that cover the theory in more detail, for instance this text of Taylor.

** — 1. Pseudodifferential operators — **

The Kohn-Nirenberg quantisation was defined above for any symbol obeying the very loose estimates (6). To obtain a clean theory it is convenient to focus attention to more restrictive classes of symbols. There are many such classes one can consider, but we shall only work with the classical symbol classes:

Definition 3 (Classical symbol class)Let . A function is said to be a (classical)symbol of orderif it is smooth and one has the derivative boundsfor all and . (Informally: “behaves like” , with each derivative in the frequency variable gaining an additional decay factor of , but with each derivative in the spatial variable exhibiting no gain.) The collection of all symbols of order will be denoted . If is a symbol of order , the operator is referred to as a pseudodifferential operator of order .

As a major motivating example, any variable coefficient linear differential operator (3) of order will be a pseudodifferential operator of order , so long as the coefficients obey the bounds

for , , and . (This would then exclude operators with unbounded coefficients, such as the harmonic oscillator, but can handle localised versions of these operators, and in any event there are other symbol classes in the literature that can be used to handle certain types of differential operators with unbounded coefficients.) Also, a fractional differential operator such as will be a pseudodifferential operator of order for any . We refer the reader to Stein’s text for a discussion of more exotic symbol classes than the one given here.

The space of pseudodifferential operators of order form a vector space that is non-decreasing in : any pseudodifferential operator of order is automatically also of order for any . (Thus, strictly speaking, it would be more appropriate to say that is a pseudodifferential operator of order *at most* if , but we will not adopt this convention for brevity.) The intuition to keep in mind is that a pseudodifferential operator of order behaves like a variable coefficient linear differential operator of order , with the obvious caveat that in the latter case is restricted to be a natural number, whereas in the former can be any real number. This intuition will be supported by the various components of the *pseudodifferential calculus* that we shall develop later, for instance we will show that the composition of a pseudodifferential operator of order and a pseudodifferential operator of order is a pseudodifferential operator of order .

Before we set out this calculus, though, we give a fundamental estimate, which can be viewed as a variable coefficient version of the Hörmander-Mikhlin multiplier theorem:

Theorem 4 (Calderón-Vallaincourt theorem)Let , and let be a pseudodifferential operator of order . Then one hasfor all . In particular, extends to a bounded linear operator on each space with .

We now begin the proof of this theorem. The first step is a dyadic decomposition of Littlewood-Paley type. Let be a bump function supported on that equals on . Then we can write

where

and

for . From dominated convergence, implies that

pointwise for . Thus by Fatou’s lemma, it will suffice to show that

uniformly in . Observe from Definition 3 and the Leibniz rule that each is supported in the strip and obeys the derivative estimates

From (5) and Fubini’s theorem we can express as an integral operator

for , where the integral kernel is given by the formula

We can obtain several estimates on this kernel. Firstly, from the triangle inequality, (11), and the support property of we have the trivial bound

When , we may integrate by parts repeatedly, gaining factors of at the cost of applying a derivative to for each such factor, and then if one applies the triangle inequality, (11), and support property of as before we conclude that

for any ; by combining the estimates, we conclude that

for all and . Differentiating (13) in or , and repeating the above arguments, we also obtain the estimates

Since the function has an norm of for any , we now see from (12) and Young’s inequality that

Thus each component of is under control (so for instance we may now discard the term); the difficulty is to sum in without losing any -dependent factors. To do this, we first observe from (14), (15) and a routine summation of that the total kernel (which is the integral kernel for ) obeys the pointwise bound

as well as the pointwise derivative bound

These are the usual kernel bounds for one-dimensional Calderón-Zygmund theory. From that theory we conclude that in order to prove the estimate (10), it suffices to establish the case

From (17), we have already established a preliminary bound

for each , but a direct application of the triangle inequality will cost us a -dependent factor, which we cannot afford. To do better, we need some “orthogonality” between the . The intuition here is that each component only interacts with the portion of that corresponds to frequencies of magnitude , and that these regions are somehow “orthogonal” to each other. Informally, this suggests that

where is something like a Littlewood-Paley projection operator to frequencies . If we accepted this heuristic, then we could informally use the Littlewood-Paley inequality (or decoupling theory) to calculate

It is possible to make this approximation (19) more precise and establish (18): see Exercise 7. However, we will take the opportunity to showcase another elegant way to exploit “almost orthogonality”, known as the Cotlar-Stein lemma:

Lemma 5 (Cotlar-Stein lemma)Let be bounded linear maps from one Hilbert space to another . Suppose that the maps obey the operator norm boundsfor all and some , and similarly the maps obey the operator norm bounds

Note that if the had pairwise orthogonal ranges then would vanish whenever , and similarly if the had pairwise orthogonal coranges then the would vanish whenever . Thus the hypotheses of the Cotlar-Stein lemma are indeed some quantitative form of “almost orthogonality” of the .

*Proof:* We use the method (which asserts that for a bounded linear map between Hilbert spaces, the operator norm of or is the square of that of or ). Applying this method to a single operator we have

Taking geometric means we have

then by the triangle inequality we have . This loses a factor of over the trivial bound. We can reduce this loss to by a further application of the method as follows. Writing , we have

and similarly

so on taking geometric means we have .

We now reduce the loss in all the way to by iterating the method (this is an instance of a neat trick in analysis, namely the tensor power trick). For any integer that is a power of two, we see from iterating the method that

(In fact, this identity holds for any natural number , not just powers of two, as can be seen from spectral theory, but powers of two will suffice for the argument here.) We expand out the right-hand side and bound using the triangle inequality by

On the one hand, we can bound the norm by

grouping things slightly differently and using (22) twice, we can also bound this norm by

Taking the geometric mean, we can bound the norm by

Summing in using (20), then in using (21) and so forth until the sum (which is just summed with a loss of ), we conclude that

Sending , we obtain the claim.

Remark 6There is a refinement of the Cotlar-Stein lemma for infinite series of operators obeying the hypotheses of the lemma, in which it is shown that the series actually converges in the strong operator topology (though not necessarily in the operator norm topology); this refinement was first observed by Meyer, and can be found for instance in this note of Comech.

We will shortly establish the bounds

for any . The claim (18) then follows from the Cotlar-Stein lemma (using (17) to dispose of the term).

We shall just show that

when ; the case is treated similarly, as is the treatment of (in fact this latter operator vanishes when , though we will not really need this fact). We have

where

A direct application of (15) and the triangle inequality gives the bounds

(say), which when combined with Young’s inequality does not give the desired gain of . To recover this gain we begin integrating by parts. From (13) we have

Note that obeys similar estimates to but with an additional gain of . Thus the contribution of this term to will be acceptable. The contribution of the other term, after an integration by parts, is

The kernel obeys the same bounds as (15) but with an additional gain of ; similarly from (16) the expression obeys the same bounds as (15) but with an additional loss of . The claim follows. This concludes the proof of the Calderón-Vaillancourt theorem.

Exercise 7With the hypotheses as above, and with a suitable Littlewood-Paley projection to frequencies , establish the operator norm boundsfor all and . Use this to provide an alternate proof of (18) that does not require the Cotlar-Stein lemma.

Now we give a preliminary composition estimate:

Theorem 8 (Preliminary composition)Let be a pseudodifferential operator of some order , and let be a pseudodifferential operator of some order . Then the composition is a pseudodifferential operator of order , thus there exists such that (note from Exercise 1 that is uniquely determined).

*Proof:* We begin with some technical reductions in order to justify some later exchanges of integrals. We can express the symbol as a locally uniform limit of truncated symbols as , where is a bump function equal to near the origin; from the product rule we see that the symbol estimates (8) are obeyed by the uniformly in as long as . If is Schwartz, then so is , and can be verified to converge pointwise to . If one can show that for some pseudodifferential operator of order , with all the required symbol estimates (8) on obeyed uniformly in , then the claim will follow by using the Arzelà-Ascoli theorem to extract a locally uniformly convergent susbequence of the and taking a limit. The upshot of this is that we may assume without loss of generality that the symbol is compactly supported in , so long as our estimates do not depend on the size of this compact support, but only on the constants in the symbol bounds (8) for .

Similarly, we may approximate locally uniformly as the limit of symbols that are compactly supprted in , which makes converge locally uniformly to ; from the compact support of this also shows that converges pointwise to . From the same limiting argument as before, we may thus assume that is compactly supported in , so long as our estimates do not depend on the size of this support, but only on the constants in the symbol bounds (8) for .

For , we have

hence on taking Fourier transforms

hence

and hence by Fubini’s theorem (and the compact support of and the Schwartz nature of ) we have , where

for all and , where the understanding is that the dependence of constants on is only through the symbol bounds (8) for these symbols.

From differentiation under the integral sign and integration by parts we obtain the Leibniz identities

and

From this and an induction on (varying as necessary, noting that if maps to and maps to ) we see that to prove (25) it suffices to do so in the case, thus we now only need to show that

Applying a smooth partition of unity in the variable to , it suffices to verify the claim in one of two cases:

- is supported in the region (so in particular ).
- is supported in the region .

(One can verify that applying the required cutoffs to do not significantly worsen the symbol estimates (8).) In the former case we write the left-hand side as

where

By repeating the proof of (14) we have

so from this and the symbol bound we obtain the claim in this case.

It remains to handle the latter case. Here we integrate by parts repeatedly in the variable to write the left-hand side of (26) as

for any . Then as before we can rewrite this as

where

By taking large enough we will eventually recover the bound

(in fact one can gain arbitrary powers of if desired), and so by repeating the previous arguments we also obtain the claim in this case.

The above proposition shows that if and then . The following exercise gives some refinements to this fact:

Exercise 9 (Composition of pseudodifferential operators)Let and for some .

- (i) Show that . (
Hint:reduce as before to the case where are compactly supported, and use the fundamental theorem of calculus to write , where . Then use the Fourier inversion formula, integration by parts, and arguments similar to those used to prove Theorem 8.- (ii) Show that , where and . (Hint: now apply the fundamental theorem of calculus once more to expand .)
- (iii) Check (i) and (ii) directly in the classical case when and for some smooth obeying the bounds (9) and for . Based on this, for any integer , make a prediction for an approximation to as a polynomial combination of the symbols arbitrary and finitely many of their derivatives which is accurate up to an error in . Then verify this prediction.

Remark 10From Exercise 9 we see that if are pseudodifferential operators of order respectively, then the commutator differs from by a pseudodifferential operator of order , where is the Poisson bracketThis approximate correspondence between the Lie bracket (which plays a fundamental role in the dynamics of quantum mechanics) and the Poisson bracket (which plays a fundamental role in the dynamics of classical mechanics) is one of the mathematical foundations of the correspondence principle relating quantum and classical mechanics, but we will not discuss this topic further here.

There is also a companion result regarding adjoints of pseudodifferential operators:

Exercise 11 (Adjoint of pseudodifferential operator)Let .

- (i) If is compactly supported, show that the function defined by
is also a symbol of order , and that is the adjoint of in the sense that

for all .

- (ii) Show that even if is not compactly supported, there is a unique pseudodifferential operator of order which is the adjoint of in the sense that
for all .

- (iii) Show that is a pseudodifferential operator of order .

Now we give some applications of the above pseudodifferential calculus.

Exercise 12 (Pseudodifferential operators and Sobolev spaces)For any and , define the Sobolev space to be the completion of the Schwartz functions with respect to the norm

- (i) If is a non-negative integer, show that
for any , thus in this case the Sobolev spaces agree (up to constants) with the classical Sobolev spaces (as discussed for instance in this set of notes).

- (ii) If is a pseudodifferential operator of some order , show that
for any , thus extends to a bounded linear map from to . (

Hint:use Theorem 4 and Theorem 8).- (iii) Let be a pseudodifferential operator of some order that obeys the strong ellipticity condition
for all . Establish the Garding inequality

for all and some depending only on . (

Hint:use Exercises 9, 11 to express as for some pseudodifferential operators of orders and respectively.) If , deduce also the variant inequality(possibly with slightly different choices of ).

The behaviour of pseudodifferential operators may be clarified by using a type of phase space transform, which we will call a *Gabor-type transform*.

Exercise 13 (Gabor-type transforms and pseudodifferential operators)Given any function with the normalisation , and any , define theGabor-type transformby the formulathus is the inner product of with the function , which is the “wave packet” formed from function by translating by and then modulating by . (Intuitively, measures the extent to which lives at spatial location and frequency location .) We also define the adjoint map for by the formula

- (i) Show that for any , is a Schwartz function on , thus is a linear map from to . Similarly, show that for , is a Schwartz function on , thus is a linear map from to .
- (ii) Establish the identity for any , and conclude inparticular that
for any , thus extends to a linear isometry from into .

- (iii) For any smooth compactly supported and , establish the identity
where is the (Kohn-Nirenberg) Wigner distribution of , defined by the formula

and is the phase space convolution

Remark 14When is a Gaussian, the transform is essentially the Gabor transform (in signal processing) or the FBI transform (in microlocal analysis), and is also closely related to the Bargmann transform in complex analysis. There are some technical advantages with working with Gaussian choices of , particularly with regards to the treatment of certain lower order terms in the pseudodifferential calculus; see for instance these notes of Tataru.

Note that is a Schwartz function on , and by the Fourier inversion formula it has unit mass: . (One also has the marginal distributions and , so would be a strong candidate for a “phase space probability distribution” for , save for the unfortunate fact that has no reason to be non-negative. But even with oscillation, still behaves like an approximation to the identity, so for slowly varying can be viewed as an approximation to . Thus, Exercise 13(iii) can be intuitively viewed as saying that behaves approximately like a multiplier in phase space:

Another informal way of viewing this assertion is that (for suitable choices of ) the translated and modulated functions can be viewed as approximate eigenfunctions of with eigenvalue . This is for instance consistent with the approximate functional calculus and that one saw in Exercises 9, 11. The exercise below gives another way to view this approximation:

Exercise 15 ( bound)Let be a smooth function obeying the “ bound”for all and . Let and be as in Exercise 13. Show that there is a smooth kernel obeying the bounds

for any , such that

for any . (

Hint:work first in the case when is compactly supported, where one can use Fubini’s theorem to derive an explicit integral expression for , which one can then control by various integrations by parts.) Use this to establish the boundfor any ; note that this gives an alternate proof of (18). (See also these notes of Tataru for further elaboration of this approach to pseudodifferential operators.)

As a sample application of the Gabor transform formalism we give a variant of the Garding inequality from Exercise 12(iii).

Theorem 16 (Sharp Garding inequality)Let be a pseudodifferential operator of order such that for all . Then one hasfor all , where depends only on .

*Proof:* From Exercise 11 we see that is a pseudodifferential operator of order , hence by Exercise 12(ii) we have

Thus we may remove the imaginary part from and assume that is real and non-negative. Applying a smooth partition of unity of Littlewood-Paley type, we can write , where each is also non-negative, supported on the region , and obeys essentially the same symbol estimates as uniformly in . It then suffices to show that

uniformly in .

We now use the Gabor-type transforms from Exercise 13, except that we make dependent on . Specifically we pick a single real even with norm , then define for all . We will approximate by

Observe that

so by the triangle inequality it will suffice to establish the bound

However, it is not difficult (see exercise below) to show that is a symbol of order uniformly in , and the claim now follows from Exercise 12(ii).

Exercise 17Verify the claim that is a symbol of order uniformly in . (Here one will need the fact that is a rescaling by a scaling factor of , which is an even Schwartz function of mean . The even nature of is needed to cancel some linear terms which would otherwise only allow one to obtain symbol bounds of order rather than .)

Remark 18It is possible to improve the error term in the sharp Garding inequality, particularly if one uses the Weyl quantization rather than the Kohn-Nirenberg one (see Remark 19 below); also the non-negativity hypothesis on can be relaxed in a manner consistent with the uncertainty principle; see this deep paper of Fefferman and Phong.

Remark 19Throughout this set of notes we have used the Kohn-Nirenberg quantizationor equivalently (taking to be compactly supported for sake of discussion)

However, this is not the only quantization that one could use. For instance, one could also use the adjoint Kohn-Nirenberg quantization

which one can easily relate to the Kohn-Nirenberg quantization by the identity

In particular, from Exercise 11 we see that if is a symbol of order , then and only differ by pseudodifferential operators of order (and that both quantizations produce the same class of pseudodifferential operators of a given order). The operators appearing earlier can also be viewed as a quantization of (known as the

anti-Wick quantizationof associated to the test function ). But perhaps the most popular quantization used in the literature is the Weyl quantizationwhich in some sense “splits the difference” between the Kohn-Nirenberg and adjoint Kohn-Nirenberg quantizations, being completely symmetric between the input spatial variable and output spatial variable . (Strictly speaking, this formula is only well-defined for say compactly supported symbols ; for more general symbols one can define in the weak sense as the distribution for which

for (it is not difficult to use integration by parts to show that the expression in parentheses is rapidly decreasing in , hence absolutely integrable). In particular there is now no error term in the analogue of Exercise 11:

All of the preceding theory for the Kohn-Nirenberg quantization can be adapted to the Weyl quantization with minor changes (for instance, the definition of the Wigner transform changes slightly, and the operation defined in (24) is replaced with the Moyal product), and as seen in Exercise 20 below, the two quantizations again produce the same classes of pseudodifferential operators, with symbols agreeing up to lower order terms.

Exercise 20 (Kohn-Nirenberg and Weyl quantizations are equivalent up to lower order)Let be a real number.

- (i) If is a symbol of order , show that there exists a symbol of order such that . Furthermore, show that is a symbol of order .
- (ii) If is a symbol of order , show that there exists a symbol of order such that . Furthermore, show that is a symbol of order .

Exercise 21 (Comparison of quantizations)Let be natural numbers, and let be the monomial .

- (i) Show that .
- (ii) Show that .
- (iii) Show that , where ranges over all tuples of operators consisting of copies of and copies of . For instance, if , then
Informally, the Kohn-Nirenberg quantization always applies position operators to the left of momentum operators; the adjoint Kohn-Nirenberg quantization always applies position operators to the right of momentum operators; and the Weyl quantization averages equally over all possible orderings. (Taking formal generating functions, we also see (formally, at least) that the quantization of a plane wave for real numbers is equal to in the Kohn-Nirenberg quantization, in the adjoint Kohn-Nirenberg quantization, and in the Weyl quantization.)

Exercise 22 (Gabor-type transforms and symmetries)Let .

- (i) (Physical translation) If and is the function , show that for all .
- (ii) (Frequency modulation) If and is the function , show that for all .
- (iii) (Dilation) If and is the function , show that for all , where .
- (iv) (Fourier transform) If , show that .
- (v) (Quadratic phase modulation) If and is the function , show that for all , where .
We remark that the group generated by the transformations (i)-(v) is the (Weil representation of the) metaplectic group .

Remark 23Ignoring the changes in the Gabor test function , as well as the various phases appearing on the right-hand side, we conclude from the above exercise that basic transformations on functions seem to correspond to various area-preserving maps of phase space; for instance, the Fourier transform is associated to the rotation , which is consistent in particular with the fact that a fourfold iteration of the Fourier transform yields the identity operator. This is in fact a quite general phenomenon, with something asymptotically resembling such identities available for an important class of operators known as Fourier integral operators (but in higher dimensions one replaces the adjective with “area-preserving” with “symplectomorphism” or “canonical transformation“). However, as stated previously, the systematic development of the theory of Fourier integral operators is beyond the scope of this course.

]]>

Remark 24Virtually all of the above theory extends to higher dimensions, and also to general smooth manifolds as domains. In the latter case, the natural analogue of phase space is the cotangent bundle , and the symplectic geometry of this bundle then plays a fundamental role in the theory (as already hinted at by the appearance of the Poisson bracket in Remark 10. See for instance this text of Folland for more discussion.

similarly, if are a collection of functions in a Lebesgue space that oscillate “independently” of each other, then we expect

We have already seen one instance in which this heuristic can be made precise, namely when the phases of are randomised by a random sign, so that Khintchine’s inequality (Lemma 4 from Notes 1) can be applied. There are other contexts in which a *square function estimate*

or a *reverse square function estimate*

(or both) are known or conjectured to hold. For instance, the useful *Littlewood-Paley inequality* implies (among other things) that for any , we have the reverse square function estimate

whenever the Fourier transforms of the are supported on disjoint annuli , and we also have the matching square function estimate

if there is some separation between the annuli (for instance if the are -separated). We recall the proofs of these facts below the fold. In the case, we of course have Pythagoras’ theorem, which tells us that if the are all orthogonal elements of , then

In particular, this identity holds if the have *disjoint Fourier supports* in the sense that their Fourier transforms are supported on disjoint sets. For , the technique of *bi-orthogonality* can also give square function and reverse square function estimates in some cases, as we shall also see below the fold.

In recent years, it has begun to be realised that in the regime , a variant of reverse square function estimates such as (1) is also useful, namely *decoupling estimates* such as

(actually in practice we often permit small losses such as on the right-hand side). An estimate such as (2) is weaker than (1) when (or equal when ), as can be seen by starting with the triangle inequality

and taking the square root of both side to conclude that

However, the flip side of this weakness is that (2) can be easier to prove. One key reason for this is the ability to *iterate* decoupling estimates such as (2), in a way that does not seem to be possible with reverse square function estimates such as (1). For instance, suppose that one has a decoupling inequality such as (2), and furthermore each can be split further into components for which one has the decoupling inequalities

Then by inserting these bounds back into (2) we see that we have the combined decoupling inequality

This iterative feature of decoupling inequalities means that such inequalities work well with the method of *induction on scales*, that we introduced in the previous set of notes.

In fact, decoupling estimates share many features in common with restriction theorems; in addition to induction on scales, there are several other techniques that first emerged in the restriction theory literature, such as wave packet decompositions, rescaling, and bilinear or multilinear reductions, that turned out to also be well suited to proving decoupling estimates. As with restriction, the *curvature* or *transversality* of the different Fourier supports of the will be crucial in obtaining non-trivial estimates.

Strikingly, in many important model cases, the optimal decoupling inequalities (except possibly for epsilon losses in the exponents) are now known. These estimates have in turn had a number of important applications, such as establishing certain discrete analogues of the restriction conjecture, or the first proof of the main conjecture for Vinogradov mean value theorems in analytic number theory.

These notes only serve as a brief introduction to decoupling. A systematic exploration of this topic can be found in this recent text of Demeter.

** — 1. Square function and reverse square function estimates — **

We begin with a form of the Littlewood-Paley inequalities. Given a region , we say that a tempered distribution on has *Fourier support in * if its distributional Fourier transform is supported in (the closure of) .

Theorem 1 (Littlewood-Paley inequalities)Let , let be distinct integers, let , and for each let be a function with Fourier support in the annulus .

- (i) (Reverse square function inequality) One has
- (ii) (Square function inequality) If the are -separated (thus for any ) then

*Proof:* We begin with (ii). We use a randomisation argument. Let be a bump function supported on the annulus that equals one on , and for each let be the Fourier multiplier defined by

at least for functions in the Schwartz class. Clearly the operator is given by convolution with a (-dependent) Schwartz function, so this multiplier is bounded on every space. Writing , we see from the separation property of the that we have the reproducing formula

Now let be random signs drawn uniformly and independently at random, thus

The operator is a Fourier multiplier with symbol . This symbol obeys the hypotheses of the Hörmander-Miklin multiplier theorem, uniformly in the choice of signs; since we are in the non-endpoint case , we thus have

uniformly in the . Taking power means of this estimate using Khintchine’s inequality (Lemma 4 from Notes 1), we obtain (ii) as desired.

Now we turn to (i). By treating the even and odd cases separately and using the triangle inequality, we may assume without loss of generality that the all have the same parity, so in particular are -separated. (Why are we permitted to use this reduction for part (i) but not for part (ii)?) Now we use the projections from before in a slightly different way, noting that

for any , and hence

Applying the Hörmander-Mikhlin theorem as before, we conclude that

and on taking power means and using Khintchine’s inequality as before we conclude (i).

Exercise 2 (Smooth Littlewood-Paley estimate)Let and , and let be a bump function supported on that equals on . For any integer , let denote the Fourier multiplier, defined on Schwartz functions byand extended to functions by continuity. Show that for any , one has

(in particular, the left-hand side is finite).

We remark that when , the condition that the be -separated can be removed from Theorem 1(ii), by using the Marcinkiewicz multiplier theorem in place of the Hörmander-Mikhlin multiplier theorem. But, perhaps surprisingly, the condition cannot be removed in higher dimensions, as a consequence of Fefferman’s surprising result on the unboundedness of the disk multiplier.

Exercise 3 (Unboundedness of the disc multiplier)Let denote either the disk or the annulus . Let denote the Fourier multiplier defined on Schwartz functions by

- (i) Show that for any collection of half-planes in , and any functions , that
(

Hint:first rescale the set by a large scaling factor , apply the Marcinkiewicz-Zygmund theorem (Exercise 7 from Notes 1), exploit the symmetries of the Fourier transform, then take a limit as .)- (ii) Let be a collection of rectangles for some , and for each , let be a rectangle formed from by translating by a distance in the direction of the long axis of . Use (i) to show that
(Hint: a direct application of (i) will give just one side of this estimate, but then one can use symmetry to obtain the other side.)

- (iii) In Fefferman’s paper, modifying a classic construction of a Besicovitch set, it was shown that for any , there exists a collection of rectangles for some with such that all the rectangles are disjoint, but such that has measure . Assuming this fact, conclude that the multiplier estimate (4) fails unless .
- (iv) Show that Theorem 1(ii) fails when and the requirement that the be -separated is removed.

Exercise 4Let be a bump function supported on that equals one on . For each integer , let be the Fourier multiplier defined for byand also define

- (i) For any , establish the square function estimate
for . (

Hint:interpolate between the cases, and for the latter use Plancherel’s theorem for Fourier series.)- (ii) For any , establish the square function estimate
for . (

Hint:from the boundedness of the Hilbert transform, is bounded in . Combine this with the Marcinkiewicz-Zygmund theorem (Exercise 7 from Notes 1), then use the symmetries of the Fourier transform, part (i), and the identity .)- (iii) For any , establish the reverse square function estimate
for . (

Hint:use duality as in the solution to Exercise 2 in this set of notes, or Exercise 11 in Notes 1, and part (ii).)- (iv) Show that the estimate (ii) fails for , and similarly the estimate (iii) fails for .

Remark 5The inequalities in Exercise 4 have been generalised by replacing the partition with an arbitrary partition of the real line into intervals; see this paper of Rubio de Francia.

If are functions with disjoint Fourier supports, then as mentioned in the introduction, we have from Pythagoras’ theorem that

We have the following variants of this claim:

Lemma 6 ( and reverse square function estimates)Let have Fourier transforms supported on the sets respectively.

- (i) (Almost orthogonality) If the sets have overlap at most (i.e., every lies in at most of the ) for some , then
- (ii) (Almost bi-orthogonality) If the sets with have overlap at most for some , then

*Proof:* For (i), observe from Plancherel’s theorem that

By hypothesis, for each frequency at most of the are non-zero, thus by Cauchy-Schwarz we have the pointwise estimate

and hence by Fubini’s theorem

The claim then follows by a further application of Plancherel’s theorem and Fubini’s theorem. For (ii), we observe that

and

Since has Fourier support in , the claim (ii) now follows from (i).

Remark 7By using in place of , one can also establish a variant of Lemma 6(ii) in which the sum set is replaced by the difference set . It is also clear how to extend the lemma to other even exponent Lebesgue spaces such as ; see for instance this recent paper of Gressman, Guo, Pierce, Roos, and Yung. However, we will not use these variants here.

We can use this lemma to establish the following reverse square function estimate for the circle:

Exercise 8 (Square function estimate for circle and parabola)Let , let be a -separated subset of the unit circle , and for each , let have Fourier support in the rectangle

- (i) Use Lemma 6(ii) to establish the reverse square function estimate
- (ii) If the elements of are -separated for a sufficiently large absolute constant , establish the matching square function estimate
- (iii) Obtain analogous claims to (i), (ii) in which for some -separated subset of , where is the graphing function , and to each one uses the parallelogram
in place of .

- (iv) (Optional, as it was added after this exercise was first assigned as homework) Show that in (i) one cannot replace the norms on both sides by for any given . (
Hint:use a Knapp type example for each and ensure that there is enough constructive interference in near the origin.) On the other hand, using Exercise 4 show that the norm in (ii) can be replaced by an norm for any .

For a more sophisticated estimate along these lines, using sectors of the plane rather than rectangles near the unit circle, see this paper of Cordóba. An analogous reverse square function estimate is also conjectured in higher dimensions (with replaced by the endpoint restriction exponent ), but this remains open, and in fact is at least as hard as the restriction and Kakeya conjectures; see this paper of Carbery.

** — 2. Decoupling estimates — **

We now turn to decoupling estimates. We begin with a general definition.

Definition 9 (Decoupling constant)Let be a finite collection of non-empty open subsets of for some (we permit repetitions, so may be a multi-set rather than a set), and let . We define thedecoupling constantto be the smallest constant for which one has the inequality

We have the trivial upper and lower bounds

with the lower bound arising from restricting to the case when all but one of the vanish, and the upper bound following from the triangle inequality and Cauchy-Schwarz. In the literature, decoupling inequalities are also considered with the summation of the norms replaced by other summations (for instance, the original decoupling inequality of Wolff used norms) but we will focus only on decoupling estimates in this post. In the literature it is common to restrict attention to the case when the sets are disjoint, but for minor technical reasons we will not impose this extra condition in our definition.

Exercise 10 (Elementary properties of decoupling constants)Let and .

- (i) (Monotonicity) Show that
whenever are non-empty open subsets of with for .

- (ii) (Triangle inequality) Show that
for any finite non-empty collections of open non-empty subsets of .

- (iii) (Affine invariance) Show that
whenever are open non-empty and is an invertible affine transformation.

- (iv) (Interpolation) Suppose that for some and , and suppose also that is a non-empty collection of open non-empty subsets of for which one has the projection bounds
for all , , and , where the Fourier multiplier is defined by

Show that

- (v) (Multiplicativity) Suppose that is a family of open non-empty subsets of , with each containing further open non-empty subsets for . Show that
- (vi) (Adding dimensions) Suppose that is a family of disjoint open non-empty subsets of and that . Show that for any , one has
where the right-hand side is a decoupling constant in .

The most useful decoupling inequalites in practice turn out to be those where the decoupling constant is close to the lower bound of , for instance if one has the sub-polynomial bounds

for every . We informally say that the collection of sets *exhibits decoupling in * when this is the case.

For , Lemma 6 (and (3)) already gives some decoupling estimates: one has

if the sets have an overlap of at most , and similarly

when the sets , have an overlap of at most .

For , it is not possible to exhibit decoupling in the limit :

Exercise 11If is a collection of non-empty open subsets of , show thatfor any . (

Hint:select the to be concentrated in widely separated large balls.)

Henceforth we now focus on the regime . By (8), decoupling is easily obtained if the regions are of bounded overlap. For larger than , bounded overlap is insufficient by itself; the arrangement of the regions must also exhibit some “curvature”, as the following example shows.

Exercise 12If , and , show that(

Hint:for the upper bound, use a variant of Exercise 24(ii) from Notes 1, or adapt the interpolation arguemnt used to establish that exercise.)

Now we establish a significantly more non-trivial decoupling theorem:

Theorem 13 (Decoupling for the parabola)Let , let for some -separated subset of , where , and to each let be the parallelogram (5). Thenfor any .

This result was first established by Bourgain and Demeter; our arguments here will loosely follow an argument of Li, that is based in turn on the efficient congruencing methods of Wooley, as recounted for instance in this exposition of Pierce.

We first explain the significance of the exponent in Theorem 13. Let be a maximal -separated subset for some small , so that has cardinality . For each , choose so that is a non-negative bump function (not identically zero) adapted to the parallelogram , which is comparable to a rectangle. From the Fourier inversion formula, will then have magnitude on a dual rectangle of dimensions comparable to , and is rapidly decreasing away from that rectangle, so we have

for all and . In particular

On the other hand, we have for if is a sufficiently small absolute constant, hence

Comparing this with (6), we conclude that

so Theorem 13 cannot hold if the exponent is replaced by any larger exponent. On the other direction, by using Exercise 10(iv) and the trivial decoupling from (8), we see that we also have decoupling in for any . (Note from the boundedness of the Hilbert transform that a Fourier projection to any polygon of boundedly many sides will be bounded in for any with norm .) Note that reverse square function estimates in Exercise 8 only give decoupling in the smaller range ; the version of Lemma 6 is not strong enough to extend the decoupling estimates to larger ranges because the triple sums have too much overlap.

For any , let denote the supremum of the decoupling constants over all -separated subsets of . From (7) we have the trivial bound

for any .

We first make a minor observation on the stability of that is not absolutely essential for the arguments, but is convenient for cleaning up the notation slightly (otherwise we would have to replace various scales that appear in later arguments by comparable scales ).

*Proof:* Without loss of generality we may assume that . We first show that . We need to show that

whenever is a -separated subset of the . By partitioning into pieces and using Exercise 10(ii) we may assume without loss of generality that is in fact -separated. In particular

The claim now follows from Exercise 10(i) and the inclusion

Conversely, we need to show that

whenever is -separated, or equivalently that

when (we can extend from to all of by a limiting argument). From elementary geometry we see that for each we can find a subset of of cardinality , such that the parallelograms with an integer, , and a sufficiently small absolute constant, cover . In particular, using Fourier projections to polygons with sides, one can split

where each has Fourier support in and

Now the collection can be partitioned into subcollections, each of which is -separated. From this and Exercise 10(ii), (iii) we see that

and thus

Applying (10), (11) we obtain the claim.

More importantly, we can use the symmetries of the parabola to control decoupling constants for parallelograms in a set of diameter in terms of a coarser scale decoupling constant :

Proposition 15 (Parabolic rescaling)Let , and let be a -separated subset of an interval of length . Then

*Proof:* We can assume that for a small absolute constant , since when the claim follows from Lemma 14. Write . Applying the Galilean transform

(which preserves the parabola, and maps parallelograms to ) and Exercise 10(iii), we may normalise , so .

Now let be the parabolic rescaling map

Observe that maps to for any . From Exercise 10(iii) again, we can write the left-hand side of (12) as

since is -separated, the claim then follows.

The multiplicativity property in Exercise 16 suggests that an induction on scales approach could be fruitful to establish (9). Interestingly, it does not seem possible to induct directly on ; all the known proofs of this decoupling estimate proceed by introducing some auxiliary variant of that looks more complicated (in particular, involving additional scale parameters than just the base scale ), but which obey some inequalities somewhat reminiscent of the one in Exercise 16 for which an induction on scale argument can be profitably executed. It is yet not well understood exactly what choices of auxiliary quantity work best, but we will use the following choice of Li of a certain “asymmetric bilinear” variant of the decoupling constant:

Definition 17 (Bilinear decoupling constant)Let . Define to be the best constant for which one has the estimatewhenever are -separated subsets of intervals of length respectively with , and for each , has Fourier support in , and similarly for each , has Fourier support in .

The scale is present for technical reasons and the reader may wish to think of it as essentially being comparable to . Rather than inducting in , we shall mostly keep fixed and primarily induct instead on . As we shall see later, the asymmetric splitting of the sixth power exponent as is in order to exploit orthogonality in the first factor.

From Hölder’s inequality, the left-hand side of (13) is bounded by

from which we conclude the bound

When are at their maximal size we can use these bilinear decoupling constants to recover control on the decoupling constants , thanks to parabolic rescaling:

*Proof:* Let be a -separated subset of , and for each let be Fourier supported in . We may normalise . It will then suffice to show that

We partition into disjoint components , each of which is supported in a subinterval of of length , with the family of intervals having bounded overlap, so in particular has cardinality . Then for any , we of course have

From the pigeonhole principle, this implies at least one of the following statements needs to hold for each given :

- (i) (Narrow case) There exists such that
- (ii) (Broad case) There exist distinct intervals with such that

(The reason for this is as follows. Write and , then . Let be the number of intervals , then , hence . If there are only intervals for which , then by the pigeonhole principle we have for one of these and we are in the narrow case (i); otherwise, if there are sufficiently many for which , one can find two such with , and we are in the broad case (ii).) This implies the pointwise bound

(We remark that more advanced versions of this “narrow-broad decomposition” in higher dimensions, taking into account more of the geometry of the various frequencies that arise in such sums, are useful in both restriction and decoupling theory; see this paper of Guth for more discussion.) From (13) we have

while from Proposition 15 we have

and hence

Combining all these estimates, we obtain the claim.

In practice the term here will be negligible as long as is just slightly smaller than (e.g. for some small ). Thus, the above bilinear reduction is asserting that up to powers of (which will be an acceptable loss in practice), the quantity is basically comparable to .

If we immediately apply insert (14) into the above lemma, we obtain a useless inequality due to the loss of in the main term on the right-hand side. To get an improved estimate, we will need a recursive inequality that allows one to slowly gain additional powers of at the cost of decreasing the size of factors (but as long as is much larger than , we will have enough “room” to iterate this inequality repeatedly). The key tool for doing this (and the main reason why we make the rather odd choice of splitting the exponents as ) is

*Proof:* It will suffice to show that

whenever are -separated subsets of intervals of length respectively with , and have Fourier support on respectvely for , and we have the normalisation

We can partition as , where is a collection of intervals of length that have bounded overlap, and is a -separated subset of . We can then rewrite the left-hand side of (16) as

where

and

From (13) we have

so it will suffice to prove the almost orthogonality estimate

By Lemma 6(i), it suffices to show that the Fourier supports of have overlap .

Applying a Galilean transformation, we may normalise the interval to be centered at the origin, thus , and is now at a distance from the origin. (Strictly speaking this may push out to now lie in rather than , but this will not make a significant impact to the arguments.) In particular, all the rectangles , , now lie in a rectangle of the form , and hence and have Fourier support in such a rectangle also (after enlarging the implied constants in the notation appropriately). Meanwhile, if is centered at , then (since the map has Lipschitz constant when and ) the parallelogram is supported in the strip for any , hence will also be supported in such a strip. Since , is supported in a similar strip (with slightly different implied constants in the notation). Thus, if and have overlapping Fourier supports for , then , hence (since ) . Since the intervals have length and bounded overlap, we thus see that each has at most intervals for which and have overlapping Fourier supports, and the claim follows.

The final ingredient needed is a simple application of Hölder’s inequality to allow one to (partially) swap and :

Now we have enough inequalities to establish the claim (9). Let be the least exponent for which we have the bound

for all and ; equivalently, we have

Another equivalent formulation is that is the least exponent for which we have the bound

as where denotes a quantity that goes to zero as . Clearly ; our task is to show that .

Suppose for contradiction that . We will establish the bound

as for some , which will give the desired contradiction.

Let for some small exponent (independent of , but depending on ) to be chosen later. From Proposition 18 and (17) we have

Since , the second term on the right-hand side is already of the desired form; it remains to get a sufficiently good bound on the first term. Note that a direct application of (14), (17) bounds this term by ; we need to improve this bound by a large multiple of to conclude. To obtain this improvement we will repeatedly use Proposition 19 and Exercise 20. Firstly, from Proposition 19 we have

if is small enough. To control the right-hand side, we more generally consider expressions of the form

for various ; this quantity is well-defined if is small enough depending on . From Exercise 20 and (17) we have

and then by Proposition 19

if is small enough depending on . We rearrange this as

The crucial fact here is that we gain a small power of on the right-hand side when is large. Iterating this inequality times, we see that

for any given , if is small enough depending on , and denotes a quantity that goes to zero in the limit holding fixed. Now we can afford to apply (14), (17) and conclude that

which when inserted back into (20), (19) gives

If we then choose large enough depending on , and small enough depending on , we obtain the desired improved bound (18).

Remark 21An alternate arrangement of the above argument is as follows. For any exponent , let denote the claim thatwhenever , with sufficiently small depending on , and is sent to zero holding fixed. The bounds (14), (17) give the claim . On the other hand, the bound (21) shows that implies for any given . Thus if , we can establish for arbitrarily large , and for large enough we can insert the bound (22) (with sufficiently large depending on ) into (19), (20) to obtain the required claim (18). See this blog post for a further elaboration of this approach, which allows one to systematically determine the optimal exponents one can conclude from a system of inequalities of the type one sees in Proposition 19 or Exercise 20 (it boils down to computing the Perron-Frobenius eigenvalue of certain matrices).

Exercise 22By carefully unpacking the above iterative arguments, establish a bound of the formfor all sufficiently small . (This bound was first established by Li.)

Exercise 23 (Localised decoupling)Let , and let be a family of boundedly overlapping intervals in of length . For each , let be an integrable function supported on , and let denote the extension operatorFor any ball of radius , use Theorem 13 to establish the local decoupling inequality

for any , where is the weight function

The decoupling theorem for the parabola has been extended in a number of directions. Bourgain and Demeter obtained the analogous decoupling theorem for the paraboloid:

Theorem 24 (Decoupling for the paraboloid)Let , let , let for some -separated subset of , where is the map , and to each let be the diskThen one has

for any .

Clearly Theorem 13 is the case of Theorem 24.

Exercise 25Show that the exponent in Theorem 24 cannot be replaced by any larger exponent.

We will not prove Theorem 24 here; the proof in Bourgain-Demeter shares some features in common with the one given above (for instance, it focuses on a -linear formulation of the decoupling problem, though not one that corresponds precisely to the bilinear formulation given above), but also involves some additional ingredients, such as the wave packet decomposition and the multilinear restriction theorem from Notes 1.

Somewhat analogously to how the multilinear Kakeya conjecture could be used in Notes 1 to establish the multilinear restriction conjecture (up to some epsilon losses) by an induction on scales argument, the decoupling theorem for the paraboloid can be used to establish decoupling theorems for other surfaces, such as the sphere:

Exercise 26 (Decoupling for the sphere)Let , let , and let be a -separated subset of the sphere . To each , let be the diskAssuming Theorem 24, establish the bound

for any . (

Hint:if one lets denote the supremum over all expressions of the form of the left-hand side of (23), use Exercise 10 and Theorem 24 to establish a bound of the form , taking advantage of the fact that a sphere resembles a paraboloid at small scales. This argument can also be found in the above-mentioned paper of Bourgain and Demeter.)

An induction on scales argument (somewhat similar to the one used to establish the multilinear Kakeya estimate in Notes 1) can similarly be used to establish decoupling theorems for the cone

from the decoupling theorem for the parabola (Theorem 13). It will be convenient to rewrite the equation for the cone as , then perform a linear change of variables to work with the tilted cone

which can be viewed as a projective version of the parabola .

Exercise 27 (Decoupling for the cone)For , let denote the supremum of the decoupling constantswhere ranges over -separated subsets of , and denotes the sector

More generally, if , let denote the supremum of the decoupling constants

where ranges over -separated subsets of , and denotes the shortened sector

- (i) For any , show that .
- (ii) For any , show that for any . (Hint: use Theorem 13 and various parts of Exercise 10, exploiting the geometric fact that thin slices of the tilted cone resemble the Cartesian product of a parabola and a short interval.)
- (iii) For any , show that . (
Hint:adapt the argument used to establish Exercise 16, taking advantage of the invariance of the tilted light cone under projective parabolic rescaling and projective Galilean transformations ; these maps can also be viewed as tilted (conformal) Lorentz transformations ).- (iv) Show that for any and .
- (v) State and prove a generalisation of (iv) to higher dimensions, using Theorem 24 in place of Theorem 13.
This argument can also be found in the above-mentioned paper of Bourgain and Demeter.

A separate generalization of Theorem 13, to the *moment curve*

was obtained by Bourgain, Demeter, and Guth:

Theorem 28 (Decoupling for the moment curve)Let , let , and let be a -separated subset of . For each , let denote the regionwhere is the map

Then

for any .

Exercise 29Show that the exponent in Theorem 28 cannot be replaced by any higher exponent.

It is not difficult to use Exercise 10 to deduce Theorem 13 from the case of Theorem 28 (the only issue being that the regions are not quite the same the parallelograms appearing in Theorem 13).

The original proof of Theorem 28 by Bourgain-Demeter-Guth was rather intricate, using for instance a version of the multilinear Kakeya estimate from Notes 1. A shorter proof, similar to the one used to prove Theorem 13 in these notes, was recently given by Guo, Li, Yung, and Zorin-Kranich, adapting the “nested efficient congruencing” method of Wooley, which we will not discuss here, save to say that this method can be viewed as a -adic counterpart to decoupling techniques. See also this paper of Wooley for an alternate approach to (a slightly specialised version of) Theorem 28.

Perhaps the most striking application of Theorem 28 is the following conjecture of Vinogradov:

Exercise 30 (Main conjecture for the Vinogradov mean value theorem)Let . For any and any , let denote the quantitywhere .

- (i) If is a natural number, show that is equal to the number of tuples of natural numbers between and obeying the system of equations
for .

- (ii) Using Theorem 28, establish the bound
for all . (

Hint:set and , and apply the decoupling inequality to functions that are adapted to a small ball around .)- (iii) More generally, establish the bound
for any and . Show that this bound is best possible up to the implied constant and the loss of factors.

Remark 31Estimates of the form (24) are known asmean value theorems, and were first established by Vinogradov in 1937 in the case when was sufficiently large (and by Hua when was sufficiently small). These estimates in turn had several applications in analytic number theory, most notably the Waring problem and in establishing zero-free regions for the Riemann zeta function; see these previous lecture notes for more discussion. The ranges of for which (24) was established was improved over the years, with much recent progress by Wooley using his method of efficient congruencing; see this survey of Pierce for a detailed history. In particular, these methods can supply an alternate proof of (24); see this paper of Wooley.

Exercise 32 (Discrete restriction)Let and , and let be a -separated subset of either the unit sphere or the paraboloid . Using Theorem 24 and Exercise 26, show that for any radius and any complex numbers , one has the discrete restriction estimateExplain why the exponent here cannot be replaced by any larger exponent, and also explain why the exponent in the condition cannot be lowered.

For further applications of decoupling estimates, such as restriction and Strichartz estimates on tori, and application to combinatorial incidence geometry, see the text of Demeter.

[These exercises will be moved to a more appropriate location at the end of the course, but are placed here for now so as not to affect numbering of existing exercises.]

Exercise 33Show that the inequality in (8) is actually an equality, if is the maximal overlap of the .

]]>

Exercise 34Show that whenever . (Hint: despite superficial similarity, this is not related to Lemma 14. Instead, adapt the parabolic rescaling argument used to establish Proposition 15.)

My own mathematical areas of expertise are somewhat far from Conway’s; I have played for instance with finite simple groups on occasion, but have not studied his work on moonshine and the monster group.Â But I have certainly encountered his results every so often in surprising contexts; most recently, when working on the Collatz conjecture, I looked into Conway’s wonderfully preposterous FRACTRAN language, which can encode any Turing machine as an iteration of a Collatz-type map, showing in particular that there are generalisations of the Collatz conjecture that are undecidable in axiomatic frameworks such as ZFC.Â [EDIT: also, my belief that the Navier-Stokes equations admit solutions that blow up in finite time is also highly influenced by the ability of Conway’s game of life to generate self-replicating “von Neumann machines“.]

I first met John as an incoming graduate student in Princeton in 1992; indeed, a talk he gave, on “Extreme proofs” (proofs that are in some sense “extreme points” in the “convex hull” of all proofs of a given result), may well have been the first research-level talk I ever attended, and one that set a high standard for all the subsequent talks I went to, with Conway’s ability to tease out deep and interesting mathematics from seemingly frivolous questions making a particular impact on me.Â (Some version of this talk eventually became this paper of Conway and Shipman many years later.)

Conway was fond of hanging out in the Princeton graduate lounge at the time of my studies there, often tinkering with some game or device, and often enlisting any nearby graduate students to assist him with some experiment or other.Â I have a vague memory of being drafted into holding various lengths of cloth with several other students in order to compute some element of a braid group; on another occasion he challenged me to a board game he recently invented (now known as “Phutball“) with Elwyn Berlekamp and Richard Guy (who, by sad coincidence, both also passed away in the last 12 months).Â I still remember being repeatedly obliterated in that game, which was a healthy and needed lesson in humility for me (and several of my fellow graduate students) at the time.Â I also recall Conway spending several weeks trying to construct a strange periscope-type device to try to help him visualize four-dimensional objects by giving his eyes vertical parallax in addition to the usual horizontal parallax, although he later told me that the only thing the device made him experience was a headache.

About ten years ago we ran into each other at some large mathematics conference, and lacking any other plans, we had a pleasant dinner together at the conference hotel.Â We talked a little bit of math, but mostly the conversation was philosophical.Â I regrettably do not remember precisely what we discussed, but it was very refreshing and stimulating to have an extremely frank and heartfelt interaction with someone with Conway’s level of insight and intellectual clarity.

Conway was arguably an extreme point in the convex hull of all mathematicians.Â He will very much be missed.

]]>UPDATE: Here are some other lists of mathematical seminars online:

- Online seminars (curated by Ao Sun and Mingchen Xia at MIT)
- Algebraic Combinatorics Online Seminars (maybe using the same data set as the preceding link?)
- Online mathematics seminars (curated by Dan Isaksen at Wayne State University)
- Math seminars (run by Edgar Costa and David Roe at MIT)

Perhaps further links of this type could be added in the comments.Â It would perhaps make sense to somehow unify these lists into a single one that can be updated through crowdsourcing.

EDIT: See also IPAM’s advice page on running virtual seminars.

]]>We work in a Euclidean space . Recall that is the space of -power integrable functions , quotiented out by almost everywhere equivalence, with the usual modifications when . If then the Fourier transform will be defined in this course by the formula

From the dominated convergence theorem we see that is a continuous function; from the Riemann-Lebesgue lemma we see that it goes to zero at infinity. Thus lies in the space of continuous functions that go to zero at infinity, which is a subspace of . Indeed, from the triangle inequality it is obvious that

If , then Plancherel’s theorem tells us that we have the identity

Because of this, there is a unique way to extend the Fourier transform from to , in such a way that it becomes a unitary map from to itself. By abuse of notation we continue to denote this extension of the Fourier transform by . Strictly speaking, this extension is no longer defined in a pointwise sense by the formula (1) (indeed, the integral on the RHS ceases to be absolutely integrable once leaves ; we will return to the (surprisingly difficult) question of whether pointwise convergence continues to hold (at least in an almost everywhere sense) later in this course, when we discuss Carleson’s theorem. On the other hand, the formula (1) remains valid in the sense of distributions, and in practice most of the identities and inequalities one can show about the Fourier transform of “nice” functions (e.g., functions in , or in the Schwartz class , or test function class ) can be extended to functions in “rough” function spaces such as by standard limiting arguments.

By (2), (3), and the Riesz-Thorin interpolation theorem, we also obtain the Hausdorff-Young inequality

for all and , where is the dual exponent to , defined by the usual formula . (One can improve this inequality by a constant factor, with the optimal constant worked out by Beckner, but the focus in these notes will not be on optimal constants.) As a consequence, the Fourier transform can also be uniquely extended as a continuous linear map from . (The situation with is much worse; see below the fold.)

The *restriction problem* asks, for a given exponent and a subset of , whether it is possible to meaningfully restrict the Fourier transform of a function to the set . If the set has positive Lebesgue measure, then the answer is yes, since lies in and therefore has a meaningful restriction to even though functions in are only defined up to sets of measure zero. But what if has measure zero? If , then is continuous and therefore can be meaningfully restricted to any set . At the other extreme, if and is an arbitrary function in , then by Plancherel’s theorem, is also an arbitrary function in , and thus has no well-defined restriction to any set of measure zero.

It was observed by Stein (as reported in the Ph.D. thesis of Charlie Fefferman) that for certain measure zero subsets of , such as the sphere , one can obtain meaningful restrictions of the Fourier transforms of functions for certain between and , thus demonstrating that the Fourier transform of such functions retains more structure than a typical element of :

Theorem 1 (Preliminary restriction theorem)If and , then one has the estimatefor all Schwartz functions , where denotes surface measure on the sphere . In particular, the restriction can be meaningfully defined by continuous linear extension to an element of .

*Proof:* Fix . We expand out

From (1) and Fubini’s theorem, the right-hand side may be expanded as

where the inverse Fourier transform of the measure is defined by the formula

In other words, we have the identity

using the Hermitian inner product . Since the sphere have bounded measure, we have from the triangle inequality that

Also, from the method of stationary phase (as covered in the previous class 247A), or Bessel function asymptotics, we have the decay

for any (note that the bound already follows from (6) unless ). We remark that the exponent here can be seen geometrically from the following considerations. For , the phase on the sphere is stationary at the two antipodal points of the sphere, and constant on the tangent hyperplanes to the sphere at these points. The wavelength of this phase is proportional to , so the phase would be approximately stationary on a cap formed by intersecting the sphere with a neighbourhood of the tangent hyperplane to one of the stationary points. As the sphere is tangent to second order at these points, this cap will have diameter in the directions of the -dimensional tangent space, so the cap will have surface measure , which leads to the prediction (7). We combine (6), (7) into the unified estimate

where the “Japanese bracket” is defined as . Since lies in precisely when , we conclude that

Applying Young’s convolution inequality, we conclude (after some arithmetic) that

whenever , and the claim now follows from (5) and Hölder’s inequality.

Remark 2By using the Hardy-Littlewood-Sobolev inequality in place of Young’s convolution inequality, one can also establish this result for .

Motivated by this result, given any Radon measure on and any exponents , we use to denote the claim that the *restriction estimate*

for all Schwartz functions ; if is a -dimensional submanifold of (possibly with boundary), we write for where is the -dimensional surface measure on . Thus, for instance, we trivially always have , while Theorem 1 asserts that holds whenever . We will not give a comprehensive survey of restriction theory in these notes, but instead focus on some model results that showcase some of the basic techniques in the field. (I have a more detailed survey on this topic from 2003, but it is somewhat out of date.)

** — 1. Necessary conditions — **

It is relatively easy to find necessary conditions for a restriction estimate to hold, as one simply needs to test the estimate (9) against a suitable family of examples. We begin with the simplest case . The Hausdorff-Young inequality (4) tells us that we have the restriction estimate whenever . These are the only restriction estimates available:

Proposition 3 (Restriction to )Suppose that are such that holds. Then and .

We first establish the necessity of the duality condition . This is easily shown, but we will demonstrate it in three slightly different ways in order to illustrate different perspectives. The first perspective is from scale invariance. Suppose that the estimate holds, thus one has

for all Schwartz functions . For any scaling factor , we define the scaled version of by the formula

Applying (10) with replaced by , we then have

From change of variables, we have

and from the definition of Fourier transform and further change of variables we have

so that

combining all these estimates and rearranging, we conclude that

If is non-zero, then by sending either to zero or infinity we conclude that for all , which is absurd. Thus we must have the necessary condition , or equivalently that .

We now establish the same necessary condition from the perspective of dimensional analysis, which one can view as an abstraction of scale invariance arguments. We give the spatial variable a unit of length. It is not so important what units we assign to the range of the function (it will cancel out of both sides), but let us make it dimensionless for sake of discussion. Then the norm

will have the units of , because integration against -dimensional Lebesgue measure will have the units of (note this conclusion can also be justified in the limiting case ). For similar reasons, the Fourier transform

will have the units of ; also, the frequency variable must have the units of in order to make the exponent appearing in the exponential dimensionless. As such, the norm

has units . In order for the estimate (10) to be dimensionally consistent, we must therefore have , or equivalently that .

Finally, we establish the necessary condition once again using the example of a rescaled bump function, which is basically the same as the first approach but with replaced by a bump function. We will argue at a slightly heuristic level, but it is not difficult to make the arguments below rigorous and we leave this as an exercise to the reader. Given a length scale , let be a bump function adapted to the ball of radius around the origin, thus where is some fixed test function supported on . We refer to this as a bump function *adapted* to ; more generally, given an ellipsoid (or other convex region, such as a cube, tube, or disk) , we define a bump function adapted to to be a function of the form , where is an affine map from (or other fixed convex region) to and is a bump function with all derivatives uniformly bounded. As long as is non-zero, the norm is comparable to (up to constant factors that can depend on but are independent of ). The uncertainty principle then predicts that the Fourier transform will be concentrated in the dual ball , and within this ball (or perhaps a slightly smaller version of this ball) would be expected to be of size comparable to (the phase does not vary enough to cause significant cancellation). From this we expect to be comparable in size to . If (10) held, we would then have

for all , which is only possible if , or equivalently .

Now we turn to the other necessary condition . Here one does not use scaling considerations; instead, it is more convenient to work with randomised examples. A useful tool in this regard is Khintchine’s inequality, which encodes the *square root cancellation* heuristic that a sum of numbers or functions with randomised signs (or phases) should have magnitude roughly comparable to the *square function* .

Lemma 4 (Khintchine’s inequality)Let , and let be independent random variables that each take the values with an equal probability of .

- (i) For any complex numbers , one has
- (ii) For any functions on a measure space , one has

*Proof:* We begin with (i). By taking real and imaginary parts we may assume without loss of generality that the are all real, then by normalisation it suffices to show the upper bound

for all , whenever are real numbers with .

When the upper and lower bounds follow by direct calculation (in fact we have equality in this case). By Hölder’s inequality, this yields the upper bound for and the lower bound for . To handle the remaining cases of (11) it is convenient to use the exponential moment method. Let be an arbitrary threshold, and consider the upper tail probability

For any , we see from Markov’s inequality that this quantity is less than or equal to

The expectation here can be computed to equal

By comparing power series we see that for any real , hence by the normalisation we see that

If we set we conclude that

since the random variable is symmetric around the origin, we conclude that

From the Fubini-Tonelli theorem we have

and this then gives the upper bound (11) for any . The claim (12) for then follows from this, Hölder’s inequality (applied in reverse), and the fact that (12) was already established for .

To prove (ii), observe from (i) that for every one has

integrating in and applying the Fubini-Tonelli theorem, we obtain the claim.

Exercise 5For any , let denote the root of the implied constant in (11), that is to saywhere the supremum is over all and all reals with .

- (i) If one analyzes the argument used above to prove (11) carefully, one obtains an upper bound for . How does this upper bound depend on asymptotically in the limit ?
- (ii) Establish (11) for the case of even integers by direct expansion of the left-hand side and some combinatorial calculation. This gives another upper bound on . How does this upper bound compare with that in (i) in the limit .
- (iii) Establish a matching lower bound (up to absolute constants) for the quantity in the limit .

Now we show that the estimate (10) fails in the large regime , even when . Here, the idea is to have “spread out” in physical space (in order to keep the norm low), and also having somewhat spread out in frequency space (in order to prevent the norm from dropping too much). We use the probabilistic method (constructing a random counterexample rather than a deterministic one) in order to exploit Khintchine’s inequality. Let be a non-zero bump function supported on (say) the unit ball , and consider a (random) function of the form

where are the random signs from Lemma 4, and are sufficiently separated points in (all we need for this construction is that for all ); thus is the random sum of bump functions adapted to disjoint balls . In particular, the summands here have disjoint supports and

(note that the signs have no effect on the magnitude of ). If (10) were true, this would give the (deterministic) bound

On the other hand, the Fourier transform of is

so by Khintchine’s inequality

The phases can be deleted, and is not identically zero, so one arrives at

Comparing this with (13) and sending , we obtain a contradiction if . This completes the proof of Proposition 3.

Exercise 6Find a deterministic construction that explains why the estimate (10) fails when and .

Exercise 7 (Marcinkiewicz-Zygmund theorem)Let be measure spaces, let , and suppose is a bounded linear operator with operator norm . Show thatfor any at most countable index set and any functions . Informally, this result asserts that if a linear operator is bounded from scalar-valued functions to scalar-valued functions, then it is automatically bounded from

vector-valuedfunctions to vector-valued functions. (By using gaussians instead of random sums, one can even obtain this bound with the implied constant equal to .)

Exercise 8Let be a bounded open subset of , and let . Show that holds if and only if and . (Note: in order to use either the scale invariance argument or the dimensional analysis argument to get the condition , one should replace with something like a ball of some radius , and allow the estimates to depend on .)

Now we study the restriction problem for two model hypersurfaces:

- (i) The
*paraboloid*equipped with the measure induced from Lebesgue measure in the horizontal variables , thus

(note this is

*not*the same as surface measure on , although it is mutually absolutely continuous with this measure). - (ii) The
*sphere*.

These two hypersurfaces differ from each other in one important respect: the paraboloid is non-compact, while the sphere is compact. Aside from that, though, they behave very similarly; they are both quadric hypersurfaces with everywhere positive curvature. Furthermore, they are also very highly symmetric surfaces. The sphere of course enjoys the rotation symmetry under the orthogonal group . At first glance the paraboloid only enjoys symmetry under the smaller orthogonal group that rotates the variable (leaving the final coordinate unchanged), but it also has a family of Galilean symmetries

for any , which preserves (and also can be seen to preserve the measure , since the horizontal variable is simply translated by ). Furthermore, the paraboloid also enjoys a *parabolic scaling symmetry*

for any , for which the sphere does not have an exact analogue (though morally Taylor expansion suggests that the sphere “behaves like” the paraboloid at small scales, or equivalently that certain parabolically rescaled copies of the sphere behave like the paraboloid in the limit). The following exercise exploits these symmetries:

- (i) Let be a non-empty open subset of , and let . Show that holds if and only if holds.
- (ii) Let be bounded non-empty open subsets of (endowed with the restriction of to ), and let . Show that holds if and only if holds.
- (iii) Suppose that are such that holds. Show that . (
Hint:Any of the three methods of scale invariance, dimensional analysis, or rescaled bump functions will work here.)- (iv) Suppose that are such that holds. Show that . (
Hint:The same three methods still work, but some will be easier to pull off than others.)- (v) Suppose that are such that holds for some bounded non-empty open subset of , and that . Conclude that holds.
- (vi) Suppose that are such that holds, and that . Conclude that holds.

Exercise 10 (No non-trivial restriction estimates for flat hypersurfaces)Let be an open non-empty subset of a hyperplane in , and let . Show that can only hold when .

To obtain a further necessary condition on the restriction estimates or holding, it is convenient to dualise the restriction estimate to an *extension estimate*.

Exercise 11 (Duality)Let be a Radon measure on , let , and let . Show that the following claims are equivalent:

This gives a further necessary condition as follows. Suppose for instance that holds; then by the above exercise, one has

for all . In particular, . However, we have the following stationary phase computation:

for all and some non-zero constants depending only on . Conclude that the estimate can only hold if .

Exercise 13Show that the estimate can only hold if . (Hint:one can explicitly test (15) when is a gaussian; the fact that gaussians are not, strictly speaking, compactly supported can be dealt with by a limiting argument.)

It is conjectured that the necessary conditions claimed above are sufficient. Namely, we have

Conjecture 14 (Restriction conjecture for the sphere)Let . Then we have whenever and .

Conjecture 15 (Restriction conjecture for the paraboloid)Let . Then we have whenever and .

It is also conjectured that Conjecture 14 holds if one replaces the sphere by any bounded open non-empty subset of the paraboloid .

The current status of these conjectures is that they are fully solved in the two-dimensional case (as we will see later in these notes) and partially resolved in higher dimensions. For instance, in one of the strongest results currently is due to Hong Wang, who established for a bounded open non-empty subset of when (conjecturally this should hold for all ); for higher dimensions see this paper of Hickman and Rogers for the most recent results.

We close this section with an important connection between the restriction conjecture and another conjecture known as the *Kakeya maximal function conjecture*. To describe this connection, we first give an alternate derivation of the