This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If is an absolutely integrable function, its Fourier coefficients are defined by the formula

If is smooth, then the Fourier coefficients are absolutely summable, and we have the Fourier inversion formula where the series here is uniformly convergent. In particular, if we define the partial summation operators then converges uniformly to when is smooth.What if is not smooth, but merely lies in an class for some ? The Fourier coefficients remain well-defined, as do the partial summation operators . The question of convergence in norm is relatively easy to settle:

Exercise 1

- (i) If and , show that converges in norm to . (
Hint:first use the boundedness of the Hilbert transform to show that is bounded in uniformly in .)- (ii) If or , show that there exists such that the sequence is unbounded in (so in particular it certainly does not converge in norm to . (
Hint:first show that is not bounded in uniformly in , then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)

Note from Hölder’s inequality that contains for all , so Carleson’s theorem covers the case of Hunt’s theorem. We remark that the precise threshold near between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function or the summation method by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a *frequency modulation symmetry* in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

** — 1. Equivalent forms of almost everywhere convergence of Fourier series — **

A standard technique to prove almost everywhere convergence results is by first establishing a weak-type estimate of an associated maximal function. For instance, the Lebesgue differentiation theorem is usually established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse this implication in certain cases by exploiting a symmetry of the problem. Here is the principle specialised to the application of pointwise convergence of Fourier series, and also combined with a transference principle of Kenig and Tomas:

Proposition 3 (Equivalent forms of almost everywhere convergence)Let . Then the following statements are equivalent:

- (i) For every , one has for almost every .
- (ii) There does not exist such that for almost every .
- (iii) One has the maximal inequality for all smooth , where the weak norm is defined as and denotes the Lebesgue measure of a set (which in this setting is a subset of the unit circle).
- (iv) One has the maximal inequality for all smooth , where denotes the partial Fourier series
- (v) One has the maximal inequality for all , where denotes the Fourier multiplier operator

Among other things, this proposition equates the qualitative property (i) of almost everywhere convergence to the quantitative property (iii) of a maximal inequality. This equivalence (first observed by Calderón) is similar in spirit to the uniform boundedness principle (see e.g. Corollary 1 of this previous blog post). The restriction is needed for just one implication (from (ii) to (iii)) in the arguments below, and arises due to the use of Khintchine’s inequality at one point. The equivalence of (iv) and (v) is part of a more general principle of *transference* that allows one to pass back and forth between periodic domains such as with non-periodic domains such as (or, on the Fourier side, between discrete domains and continuous domains ) if the estimates in question enjoy suitable scaling symmetries. We will use the formulation (v), as it enjoys the most symmetries.

*Proof:* We first show that (iii) implies (i). If (1) holds for all smooth , then certainly for all finite one has

Clearly (i) implies (ii). Now we assume that (iii) fails and use this to show that (ii) fails as well. From the failure of (iii) and monotone convergence, for any one can find , a measurable subset of , a finite , and such that

and such that In particular, has positive measure. By homogeneity we may normalise . At this stage, nothing prevents the measure of from being much smaller than ; but we can exploit translation invariance to increase the measure of to be comparable to as follows. Let be the integer part of . We claim that there exist translations of whose union has measure comparable to : This is easiest to establish by the probabilistic method (which in this context we might call the*random translation*method). If we select uniformly and independently at random we see that every point will lie in a given translate (or equivalently, that lies in ) with probability , hence Integrating in and using the Fubini-Tonelli theorem, we conclude that and hence there exists deterministic choices of for which By definition of the RHS is comparable to , giving the claim (clearly the left-hand side cannot exceed ).

Now consider the randomised linear combination

of translates of , where are random Bernoulli signs. From Khintchine’s inequality and the hypothesis we have hence by construction of and (4)Now we study the behaviour of when . Since is a convolution operator, it commutes with translations, and hence

for each . On the other hand, from (3) we have and hence there exists such that In particular, the square function is at least . Meanwhile, from Khintchine’s inequality and (7) we have for all . Applying the Paley-Zygmund inequality (setting , for instance) we conclude that (for suitable choices of implied constants), so in particular Integrating in using (5), and applying the Fubini-Tonelli theorem, we conclude that hence by (6) one has In particular, there exists a deterministic choice of signs (and hence of ) for which On the other hand, the left-hand side is at most . We conclude that for every , we can find a smooth function with and a finite , as well as a set of measure , such that for all .Applying this fact iteratively (each time choosing to be sufficiently large depending on all previous choices), we can construct a sequence of smooth functions , finite , and sets for such that

- (a) for all .
- (b) for all .
- (c) One has for all and (note that the right-hand side is finite since the are smooth for ).
- (d) for all (note that the left-hand side is bounded by ).

Now we assume (iv) and work to establish (v). The idea here is to use a rescaling argument, viewing as the limit as of the large circle (in physical space) or the fine lattice (in frequency space).

By limiting arguments we may assume that is compactly supported on some interval . Let be a large scaling parameter, and consider the periodic function defined by

For large enough, this function is smooth and supported on the interval , with norm The Fourier coefficients of is given as so that Applying (iv), we see that for any , we have Rescaling by , we conclude that We can let range over the reals rather than the integers as this does not affect the constraint . Rescaling by , we see that for any compact intervals , we have By uniform Riemann integrability and the rapid decrease of uniformly for , . We conclude that By monotone convergence we may replace with , and we then obtain (v).Finally, we assume (v) and establish (iv). By a limiting argument it suffices to establish (iv) for trigonometric polynomials , that is to say periodic functions whose Fourier coefficients are supported in for some natural number . Let be a non-zero Schwartz function with supported in , and for a given scaling parameter let denote the Schwartz function

For sufficiently large one easily checks that The Fourier transform of can be calculated as hence (for large enough) and thus From (v) we conclude that for any we have For large enough, the left-hand side is for some depending on . Dividing by and replacing by , we obtain the claim (iv).

Exercise 4For , let denote the Fejér summation operators

- (i) For any , establish the pointwise bound where is the Hardy-Littlewood maximal function
- (ii) Show that for , one has for almost all .

Exercise 5 (Pointwise convergence of Fourier integrals)Let be such that the conclusion of Theorem 3(v) holds. Show that for any , one has for almost all , where is defined for Schwartz functions by the formula and then extended to by density.

Exercise 6Let . Suppose that is such that one has the restriction estimate for all Schwartz functions , where denotes the surface measure on the sphere . Conclude that for all Schwartz functions . (This observation is due to Bourgain.) In particular, by Marcinkiewicz interpolation, implies for all . (Hint:adapt some parts of the argument used to get from (iii) to (i) in Proposition 3, using rotation invariance as a substitute for translation invariance. (But the translational symmetry of the restriction problem – more precisely, the ability to translate a function in physical space without changing the absolute value of its Fourier transform – will also be useful.))

We are now ready to establish Kolmogorov’s theorem (Theorem 2(i)); our arguments are loosely based on the original construction of Kolmogorov (though he was not in possession at the time of the Stein maximal principle). In view of the equivalence between (ii) and (v) in Theorem 3, it suffices to show that the maximal operator

fails to be of weak-type on Schwartz functions. Recalling that the Hilbert transform is also a Fourier multiplier operator some routine calculations then show that for any Schwartz function . By the triangle inequality, it then suffices to show that the maximal operator fails to be of weak type on Schwartz functions.To motivate the construction, note from a naive application of the triangle inequality that

If the function was absolutely integrable, then by Young’s inequality we would conclude that the maximal operator was strong type , and hence also weak type . Thus any counterexample must somehow exploit the logarithmic divergence of the integral of . However, there are two potential sources of cancellation that could ameliorate this divergence: the sign of the Hilbert kernel , and the phase . But because of the supremum in , we can select the frequency parameter as we please, as long as it depends only on and not on . The idea is then to choose (and the support of ) to remove both sources of cancellation as much as possible.We turn to the details. Let be a large natural number, and then select widely separated frequency scales

In order to assist with removing cancellation in the phases later, we will require these scales to be integers. The precise choice of scales is not too important as long as they are widely separated and integer valued, but for sake of concreteness one could for instance set . Let be a bump function of total mass supported on , and let be the Schwartz function thus is an approximation (in a weak sense) to the sum of Dirac masses , with the frequency scale of the approximation to increasing rapidly in . We easily compute the norm of :Now we estimate for in the interval for some natural number ; note the set of all such has measure . In this range we will test the maximal operator at the frequency cutoff :

As is supported in , we see (for large enough) that avoids the support of and we can replace the principal value integral with the ordinary integral. Substituting (9), we conclude that As is an integer, the phase is equal to . We also cancel out the phase as being independent of , thus For , we exploit the oscillatory nature of the phase through an integration by parts, leading to the bound (one could even gain a factor of here if desired, but we will not need it). Summing, we have For , we instead exploit the near-constant nature of the phase by writing and similarly to conclude that Summing and combining with (11), we conclude (from the rapidly increasing nature of the ) that and thus (for large) Comparing this with (10) we contradict the conclusion of Theorem 3(iv), giving the claim.

Remark 7In 1926, Kolmogorov refined his construction to obtain a function whose Fourier sums diverged everywhere (not just almost everywhere).

Exercise 8 (Rademacher-Menshov theorem)

- (i) Let be some square-integrable functions on a probability space , with a power of two. By performing a suitable Whitney type decomposition (similar to that used in Section 3 of Notes 1), establish the pointwise bound where for each , ranges over dyadic intervals of the form with . If furthermore the are orthogonal to each other, establish the maximal inequality
- (ii) If is a trigonometric polynomial with at most non-zero coefficients for some , use part (i) to establish the bound
- (iii) If lies in the Sobolev space for some , use (ii) to show that for almost every .

** — 2. Carleson’s theorem — **

We now begin the proof of Carleson’s theorem (Theorem 2(ii)), loosely following the arguments of Lacey and Thiele (we briefly comment on other approaches at the end of these notes). In view of Proposition 3, it suffices to establish the weak-type bound

for Schwartz functions . Because of the supremum, the expression depends sublinearly on rather than linearly; however there is a trick to reduce matters to considering linear estimates. By selecting, for each , to be a frequency which attains (or nearly attains) the supremal value of , it suffices to establish the linearised estimate uniformly for all measurable functions , where is the operator One can think of this operator as the (Kohn-Nirenberg) quantisation of the rough symbol . Unfortunately this symbol is far too rough for us to be able to use pseudodifferential operator tools from the previous set of notes. Nevertheless, the “time-frequency analysis” mindset of trying to efficiently decompose phase space into rectangles consistent with the uncertainty principle will remain very useful.The next step is to dualise the weak norm to linearise the dependence on even further:

Exercise 9Let , let be a -finite measure space, let be a measurable function, and let . Show that the following claims are equivalent (up to changes in the implied constants in the asymptotic notation):

- (i) One has .
- (ii) For every subset of of finite measure, the function is absolutely integrable on , and

In view of this exercise, we see that it suffices to obtain the bound

for all Schwartz , all sets of finite measure, and all measurable functions . Actually only the restriction of to is relevant here, so one can view as a function just on if desired. The operator can be viewed as the quantisation of the (very rough) symbol , that is to say the indicator function of the region lying underneath the graph of :
A notable feature of the estimate (12) is that it enjoys *three* different symmetries (or near-symmetries), each of which is “non-compact” in the sense that it is parameterised by a parameter taking values in a non-compact space such as or :

- (i) (Translation symmetry) For any spatial shift , both sides of (12) remain unchanged if we replace by , the set by the translate , and the function by .
- (ii) (Dilation symmetry) For any scaling factor , both sides of (12) become multiplied by the same scaling factor if we replace by , by the dilate , and the function by .
- (iii) (Modulation symmetry) For any frequency shift , both sides of (12) remain (almost) unchanged if we replace by , do not modify the set , and replace the function by . (Technically the left-hand side changes because of an additional factor of , but this factor can be handled for instance by generalising the indicator function cutoff to a subindicator function cutoff that has the pointwise bound ; we will ignore this very minor issue here.)

Each of these symmetries corresponds to a different symmetry of phase space , namely spatial translation , dilation , and frequency translation respectively. As a general rule of thumb, if one wants to prove a delicate estimate such as (12) that is invariant with respect to one or more non-compact symmetries, then one should use tools that are similarly invariant (or approximately invariant) with respect to these symmetries. Thus for instance Littlewood-Paley theory or Calderón-Zygmund theory would not be suitable tools to use here, as they are only invariant with respect to translation and dilation symmetry but absolutely fail to have any modulation symmetry properties (these theories prescribe a privileged role to the frequency origin, or equivalently they isolate functions of mean zero as playing a particularly important role).

Besides the need to respect the symmetries of the problem, one of the main difficulties in establishing (12) is that the expression , couples together the function with the function in a rather complicated way (via the frequency variable ). We would like to try to decouple this interaction by making and instead interact with simpler objects (such as “wave packets”), rather than being coupled directly to each other. To motivate the decomposition to use, we begin with a heuristic discussion. The first main idea is to temporarily work in the (non-invertible) coordinate system of phase space rather than in order to simplify the constraint to the simple geometric region of a half-plane (this coordinate system is of course a terrible choice for most of the other parts of the argument, but is the right system to use for the frequency decompositions we will now employ). In analogy to the Whitney type decompositions used in Notes 1, one can split

for almost all choices of and (at least if have the same sign), where range over pairs of dyadic intervals that are “close” in the sense that and that and are not adjacent, but their parents are adjacent, and with to the left of . (Here it is convenient to work with half-open dyadic intervals , to avoid issues with overlap.) If one ignores the caveats and blindly substitutes in the decomposition (13), the expression in the left of (12) becomes To decouple further, we will try to decompose into “rank one” operators. More precisely, we manipulate where we use the notation . It will be convenient to try to discretise this integral average. From the uncertainty principle, modifying by should only modify approximately by a phase, so the integral here is roughly constant at spatial scales . So we heuristically have If we now define a*tile*to be a rectangle in phase space of the form where are dyadic intervals and with unit area , we see that every in the above sum is associated to a tile . The interval is then similarly assocated to a nearby tile , and we write to indicate the relationship between the two tiles (they share the same spatial interval , but lies just above ). We can then approximately write the left-hand side of (12) as where is an -normalised “wave packet” that is roughly localised to in phase space. This approximate form of (12) has achieved the goal of decoupling the function from the data , as they both now interact with the tile pair rather than through each other. Note also that the set of tiles obeys an approximate version of the three symmetries that (12) does. Firstly, the set of tiles is invariant under dilations if is a power of two; secondly, once one fixes the scales of the tiles, the remaining set of tiles is invariant under spatial translations by integer multiples of the spatial scale , and under frequency translations by integer multiples of . (We will need the discrete and nested nature of the tiles for some subsequent combinatorial arguments, and it turns out to be worthwhile to accept a slightly degraded form of the three basic symmetries of the problem in return for such a discretisation.)

We now make the above heuristic decomposition rigorous. For any dyadic interval , let denote the left child interval, and the right child interval. We fix a bump function supported on normalised to have norm ; henceforth we permit all implied constants in the asymptotic notation to depend on . For each interval let denote the rescaled function

noting that this is a bump function supported on . We will establish the estimate where ranges over all dyadic intervals. We assume (15) for now and see why it implies (12). The left-hand side of (15) is not quite dilation or frequency modulation invariant, but we can fix this by an averaging argument as follows. Applying the modulation invariance, we see for any that since we thus have We temporarily truncate to a finite range of scales, and use the triangle inequality, to obtain for any finite . For fixed , the expression is periodic in with period , with average equal to which we can rewrite as which one can rewrite further (using the change of variables ) as whereHence if we average over all in (say) , we conclude that

and hence on sending to infinity Using dilation symmetry, we also see that for any . Averaging this for with Haar measure , we conclude that But as is a bump function supported in , one has The quantity is a non-zero constant, hence which is (12).It remains to prove (15). As in the heuristic discussion, we approximately decompose the convolution into a sum over tiles. We have

Motivated by this, we define as before a*tile*to be a rectangle with dyadic intervals with ; we also split each such tile into an upper half and a lower half . We refer to as the

*spatial scale*of the tile, and the reciprocal as the

*frequency scale*. For each tile define the wave packet which is a Schwartz function with Fourier support in (in fact it is supported in ) that is normalised to have norm and is localised spatially near , so morally it has “phase space support in “. We will later establish the estimate for all and sets of finite measure (cf. (14)), where ranges over the set of all tiles. For now, we show why this estimate implies (15) and hence (12). Just as (12) was obtained from (15) by averaging over dilation and frequency modulations, we shall recover (15) from (17) by averaging over spatial translations. As before, we first temporarily restrict the size range of and use the triangle inequality to obtain Applying translation symmetry, we conclude that for any . The left-hand side may be rewritten as where we extend the definition of to translated tiles in the obvious fashion. The expression inside the absolute values is periodic in with period , and averages to which by (16) simplifies to and so on averaging in and then sending to infinity we recover (15).

It remains to establish (17). It is convenient to introduce the sets

so that the target estimate (17) simplifies slightly to As advertised, we have now decoupled the influences of and the influences of (which determine the sets ), as these quantities now only directly interact with the wave packets , rather than with each other. Moreover, in some sense only interacts with the lower half of the tile (as this is where is concentrated), while and only interact with the upper half of the tile.One advantage of this “model” formulation of the problem is that one can naturally build up to the full problem by trying to establish estimates of the form

where is some smaller set of tiles. For instance, if we can prove (19) for all finite collections of tiles, then by monotone convergence we recover the required estimate.The key problem here is that tiles have three degrees of freedom: scale, spatial location, and frequency location, corresponding to the three symmetries of dilation, spatial translation, and frequency modulation of the original estimate (12). But one can warm up by looking at families of tiles that only exhibit two or fewer degrees of freedom, in a way that slowly builds up the various techniques we will need to apply to establish the general case:

**The case of a single tile** We begin with the simplest case of a single tile (so that there are zero degrees of freedom):

**The case of separated tiles of fixed scale** Now we let be a collection of tiles all of a fixed spatial scale (so that (so that we have the two parameters of spatial and frequency location, but not the scale parameter). Among other things, this makes the tiles in essentially disjoint (i.e., disjoint ignoring sets of measure zero). This disjointness manifests itself in two useful ways. Firstly, we claim that we can improve the trivial bound

Now let us see why (24) is true. To motivate the argument, suppose that had no tail outside of , so that one could replace to in (22). Then would have

and as the tiles are all essentially disjoint the claim (24) would then follow from summing in , since each contributes to at most one of the sets . Now we have to deal with the contribution of the tails. We can bound For each , there is at most one dyadic interval of the fixed length such that . Thus in the above sum is fixed, and only can vary; from (22) we then see that , giving (24).Now we prove (25). The intuition here is that the essential disjointness of the tiles make the approximately orthogonal, so that (25) should be a variant of Bessel’s inequality. We exploit this approximate orthogonality by a method, which we perform here explicitly. By duality we have

for some coefficients with , so by Cauchy-Schwarz it suffices to show that The left-hand side expands as From the Fourier support of we see that the inner product vanishes unless the intervals overlap which by the equal sizes of force . In this case we can use (22) to bound the inner product by and then a routine application of Schur’s test gives (26). This establishes (25), giving (19) in the case of tiles of equal dimensions.
**The case of a regular -tree**

Now we attack some cases where the tiles can vary in scale. In phase space, a key geometric difficulty now arises from the fact that tiles may start partially overlapping each other, in contrast to the previous case in which the essential disjointness of the tile set was crucial in establishing the key estimates (24), (25). However, because we took care to restrict the intervals of the tiles to be dyadic, there are only a limited number of ways in which two tiles can overlap. Given two rectangles and , we define the relation if and ; this is clearly a partial order on rectangles. The key observation is as follows: if two tiles overlap, then either or . Similarly if are replaced by their upper tiles or by their lower tiles . Note that if are tiles with , then one of or holds (and the only way both inequalities can hold simultaneously is if ).

As was first observed by Fefferman, a key configuration of tiles that needs to be understood for these sorts of problems is that of a *tree*.

Definition 10Let be a tile. Atree with topis a collection of tiles with the property that for all . (For minor technical reasons it is convenient to not require the top to actually lie in the tree , though this is often the case.) We write for the spatial support of the tree, and for the frequency support of the tree top. If we in fact have for all , we say that is a -tree; similarly if for all , we say that is a -tree. (Thus every tree can be partitioned into a -tree and a -tree with the same top as the original tree.)

The tiles in a tree can vary in scale and in spatial location, but once these two parameters are given, the frequency location is fixed, so a tree can again be viewed as a “two-parameter” subfamily of the three-parameter family of tiles.

We now prove (19) in the case when is a -tree , thus for all . Here, the factors will all “collide” with each other and there will be no orthogonality to exploit here; on the other hand, there will be a lot of “disjointness” in the that can be exploited instead.

To illustrate the key ideas (and to help motivate the arguments for the general case) we will also make the following “regularity” hypotheses: there exists two quantities (which we will refer to as the *energy* and *mass* of the tree respectively) for which we have the upper bounds

We also assume that we have the reverse bounds for the tree top:

and It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either or .
We will use (27), (28), (29) to establish the *tree estimate*

Note from (30) and Cauchy-Schwarz that

and from (31) and Cauchy-Schwarz one similarly has and so (32) recovers the desired estimate (19).It remains to establish the tree estimate (32). It will be convenient to use the tree to partition the real line into dyadic intervals that are naturally “adapted to” the geometry of the tree (or more precisely to the spatial intervals of the tree) in a certain way (in a manner reminiscent of a Whitney decomposition).

Exercise 11 (Whitney-type decomposition associated to a tree)Let be a non-empty tree. Show that there exists a family of dyadic intervals with the following properties:(Hint: one can choose to be the collection of all dyadic intervals whose dilate does not contain any , and which is maximal with respect to set inclusion.)

- (i) The intervals in form a partition of (up to sets of measure zero).
- (ii) For each and any with , we have .
- (iii) For each , there exists with and .

We can of course assume that the tree is non-empty, since (32) is trivial for empty sets of tiles. We apply the partition from Exercise 11. By the triangle inequality, we can bound the left hand side of (32) by

which by (27), (22) may be bounded by We first dispose of the narrow tiles in which . By Exercise 11(ii) this forces . From (28) we have (say). For each fixed spatial scale , the intervals in the tree are all essentially disjoint, so a routine calculation then shows (say), so that which from Exercise 11(ii) implies that the contribution of the case to (32) is acceptable.Now we consider the wide tiles in which . From Exercise 11(ii) this case is only possible if and . Thus the are now restricted to an interval of length , and it will suffice to establish the local estimate

for each . Note that for each fixed spatial scale , there is at most one choice of frequency interval with and , thus for fixed the set is independent of . We may then sum in for each such scale to conclude Now we make the crucial observation that in a -tree , the intervals are all essentially disjoint, hence the are disjoint as well. As these sets are also contained in , we conclude that From Exercise 11(iii) and (29) (choosing a tile with spatial scale and within of , and with for the tile provided by Exercise 11(iii)) we have giving the claim.
**The case of a regular -tree**

We now complement the previous case by establishing (19) for (certain types of) -trees . The situation is now reversed: there is a lot of “collision” in the , but on the other hand there is now some “orthogonality” in the that can be exploited.

As before we will assume some regularity on the -tree , namely that there exist for which one has the upper bounds

for all (note this is slightly stronger than (27)), as well as the bound (29) for any tile with for some . We complement this with the matching lower bounds and (31).As before we will focus on establishing the tree estimate (32). From (31) and Cauchy-Schwarz as before we have

As we now have a -tree, the tiles become disjoint (up to null sets), and we can obtain an almost orthogonality estimate:

Exercise 12 (Almost orthogonality)For any -tree , show that for all complex numbers , and use this to deduce the Bessel-type inequality

From this exercise and (34) we see that

and so the desired bound (19) will follow from the tree estimate (32).In this case it will be convenient to linearise the sum to remove the absolute value signs; more precisely, to show (32) it suffices to show that

for any complex numbers of magnitude . Again we may assume that the tree is non-empty, and use the partition from Exercise 11, to split the left-hand side as The contribution of the narrow tiles can be disposed of as before without any additional difficulty, so we focus on estimating the contribution of the wide tiles. As before, in order for this sum to be non-empty has to be contained in an neighbourhood of .The main difficulty here is the dependence of on . We rewrite

so that the above expression can be written as Now for a key geometric observation: the intervals are nested (and decrease when increases), so the condition is equivalent to a condition of the form for some scale depending on . Thus the above sum can be written as One can bound the integrand here by a “maximal Calderón-Zygmund operator” which is basically a sup over truncations of the “(modulated) pseudodifferential operator” The point of this formulation is that the integrand can now be expressed as a sort of “Littlewood-Paley projection” of the function to the region of frequency space corresponding to those intervals with :

Exercise 13Establish the pointwise estimate for all where ranges over all intervals (not necessarily dyadic) containing .

From (29) and Exercise 11(iii) as before we have

and so we can bound the expression (35) by which one can bound in terms of the Hardy-Littlewood maximal function of , followed by Cauchy-Schwarz and the Hardy-Littlewood inequality, and finally Exercise 12, as On the other hand, from (33) we have for every . By grouping the tiles in according to their maximal elements (which necessarily have essentially disjoint spatial intervals) and applying the above inequality to each such group and summing, we conclude that and the tree estimate (32) follows.
**The general case**

We are now ready to handle the general case of an arbitrary finite collection of tiles. Motivated by the previous discussion, we define two quantities:

Definition 14 (Energy and mass)For any non-empty finite collection of tiles, we define theenergyto be the quantity where ranges over all -trees in , and themassto be the quantity where is the set (thus for instance ). By convention, we declare the empty set of tiles to have energy and mass equal to zero.

Note here that the definition of mass has been modified slightly from previous arguments, in that we now use instead of . However, this turns out to be an acceptable modification, in the sense that we still continue to have the analogue of (32):

Exercise 15 (Tree estimate)If is a tree, show that

Since has an norm of , we also have the trivial bound

for any finite collection of tiles .The strategy is now to try to partition an arbitrary family of tiles into collections of disjoint trees (or “forests”, if you will) whose energy , mass , and spatial scale are all under control, apply Exercise 15 to each tree, and sum. To do this we rely on two key selection results, which are vaguely reminiscent of the Calderón-Zygmund decomposition:

Proposition 16 (Energy selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

Proposition 17 (Mass selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

(In these propositions, “disjoint” means that any given tile belongs to at most one of the trees in ; but the tiles in one tree are allowed to overlap the tiles in another tree.)

Let us assume these two propositions for now and see how these (together with Exercise 15) establishes the required estimate (19) for an arbitrary collection of tiles. We may assume without loss of generality that and are non-zero. Rearranging the above two propositions slightly, we see that if is a finite collection of tiles such that

for some integer then after applying Proposition 16 followed by Proposition 17, we can partition into a disjoint collection of trees with together with a remainder with Note that any finite collection of tiles will obey (38) for some sufficiently large and negative . Starting with this and then iterating indefinitely, and discarding any empty families, we can therefore partition any finite collection of tiles as where are collections of trees (empty for all but finitely many ) such that and (39) holds, and is a residual collection of tiles with We can then bound the left-hand side of (19) by From Exercise 15 applied to individual tiles and (41) we see that the second term in this expression vanishes. For the first term, we use Exercise 15, (40), (36) to bound this sum by which by (39) is bounded by which sums to as required.It remains to establish the energy and mass selection lemmas. We begin with the mass selection claim, Proposition 17. Let denote the set of all tiles with for some and such that

Let denote the set of tiles in that are maximal with respect to the tile partial order. (Note that the left-hand side of (42) is bounded by , so there is an upper bound to the spatial scales of the tiles involved here.) Then every tile in is either less than or equal to a tile in , or is such that for all . Thus if we let be the collection of tiles of the second form, and let be the collection of trees with tree top associated to each (selected greedily, and in arbitrary order, subject of course to the requirement that no tile belongs to more than one tree), we obtain the required partition with and it remains to establish the bound This will be a (rather heavily disguised) variant of the Hardy-Littlewood maximal inequality. By construction, the tree tops are essentially disjoint, and one has for all such tree tops. To motivate the argument, suppose for sake of discussion that we had the stronger estimate By the essential disjointness of the , the sets are also essentially disjoint subsets of , hence and the claim (43) would then follow. Now we do not quite have (44); but from the pigeonhole principle we see that for each there is a natural number such that (say), where denotes the interval with the same center as but times the length (this is not quite a dyadic interval). We now restrict attention to those associated to a fixed choice of . Let denote the corresponding dilated tiles, then we have for each with .Unfortunately, the are no longer disjoint. However, by the greedy algorithm (repeatedly choosing maximal tiles (in the tile ordering)), we can find a collection such that

- (i) All the dilated tree tops are essentially disjoint.
- (ii) For every with , there is such that intersects and .

From property (i) and (45) we have

On the other hand, from property (ii) we see that the sum of all the for all with associated to a single is . Putting the two statements together we see that and on summing in we obtain the required claim (43).Finally, we prove the energy selection claim, Proposition 16. The basic idea is to extract all the high-energy trees from in such a way that the -tree component of those trees are sufficiently “disjoint” from each other that a useful Bessel inequality, generalising Exercise 12, may be deployed. Implementing this strategy correctly turns out however to be slightly delicate. We perform the following iterative algorithm to generate a partition

as well as a companion collection of -trees as follows.

- Step 1. Initialise and .
- Step 2. If then STOP. Otherwise, go on to Step 3.
- Step 3. Since we now have , contains a -tree for which
Among all such , choose one for which the midpoint of the frequency is
*minimal*. (The reason for this rather strange choice will be made clearer shortly.) - Step 4. Add to , add the larger tree (with the same top as ) to , then remove from . We also remove the adjacent trees and from and also place them into . Now return to Step 2.

This procedure terminates in finite time to give a partition (46) with , and with the trees coming in triplets all associated to a -tree in with the same spatial scale as , with all the -trees disjoint and obeying the estimates

(both the upper and lower bounds will be important for this argument). It will then suffice to show that by (48), it then suffices to show the Bessel type inequalityNow we make a crucial observation: not only are the trees in disjoint (in the sense that no tile belongs to two of these trees), but the lower tiles are also essentially disjoint. Indeed we claim an even stronger disjointness property: if , are such that , then is not only disjoint from the larger dyadic interval , but is in fact disjoint from the even larger interval . To see this, suppose for contradiction that and . There are three possibilities to rule out:

- is equal to . This can be ruled out because any two lower frequency intervals associated to a -tree are either equal or disjoint.
- was selected after was. To rule this out, observe that contains the parent of , and hence , , or . Thus, when was selected, should have been placed with one of the three trees associated to and would therefore not have been available for inclusion into , a contradiction.
- was selected before was. If this case held, then the midpoint of would have to be greater than or equal to that of , otherwise would not have a minimal midpoint at the time of its selection. But is contained in , which is contained in , which lies below , which contains , which contains the midpoint of ; thus the midpoint of lies strictly below that of , a contradiction.

If the were perfectly orthogonal to each other, this disjointness would be more than enough to establish (49). Unfortunately we only have imperfect orthogonality, and we have to work a little harder. As usual, we turn to a type argument. We can write the left-hand side of (49) as

so by Cauchy-Schwarz it suffices to show that By the triangle inequality, the left-hand side may be bounded by As has Fourier support in , we see that vanishes unless and overlap. By symmetry it suffices to consider the cases and .First let us consider the contribution of . Using Young’s inequality and symmetry, we may bound this contribution by

A direct calculation using (22) reveals that so the contribution of this case is at most as desired.Now we deal with the case when , which by the preceding discussion implies that and lies outside of . Here we use (37) to bound

andand then we can bound this contribution by

Direct calculation using (22) reveals that (say), and also so we obtain a bound of which is acceptable by (48). This finally finishes the proof of Proposition 16, which in turn completes the proof of Carleson’s theorem.

Remark 18The Lacey-Thiele proof of Carleson’s theorem given above relies on a decomposition of a tileset in a way that controls both energy and mass. The original proof of Carleson dispenses with mass (or with the function ), and focuses on controlling maximal operators that (in our notation) are basically of the form To control such functions, one iterates a decomposition similar to Proposition 16 to partition into trees with good energy control, and establishes pointwise control of the contribution of each tree outside of an exceptional set. See Section 4 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis. The proof of Fefferman takes the opposite tack, dispensing with energy and focusing on bounding the operator norm of the linearised operator Roughly speaking, the strategy is to iterate a version of Proposition 16 for partition into “forests” of disjoint trees, though in Fefferman’s argument some additional work is invested into obtaining even better disjointness properties on these forests than is given here. See Section 5 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis.

A modification of the above arguments used to establish the weak estimate can also establish restricted weak-type estimates for any :

Exercise 19For any sets of finite measure, and any measurable function , show that for any . (Hint:repeat the previous analysis with , but supplement it with an additional energy bound coming from a suitably localised version of Exercise 12.)

The bound (51) is also true for , yielding Hunt’s theorem, but this requires some additional arguments of Calderón-Zygmund type, involving the removal of an exceptional set defined using the Hardy-Littlewood maximal function:

Exercise 20 (Hunt’s theorem)Let be of finite non-zero measure, and let be a measurable function. Let be the exceptional set for a large absolute constant ; note from the Hardy-Littlewood inequality that if is large enough.

- (i) If be a finite collection of tiles with for all , show that (
Hint:By using (22) and the disjointness of the when is fixed, first establish the estimate whenever is a natural number and is an interval with and .)- (ii) If be a finite collection of tiles with for all , show that . (For a given tree , one can introduce the dyadic intervals as in Exercise 11, then perform a Calderón-Zygmund type decomposition to , splitting it into a “good” function bounded pointwise by , plus “bad functions” that are supported on the intervals and have mean zero. See this paper of Grafakos, Terwilleger, and myself for details.)
- (iii) For any finite collection of tiles for all
- (iv) Show that (51) holds for all , and conclude Theorem 2(iii).

Remark 21The methods of time-frequency analysis given here can handle several other operators that, like the Carleson operator, exhibit scaling, translation, and frequency modulation symmetries. One model example is the bilinear Hilbert transform for . The methods in this set of notes were used by Lacey and Thiele to establish the estimates for with (these estimates have since been strengthened and extended in a number of ways). We only give the briefest of sketches here. Much as how Carleson’s theorem can be reduced to a bound (19), the above estimates can be reduced to the estimation of a model sum where is a certain collection of triples of tiles with common spatial interval and frequency intervals varying along a certain one-parameter family for each fixed choice of spatial interval. One then uses a variant of Proposition 16 to partition into “-trees”, “-trees”, and “-trees”, the contribution of each of which can be controlled by the energies of on such trees, times the length of the spatial support of the tree, in analogy with Exercise 15. See for instance the text of Muscalu and Schlag for more discussion and further results.

Remark 22The concepts of mass and energy can be abstracted into a framework of spaces associated to outer measures (as opposed to the classical setup of spaces associated to countably additive measures), in which the mass and energy selection propositions can be viewed as consequences of an abstract Carleson embedding theorem, and the calculations establishing estimates such as (19) from such propositions and a tree estimate can be viewed as consequences of an “outer Hölder inequality”. See this paper of Do and Thiele for details.

## 104 comments

Comments feed for this article

3 June, 2020 at 4:50 pm

Alan ChangIt seems like there is currently a glitch with this blog post. Instead of seeing the class notes when I load this page, I see a copy of https://terrytao.wordpress.com/2009/03/07/infinite-fields-finite-fields-and-the-ax-grothendieck-theorem/

[Fixed, thanks – T.]4 June, 2020 at 1:51 pm

AnonymousLHS(49) = ?

My attempt:

[There was a misprint in the notes regarding a misplaced conjugation sign, which has now been fixed – T.]5 June, 2020 at 10:45 am

AnonymousFor the tree estimate, you mentioned that the LHS is like the controlling the maximal Calderon Zygmund operator and maximal Hilbert transform. Could you elaborate on this? It wasn’t clear from the lecture.

[Added more comments to this effect – T.]23 June, 2020 at 1:43 am

XuminDear Prof. Tao, thank you for your nice notes on a.e. convergence of Fourier series. Recently, when my collaborators and I work on the almost uniform convergence (the noncommutative analogue of a.e. convergence) of noncommutative Fourier series, we are surprised to find some new results on the classical a.e. convergence of Fourier series.

We know that the symbols associated to Fejér Fourier multipliers on are given by where . If we generalize this formula by setting where and is an arbitrary convex body in , then this kind of Fourier multipliers may give a kind of generalized Fejér Fourier series for functions in . We find that the Fourier series a.e. convergence to as for any with . Moreover, we find that for any sequence of positive (sending positive functions to positive functions) Fourier multipliers on with pointwisely, we can always find a subsequence such that a.e. convergence to for any with .

I would like to ask you if the above result is a new result or some mathematicians have already found some similar results before? Is such a result on a.e. convergence of Fourier series makes some new senses or not?

Thank you very much!

23 June, 2020 at 12:31 pm

Terence TaoWhen is smooth with nonvanishing curvature everywhere then the convolution kernel of these multipliers decays like and one can use standard convergence theorems for approximate identities (E.g. Stein-Weiss page 13). For more general convex bodies the decay is more anisotropic but there are still integrated bounds on these kernels (e.g., https://mathscinet.ams.org/mathscinet-getitem?mr=328471 ) which may give results of this form. The multipliers are also very similar to Bochner-Riesz means adapted to convex bodies, for which there is a certain amount of literature, e.g., https://mathscinet.ams.org/mathscinet-getitem?mr=1827499 .

14 July, 2020 at 8:41 pm

Big Ideas in Applied Math: Smoothness and Degree of Approximation – Ethan Epperly[…] sense, meaning that , where is the truncated Fourier series and is norm .2One may show with considerable analysis that Fourier series converges in others senses, for example almost everywhere convergence. We also […]

24 July, 2020 at 11:43 pm

tophythetoasterA one line solution to the blue-eyed islanders puzzle:

Noam Chomsky and Terence Tao walk into a bar.

-A. JK

3 August, 2020 at 8:05 pm

Pointwise ergodic theorems for non-conventional bilinear polynomial averages | What's new[…] on individual scales , and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of to eventually handle all […]