This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If is an absolutely integrable function, its Fourier coefficients are defined by the formula

If is smooth, then the Fourier coefficients are absolutely summable, and we have the Fourier inversion formula where the series here is uniformly convergent. In particular, if we define the partial summation operators then converges uniformly to when is smooth.What if is not smooth, but merely lies in an class for some ? The Fourier coefficients remain well-defined, as do the partial summation operators . The question of convergence in norm is relatively easy to settle:

Exercise 1

- (i) If and , show that converges in norm to . (
Hint:first use the boundedness of the Hilbert transform to show that is bounded in uniformly in .)- (ii) If or , show that there exists such that the sequence is unbounded in (so in particular it certainly does not converge in norm to . (
Hint:first show that is not bounded in uniformly in , then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)

Note from Hölder’s inequality that contains for all , so Carleson’s theorem covers the case of Hunt’s theorem. We remark that the precise threshold near between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function or the summation method by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a *frequency modulation symmetry* in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

** — 1. Equivalent forms of almost everywhere convergence of Fourier series — **

A standard technique to prove almost everywhere convergence results is by first establishing a weak-type estimate of an associated maximal function. For instance, the Lebesgue differentiation theorem is usually established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse this implication in certain cases by exploiting a symmetry of the problem. Here is the principle specialised to the application of pointwise convergence of Fourier series, and also combined with a transference principle of Kenig and Tomas:

Proposition 3 (Equivalent forms of almost everywhere convergence)Let . Then the following statements are equivalent:

- (i) For every , one has for almost every .
- (ii) There does not exist such that for almost every .
- (iii) One has the maximal inequality for all smooth , where the weak norm is defined as and denotes the Lebesgue measure of a set (which in this setting is a subset of the unit circle).
- (iv) One has the maximal inequality for all smooth , where denotes the partial Fourier series
- (v) One has the maximal inequality for all , where denotes the Fourier multiplier operator

Among other things, this proposition equates the qualitative property (i) of almost everywhere convergence to the quantitative property (iii) of a maximal inequality. This equivalence (first observed by Calderón) is similar in spirit to the uniform boundedness principle (see e.g. Corollary 1 of this previous blog post). The restriction is needed for just one implication (from (ii) to (iii)) in the arguments below, and arises due to the use of Khintchine’s inequality at one point. The equivalence of (iv) and (v) is part of a more general principle of *transference* that allows one to pass back and forth between periodic domains such as with non-periodic domains such as (or, on the Fourier side, between discrete domains and continuous domains ) if the estimates in question enjoy suitable scaling symmetries. We will use the formulation (v), as it enjoys the most symmetries.

*Proof:* We first show that (iii) implies (i). If (1) holds for all smooth , then certainly for all finite one has

Clearly (i) implies (ii). Now we assume that (iii) fails and use this to show that (ii) fails as well. From the failure of (iii) and monotone convergence, for any one can find , a measurable subset of , a finite , and such that

and such that In particular, has positive measure. By homogeneity we may normalise . At this stage, nothing prevents the measure of from being much smaller than ; but we can exploit translation invariance to increase the measure of to be comparable to as follows. Let be the integer part of . We claim that there exist translations of whose union has measure comparable to : This is easiest to establish by the probabilistic method (which in this context we might call the*random translation*method). If we select uniformly and independently at random we see that every point will lie in a given translate (or equivalently, that lies in ) with probability , hence Integrating in and using the Fubini-Tonelli theorem, we conclude that and hence there exists deterministic choices of for which By definition of the RHS is comparable to , giving the claim (clearly the left-hand side cannot exceed ).

Now consider the randomised linear combination

of translates of , where are random Bernoulli signs. From Khintchine’s inequality and the hypothesis we have hence by construction of and (4)Now we study the behaviour of when . Since is a convolution operator, it commutes with translations, and hence

for each . On the other hand, from (3) we have and hence there exists such that In particular, the square function is at least . Meanwhile, from Khintchine’s inequality and (7) we have for all . Applying the Paley-Zygmund inequality (setting , for instance) we conclude that (for suitable choices of implied constants), so in particular Integrating in using (5), and applying the Fubini-Tonelli theorem, we conclude that hence by (6) one has In particular, there exists a deterministic choice of signs (and hence of ) for which On the other hand, the left-hand side is at most . We conclude that for every , we can find a smooth function with and a finite , as well as a set of measure , such that for all .Applying this fact iteratively (each time choosing to be sufficiently large depending on all previous choices), we can construct a sequence of smooth functions , finite , and sets for such that

- (a) for all .
- (b) for all .
- (c) One has for all and (note that the right-hand side is finite since the are smooth for ).
- (d) for all (note that the left-hand side is bounded by ).

Now we assume (iv) and work to establish (v). The idea here is to use a rescaling argument, viewing as the limit as of the large circle (in physical space) or the fine lattice (in frequency space).

By limiting arguments we may assume that is compactly supported on some interval . Let be a large scaling parameter, and consider the periodic function defined by

For large enough, this function is smooth and supported on the interval , with norm The Fourier coefficients of is given as so that Applying (iv), we see that for any , we have Rescaling by , we conclude that We can let range over the reals rather than the integers as this does not affect the constraint . Rescaling by , we see that for any compact intervals , we have By uniform Riemann integrability and the rapid decrease of uniformly for , . We conclude that By monotone convergence we may replace with , and we then obtain (v).Finally, we assume (v) and establish (iv). By a limiting argument it suffices to establish (iv) for trigonometric polynomials , that is to say periodic functions whose Fourier coefficients are supported in for some natural number . Let be a non-zero Schwartz function with supported in , and for a given scaling parameter let denote the Schwartz function

For sufficiently large one easily checks that The Fourier transform of can be calculated as hence (for large enough) and thus From (v) we conclude that for any we have For large enough, the left-hand side is for some depending on . Dividing by and replacing by , we obtain the claim (iv).

Exercise 4For , let denote the Fejér summation operators

- (i) For any , establish the pointwise bound where is the Hardy-Littlewood maximal function
- (ii) Show that for , one has for almost all .

Exercise 5 (Pointwise convergence of Fourier integrals)Let be such that the conclusion of Theorem 3(v) holds. Show that for any , one has for almost all , where is defined for Schwartz functions by the formula and then extended to by density.

Exercise 6Let . Suppose that is such that one has the restriction estimate for all Schwartz functions , where denotes the surface measure on the sphere . Conclude that for all Schwartz functions . (This observation is due to Bourgain.) In particular, by Marcinkiewicz interpolation, implies for all . (Hint:adapt some parts of the argument used to get from (iii) to (i) in Proposition 3, using rotation invariance as a substitute for translation invariance. (But the translational symmetry of the restriction problem – more precisely, the ability to translate a function in physical space without changing the absolute value of its Fourier transform – will also be useful.))

We are now ready to establish Kolmogorov’s theorem (Theorem 2(i)); our arguments are loosely based on the original construction of Kolmogorov (though he was not in possession at the time of the Stein maximal principle). In view of the equivalence between (ii) and (v) in Theorem 3, it suffices to show that the maximal operator

fails to be of weak-type on Schwartz functions. Recalling that the Hilbert transform is also a Fourier multiplier operator some routine calculations then show that for any Schwartz function . By the triangle inequality, it then suffices to show that the maximal operator fails to be of weak type on Schwartz functions.To motivate the construction, note from a naive application of the triangle inequality that

If the function was absolutely integrable, then by Young’s inequality we would conclude that the maximal operator was strong type , and hence also weak type . Thus any counterexample must somehow exploit the logarithmic divergence of the integral of . However, there are two potential sources of cancellation that could ameliorate this divergence: the sign of the Hilbert kernel , and the phase . But because of the supremum in , we can select the frequency parameter as we please, as long as it depends only on and not on . The idea is then to choose (and the support of ) to remove both sources of cancellation as much as possible.We turn to the details. Let be a large natural number, and then select widely separated frequency scales

In order to assist with removing cancellation in the phases later, we will require these scales to be integers. The precise choice of scales is not too important as long as they are widely separated and integer valued, but for sake of concreteness one could for instance set . Let be a bump function of total mass supported on , and let be the Schwartz function thus is an approximation (in a weak sense) to the sum of Dirac masses , with the frequency scale of the approximation to increasing rapidly in . We easily compute the norm of :Now we estimate for in the interval for some natural number ; note the set of all such has measure . In this range we will test the maximal operator at the frequency cutoff :

As is supported in , we see (for large enough) that avoids the support of and we can replace the principal value integral with the ordinary integral. Substituting (9), we conclude that As is an integer, the phase is equal to . We also cancel out the phase as being independent of , thus For , we exploit the oscillatory nature of the phase through an integration by parts, leading to the bound (one could even gain a factor of here if desired, but we will not need it). Summing, we have For , we instead exploit the near-constant nature of the phase by writing and similarly to conclude that Summing and combining with (11), we conclude (from the rapidly increasing nature of the ) that and thus (for large) Comparing this with (10) we contradict the conclusion of Theorem 3(iv), giving the claim.

Remark 7In 1926, Kolmogorov refined his construction to obtain a function whose Fourier sums diverged everywhere (not just almost everywhere).

Exercise 8 (Radamacher-Menshov theorem)

- (i) Let be some square-integrable functions on a probability space , with a power of two. By performing a suitable Whitney type decomposition (similar to that used in Section 3 of Notes 1), establish the pointwise bound where for each , ranges over dyadic intervals of the form with . If furthermore the are orthogonal to each other, establish the maximal inequality
- (ii) If is a trigonometric polynomial with at most non-zero coefficients for some , use part (i) to establish the bound
- (iii) If lies in the Sobolev space for some , use (ii) to show that for almost every .

** — 2. Carleson’s theorem — **

We now begin the proof of Carleson’s theorem (Theorem 2(ii)), loosely following the arguments of Lacey and Thiele (we briefly comment on other approaches at the end of these notes). In view of Proposition 3, it suffices to establish the weak-type bound

for Schwartz functions . Because of the supremum, the expression depends sublinearly on rather than linearly; however there is a trick to reduce matters to considering linear estimates. By selecting, for each , to be a frequency which attains (or nearly attains) the supremal value of , it suffices to establish the linearised estimate uniformly for all measurable functions , where is the operator One can think of this operator as the (Kohn-Nirenberg) quantisation of the rough symbol . Unfortunately this symbol is far too rough for us to be able to use pseudodifferential operator tools from the previous set of notes. Nevertheless, the “time-frequency analysis” mindset of trying to efficiently decompose phase space into rectangles consistent with the uncertainty principle will remain very useful.The next step is to dualise the weak norm to linearise the dependence on even further:

Exercise 9Let , let be a -finite measure space, let be a measurable function, and let . Show that the following claims are equivalent (up to changes in the implied constants in the asymptotic notation):

- (i) One has .
- (ii) For every subset of of finite measure, the function is absolutely integrable on , and

In view of this exercise, we see that it suffices to obtain the bound

for all Schwartz , all sets of finite measure, and all measurable functions . Actually only the restriction of to is relevant here, so one can view as a function just on if desired. The operator can be viewed as the quantisation of the (very rough) symbol , that is to say the indicator function of the region lying underneath the graph of :
A notable feature of the estimate (12) is that it enjoys *three* different symmetries (or near-symmetries), each of which is “non-compact” in the sense that it is parameterised by a parameter taking values in a non-compact space such as or :

- (i) (Translation symmetry) For any spatial shift , both sides of (12) remain unchanged if we replace by , the set by the translate , and the function by .
- (ii) (Dilation symmetry) For any scaling factor , both sides of (12) become multiplied by the same scaling factor if we replace by , by the dilate , and the function by .
- (iii) (Modulation symmetry) For any frequency shift , both sides of (12) remain (almost) unchanged if we replace by , do not modify the set , and replace the function by . (Technically the left-hand side changes because of an additional factor of , but this factor can be handled for instance by generalising the indicator function cutoff to a subindicator function cutoff that has the pointwise bound ; we will ignore this very minor issue here.)

Each of these symmetries corresponds to a different symmetry of phase space , namely spatial translation , dilation , and frequency translation respectively. As a general rule of thumb, if one wants to prove a delicate estimate such as (12) that is invariant with respect to one or more non-compact symmetries, then one should use tools that are similarly invariant (or approximately invariant) with respect to these symmetries. Thus for instance Littlewood-Paley theory or Calderón-Zygmund theory would not be suitable tools to use here, as they are only invariant with respect to translation and dilation symmetry but absolutely fail to have any modulation symmetry properties (these theories prescribe a privileged role to the frequency origin, or equivalently they isolate functions of mean zero as playing a particularly important role).

Besides the need to respect the symmetries of the problem, one of the main difficulties in establishing (12) is that the expression , couples together the function with the function in a rather complicated way (via the frequency variable ). We would like to try to decouple this interaction by making and instead interact with simpler objects (such as “wave packets”), rather than being coupled directly to each other. To motivate the decomposition to use, we begin with a heuristic discussion. The first main idea is to temporarily work in the (non-invertible) coordinate system of phase space rather than in order to simplify the constraint to the simple geometric region of a half-plane (this coordinate system is of course a terrible choice for most of the other parts of the argument, but is the right system to use for the frequency decompositions we will now employ). In analogy to the Whitney type decompositions used in Notes 1, one can split

for almost all choices of and (at least if have the same sign), where range over pairs of dyadic intervals that are “close” in the sense that and that and are not adjacent, but their parents are adjacent, and with to the left of . (Here it is convenient to work with half-open dyadic intervals , to avoid issues with overlap.) If one ignores the caveats and blindly substitutes in the decomposition (13), the expression in the left of (12) becomes To decouple further, we will try to decompose into “rank one” operators. More precisely, we manipulate where we use the notation . It will be convenient to try to discretise this integral average. From the uncertainty principle, modifying by should only modify approximately by a phase, so the integral here is roughly constant at spatial scales . So we heuristically have If we now define a*tile*to be a rectangle in phase space of the form where are dyadic intervals and with unit area , we see that every in the above sum is associated to a tile . The interval is then similarly assocated to a nearby tile , and we write to indicate the relationship between the two tiles (they share the same spatial interval , but lies just above ). We can then approximately write the left-hand side of (12) as where is an -normalised “wave packet” that is roughly localised to in phase space. This approximate form of (12) has achieved the goal of decoupling the function from the data , as they both now interact with the tile pair rather than through each other. Note also that the set of tiles obeys an approximate version of the three symmetries that (12) does. Firstly, the set of tiles is invariant under dilations if is a power of two; secondly, once one fixes the scales of the tiles, the remaining set of tiles is invariant under spatial translations by integer multiples of the spatial scale , and under frequency translations by integer multiples of . (We will need the discrete and nested nature of the tiles for some subsequent combinatorial arguments, and it turns out to be worthwhile to accept a slightly degraded form of the three basic symmetries of the problem in return for such a discretisation.)

We now make the above heuristic decomposition rigorous. For any dyadic interval , let denote the left child interval, and the right child interval. We fix a bump function supported on normalised to have norm ; henceforth we permit all implied constants in the asymptotic notation to depend on . For each interval let denote the rescaled function

noting that this is a bump function supported on . We will establish the estimate where ranges over all dyadic intervals. We assume (15) for now and see why it implies (12). The left-hand side of (15) is not quite dilation or frequency modulation invariant, but we can fix this by an averaging argument as follows. Applying the modulation invariance, we see for any that since we thus have We temporarily truncate to a finite range of scales, and use the triangle inequality, to obtain for any finite . For fixed , the expression is periodic in with period , with average equal to which we can rewrite as which one can rewrite further (using the change of variables ) as whereHence if we average over all in (say) , we conclude that

and hence on sending to infinity Using dilation symmetry, we also see that for any . Averaging this for with Haar measure , we conclude that But as is a bump function supported in , one has The quantity is a non-zero constant, hence which is (12).It remains to prove (15). As in the heuristic discussion, we approximately decompose the convolution into a sum over tiles. We have

Motivated by this, we define as before a*tile*to be a rectangle with dyadic intervals with ; we also split each such tile into an upper half and a lower half . We refer to as the

*spatial scale*of the tile, and the reciprocal as the

*frequency scale*. For each tile define the wave packet which is a Schwartz function with Fourier support in (in fact it is supported in ) that is normalised to have norm and is localised spatially near , so morally it has “phase space support in “. We will later establish the estimate for all and sets of finite measure (cf. (14)), where ranges over the set of all tiles. For now, we show why this estimate implies (15) and hence (12). Just as (12) was obtained from (15) by averaging over dilation and frequency modulations, we shall recover (15) from (17) by averaging over spatial translations. As before, we first temporarily restrict the size range of and use the triangle inequality to obtain Applying translation symmetry, we conclude that for any . The left-hand side may be rewritten as where we extend the definition of to translated tiles in the obvious fashion. The expression inside the absolute values is periodic in with period , and averages to which by (16) simplifies to and so on averaging in and then sending to infinity we recover (15).

It remains to establish (17). It is convenient to introduce the sets

so that the target estimate (17) simplifies slightly to As advertised, we have now decoupled the influences of and the influences of (which determine the sets ), as these quantities now only directly interact with the wave packets , rather than with each other. Moreover, in some sense only interacts with the lower half of the tile (as this is where is concentrated), while and only interact with the upper half of the tile.One advantage of this “model” formulation of the problem is that one can naturally build up to the full problem by trying to establish estimates of the form

where is some smaller set of tiles. For instance, if we can prove (19) for all finite collections of tiles, then by monotone convergence we recover the required estimate.The key problem here is that tiles have three degrees of freedom: scale, spatial location, and frequency location, corresponding to the three symmetries of dilation, spatial translation, and frequency modulation of the original estimate (12). But one can warm up by looking at families of tiles that only exhibit two or fewer degrees of freedom, in a way that slowly builds up the various techniques we will need to apply to establish the general case:

**The case of a single tile** We begin with the simplest case of a single tile (so that there are zero degrees of freedom):

**The case of separated tiles of fixed scale** Now we let be a collection of tiles all of a fixed spatial scale (so that (so that we have the two parameters of spatial and frequency location, but not the scale parameter). Among other things, this makes the tiles in essentially disjoint (i.e., disjoint ignoring sets of measure zero). This disjointness manifests itself in two useful ways. Firstly, we claim that we can improve the trivial bound

Now let us see why (24) is true. To motivate the argument, suppose that had no tail outside of , so that one could replace to in (22). Then would have

and as the tiles are all essentially disjoint the claim (24) would then follow from summing in , since each contributes to at most one of the sets . Now we have to deal with the contribution of the tails. We can bound For each , there is at most one dyadic interval of the fixed length such that . Thus in the above sum is fixed, and only can vary; from (22) we then see that , giving (24).Now we prove (25). The intuition here is that the essential disjointness of the tiles make the approximately orthogonal, so that (25) should be a variant of Bessel’s inequality. We exploit this approximate orthogonality by a method, which we perform here explicitly. By duality we have

for some coefficients with , so by Cauchy-Schwarz it suffices to show that The left-hand side expands as From the Fourier support of we see that the inner product vanishes unless the intervals overlap which by the equal sizes of force . In this case we can use (22) to bound the inner product by and then a routine application of Schur’s test gives (26). This establishes (25), giving (19) in the case of tiles of equal dimensions.
**The case of a regular -tree**

Now we attack some cases where the tiles can vary in scale. In phase space, a key geometric difficulty now arises from the fact that tiles may start partially overlapping each other, in contrast to the previous case in which the essential disjointness of the tile set was crucial in establishing the key estimates (24), (25). However, because we took care to restrict the intervals of the tiles to be dyadic, there are only a limited number of ways in which two tiles can overlap. Given two rectangles and , we define the relation if and ; this is clearly a partial order on rectangles. The key observation is as follows: if two tiles overlap, then either or . Similarly if are replaced by their upper tiles or by their lower tiles . Note that if are tiles with , then one of or holds (and the only way both inequalities can hold simultaneously is if ).

As was first observed by Fefferman, a key configuration of tiles that needs to be understood for these sorts of problems is that of a *tree*.

Definition 10Let be a tile. Atree with topis a collection of tiles with the property that for all . (For minor technical reasons it is convenient to not require the top to actually lie in the tree , though this is often the case.) We write for the spatial support of the tree, and for the frequency support of the tree top. If we in fact have for all , we say that is a -tree; similarly if for all , we say that is a -tree. (Thus every tree can be partitioned into a -tree and a -tree with the same top as the original tree.)

The tiles in a tree can vary in scale and in spatial location, but once these two parameters are given, the frequency location is fixed, so a tree can again be viewed as a “two-parameter” subfamily of the three-parameter family of tiles.

We now prove (19) in the case when is a -tree , thus for all . Here, the factors will all “collide” with each other and there will be no orthogonality to exploit here; on the other hand, there will be a lot of “disjointness” in the that can be exploited instead.

To illustrate the key ideas (and to help motivate the arguments for the general case) we will also make the following “regularity” hypotheses: there exists two quantities (which we will refer to as the *energy* and *mass* of the tree respectively) for which we have the upper bounds

We also assume that we have the reverse bounds for the tree top:

and It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either or .
We will use (27), (28), (29) to establish the *tree estimate*

Note from (30) and Cauchy-Schwarz that

and from (31) and Cauchy-Schwarz one similarly has and so (32) recovers the desired estimate (19).It remains to establish the tree estimate (32). It will be convenient to use the tree to partition the real line into dyadic intervals that are naturally “adapted to” the geometry of the tree (or more precisely to the spatial intervals of the tree) in a certain way (in a manner reminiscent of a Whitney decomposition).

Exercise 11 (Whitney-type decomposition associated to a tree)Let be a non-empty tree. Show that there exists a family of dyadic intervals with the following properties:(Hint: one can choose to be the collection of all dyadic intervals whose dilate does not contain any , and which is maximal with respect to set inclusion.)

- (i) The intervals in form a partition of (up to sets of measure zero).
- (ii) For each and any with , we have .
- (iii) For each , there exists with and .

We can of course assume that the tree is non-empty, since (32) is trivial for empty sets of tiles. We apply the partition from Exercise 11. By the triangle inequality, we can bound the left hand side of (32) by

which by (27), (22) may be bounded by We first dispose of the narrow tiles in which . By Exercise 11(ii) this forces . From (28) we have (say). For each fixed spatial scale , the intervals in the tree are all essentially disjoint, so a routine calculation then shows (say), so that which from Exercise 11(ii) implies that the contribution of the case to (32) is acceptable.Now we consider the wide tiles in which . From Exercise 11(ii) this case is only possible if and . Thus the are now restricted to an interval of length , and it will suffice to establish the local estimate

for each . Note that for each fixed spatial scale , there is at most one choice of frequency interval with and , thus for fixed the set is independent of . We may then sum in for each such scale to conclude Now we make the crucial observation that in a -tree , the intervals are all essentially disjoint, hence the are disjoint as well. As these sets are also contained in , we conclude that From Exercise 11(iii) and (29) (choosing a tile with spatial scale and within of , and with for the tile provided by Exercise 11(iii)) we have giving the claim.
**The case of a regular -tree**

We now complement the previous case by establishing (19) for (certain types of) -trees . The situation is now reversed: there is a lot of “collision” in the , but on the other hand there is now some “orthogonality” in the that can be exploited.

As before we will assume some regularity on the -tree , namely that there exist for which one has the upper bounds

for all (note this is slightly stronger than (27)), as well as the bound (29) for any tile with for some . We complement this with the matching lower bounds and (31).As before we will focus on establishing the tree estimate (32). From (31) and Cauchy-Schwarz as before we have

As we now have a -tree, the tiles become disjoint (up to null sets), and we can obtain an almost orthogonality estimate:

Exercise 12 (Almost orthogonality)For any -tree , show that for all complex numbers , and use this to deduce the Bessel-type inequality

From this exercise and (34) we see that

and so the desired bound (19) will follow from the tree estimate (32).In this case it will be convenient to linearise the sum to remove the absolute value signs; more precisely, to show (32) it suffices to show that

for any complex numbers of magnitude . Again we may assume that the tree is non-empty, and use the partition from Exercise 11, to split the left-hand side as The contribution of the narrow tiles can be disposed of as before without any additional difficulty, so we focus on estimating the contribution of the wide tiles. As before, in order for this sum to be non-empty has to be contained in an neighbourhood of .The main difficulty here is the dependence of on . We rewrite

so that the above expression can be written as Now for a key geometric observation: the intervals are nested (and decrease when increases), so the condition is equivalent to a condition of the form for some scale depending on . Thus the above sum can be written as One can bound the integrand here by a “maximal Calderón-Zygmund operator” which is basically a sup over truncations of the “(modulated) pseudodifferential operator” The point of this formulation is that the integrand can now be expressed as a sort of “Littlewood-Paley projection” of the function to the region of frequency space corresponding to those intervals with :

Exercise 13Establish the pointwise estimate for all where ranges over all intervals (not necessarily dyadic) containing .

From (29) and Exercise 11(iii) as before we have

and so we can bound the expression (35) by which one can bound in terms of the Hardy-Littlewood maximal function of , followed by Cauchy-Schwarz and the Hardy-Littlewood inequality, and finally Exercise 12, as On the other hand, from (33) we have for every . By grouping the tiles in according to their maximal elements (which necessarily have essentially disjoint spatial intervals) and applying the above inequality to each such group and summing, we conclude that and the tree estimate (32) follows.
**The general case**

We are now ready to handle the general case of an arbitrary finite collection of tiles. Motivated by the previous discussion, we define two quantities:

Definition 14 (Energy and mass)For any non-empty finite collection of tiles, we define theenergyto be the quantity where ranges over all -trees in , and themassto be the quantity where is the set (thus for instance ). By convention, we declare the empty set of tiles to have energy and mass equal to zero.

Note here that the definition of mass has been modified slightly from previous arguments, in that we now use instead of . However, this turns out to be an acceptable modification, in the sense that we still continue to have the analogue of (32):

Exercise 15 (Tree estimate)If is a tree, show that

Since has an norm of , we also have the trivial bound

for any finite collection of tiles .The strategy is now to try to partition an arbitrary family of tiles into collections of disjoint trees (or “forests”, if you will) whose energy , mass , and spatial scale are all under control, apply Exercise 15 to each tree, and sum. To do this we rely on two key selection results, which are vaguely reminiscent of the Calderón-Zygmund decomposition:

Proposition 16 (Energy selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

Proposition 17 (Mass selection)Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

(In these propositions, “disjoint” means that any given tile belongs to at most one of the trees in ; but the tiles in one tree are allowed to overlap the tiles in another tree.)

Let us assume these two propositions for now and see how these (together with Exercise 15) establishes the required estimate (19) for an arbitrary collection of tiles. We may assume without loss of generality that and are non-zero. Rearranging the above two propositions slightly, we see that if is a finite collection of tiles such that

for some integer then after applying Proposition 16 followed by Proposition 17, we can partition into a disjoint collection of trees with together with a remainder with Note that any finite collection of tiles will obey (38) for some sufficiently large and negative . Starting with this and then iterating indefinitely, and discarding any empty families, we can therefore partition any finite collection of tiles as where are collections of trees (empty for all but finitely many ) such that and (39) holds, and is a residual collection of tiles with We can then bound the left-hand side of (19) by From Exercise 15 applied to individual tiles and (41) we see that the second term in this expression vanishes. For the first term, we use Exercise 15, (40), (36) to bound this sum by which by (39) is bounded by which sums to as required.It remains to establish the energy and mass selection lemmas. We begin with the mass selection claim, Proposition 17. Let denote the set of all tiles with for some and such that

Let denote the set of tiles in that are maximal with respect to the tile partial order. (Note that the left-hand side of (42) is bounded by , so there is an upper bound to the spatial scales of the tiles involved here.) Then every tile in is either less than or equal to a tile in , or is such that for all . Thus if we let be the collection of tiles of the second form, and let be the collection of trees with tree top associated to each (selected greedily, and in arbitrary order, subject of course to the requirement that no tile belongs to more than one tree), we obtain the required partition with and it remains to establish the bound This will be a (rather heavily disguised) variant of the Hardy-Littlewood maximal inequality. By construction, the tree tops are essentially disjoint, and one has for all such tree tops. To motivate the argument, suppose for sake of discussion that we had the stronger estimate By the essential disjointness of the , the sets are also essentially disjoint subsets of , hence and the claim (43) would then follow. Now we do not quite have (44); but from the pigeonhole principle we see that for each there is a natural number such that (say), where denotes the interval with the same center as but times the length (this is not quite a dyadic interval). We now restrict attention to those associated to a fixed choice of . Let denote the corresponding dilated tiles, then we have for each with .Unfortunately, the are no longer disjoint. However, by the greedy algorithm (repeatedly choosing maximal tiles (in the tile ordering)), we can find a collection such that

- (i) All the dilated tree tops are essentially disjoint.
- (ii) For every with , there is such that intersects and .

From property (i) and (45) we have

On the other hand, from property (ii) we see that the sum of all the for all with associated to a single is . Putting the two statements together we see that and on summing in we obtain the required claim (43).Finally, we prove the energy selection claim, Proposition 16. The basic idea is to extract all the high-energy trees from in such a way that the -tree component of those trees are sufficiently “disjoint” from each other that a useful Bessel inequality, generalising Exercise 12, may be deployed. Implementing this strategy correctly turns out however to be slightly delicate. We perform the following iterative algorithm to generate a partition

as well as a companion collection of -trees as follows.

- Step 1. Initialise and .
- Step 2. If then STOP. Otherwise, go on to Step 3.
- Step 3. Since we now have , contains a -tree for which
Among all such , choose one for which the midpoint of the frequency is
*minimal*. (The reason for this rather strange choice will be made clearer shortly.) - Step 4. Add to , add the larger tree (with the same top as ) to , then remove from . We also remove the adjacent trees and from and also place them into . Now return to Step 2.

This procedure terminates in finite time to give a partition (46) with , and with the trees coming in triplets all associated to a -tree in with the same spatial scale as , with all the -trees disjoint and obeying the estimates

(both the upper and lower bounds will be important for this argument). It will then suffice to show that by (48), it then suffices to show the Bessel type inequalityNow we make a crucial observation: not only are the trees in disjoint (in the sense that no tile belongs to two of these trees), but the lower tiles are also essentially disjoint. Indeed we claim an even stronger disjointness property: if , are such that , then is not only disjoint from the larger dyadic interval , but is in fact disjoint from the even larger interval . To see this, suppose for contradiction that and . There are three possibilities to rule out:

- is equal to . This can be ruled out because any two lower frequency intervals associated to a -tree are either equal or disjoint.
- was selected after was. To rule this out, observe that contains the parent of , and hence , , or . Thus, when was selected, should have been placed with one of the three trees associated to and would therefore not have been available for inclusion into , a contradiction.
- was selected before was. If this case held, then the midpoint of would have to be greater than or equal to that of , otherwise would not have a minimal midpoint at the time of its selection. But is contained in , which is contained in , which lies below , which contains , which contains the midpoint of ; thus the midpoint of lies strictly below that of , a contradiction.

If the were perfectly orthogonal to each other, this disjointness would be more than enough to establish (49). Unfortunately we only have imperfect orthogonality, and we have to work a little harder. As usual, we turn to a type argument. We can write the left-hand side of (49) as

so by Cauchy-Schwarz it suffices to show that By the triangle inequality, the left-hand side may be bounded by As has Fourier support in , we see that vanishes unless and overlap. By symmetry it suffices to consider the cases and .First let us consider the contribution of . Using Young’s inequality and symmetry, we may bound this contribution by

A direct calculation using (22) reveals that so the contribution of this case is at most as desired.Now we deal with the case when , which by the preceding discussion implies that and lies outside of . Here we use (37) to bound

andand then we can bound this contribution by

Direct calculation using (22) reveals that (say), and also so we obtain a bound of which is acceptable by (48). This finally finishes the proof of Proposition 16, which in turn completes the proof of Carleson’s theorem.

Remark 18The Lacey-Thiele proof of Carleson’s theorem given above relies on a decomposition of a tileset in a way that controls both energy and mass. The original proof of Carleson dispenses with mass (or with the function ), and focuses on controlling maximal operators that (in our notation) are basically of the form To control such functions, one iterates a decomposition similar to Proposition 16 to partition into trees with good energy control, and establishes pointwise control of the contribution of each tree outside of an exceptional set. See Section 4 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis. The proof of Fefferman takes the opposite tack, dispensing with energy and focusing on bounding the operator norm of the linearised operator Roughly speaking, the strategy is to iterate a version of Proposition 16 for partition into “forests” of disjoint trees, though in Fefferman’s argument some additional work is invested into obtaining even better disjointness properties on these forests than is given here. See Section 5 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis.

A modification of the above arguments used to establish the weak estimate can also establish restricted weak-type estimates for any :

Exercise 19For any sets of finite measure, and any measurable function , show that for any . (Hint:repeat the previous analysis with , but supplement it with an additional energy bound coming from a suitably localised version of Exercise 12.)

The bound (51) is also true for , yielding Hunt’s theorem, but this requires some additional arguments of Calderón-Zygmund type, involving the removal of an exceptional set defined using the Hardy-Littlewood maximal function:

Exercise 20 (Hunt’s theorem)Let be of finite non-zero measure, and let be a measurable function. Let be the exceptional set for a large absolute constant ; note from the Hardy-Littlewood inequality that if is large enough.

- (i) If be a finite collection of tiles with for all , show that (
Hint:By using (22) and the disjointness of the when is fixed, first establish the estimate whenever is a natural number and is an interval with and .)- (ii) If be a finite collection of tiles with for all , show that . (For a given tree , one can introduce the dyadic intervals as in Exercise 11, then perform a Calderón-Zygmund type decomposition to , splitting it into a “good” function bounded pointwise by , plus “bad functions” that are supported on the intervals and have mean zero. See this paper of Grafakos, Terwilleger, and myself for details.)
- (iii) For any finite collection of tiles for all
- (iv) Show that (51) holds for all , and conclude Theorem 2(iii).

Remark 21The methods of time-frequency analysis given here can handle several other operators that, like the Carleson operator, exhibit scaling, translation, and frequency modulation symmetries. One model example is the bilinear Hilbert transform for . The methods in this set of notes were used by Lacey and Thiele to establish the estimates for with (these estimates have since been strengthened and extended in a number of ways). We only give the briefest of sketches here. Much as how Carleson’s theorem can be reduced to a bound (19), the above estimates can be reduced to the estimation of a model sum where is a certain collection of triples of tiles with common spatial interval and frequency intervals varying along a certain one-parameter family for each fixed choice of spatial interval. One then uses a variant of Proposition 16 to partition into “-trees”, “-trees”, and “-trees”, the contribution of each of which can be controlled by the energies of on such trees, times the length of the spatial support of the tree, in analogy with Exercise 15. See for instance the text of Muscalu and Schlag for more discussion and further results.

Remark 22The concepts of mass and energy can be abstracted into a framework of spaces associated to outer measures (as opposed to the classical setup of spaces associated to countably additive measures), in which the mass and energy selection propositions can be viewed as consequences of an abstract Carleson embedding theorem, and the calculations establishing estimates such as (19) from such propositions and a tree estimate can be viewed as consequences of an “outer Hölder inequality”. See this paper of Do and Thiele for details.

## 97 comments

Comments feed for this article

14 May, 2020 at 8:55 am

AnonymousIt is known that for each continuous periodic function, its Fourier series is

(everywhere) Cesaro summable to the function.

Is the Fourier series of any integrable function Cesaro summable almost everywhere to the function?

[Yes; see Exercise 4. -T]14 May, 2020 at 11:35 am

Johan AspegrenIs the Hunt`s theorem really valid for the essentially bounded functions even though in exercise 1 one is asked to prove that there isn`t even convergence in the norm?

14 May, 2020 at 2:58 pm

Terence TaoYes, because for any finite ; thus for instance essentially bounded functions are already square-integrable, so Hunt’s theorem in this case follows from Carleson’s theorem. The sums in this case need not stay uniformly bounded in , but would stay bounded in any finite space and converge in norm as well as pointwise almost everywhere.

For the Stein maximal principle applies and ensures that pointwise almost everywhere convergence implies weak (p,p) type of the maximal function, but there is no such principle for , and in that regime it can happen that pointwise almost everywhere convergence holds even in the absence of other results that are typically considered weaker, such as norm convergence.

15 May, 2020 at 6:00 am

riderrijuHello respected professor. Can you give an article based on atiyah singner index theorem.

15 May, 2020 at 7:36 am

James SmithYears ago I thought I’d write a book on the pointwise convergence of Fourier series, inspired by Carelson’s proof and Dirichlet’s 1823 proof, which appeared in a very early edition of Crelle. I got as far as translating the latter and putting an outline together. I just had a rummage around and I still have it!

The authors of the papers I tracked down are impressive. Looking at my outline, aside from the two aforementioned, I see du Bois-Reymond (1873), Féjér (1889), Kolmogorov (1922), Hunt (1968/78), Stein (1961), Zygmund (1969), etc. The dates are those of the papers I thought were landmarks in the history. I’ve spared the reader the titles. I also stumbled across Lacey and Thiele, they were to make it into the last section of the book…

Unsurprisingly I never got around to finishing this task!

Here is a picture of the outline, I’m sorry about the poor quality:

15 May, 2020 at 9:08 am

dn1214Thank you for these notes, it is really interesting. During my reading of the notes I did not find any solution to Exercise 6. Can you provide me with a hint? I tried to prove the contraposition but it did not work.

[Adapt the argument used to derive the failure of Proposition 3(ii) from the failure of Proposition 3(iii). The key point is that random rotations can be used to turn a small subset of the sphere into a large one. -T.]15 May, 2020 at 10:14 am

RexSome of the latex in part (v) of the statement of proposition 3 is not parsing for me.

[Corrected, thanks – T.]15 May, 2020 at 11:00 am

RexIn the same statement, it might be helpful to add a parenthesis in the definition of the Fourier multiplier operator. Otherwise it is not visibly clear that the widehat on the left includes the f under it. That is, \widehat{(1_{D \leq N} f)}(\xi). This confused me for a moment.

[Corrected, thanks – T.]15 May, 2020 at 4:18 pm

RexAlso, the multiplier is notated differently in the centered formula, for some reason.

15 May, 2020 at 1:12 pm

dn1214I have another question, related to Exercise 8 (i), is it correct that if I take then the required bound is ? (I understood that and that thus there is no sums o). Then if I am not mistaken, it is not possible to bound the term . Is it correct?

15 May, 2020 at 4:46 pm

Terence TaoThis has now been corrected. For , the required bound is .

16 May, 2020 at 6:28 am

AntoineI started reading your exposition of the proof of Carleson’s theorem and I already have several questions related to this theorem.

(i) You say that Littlewood-Paley analysis or Calderon-Zygmund attempts will not work because these are tools which do not respect all the symmetries of the estimate. Is there however a weaker form of the Carleson theorem that is reachable by such means? In other words, what can do the classical tools (Calderon-Zygmund, Littlewood-Paley, Pseudodifferential operators, etc.) on this problem ? For example in your preceding set of notes, a “naive” Littlewood-Paley attempt for proving the Calderon-Vaillancourt allows one to prove a much weaker statement (namely the boundedness of the frequency localised operator, eq. (17) of your Notes 3).

(ii) In your roof of Carleson’s theorem, you bound the operator rather than the operator that you used in the construction of an almost everywhere divergent Fourier series. I found the latter very interesting and I am wondering: if one tries to run an analysis of the level sets assuming this time that , is there a way to convince ourselves that the construction of a counterexample in is not possible? Or a way to get the intuition of why the theorem must be true? For me it is not clear of where the assumption will help us much more than (at least my poor computations are not helping me). I am partly asking this question because Carleson is known for proving the theorem after years of trying to construct a counter example.

Thank you for your time, and all the notes that you realease!

17 May, 2020 at 6:35 pm

Terence TaoRoughly speaking, Calderon-Zygmund type techniques can control a restricted version of the Carleson theorem in which one only works with frequency cutoff is restricted to a lacunary set, such as the powers of . In terms of the model operators appearing later in the paper, Calderon-Zygmund theory can basically handle the contribution of a single tree (particularly if one modulates frequency so that the tree is now based at the frequency origin).

The operator is almost the same as ; it has a slightly more convenient description in physical space, whereas has a more convenient description in frequency space, but they are essentially the same operator. By working in one gains access to approximations to Dirac delta masses that one can place at various spatial locations to influence the Carleson maximal operator in various ways. In such approximate Dirac masses are very expensive in terms of the norm and are no longer usable to generate counterexamples. (With regards to the proof of Carleson’s theorem, the bound manifests itself through the energy selection lemma, Proposition 16; Dirac spikes cost too much energy for their spatial extent.)

18 May, 2020 at 1:27 am

AntoineThank you a lot for the insights for the insights on the Calderon-Zygmund type techniques, it proves to be very helpful!

Let me ask one last question, which I do not how to address, but might be in fact simple to answer: as you explained, it is true that and are almost identical, but many results seem easier to prove with (at first glance). For example I wonder how the classical result (and very easy with ) that translates with instead? The classical proof only ises that the Dirichlet kernel has logarithmic divergence of its norm. In the first part of his article, Carleson improved this claims and proves, that (for say) using a more simple interval selection process. Is it possible to write a simple proof of this bound using the discrete model given by the function?

18 May, 2020 at 12:57 pm

Terence TaoAs I said previously, it’s a matter of whether one wishes to exploit physical space arguments or frequency space arguments. The bound uses the physical space structure of the kernel and the triangle inequality in physical space, so if one starts with one basically should write down the physical space representation which is essentially up to minor factors.

I am not so familiar with this preliminary argument of Carleson, but I would imagine that it can be translated to the discrete model. Certainly the full argument of Carleson can be translated in this fashion, see e.g., https://arxiv.org/abs/1210.0886

18 May, 2020 at 10:19 pm

AntoineThank you for the paper of Demeter, I’ll dig into it. I will also try to write a bound using arguments presented in you lecture notes, and if I suceed I will put it here.

16 May, 2020 at 9:45 am

Convergence of Fourier Series in Lp – Zeros and Ones[…] i.e., is it true that validity of (A) implies (B) itself? The answer is yes for due to Stein. See Tao’s notes on more details about almost everywhere […]

17 May, 2020 at 2:01 pm

Xiao-Chuan LiuIn the proof of Proposition 3, at the end of the third paragraph, it should be item (i) you want to establish.

For the part from (ii) to (iii), the final part of the proof, where you listed sequences of functions satisfying four conditions, in condition (3), do you require x to belong to and also ?

[Corrected, thanks – T.]19 May, 2020 at 9:13 am

AnonymousIs it possible that Carlson theorem holds for any complete system of continuous functions on any (fixed) real interval?

19 May, 2020 at 9:21 am

AnonymousI meant any orthonormal (wrt some continuous weight function) complete system of functions on a real interval.

19 May, 2020 at 3:45 pm

Terence TaoThis fails in general; the first construction, for a bounded orthogonal system of sign functions, seems to be by Kolmogorov and Menshov in 1927 https://mathscinet.ams.org/mathscinet-getitem?mr=1544864, there is also a counterexample for the permuted Fourier basis announced by Kolmogorov in 1926, but whose proof first appears in https://mathscinet.ams.org/mathscinet-getitem?mr=147833 . I found these references in this survey https://mathscinet.ams.org/mathscinet-getitem?mr=710115 which discusses several further results of this type.

On the other hand, the Radamacher-Menshov type results (see Exercise 8) that has a logarithmic loss is available for any orthogonal system.

19 May, 2020 at 11:27 am

RexIn formula (7) in the proof of Prop 3 is written:

“Since is a convolution operator, it commutes with translations, and hence

for each .”

Maybe some parentheses would be helpful here. The mention of translation-invariance suggests that this formula is meant to say

as opposed to

since formula (B) would just follow by linearity and not require any translation-invariance. (Of course, the sentence ultimately asserts that (A) = (B) by translation-invariance.)

But where in the rest of the proof do we rely upon the formula (A)? That is, what breaks if we use (B) throughout and never connect it to (A) by asserting translation-invariance? Is it something with Khinchin?

19 May, 2020 at 3:50 pm

Terence TaoStrictly speaking, the interpretation (B) is not well-formed; is a number, not a function, so does not apply to it. (One would write something like or instead.)

The translation invariance is needed to ensure that the summand of (7) is large on , which thanks to the random translations are approximately disjoint. Without the translation invariance one cannot push apart the sets of largeness to fill out a set of large measure.

19 May, 2020 at 4:14 pm

RexI was sloppy with the notation, but what I meant in (B) is what you suggested: translate by first, and then apply to it.

So what you are saying is that, a priori, one does not know anything about the of a translated , even if we already know that has large values on a reasonably large set? In particular, it is not clear that of the translated thing has large values anywhere, until we observe that it is just a translation of ?

19 May, 2020 at 4:47 pm

Terence TaoYes. In general, a translate of will be linearly independent of , so if all that one knows about is that it is linear, then there is no relationship between applied to and applied to the translate of .

20 May, 2020 at 6:04 am

AnonymousDear Pro.Tao,

Tomorrow, 21stMay , I want you to be named with the big award of math.

20 May, 2020 at 7:00 am

AnonymousDear professor,

I am confused about the (ii) of Exercise 11. Indeed, if the spatial intervals in a tree are all dyadic intervals within the spatial support of the top Let’s say and the collections of are of the form for . Then for any partition , an element that intersects the interior of , it contains infinitely many intervals of sizes strictly smaller than . This contradicts to (ii) of Exercise 11.

20 May, 2020 at 10:55 am

Terence TaoOops, I forgot to add the requirement that trees are finite (this is one of the reasons why we restrict to a finite set of tiles earlier in the argument).

24 May, 2020 at 9:08 pm

AnonymousDear Pro Tao,

I really concern the Navier – Stokes problem very much. I would like to contribute my little idea (but infact it is very superior) . I think you do not wait a Turing machine – a fiction machine ( a dream machine) to check the time of flow up Navier -Stokes equation. I advice you to have a breakthrough in physics ( with your prestidge , you can visit the physics laboratory room which is the best in the world. If not all your efforts will be fallen down in the sea. Once you succeed, you have double of novelable medals at the same time.They are Nobel prize and Millenium award.

A genuine advice,

Best regard,

20 May, 2020 at 11:06 am

Xiao-Chuan LiuI guess the four integrals at the end of the proof of Kolmogorov theorem, (around formula (11)) are from -1 to 1, instead of over the whole R?

[It doesn’t matter, thanks to the support of , but I have changed them all to for consistency. -T]22 May, 2020 at 1:29 am

AnonymousSince Chebyshev polynomials expansion of any square integrable function (wrt to the weight function ) on is identical(!) with the Fourier expansion of the even function on under the pointwise correspondence and , it follows that Carlson theorem holds also for Chebyshev polynomials expansions. So it seems reasonable to assume that Carlson theorem holds also for the other classical orthogonal polynomials.

In particular, if Carlson theorem is true also for Legendre polynomials expasions, is it possible that Carlson theorem holds also for the classical spherical harmonics expansions on the unit sphere ?

22 May, 2020 at 6:13 am

riderrijuCan it possible to write an article on hypercomplex function and spinors

22 May, 2020 at 10:04 am

James LengI believe you have a typo in your presentation of K’s counterexample:

“Now we estimate {{\mathcal C}f(x)} for {x} in the interval {[j_0+0.1, j_0+0.9]} for some natural number {n/3 \leq j \leq 2n/3}; note the set of all such {x} has measure {\sim n}. In this range we will test the maximal operator at the frequency cutoff {N = N_{j_0}}:”

Should the j be a j_0? Otherwise, I don’t see how it’s possible that the measure of all such x is \sim n.

[Corrected, thanks – T.]22 May, 2020 at 12:09 pm

Anonymous“we can select the frequency parameter N as we please, as long as it depends only on N and not on y” I think it should mean “only on x and not on y”?

[Corrected, thanks – T.]22 May, 2020 at 2:08 pm

AnonymousIn the beginning of the Carleson’s theorem, is the maximal operator 1_N(x) is actually a linear operator? i.e 1_N(x) (f+g) = 1_N(x)f + 1_N(x)g. This seems to fail since according to the definition of N(x), it seems that N might also depends on f?

26 May, 2020 at 1:25 pm

Terence TaoFor each fixed choice of frequency function , the operator is linear in . For the application to the maximal operator one does need to choose in a manner that depends on , but as long as the estimates for each of the linearized operators is uniform in the choice of frequency function, this is not a problem. (To put it another way, the inherent nonlinearity in the problem has been moved away from the operator, and is now contained solely in the choice of a parameter, in this case the frequency function.)

22 May, 2020 at 7:40 pm

RexI think the passage discussing modulation symmetry might be referring to the wrong formula. It refers to (12) and makes the remark

“(Technically the right-hand side changes because of an additional factor of {e^{2\pi i x \xi_0}}, but this factor can be handled for instance by generalising the indicator function cutoff {1_E} to a subindicator function cutoff {\chi_E} that has the pointwise bound {|\chi_E| \leq 1_E}; we will ignore this very minor issue here.)”

But the right-hand side of (12) is an L^2 norm, so it should be invariant under multiplication with a Fourier character.

[Corrected, thanks – T.]23 May, 2020 at 8:39 am

RexIn the passage

“By selecting, for each {x}, {N(x)} to be a frequency which attains (or nearly attains) the supremal value of {|1_{D \leq N(x)}| f(x)}, it suffices to establish the linearised estimate…”

Is the f(x) here supposed to be inside the absolute value?

[Corrected, thanks – T.]25 May, 2020 at 5:23 pm

Alan ChangIn the displayed equation following “The Fourier transform of f_R can be calculated as”, the f_R is missing a hat.

[Corrected, thanks – T.]26 May, 2020 at 1:23 pm

John MangualAre we allowed to say that where ? I think this was called a “good kernel” or “approximation to the identity” with several properties.

We could spread the error or over a bunch of (countably many) real numbers such that or .

There are numerous theorems where the objective is to show in or . We could try to draw the (failure of) pointwise convergence in the appropriate space.

I've never seen these quantitative convergence results.

26 May, 2020 at 2:37 pm

RexIn the passage

“To decouple further, we will try to decompose into “rank one” operators. More precisely, we manipulate

where we use the notation . It will be convenient to try ”

I believe you want (y – z) rather than (z – y) in the convolution here.

[I think it is correct as it stands – T.]27 May, 2020 at 8:45 pm

RexWhat does the in mean?

[Adjoint – T.]26 May, 2020 at 2:49 pm

RexCould you say more about the intuition behind this decomposition of into “rank one” operators? I find the explanation even about heuristics to be rather opaque. Maybe just more talking would help.

If I understand correctly, the are supposed to be translates of some model wave packet attuned to the Fourier interval . These translates are somehow supposed to form something like a basis for the subspace of L^2 consisting of f whose Fourier support lies entirely inside ?

Anyway, that is my interpretation of the formula

What does “rank one” refer to anyway?

Right now I don’t see the overarching strategy behind the proof of Carleson’s theorem.

27 May, 2020 at 10:56 am

Terence TaoRank refers to the dimension of the range of an operator: https://en.wikipedia.org/wiki/Finite-rank_operator . In particular, operators of the form are rank one operators.

There is a general heuristic that the “rank” of a pseudodifferential operator should be roughly equal to the area of the region in phase space that the symbol is supported in, although this statement is not literally true for several reasons, particularly the presence of “Schwartz tails” in either space or frequency (the claim is closer to literally true though for the “dyadic model” of Fourier analysis and pseudodifferential calculus provided by the Fourier-Walsh model, see e.g., https://www.math.ucla.edu/~tao/254a.1.01w/notes5.dvi ). In particular, the projection is projecting to a strip of infinite area and would thus be expected to be of infinite rank; partitioning into tiles would then be expected to give something like a rank one decomposition.

For continuous choices of parameter , the functions ${\mathrm Trans}_y 1_{\omega_-}^\check$ are an overcomplete basis, in particular the uncertainty principle predicts that these functions are essentially unchanged (up to multiplication by scalar phases) if is only adjusted at scales smaller than . However, if one sparisifies the range of to a lattice of spacing , one expects to recover a basis (cf. the Shannon sampling theorem https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem ).

27 May, 2020 at 10:00 pm

Rex“From the uncertainty principle, modifying {y} by {O(1/|\omega_-|)} should only modify {\mathrm{Trans}_y 1^\vee_{\omega_-}(x)} approximately by a phase, so the integral here is roughly constant at spatial scales {O(1/|\omega_-|)}.”

Could you elaborate on this? It is unclear to me how the uncertainty principle comes into play here.

28 May, 2020 at 1:06 pm

Terence TaoSuppose first that is supported near the origin, so is supported in the region . Then the uncertainty principle predicts that the Fourier transform will not vary significantly at scales less than the reciprocal scale , since the phase does not change much under this modification. When is centered at a different frequency then one can perform a similar heuristic after removing a frequency modulation . (See also https://terrytao.wordpress.com/2010/06/25/the-uncertainty-principle/ for a more detailed discussion of the uncertainty principle heuristic.)

28 May, 2020 at 4:28 pm

RexBy “uncertainty principle” you are just referring to the informal phase heuristic from https://terrytao.wordpress.com/2010/06/25/the-uncertainty-principle/ ?

The uncertainty principle I had in mind was something else, namely a lower bound on the product of the variances of f and \hat{f} (from Stein and Shakarchi vol 1).

I would not have guessed that there was any close connection between these two things.

29 May, 2020 at 12:51 pm

Terence TaoYes, I am referring to the general (but informal) uncertainty principle here, rather than the specific Heisenberg uncertainty principle you refer to, which is a manifestation of the general principle (namely: if a function has frequency standard deviation , then it is roughly constant up to phase rotations at spatial scales and hence should hae spatial standard deviation ) but the latter is often too narrow to be of much use in applications (other than as support for the more general principle).

Note also that while the Heisenberg uncertainty principle is the most well known of the formalizations of the general uncertainty principle, it is certainly far from the only one available. For instance the Hardy uncertainty principle is also well known.

27 May, 2020 at 7:57 am

AnonymousThe definition of P_{+} when in the sentence “The interval \omega_{+} is then similarly….” is missing a right bracket.

A \langle is missing in (14).

[Corrected, thanks – T.]27 May, 2020 at 8:53 am

AnonymousShould we actually instead have ?

[Corrected, thanks – T.]27 May, 2020 at 11:08 am

RexIn the passage:

“From the uncertainty principle, modifying {y} by {O(1/|\omega_-|)} should only modify {\mathrm{Trans}_y 1^\vee_{\omega_-}(x)} approximately by a phase, so the integral here is roughly constant at spatial scales {O(1/|\omega_-|)}. So we heuristically have

If we now define a tile to be a rectangle {P} in phase space of the form”

I think there shouldn’t be an integral sign in this formula?

27 May, 2020 at 11:27 am

AnonymousFor Exercise 9, I got a slightly different result. Here is my calculation:

(holder),

$\leq ||f||_{p,\infinity}||\mu(E)||_{p’}\leq A||\mu(E)||_{p’}$ since norm is bigger than norm. But

?

27 May, 2020 at 2:58 pm

Terence TaoYou may have inadvertently used the constant function in place of the indicator function .

27 May, 2020 at 11:18 pm

Rex“We now make the above heuristic decomposition rigorous. For any dyadic interval {\omega = [\inf \omega, \inf \omega + |\omega|]}, let {\omega_- := [\inf \omega, \inf \omega + |\omega|/2]} denote the left child interval, and {\omega_+ := [\inf \omega + |\omega|/2, \inf \omega + |\omega|]} the right parent interval. ”

I assume this should read “right child interval”.

[Corrected, thanks – T.]28 May, 2020 at 1:27 am

Rex“we thus have

\displaystyle \sum_{\omega} |\int_{\bf R} 1_E(X) 1_{\omega_+-\xi_0}(N(X)) \eta_{\omega-\xi_0}(D) f\ dx| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}.

We temporarily truncate to a finite range of scales, ”

Shouldn’t be here?

[Corrected, thanks – T.]28 May, 2020 at 9:32 pm

Rex“For fixed , the expression

is periodic in with period , with average equal to

which we can rewrite as”

The in the subscripts should be .

[Corrected, thanks – T.]29 May, 2020 at 2:05 pm

Rex(I mean in the second formula, not the first. Unless I’m mistaken)

29 May, 2020 at 6:46 am

AnonymousWhich properties of the orthonormal basis functions on are used in the proof of Carleson theorem – which are sufficient to imply its truth for any other basis of orthonormal functions?

29 May, 2020 at 2:23 pm

Terence TaoBasically in order for the known proofs of Carleson’s theorem to work, there should be a strong enough manifestation of the uncertainty principle that one can recast the Carleson maximal operator in terms of sums over tiles in phase space. Most bounded orthonormal systems will not have such a strong uncertainty principle, and indeed the analog of Carleson’s theorem for such systems fails in general.

29 May, 2020 at 10:34 am

AnonymousIn the heuristic discussion, how do you get the equality in

?

We have products of 2 operators and 1 function. I know for three functions

, shouldn’t be ?

29 May, 2020 at 2:00 pm

Terence TaoThe left-hand side is the inverse Fourier transform of , and .

More generally, it is a good idea to mentally categorize some functions as “living in frequency space” (such as ), and other functions as “living in physical space” (such as ). If one ends up trying to combine (via pointwise product or convolution) a function in physical space (such as ) with a function in Fourier space (such as ), then something has gone wrong somewhere, and by inspecting previous steps in your reasoning and identifying which expressions are in physical space and which ones are in frequency space, one can quickly localize the error.

29 May, 2020 at 2:08 pm

RexIsn’t the same as ? We’re talking about complex conjugates of indicator functions here, aren’t we?

29 May, 2020 at 2:16 pm

Terence TaoYes, they are equal, and similarly and are equal, but I am using different notation for these terms as they are playing slightly different roles (one term determines the range of the rank one operator, and the other determines the corange).

29 May, 2020 at 2:22 pm

Rexand similarly ? (I was confused about this earlier)

29 May, 2020 at 11:49 am

Alan ChangIn “which by (31) simplifies to”, the (31) should be (16).

[Corrected, thanks – T.]29 May, 2020 at 1:59 pm

RexIn the case of separated tiles of fixed scale, it might be helpful to label the energy and mass estimates as such, to make the connection with later sections clear.

[Suggestion implemented -T.]29 May, 2020 at 2:11 pm

RexIn fact, it would probably be helpful to introduce and discuss the informal physical analogy with mass and energy in this section.

30 May, 2020 at 3:34 pm

AnonymousYou should take notes and indicate this yourself.

29 May, 2020 at 2:33 pm

RexI think it was mentioned in lecture that pointwise convergence of Fourier series is not known for higher dimensional tori.

What is it about the current proof of Carleson’s theorem that fails to extend to higher dimensions?

30 May, 2020 at 8:00 pm

Terence TaoThe tree estimate (which, very roughly speaking, corresponds to a maximal disk multiplier over radii in a lacunary range such as ) is not currently established. This may not be the only missing piece of the argument but my understanding is that it is the primary one.

29 May, 2020 at 9:03 pm

RexAnother thing that would be helpful to mention explicitly in the notes is the change of variables in order to transform the graph of in the plane to a simple triangle in the plane.

In particular, it would clarify things (for me, anyway) to spell out where this transformation fits into the overall strategy. That is, why trading off the irregular regions in the plane for regular rectangular ones by making an irregular change of variables helps the proof without possibly hurting in other ways.

[Added a sentence to this effect – T.]29 May, 2020 at 9:05 pm

Rex(That was bad notation. This should read -plane and -plane)

1 June, 2020 at 11:09 am

AnonymousWhen talking about , say the -tree, the lower half are contained, and the upper half are disjoint. You mean the upper half of the tiles in the tree are disjoint from , not disjoint from each other, right? Just want to clarify this.

1 June, 2020 at 5:43 pm

Terence TaoBoth assertions are true (and the latter is more relevant).

1 June, 2020 at 1:56 pm

AnonymousIntuitively, why are the -tree and +tree estimates different? We are trying to prove equation (19) for different cases, but I thought the +tree should be the same as the -tree case, since they are symmetric (by flipping). Can we just prove one case, and say the other case follows by the same argument?

1 June, 2020 at 5:48 pm

Terence TaoThe trees themselves have a symmetry, but the model operator does not; the lower tiles interact with and the upper tiles interact with in somewhat different ways (in particular requiring understanding “energy” in the former case and “mass” in the latter).

1 June, 2020 at 3:17 pm

AnonymousFor the proof of the energy estimate of fixed scale case, equation (25), you used TT* method to exploit approximate orthogonality. I am not familiar with the TT* method, and couldn’t find much info on line. Could you explain what it is? In particular, how do you get the equality:

.

Also, is TT* method the standard way to exploit approximate orthogonality of finite dimensional vectors?

2 June, 2020 at 12:37 am

MATH 247B: Modern Real-Variable Harmonic Analysis – Countable Infinity[…] Carleson’s Theorem […]

2 June, 2020 at 3:14 am

Rex” To motivate the argument, suppose that {\phi_P} had no tail outside of {I_P}, so that one could replace {\chi_{I_P}^{100}} to {1_{I_P}} in (22). Then would have

\displaystyle |I_P|^{1/2} \langle \phi_P, 1_{E_P^+} \rangle \lesssim |\{ x \in E: (x,N(x)) \in P_+ \}|,

and as the tiles {P_+} are all essentially disjoint the claim (24) would then follow from summing in {P}, since each {x \in E} contributes to at most one of the sets {\{ x \in E: (x,N(x)) \in P_+ \}}. ”

Is this the place where we need to use the fact that N(x) does not take dyadic values?

Can one also get around the boundary issues by declaring dyadic intervals to be half-open i.e. of the form ? This seems like it would be simpler than having to find an N(x) with non-dyadic values.

2 June, 2020 at 11:44 am

Anonymous[a,b) or [a,b] not important since we don’t care sets of measure zero. Dyadic values are easy to find and use, and have nice nesting properties, thus orthogonality which is used throughout the proof.

2 June, 2020 at 1:25 pm

RexSets of measure zero can have inverse image of non-zero measure.

3 June, 2020 at 3:47 pm

Terence TaoFair enough, this is a simpler fix. I have reworded the text to use half-open dyadic intervals instead.

2 June, 2020 at 4:04 pm

AnonymousFor tiles P>P’, you mentioned the geometric observation that either are disjoint or are disjoint. What about the case where P and P’ cross each other in the middle, so both + and – side overlap? For example, let P=(1/2)*2 (short and fat), and P’=(1/4)*4 (long and thin). Why won’t this happen?

3 June, 2020 at 3:49 pm

Terence TaoFor dyadic intervals it is not possible for a short dyadic interval to overlap both halves of a long dyadic interval. For instance if then the dyadic tiles that intersect are and , one of which meets the upper half of and not the lower, and the other doing the reverse.

2 June, 2020 at 4:57 pm

AnonymousOnce we hypothesize the upper bounds of energy and mass of tiles P (eq. 27, 28), can we also assume the same lower bounds for the tree top (eq. 30,31)? The energy lower bounds seems reasonable for a tree top, but I am not sure about the mass lower bound since $\mu$ is a bound of mass density, not the total mass.

3 June, 2020 at 3:50 pm

Terence TaoNot in general, but it turns out that the tree selection process used later in the argument will basically generate trees that obey lower bounds of this form (for at least one of the mass and energy, if not both). The lower bounds are not of essential importance in this argument; they are present mostly for pedagogical purposes to illustrate why the tree estimate is in some sense “stronger” than the estimate for a single tree.

2 June, 2020 at 10:22 pm

RexIn the passage

“We also assume that we have the reverse bounds for the tree top:

\displaystyle |I_{T}|^{-1/2} \langle f, \phi_{P_T} \rangle \gtrsim \varepsilon \ \ \ \ \ (30)

and

\displaystyle |I_{T}|^{-1} \int_{\bf R} 1_{E}(x) \chi^{10}_{I_{T}}(x)\ dx \gtrsim \mu. \ \ \ \ \ (31)

It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either {\varepsilon} or {\mu}.”

What does the notation here mean? That the implicit constants should be uniform over all trees?

Otherwise, there is only one tree top, and it seems this bound would be vacuous if we allow ourselves to insert any constant we like.

3 June, 2020 at 3:51 pm

Terence TaoImplied constants are assumed to be absolute if not subscripted by parameters, so indeed in this case I am claiming a constant independent of the choice of tree.

2 June, 2020 at 11:09 pm

Rex“From (29) and Exercise 11(iii) as before we have

\displaystyle |J \cap E| \lesssim \mu |J|

and so we can bound the expression \eqerf{j-sum} by

\displaystyle \lesssim \sum_{J \in {\mathcal J}: J \subset CI_T} \mu |J| \sup_{I: I \supset J} \frac{1}{|I|} \int_I |F|

which one can bound in terms of the Hardy-Littlewood maximal function”

\eqerf{j-sum} isn’t parsing here

[Corrected, thanks – T.]3 June, 2020 at 3:56 am

RexIn the defining properties of Whitney decomposition in exercise 11, are the implicit constants in the inequalities supposed to be uniform over all trees (even though the decomposition depends on the tree)?

[Yes – T.]3 June, 2020 at 9:35 am

AnonymousThe implicit constants should depend on each tree T, not uniform.

[This is incorrect -T.]3 June, 2020 at 12:45 pm

AnonymousCan you say more about why ?

3 June, 2020 at 4:10 pm

Terence TaoFrom the properties of the Whitney decomposition, every in the sum either is contained in for some , or has size and is at distance from . In particular in the latter case there are only intervals of any given size. This is enough information to bound the sum.

3 June, 2020 at 4:50 pm

Alan ChangIt seems like there is currently a glitch with this blog post. Instead of seeing the class notes when I load this page, I see a copy of https://terrytao.wordpress.com/2009/03/07/infinite-fields-finite-fields-and-the-ax-grothendieck-theorem/

[Fixed, thanks – T.]4 June, 2020 at 1:51 pm

AnonymousLHS(49) = ?

My attempt:

[There was a misprint in the notes regarding a misplaced conjugation sign, which has now been fixed – T.]5 June, 2020 at 10:45 am

AnonymousFor the tree estimate, you mentioned that the LHS is like the controlling the maximal Calderon Zygmund operator and maximal Hilbert transform. Could you elaborate on this? It wasn’t clear from the lecture.

[Added more comments to this effect – T.]