This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} is an absolutely integrable function, its Fourier coefficients {\hat f: {\bf Z} \rightarrow {\bf C}} are defined by the formula

\displaystyle  \hat f(n) := \int_{{\bf R}/{\bf Z}} f(x) e^{-2\pi i nx}\ dx.

If {f} is smooth, then the Fourier coefficients {\hat f} are absolutely summable, and we have the Fourier inversion formula

\displaystyle  f(x) = \sum_{n \in {\bf Z}} \hat f(n) e^{2\pi i nx}

where the series here is uniformly convergent. In particular, if we define the partial summation operators

\displaystyle  S_N f(x) := \sum_{|n| \leq N} \hat f(n) e^{2\pi i nx}

then {S_N f} converges uniformly to {f} when {f} is smooth.

What if {f} is not smooth, but merely lies in an {L^p({\bf R}/{\bf Z})} class for some {1 \leq p \leq \infty}? The Fourier coefficients {\hat f} remain well-defined, as do the partial summation operators {S_N}. The question of convergence in norm is relatively easy to settle:

Exercise 1
  • (i) If {1 < p < \infty} and {f \in L^p({\bf R}/{\bf Z})}, show that {S_N f} converges in {L^p({\bf R}/{\bf Z})} norm to {f}. (Hint: first use the boundedness of the Hilbert transform to show that {S_N} is bounded in {L^p({\bf R}/{\bf Z})} uniformly in {N}.)
  • (ii) If {p=1} or {p=\infty}, show that there exists {f \in L^p({\bf R}/{\bf Z})} such that the sequence {S_N f} is unbounded in {L^p({\bf R}/{\bf Z})} (so in particular it certainly does not converge in {L^p({\bf R}/{\bf Z})} norm to {f}. (Hint: first show that {S_N} is not bounded in {L^p({\bf R}/{\bf Z})} uniformly in {N}, then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)
  • (i) (Kolmogorov, 1923) There exists {f \in L^1({\bf R}/{\bf Z})} such that {S_N f(x)} is unbounded in {N} for almost every {x}.
  • (ii) (Carleson, 1966; conjectured by Lusin, 1913) For every {f \in L^2({\bf R}/{\bf Z})}, {S_N f(x)} converges to {f(x)} as {N \rightarrow \infty} for almost every {x}.
  • (iii) (Hunt, 1967) For every {1 < p \leq \infty} and {f \in L^p({\bf R}/{\bf Z})}, {S_N f(x)} converges to {f(x)} as {N \rightarrow \infty} for almost every {x}.

Note from Hölder’s inequality that {L^2({\bf R}/{\bf Z})} contains {L^p({\bf R}/{\bf Z})} for all {p\geq 2}, so Carleson’s theorem covers the {p \geq 2} case of Hunt’s theorem. We remark that the precise threshold near {L^1} between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function {f} or the summation method {S_N} by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a frequency modulation symmetry in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

— 1. Equivalent forms of almost everywhere convergence of Fourier series —

A standard technique to prove almost everywhere convergence results is by first establishing a weak-type estimate of an associated maximal function. For instance, the Lebesgue differentiation theorem is usually established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse this implication in certain cases by exploiting a symmetry of the problem. Here is the principle specialised to the application of pointwise convergence of Fourier series, and also combined with a transference principle of Kenig and Tomas:

Proposition 3 (Equivalent forms of almost everywhere convergence) Let {1 \leq p \leq 2}. Then the following statements are equivalent:
  • (i) For every {f \in L^p({\bf R}/{\bf Z})}, one has {\lim_{N \rightarrow \infty} S_N f(x) = f(x)} for almost every {x \in {\bf R}/{\bf Z}}.
  • (ii) There does not exist {f \in L^p({\bf R}/{\bf Z})} such that {\sup_{N \geq 0} |S_N f(x)| = +\infty} for almost every {x \in {\bf R}/{\bf Z}}.
  • (iii) One has the maximal inequality

    \displaystyle  \| \sup_{N \geq 0} |S_N f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})} \ \ \ \ \ (1)

    for all smooth {f: {\bf R}/{\bf Z} \rightarrow {\bf C}}, where the weak {L^{p,\infty}} norm is defined as

    \displaystyle  \| F \|_{L^{p,\infty}({\bf R}/{\bf Z})} := \sup_{\lambda > 0} \lambda |\{ x \in {\bf R}/{\bf Z}: |F(x)| \geq \lambda \}|^{1/p}

    and {|E|} denotes the Lebesgue measure of a set {E} (which in this setting is a subset of the unit circle).
  • (iv) One has the maximal inequality

    \displaystyle  \| \sup_{N \in {\bf Z}} |P_N f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})} \ \ \ \ \ (2)

    for all smooth {f: {\bf R}/{\bf Z} \rightarrow {\bf C}}, where {P_N f} denotes the partial Fourier series

    \displaystyle  P_N f(x) = \sum_{n \leq N} \hat f(n) e^{2\pi i nx}.

  • (v) One has the maximal inequality

    \displaystyle  \| \sup_{N \in {\bf R}} |1_{D \leq N} f| \|_{L^{p,\infty}({\bf R})} \lesssim_p \|f\|_{L^p({\bf R})}

    for all {f \in {\mathcal S}({\bf R})}, where {1_{D \leq N}} denotes the Fourier multiplier operator

    \displaystyle  (1_{D \leq N} f)^\wedge(\xi) := 1_{\xi \leq N} \hat f(\xi).

Among other things, this proposition equates the qualitative property (i) of almost everywhere convergence to the quantitative property (iii) of a maximal inequality. This equivalence (first observed by Calderón) is similar in spirit to the uniform boundedness principle (see e.g. Corollary 1 of this previous blog post). The restriction {p \leq 2} is needed for just one implication (from (ii) to (iii)) in the arguments below, and arises due to the use of Khintchine’s inequality at one point. The equivalence of (iv) and (v) is part of a more general principle of transference that allows one to pass back and forth between periodic domains such as {{\bf R}/{\bf Z}} with non-periodic domains such as {{\bf R}} (or, on the Fourier side, between discrete domains {{\bf Z}} and continuous domains {{\bf R}}) if the estimates in question enjoy suitable scaling symmetries. We will use the formulation (v), as it enjoys the most symmetries.

Proof: We first show that (iii) implies (i). If (1) holds for all smooth {f}, then certainly for all finite {N_0} one has

\displaystyle  \| \sup_{0 \leq N \leq N_0} |S_N f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}

for all smooth {f}, and hence for all {f \in L^p({\bf R}/{\bf Z})} by a standard limiting argument. Taking limits as {N_0 \rightarrow \infty} (using variants of Fatou’s lemma) we conclude that (1) holds for all {f \in L^p({\bf R}/{\bf Z})}. This implies in particular from the triangle inequality that

\displaystyle  \| \limsup_{N \rightarrow \infty} |S_N f-f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}

for all {f \in L^p({\bf R}/{\bf Z})}. On the other hand, the left-hand side vanishes whenever {f} is smooth, since Fourier series converge uniformly in this case. Since smooth functions are dense in {L^p({\bf R}/{\bf Z})}, we conclude from standard limiting arguments that the left-hand side in fact vanishes for all {f \in L^p({\bf R}/{\bf Z})}, giving the claim (i).

Clearly (i) implies (ii). Now we assume that (iii) fails and use this to show that (ii) fails as well. From the failure of (iii) and monotone convergence, for any {A>0} one can find {f \in C^\infty({\bf R}/{\bf Z})}, a measurable subset {E} of {{\bf R}/{\bf Z}}, a finite {N_0}, and {\lambda > 0} such that

\displaystyle  \sup_{0 \leq N \leq N_0} |S_N f(x)| \geq \lambda \hbox{ for all } x \in E \ \ \ \ \ (3)

and such that

\displaystyle  \lambda |E|^{1/p} > A \|f\|_{L^p({\bf R}/{\bf Z})}. \ \ \ \ \ (4)

In particular, {E} has positive measure. By homogeneity we may normalise {\lambda=1}. At this stage, nothing prevents the measure of {E} from being much smaller than {1}; but we can exploit translation invariance to increase the measure of {E} to be comparable to {1} as follows. Let {n \geq 1} be the integer part of {1/|E|}. We claim that there exist translations {E+x_1,\dots,E+x_n} of {E} whose union has measure comparable to {1}:

\displaystyle  | \bigcup_{i=1}^n E + x_i| \sim 1. \ \ \ \ \ (5)

This is easiest to establish by the probabilistic method (which in this context we might call the random translation method). If we select {x_1,\dots,x_n} uniformly and independently at random we see that every point {x \in {\bf R}/{\bf Z}} will lie in a given translate {E + x_i} (or equivalently, that {x_i} lies in {x-E}) with probability {|E|}, hence

\displaystyle  {\mathbf P}( x \in \bigcup_{i=1}^n E + x_i ) = 1 - (1 - |E|)^n.

Integrating in {x} and using the Fubini-Tonelli theorem, we conclude that

\displaystyle  {\mathbf E}( |\bigcup_{i=1}^n E + x_i| ) = 1 - (1 - |E|)^n

and hence there exists deterministic choices of {x_1,\dots,x_n} for which

\displaystyle  |\bigcup_{i=1}^n E + x_i| \geq 1 - (1 - |E|)^n.

By definition of {n} the RHS is comparable to {1}, giving the claim (clearly the left-hand side cannot exceed {1}).

Now consider the randomised linear combination

\displaystyle  \tilde f(x) := \sum_{i=1}^n \epsilon_i f(x-x_i)

of translates of {f}, where {\epsilon_1,\dots,\epsilon_n} are random Bernoulli signs. From Khintchine’s inequality and the hypothesis {1 \leq p \leq 2} we have

\displaystyle  {\bf E} \| \tilde f \|_{L^p({\bf R}/{\bf Z})}^p \sim_p \| (\sum_{i=1}^n |f(\cdot-x_i)|^2)^{1/2} \|_{L^p({\bf R}/{\bf Z})}^p

\displaystyle  \leq \| (\sum_{i=1}^n |f(\cdot-x_i)|^p)^{1/p} \|_{L^p({\bf R}/{\bf Z})}^p

\displaystyle  = \sum_{i=1}^n \|f(\cdot-x_i)\|_{L^p({\bf R}/{\bf Z})}^p

\displaystyle  = n \|f\|_{L^p({\bf R}/{\bf Z})}^p

hence by construction of {n} and (4)

\displaystyle  {\bf E} \| \tilde f \|_{L^p({\bf R}/{\bf Z})}^p \lesssim_p A^{-p}. \ \ \ \ \ (6)

Now we study the behaviour of {\sup_{0 \leq N \leq N_0} |S_N \tilde f(x)|} when {x \in \bigcup_{i=1}^n E + x_i}. Since {S_N} is a convolution operator, it commutes with translations, and hence

\displaystyle  S_N \tilde f(x) = \sum_{i=1}^n \epsilon_i S_N f(x-x_i) \ \ \ \ \ (7)

for each {i}. On the other hand, from (3) we have

\displaystyle  \sup_{1 \leq i \leq n; 0 \leq N \leq N_0} |S_N f(x-x_i)| \geq 1

and hence there exists {0 \leq N_x \leq N_0} such that

\displaystyle  \sup_{1 \leq i \leq n} |S_{N_x} f(x-x_i)| \geq 1.

In particular, the square function

\displaystyle  Q_x := (\sum_{i=1}^n |S_{N_x} f(x-x_i)|^2)^{1/2}

is at least {1}. Meanwhile, from Khintchine’s inequality and (7) we have

\displaystyle  ({\bf E} |S_{N_x} \tilde f(x)|^q)^{1/q} \sim_q Q_x

for all {0 < q < \infty}. Applying the Paley-Zygmund inequality (setting {q=1,2}, for instance) we conclude that

\displaystyle  {\bf P}( |S_{N_x} \tilde f(x)| \gtrsim Q_x ) \gtrsim 1

(for suitable choices of implied constants), so in particular

\displaystyle  {\bf P}( \sup_{0 \leq N \leq N_0} |S_N \tilde f(x)| \gtrsim 1 ) \gtrsim 1

Integrating in {x} using (5), and applying the Fubini-Tonelli theorem, we conclude that

\displaystyle  {\bf E} |\{ \sup_{0 \leq N \leq N_0} |S_N \tilde f(x)| \gtrsim 1 \}| \gtrsim 1

hence by (6) one has

\displaystyle  {\bf E} |\{ \sup_{0 \leq N \leq N_0} |S_N \tilde f(x)| \gtrsim 1 \}| \gtrsim_p 1 + A^p {\bf E} \| \tilde f \|_{L^p({\bf R}/{\bf Z})}^p.

In particular, there exists a deterministic choice of signs {\epsilon_i} (and hence of {\tilde f}) for which

\displaystyle  |\{ \sup_{0 \leq N \leq N_0} |S_N \tilde f(x)| \gtrsim 1 \}| \gtrsim_p 1 + A^p \| \tilde f \|_{L^p({\bf R}/{\bf Z})}^p.

On the other hand, the left-hand side is at most {1}. We conclude that for every {A>0}, we can find a smooth function {f_A} with

\displaystyle  \|f_A \|_{L^p({\bf R}/{\bf Z})} \lesssim_p \frac{1}{A}

and a finite {N_A}, as well as a set {E_A \subset {\bf R}/{\bf Z}} of measure {|E_A| \sim_p 1}, such that

\displaystyle \sup_{0 \leq N \leq N_A} |S_N f_A(x)| \gtrsim 1

for all {x \in E_A}.

Applying this fact iteratively (each time choosing {A} to be sufficiently large depending on all previous choices), we can construct a sequence of smooth functions {f^{(m)}: {\bf R}/{\bf Z} \rightarrow {\bf C}}, finite {N^{(m)}}, and sets {E^{(m)}} for {m=1,2,\dots} such that

  • (a) {\|f^{(m)}\|_{L^p({\bf R}/{\bf Z})} \leq \frac{1}{2^m}} for all {m \geq 1}.
  • (b) {|E^{(m)}| \gtrsim_p 1} for all {m \geq 1}.
  • (c) One has

    \displaystyle \sup_{0 \leq N \leq N^{(m)}} |S_N f^{(m)}(x)| \geq m + \sum_{m' < m} \sup_{N \geq 0} \|S_N f^{(m')}\|_{L^\infty({\bf R}/{\bf Z})}

    for all {m\geq 1} and {x \in E^{(m)}} (note that the right-hand side is finite since the {f^{(m')}} are smooth for {m' < m}).
  • (d) {\sup_{0 \leq N \leq N^{(m')}} \|S_N f^{(m)}\|_{L^\infty({\bf R}/{\bf Z})} \leq \frac{1}{2^m}} for all {1 \leq m' < m} (note that the left-hand side is bounded by {O_{N^{(m')}}( \|f^{(m)}\|_{L^p({\bf R}/{\bf Z})} )}).
By randomly translating the {E^{(m)}} (and {f^{(m)}}), and using the Borel-Cantelli lemma, we may assume without loss of generality that almost every {x \in {\bf R}/{\bf Z}} lies in infinitely many of the {E^{(m)}}. If one then sets

\displaystyle  f := \sum_{m=1}^\infty f^{(m)}

we see that {f \in L^p({\bf R}/{\bf Z})}, and from the triangle inequality we see that for almost every {x \in {\bf R}/{\bf Z}} we have

\displaystyle  \sup_{0 \leq N \leq N^{(m)}} |S_N f(x)| \geq m - \sum_{\tilde m > m} \frac{1}{2^{\tilde m}}

for infinitely many {m}, which implies in particular that

\displaystyle  \sup_{N \geq 0} |S_N f(x)| = +\infty

for almost all {x}. This shows that (ii) must fail, as required.

From the identity

\displaystyle  S_N = P_N - P_{-N-1} \ \ \ \ \ (8)

and the triangle inequality we see that (iv) implies (iii). Conversely, suppose that (iii) holds. We will take a limit using frequency modulations. For any smooth {f}, we apply (iii) to the modulated function {e_{M} f}, where {M} is a natural number and {e_{M}(x) := e^{2\pi i Mx}}, to get

\displaystyle  \| \sup_{N \geq 0} |P_N(e_{M} f) - P_{-N-1}(e_{M} f)| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}.

Since {P_N(e_{M} f) = e_{M} P_{N-M} f}, we conclude that

\displaystyle  \| \sup_{N \geq 0} |P_{N-M} f - P_{-N-M-1} f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}

restricting to the range {M-N_0 \leq N \leq M+N_0} for some given natural number {N_0}, we conclude for {M > N_0} that

\displaystyle  \| \sup_{-N_0 \leq N \leq N_0} |P_{N} f - P_{-N-2M-1} f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}.

Since {f} is smooth, it has rapidly decreasing Fourier coefficients, which implies that {\sup_{-N_0 \leq N \leq N_0} P_{-N-2M-1} f} converges uniformly to zero as {M \rightarrow \infty}. We conclude that

\displaystyle  \| \sup_{-N_0 \leq N \leq N_0} |P_{N} f| \|_{L^{p,\infty}({\bf R}/{\bf Z})} \lesssim_p \|f\|_{L^p({\bf R}/{\bf Z})}

and then we obtain (iv) by monotone convergence.

Now we assume (iv) and work to establish (v). The idea here is to use a rescaling argument, viewing {{\bf R}} as the limit as {K \rightarrow \infty} of the large circle {{\bf R}/K{\bf Z}} (in physical space) or the fine lattice {K^{-1}{\bf Z}} (in frequency space).

By limiting arguments we may assume that {f} is compactly supported on some interval {[-A,A]}. Let {K>0} be a large scaling parameter, and consider the periodic function {f_K: {\bf R}/{\bf Z} \rightarrow {\bf C}} defined by

\displaystyle  f_K(x \hbox{ mod } 1) := \sum_{n \in {\bf Z}} f(K(x-n)).

For {K} large enough, this function is smooth and supported on the interval {[-A/K, A/K] \hbox{ mod } 1}, with {L^p} norm

\displaystyle  \|f_K \|_{L^p({\bf R}/{\bf Z})} = K^{-1/p} \| f\|_{L^p({\bf R})}.

The Fourier coefficients of {f_K} is given as

\displaystyle  \hat{f_K}(n) = K^{-1} \hat f(K^{-1} n)

so that

\displaystyle  P_N f_K(x) = K^{-1} \sum_{n \leq N} \hat f(K^{-1} n) e^{2\pi i n x}.

Applying (iv), we see that for any {\lambda > 0}, we have

\displaystyle  | \{ x \in [-1/2,1/2]: \sup_{N \in {\bf Z}} K^{-1} |\sum_{n \leq N} \hat f(K^{-1} n) e^{2\pi i n x}| \geq \lambda \} | \lesssim_p \lambda^{-p} K^{-1} \|f\|_{L^p({\bf R})}^p.

Rescaling {x} by {K}, we conclude that

\displaystyle  | \{ x \in [-K/2,K/2]: \sup_{N \in {\bf Z}} K^{-1} |\sum_{n \leq N} \hat f(K^{-1} n) e^{2\pi i K^{-1} n x}| \geq \lambda \} | \lesssim_p \lambda^{-p} \|f\|_{L^p({\bf R})}^p.

We can let {N} range over the reals rather than the integers as this does not affect the constraint {n \leq N}. Rescaling {N} by {K}, we see that for any compact intervals {I,J}, we have

\displaystyle  | \{ x \in I: \sup_{N \in J} K^{-1} |\sum_{\xi \in K^{-1} {\bf Z}: \xi \leq N} \hat f(\xi) e^{2\pi i \xi x}| \geq \lambda \} | \lesssim_p \lambda^{-p} \|f\|_{L^p({\bf R})}^p.

By uniform Riemann integrability and the rapid decrease of {\hat f}

\displaystyle  \lim_{K \rightarrow \infty} K^{-1} \sum_{\xi \in K^{-1} {\bf Z}: \xi \leq N} \hat f(\xi) e^{2\pi i \xi x} = \int_{-\infty}^N \hat f(\xi) e^{2\pi i \xi x}\ d\xi = 1_{D \leq N} f(x)

uniformly for {x \in I}, {N \in J}. We conclude that

\displaystyle  | \{ x \in I: \sup_{N \in J} |1_{D \leq N} f(x)| > \lambda \} | \lesssim_p \lambda^{-p} \|f\|_{L^p({\bf R})}^p.

By monotone convergence we may replace {I,J} with {{\bf R}}, and we then obtain (v).

Finally, we assume (v) and establish (iv). By a limiting argument it suffices to establish (iv) for trigonometric polynomials {f}, that is to say periodic functions whose Fourier coefficients are supported in {\{-A,\dots,A\}} for some natural number {A}. Let {\eta \in {\mathcal S}({\bf R})} be a non-zero Schwartz function with {\hat \eta} supported in {[-1,1]}, and for a given scaling parameter {R>0} let {f_R: {\bf R} \rightarrow {\bf C}} denote the Schwartz function

\displaystyle  f_R(x) := f(x \hbox{ mod } 1) \eta(x/R).

For {R} sufficiently large one easily checks that

\displaystyle  \|f_R\|_{L^p({\bf R})} \sim_{p,\eta} R^{1/p} \|f\|_{L^p({\bf R}/{\bf Z})}.

The Fourier transform of {f_R} can be calculated as

\displaystyle  \hat f_R(\xi) = R \sum_{n=-A}^A \hat f(n) \hat \eta( R(\xi-n) )

hence (for {R} large enough)

\displaystyle  1_{D \leq N+1/2}(D) f_R(x) = \sum_{n \leq N} \hat f(n) \eta(x/R) e^{2\pi i nx}

\displaystyle  = \eta(x/R) P_N f( x \hbox{ mod } 1)

and thus

\displaystyle  \eta(x/R) \sup_{N \in {\bf Z}} |P_N f( x \hbox{ mod } 1)| \leq \sup_{N' \in {\bf R}} |1_{D \leq N'}(D) f_R(x)|.

From (v) we conclude that for any {\lambda} we have

\displaystyle  | \{ x \in {\bf R}: \eta(x/R) \sup_{N \in {\bf Z}} |P_N f( x \hbox{ mod } 1)| \geq \lambda \} | \lesssim_{p,\eta} \lambda^{-p} R \|f\|_{L^p({\bf R}/{\bf Z})}^p.

For {R} large enough, the left-hand side is

\displaystyle  \gtrsim_\eta R |\{ x \in {\bf R}/{\bf Z}: \sup_{N \in {\bf Z}} |P_N f( x )| \geq c_\eta \lambda \} |

for some {c_\eta>0} depending on {\eta}. Dividing by {R} and replacing {\lambda} by {\lambda/c_\eta}, we obtain the claim (iv). \Box

Exercise 4 For {N \geq 1}, let {F_N} denote the Fejér summation operators

\displaystyle  F_N := \frac{1}{N+1} \sum_{N'=0}^N S_{N'}.

  • (i) For any {f \in L^1({\bf R}/{\bf Z})}, establish the pointwise bound

    \displaystyle  \sup_{N \geq 1} |F_N f(x)| \lesssim Mf(x)

    where {M} is the Hardy-Littlewood maximal function

    \displaystyle  Mf(x \hbox{ mod } 1) := \sup_{r>0} \frac{1}{2r} \int_{x-r}^{x+r} |f(y \hbox{ mod } 1)|\ dy.

  • (ii) Show that for {f \in L^1({\bf R}/{\bf Z})}, one has {\lim_{N \rightarrow \infty} F_N f(x) = f(x)} for almost all {x \in {\bf R}/{\bf Z}}.

Exercise 5 (Pointwise convergence of Fourier integrals) Let {1 < p < \infty} be such that the conclusion of Theorem 3(v) holds. Show that for any {f \in L^p({\bf R})}, one has {\lim_{N \rightarrow \infty} 1_{|D| \leq N} f(x) = f(x)} for almost all {x \in {\bf R}}, where {1_{|D| \leq N} f} is defined for Schwartz functions {f} by the formula

\displaystyle  1_{|D| \leq N} f(x) := \int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi

and then extended to {f \in L^p({\bf R})} by density.

Exercise 6 Let {d \geq 2}. Suppose that {1 \leq p \leq 2} is such that one has the restriction estimate

\displaystyle  \| \hat f \|_{L^1(S^{d-1}, d\sigma)} \lesssim_{p,d} \|f\|_{L^p({\bf R}^d)}

for all Schwartz functions {f \in {\mathcal S}({\bf R}^d)}, where {d\sigma} denotes the surface measure on the sphere {S^{d-1}}. Conclude that

\displaystyle  \| \hat f \|_{L^{p,\infty}(S^{d-1},d\sigma)} \lesssim_{p,d} \|f\|_{L^p({\bf R}^d)}

for all Schwartz functions {f \in {\mathcal S}({\bf R}^d)}. (This observation is due to Bourgain.) In particular, by Marcinkiewicz interpolation, {R_{S^{d-1}}(p \rightarrow 1)} implies {R_{S^{d-1}}(p \rightarrow q)} for all {1 \leq q < p}. (Hint: adapt some parts of the argument used to get from (iii) to (i) in Proposition 3, using rotation invariance as a substitute for translation invariance. (But the translational symmetry of the restriction problem – more precisely, the ability to translate a function {f} in physical space without changing the absolute value of its Fourier transform – will also be useful.))

We are now ready to establish Kolmogorov’s theorem (Theorem 2(i)); our arguments are loosely based on the original construction of Kolmogorov (though he was not in possession at the time of the Stein maximal principle). In view of the equivalence between (ii) and (v) in Theorem 3, it suffices to show that the maximal operator

\displaystyle  f\mapsto \sup_{N \in {\bf R}} |1_{D \leq N} f|

fails to be of weak-type {(1,1)} on Schwartz functions. Recalling that the Hilbert transform

\displaystyle  Hf(x) := p.v. \int_{\bf R} \frac{f(y)}{x-y}\ dy

is also a Fourier multiplier operator

\displaystyle  H = -\pi i \mathrm{sgn}(D)

some routine calculations then show that

\displaystyle  1_{D \leq N} f(x) = \frac{1}{2} f(x) + \frac{1}{2\pi i} p.v. \int_{\bf R} \frac{f(y)}{x-y} e^{2\pi i N(x-y)}\ dy

for any Schwartz function {f}. By the triangle inequality, it then suffices to show that the maximal operator

\displaystyle  {\mathcal C} f(x) := \sup_{N \in {\bf R}} |p.v. \int_{\bf R} \frac{f(y)}{x-y} e^{2\pi i N(x-y)}\ dy|

fails to be of weak type {(1,1)} on Schwartz functions.

To motivate the construction, note from a naive application of the triangle inequality that

\displaystyle  {\mathcal C} f(x) \leq \int_{\bf R} \frac{|f(y)|}{|x-y|}\ dy.

If the function {x \mapsto \frac{1}{|x|}} was absolutely integrable, then by Young’s inequality we would conclude that the maximal operator {{\mathcal C}} was strong type {(1,1)}, and hence also weak type {(1,1)}. Thus any counterexample must somehow exploit the logarithmic divergence of the integral of {\frac{1}{|x|}}. However, there are two potential sources of cancellation that could ameliorate this divergence: the sign {\mathrm{sgn}(x-y)} of the Hilbert kernel {\frac{1}{x-y}}, and the phase {e^{2\pi i N(x-y)}}. But because of the supremum in {N}, we can select the frequency parameter {N} as we please, as long as it depends only on {x} and not on {y}. The idea is then to choose {N} (and the support of {f}) to remove both sources of cancellation as much as possible.

We turn to the details. Let {n} be a large natural number, and then select {n} widely separated frequency scales

\displaystyle  1 < N_1 < \dots < N_n.

In order to assist with removing cancellation in the phases {e^{2\pi i N(x-y)}} later, we will require these scales {N_1,\dots,N_n} to be integers. The precise choice of scales is not too important as long as they are widely separated and integer valued, but for sake of concreteness one could for instance set {N_j := 100^{j}}. Let {\varphi \in C^\infty_c({\bf R})} be a bump function of total mass {1} supported on {[-1,1]}, and let {f} be the Schwartz function

\displaystyle  f(x) := \sum_{j=1}^n N_j \varphi( N_j(x-j) ), \ \ \ \ \ (9)

thus {f} is an approximation (in a weak sense) to the sum of Dirac masses {\sum_{j=1}^n \delta_j}, with the frequency scale of the approximation to {\delta_j} increasing rapidly in {j}. We easily compute the {L^1} norm of {f}:

\displaystyle  \|f\|_{L^1({\bf R})} = \sum_{j=1}^n 1 = n. \ \ \ \ \ (10)

Now we estimate {{\mathcal C}f(x)} for {x} in the interval {[j_0+0.1, j_0+0.9]} for some natural number {n/3 \leq j_0 \leq 2n/3}; note the set of all such {x} has measure {\sim n}. In this range we will test the maximal operator at the frequency cutoff {N = N_{j_0}}:

\displaystyle  {\mathcal C} f(x) \geq |p.v. \int_{\bf R} \frac{f(y)}{x-y} e^{2\pi i N_{j_0}(x-y)}\ dy|.

As {f} is supported in {\bigcup_{j=1}^n [j-N_j^{-1}, j+N_j^{-1}]}, we see (for {n} large enough) that {x} avoids the support of {f} and we can replace the principal value integral with the ordinary integral. Substituting (9), we conclude that

\displaystyle  {\mathcal C} f(x) \geq |\sum_{j=1}^n \int_{{\bf R}} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i N_{j_0}(x-j-N_j^{-1} t)}\ dt|.

As {N_{j_0}} is an integer, the phase {e^{-2\pi i N_{j_0} j}} is equal to {1}. We also cancel out the phase {e^{2\pi i N_{j_0} x}} as being independent of {j}, thus

\displaystyle  {\mathcal C} f(x) \geq |\sum_{j=1}^n \int_{{\bf R}} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i \frac{N_{j_0}}{N_j} t}\ dt|.

For {j \leq j_0}, we exploit the oscillatory nature of the phase {e^{2\pi i \frac{N_{j_0}}{N_j} t}} through an integration by parts, leading to the bound

\displaystyle  \int_{\bf R} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i \frac{N_{j_0}}{N_j} t}\ dt \lesssim_\varphi \frac{N_j}{N_{j_0}}

(one could even gain a factor of {\frac{1}{\langle j-j_0 \rangle}} here if desired, but we will not need it). Summing, we have

\displaystyle  \sum_{j=1}^{j_0} \int_{\bf R} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i \frac{N_{j_0}}{N_j} t}\ dt \lesssim_\varphi 1. \ \ \ \ \ (11)

For {j > j_0}, we instead exploit the near-constant nature of the phase {e^{2\pi i \frac{N_{j_0}}{N_j} t}} by writing

\displaystyle  e^{2\pi i \frac{N_{j_0}}{N_j} t} = 1 + O( \frac{N_{j_0}}{N_j} )

and similarly

\displaystyle  \frac{1}{x-j-N_j^{-1} t} = \frac{1}{x-j} + O( \frac{1}{(j-j_0)^2})

to conclude that

\displaystyle  \int_{\bf R} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i \frac{N_{j_0}}{N_j} t}\ dt = \frac{1}{j-j_0} + O( \frac{1}{(j-j_0)^2} + \frac{N_{j_0}}{N_j} ).

Summing and combining with (11), we conclude (from the rapidly increasing nature of the {N_j}) that

\displaystyle  \sum_{j=1}^n \int_{\bf R} \frac{\varphi(t)}{x-j-N_j^{-1} t} e^{2\pi i \frac{N_{j_0}}{N_j} t}\ dt = \log n + O_\varphi( 1 ),

and thus (for {n} large)

\displaystyle  |\{ x: {\mathcal C} f(x) \geq \frac{1}{2} \log n \}| \gtrsim n.

Comparing this with (10) we contradict the conclusion of Theorem 3(iv), giving the claim.

Remark 7 In 1926, Kolmogorov refined his construction to obtain a function {f \in L^1({\bf R}/{\bf Z})} whose Fourier sums {S_N f(x)} diverged everywhere (not just almost everywhere).

Exercise 8 (Radamacher-Menshov theorem)
  • (i) Let {f_1,\dots,f_n \in L^2(X,\mu)} be some square-integrable functions on a probability space {(X,\mu)}, with {n=2^K} a power of two. By performing a suitable Whitney type decomposition (similar to that used in Section 3 of Notes 1), establish the pointwise bound

    \displaystyle  \sup_{0 \leq N < n} |\sum_{1 \leq j \leq N} f_j(x)| \lesssim \sum_{0 \leq k < K} (\sum_{J: |J|=2^k} |\sum_{j \in J} f(x)|^2)^{1/2}

    where for each {k=0,\dots,K}, {J} ranges over dyadic intervals of the form {J = [j2^k+1, (j+1)2^k)} with {0 \leq j < 2^{K-k}}. If furthermore the {f_1,\dots,f_n} are orthogonal to each other, establish the maximal inequality

    \displaystyle  \| \sup_{0 \leq N \leq n} |\sum_{1 \leq j \leq N} f_j| \|_{L^2(X,\mu)} \lesssim (\log n) (\sum_{j=1}^n \|f_j\|_{L^2(X,\mu)}^2)^{1/2}.

  • (ii) If {f \in L^2({\bf R}/{\bf Z})} is a trigonometric polynomial with at most {n} non-zero coefficients for some {n \geq 1}, use part (i) to establish the bound

    \displaystyle  \| \sup_{N \in {\bf Z}} |P_N f| \|_{L^2({\bf R}/{\bf Z})} \lesssim (\log n) \|f\|_{L^2({\bf R}/{\bf Z})}.

  • (iii) If {f \in L^2({\bf R}/{\bf Z})} lies in the Sobolev space

    \displaystyle  H^s({\bf R}/{\bf Z}) := \{ f \in L^2({\bf R}/{\bf Z}): \sum_{j \in {\bf Z}} \langle j \rangle^{2s} |\hat f(j)|^2 < \infty \}

    for some {s>0}, use (ii) to show that {\lim_{N \rightarrow \infty} S_N f(x) = f(x)} for almost every {x \in {\bf R}}.

— 2. Carleson’s theorem —

We now begin the proof of Carleson’s theorem (Theorem 2(ii)), loosely following the arguments of Lacey and Thiele (we briefly comment on other approaches at the end of these notes). In view of Proposition 3, it suffices to establish the weak-type {(2,2)} bound

\displaystyle  \| \sup_{N \in {\bf R}} |1_{D \leq N} f| \|_{L^{2,\infty}({\bf R})} \lesssim \|f\|_{L^2({\bf R})}

for Schwartz functions {f}. Because of the supremum, the expression {\sup_{N \in {\bf R}} |1_{D \leq N} f|} depends sublinearly on {f} rather than linearly; however there is a trick to reduce matters to considering linear estimates. By selecting, for each {x}, {N(x)} to be a frequency which attains (or nearly attains) the supremal value of {|1_{D \leq N(x)} f(x)|}, it suffices to establish the linearised estimate

\displaystyle  \| 1_{D \leq N(X)} f \|_{L^{2,\infty}({\bf R})} \lesssim \|f\|_{L^2({\bf R})}

uniformly for all measurable functions {N: {\bf R} \rightarrow {\bf R}}, where {1_{D \leq N(X)}} is the operator

\displaystyle  1_{D \leq N(X)} f(x) = \int_{\bf R} 1_{\xi \leq N(x)} \hat f(\xi) e^{2\pi i x \xi}\ d\xi.

One can think of this operator {1_{D \leq N(X)}} as the (Kohn-Nirenberg) quantisation of the rough symbol {(x,\xi) \mapsto 1_{\xi \leq N(x)}}. Unfortunately this symbol is far too rough for us to be able to use pseudodifferential operator tools from the previous set of notes. Nevertheless, the “time-frequency analysis” mindset of trying to efficiently decompose phase space into rectangles consistent with the uncertainty principle will remain very useful.

The next step is to dualise the weak {L^{2,\infty}} norm to linearise the dependence on {f} even further:

Exercise 9 Let {1 < p < \infty}, let {(X,\mu)} be a {\sigma}-finite measure space, let {f: X \rightarrow {\bf C}} be a measurable function, and let {A>0}. Show that the following claims are equivalent (up to changes in the implied constants in the asymptotic notation):
  • (i) One has {\| f\|_{L^{p,\infty}({\bf R})} \lesssim_p A}.
  • (ii) For every subset {E} of {X} of finite measure, the function {f} is absolutely integrable on {E}, and

    \displaystyle  \int_E f(x)\ d\mu(x) \lesssim_p A \mu(E)^{1/p'}.

In view of this exercise, we see that it suffices to obtain the bound

\displaystyle  \int_{\bf R} 1_E(X) 1_{D \leq N(X)} f\ dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2} \ \ \ \ \ (12)

for all Schwartz {f}, all sets {E \subset {\bf R}} of finite measure, and all measurable functions {N: {\bf R} \rightarrow {\bf R}}. Actually only the restriction of {N} to {E} is relevant here, so one can view {N} as a function just on {E} if desired. The operator {1_E(X) 1_{D \leq N(X)}} can be viewed as the quantisation of the (very rough) symbol {(x,\xi) \mapsto 1_E(x) 1_{\xi \leq N(x)}}, that is to say the indicator function of the region lying underneath the graph {\{ (x,N(x)): x \in E \}} of {N}:

\displaystyle  1_E(X) 1_{D \leq N(X)} f(x) = \int_{\bf R} 1_E(x) 1_{\xi \leq N(x)} \hat f(\xi) e^{2\pi i x \xi}\ d\xi.

A notable feature of the estimate (12) is that it enjoys three different symmetries (or near-symmetries), each of which is “non-compact” in the sense that it is parameterised by a parameter taking values in a non-compact space such as {{\bf R}} or {(0,+\infty)}:

  • (i) (Translation symmetry) For any spatial shift {x_0 \in {\bf R}}, both sides of (12) remain unchanged if we replace {f(x)} by {\mathrm{Trans}_{x_0} f(x) := f(x-x_0)}, the set {E} by the translate {E+x_0}, and the function {N} by {x \mapsto N(x-x_0)}.
  • (ii) (Dilation symmetry) For any scaling factor {\lambda \in (0,+\infty)}, both sides of (12) become multiplied by the same scaling factor {\lambda^{1/2}} if we replace {f} by {\mathrm{Dil}_\lambda f(x) := \lambda^{-1/2} f(x/\lambda)}, {E} by the dilate {\lambda \cdot E := \{ \lambda x: x \in E \}}, and the function {N} by {x \mapsto \lambda^{-1} N(x/\lambda)}.
  • (iii) (Modulation symmetry) For any frequency shift {\xi_0 \in {\bf R}}, both sides of (12) remain (almost) unchanged if we replace {f(x)} by {\mathrm{Mod}_{\xi_0} f(x) := e^{2\pi i x \xi_0} f(x)}, do not modify the set {E}, and replace the function {N} by {x \mapsto N(x)+\xi_0}. (Technically the left-hand side changes because of an additional factor of {e^{2\pi i x \xi_0}}, but this factor can be handled for instance by generalising the indicator function cutoff {1_E} to a subindicator function cutoff {\chi_E} that has the pointwise bound {|\chi_E| \leq 1_E}; we will ignore this very minor issue here.)

Each of these symmetries corresponds to a different symmetry of phase space {{\bf R} \times {\bf R} = \{ (x,\xi): x,\xi \in {\bf R}\}}, namely spatial translation {(x,\xi) \mapsto (x+x_0,\xi)}, dilation {(x,\xi) \mapsto (\lambda x, \lambda^{-1} \xi)}, and frequency translation {(x,\xi) \mapsto (x,\xi+\xi_0)} respectively. As a general rule of thumb, if one wants to prove a delicate estimate such as (12) that is invariant with respect to one or more non-compact symmetries, then one should use tools that are similarly invariant (or approximately invariant) with respect to these symmetries. Thus for instance Littlewood-Paley theory or Calderón-Zygmund theory would not be suitable tools to use here, as they are only invariant with respect to translation and dilation symmetry but absolutely fail to have any modulation symmetry properties (these theories prescribe a privileged role to the frequency origin, or equivalently they isolate functions of mean zero as playing a particularly important role).

Besides the need to respect the symmetries of the problem, one of the main difficulties in establishing (12) is that the expression {1_{\xi \leq N(x)} \hat f(\xi)}, couples together the function {N} with the function {f} in a rather complicated way (via the frequency variable {\xi}). We would like to try to decouple this interaction by making {N} and {f} instead interact with simpler objects (such as “wave packets”), rather than being coupled directly to each other. To motivate the decomposition to use, we begin with a heuristic discussion. The first main idea is to temporarily work in the (non-invertible) coordinate system {(N(x),\xi)} of phase space rather than {(x,\xi)} in order to simplify the constraint {1_{\xi \leq N(x)}} to the simple geometric region of a half-plane (this coordinate system is of course a terrible choice for most of the other parts of the argument, but is the right system to use for the frequency decompositions we will now employ). In analogy to the Whitney type decompositions used in Notes 1, one can split

\displaystyle  1_{\xi \leq N(x)} = \sum_{\omega_- \sim \omega_+} 1_{\omega_-}(\xi) 1_{\omega_+}(N(x)) \ \ \ \ \ (13)

for almost all choices of {\xi} and {N(x)} (at least if {\xi, N(x)} have the same sign), where {\omega_-,\omega_+} range over pairs of dyadic intervals that are “close” in the sense that {|\omega_-|=|\omega_+|} and that {\omega_-} and {\omega_+} are not adjacent, but their parents are adjacent, and with {\omega_-} to the left of {\omega_+}. (Here it is convenient to work with half-open dyadic intervals {[j/2^k, (j+1)2^k)}, to avoid issues with overlap.) If one ignores the caveats and blindly substitutes in the decomposition (13), the expression in the left of (12) becomes

\displaystyle  \sum_{\omega_- \sim \omega_+} \int_{\bf R} 1_E(x) 1_{\omega_+}(N(x)) 1_{\omega_-}(D) f(x)\ dx.

To decouple further, we will try to decompose {1_{\omega_-}(D)} into “rank one” operators. More precisely, we manipulate

\displaystyle  1_{\omega_-}(D) f(x) = 1_{\omega_-}(D) 1_{\omega_-}(D)^* f(x)

\displaystyle  = (f * \widetilde{1^\vee_{\omega_-}}) * 1^\vee_{\omega-}(x)

\displaystyle  = \int_{\bf R} (\int_{\bf R} f(z) \overline{1^\vee_{\omega_-}(z-y)}\ dz) 1^\vee_{\omega-}(x-y)\ dy

\displaystyle  = \int_{\bf R} \langle f, \mathrm{Trans}_y 1^\vee_{\omega_-} \rangle \mathrm{Trans}_y 1^\vee_{\omega_-}(x)\ dy

where we use the notation {\tilde g(x) := \overline{g(-x)}}. It will be convenient to try to discretise this integral average. From the uncertainty principle, modifying {y} by {O(1/|\omega_-|)} should only modify {\mathrm{Trans}_y 1^\vee_{\omega_-}(x)} approximately by a phase, so the integral here is roughly constant at spatial scales {O(1/|\omega_-|)}. So we heuristically have

\displaystyle  1_{\omega-}(D) f(x) \approx \frac{1}{|\omega_-|} \sum_{y \in \frac{1}{|\omega_-|} {\bf Z}} \int_{\bf R} \langle f, \mathrm{Trans}_y 1^\vee_{\omega_-} \rangle \mathrm{Trans}_y 1^\vee_{\omega_-}(x).

If we now define a tile to be a rectangle {P} in phase space of the form

\displaystyle  P = I_P \times \omega_P

where {\omega_P, I_P} are dyadic intervals and with unit area {|P| = |I_P| |\omega_P| = 1}, we see that every {y} in the above sum is associated to a tile {P_- = [y,y+\frac{1}{|\omega_-|}] \times \omega_-}. The interval {\omega_+} is then similarly assocated to a nearby tile {P_+ = [y,y+\frac{1}{|\omega_-|}] \times \omega_+}, and we write {P_- \sim P_+} to indicate the relationship between the two tiles (they share the same spatial interval {I_{P_-} = I_{P_+}}, but {P_+} lies just above {P_-}). We can then approximately write the left-hand side of (12) as

\displaystyle  \sum_{P_- \sim P_+} \langle f, \varphi_{P_-} \rangle \langle \varphi_{P_-} 1_{\omega_{P_+}}(N), 1_E \rangle \ \ \ \ \ (14)

where

\displaystyle  \varphi_{P_-}(x) := |I_{P_-}|^{1/2} 1^\vee_{\omega_{P_-}}(x - y)

is an {L^2}-normalised “wave packet” that is roughly localised to {P_-} in phase space. This approximate form of (12) has achieved the goal of decoupling the function {f} from the data {E,N}, as they both now interact with the tile pair {P_-, P_+} rather than through each other. Note also that the set of tiles {P} obeys an approximate version of the three symmetries that (12) does. Firstly, the set of tiles is invariant under dilations {(x,\xi) \mapsto (\lambda x, \lambda^{-1} \xi)} if {\lambda} is a power of two; secondly, once one fixes the scales {|I_P|, |\omega_P|} of the tiles, the remaining set of tiles is invariant under spatial translations by integer multiples of the spatial scale {|I_P|}, and under frequency translations by integer multiples of {|\omega_P|}. (We will need the discrete and nested nature of the tiles {P} for some subsequent combinatorial arguments, and it turns out to be worthwhile to accept a slightly degraded form of the three basic symmetries of the problem in return for such a discretisation.)

We now make the above heuristic decomposition rigorous. For any dyadic interval {\omega = [\inf \omega, \inf \omega + |\omega|)}, let {\omega_- := [\inf \omega, \inf \omega + |\omega|/2)} denote the left child interval, and {\omega_+ := [\inf \omega + |\omega|/2, \inf \omega + |\omega|)} the right child interval. We fix a bump function {\eta} supported on {[0.1, 0.2]} normalised to have {L^2} norm {1}; henceforth we permit all implied constants in the asymptotic notation to depend on {\eta}. For each interval {\omega} let {\eta_\omega} denote the rescaled function

\displaystyle  \eta_\omega(\xi) := \eta( \frac{\xi - \inf \omega}{|\omega|} ),

noting that this is a bump function supported on {\omega_-}. We will establish the estimate

\displaystyle  \sum_{\omega} |\int_{\bf R} 1_E(X) 1_{\omega_+}(N(X)) |\eta_\omega|^2(D) f\ dx| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2} \ \ \ \ \ (15)

where {\omega} ranges over all dyadic intervals. We assume (15) for now and see why it implies (12). The left-hand side of (15) is not quite dilation or frequency modulation invariant, but we can fix this by an averaging argument as follows. Applying the modulation invariance, we see for any {\xi_0 \in {\bf R}} that

\displaystyle  \sum_{\omega} |\int_{\bf R} 1_E(X) 1_{\omega_+}(N(X)+\xi_0) |\eta_\omega|^2(D) \mathrm{Mod}_{\xi_0} f\ dx| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2};

since

\displaystyle |\eta_\omega|^2(D) \mathrm{Mod}_{\xi_0} = \mathrm{Mod}_{\xi_0} |\eta_{\omega-\xi_0}|^2(D)

we thus have

\displaystyle  \sum_{\omega} |\int_{\bf R} 1_E(X) 1_{\omega_+-\xi_0}(N(X)) |\eta_{\omega-\xi_0}|^2(D) f\ dx| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}.

We temporarily truncate to a finite range of scales, and use the triangle inequality, to obtain

\displaystyle  \sum_{k=-K}^K \sum_{\omega: |\omega| = 2^{-k}} \int_{\bf R} 1_E(X) 1_{\omega_+-\xi_0}(N(X)) |\eta_{\omega-\xi_0}|^2(D) f\ dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

for any finite {K}. For fixed {k}, the expression

\displaystyle  \sum_{\omega: |\omega| = 2^{-k}} \int_{\bf R} 1_E(X) 1_{\omega_+-\xi_0}(N(X)) |\eta_{\omega-\xi_0}|^2(D) f\ dx

is periodic in {\xi_0} with period {2^{-k}}, with average equal to

\displaystyle  2^k \int_{\bf R} \int_{\bf R} 1_E(X) 1_{[\zeta+2^{-k-1}, \zeta+2^{-k}]}(N(X)) |\eta_{[\zeta, \zeta+2^{-k}]}|^2(D) f\ dx\ d\zeta

which we can rewrite as

\displaystyle  \int_{\bf R} \int_{\bf R} 1_E(x) (2^k \int_{\bf R} 1_{[\zeta+2^{-k-1}, \zeta+2^{-k}]}(N(x)) |\eta(2^k(\xi-\zeta))|^2\ d\zeta) \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx

which one can rewrite further (using the change of variables {\zeta = N(x) - 2^{-k} t}) as

\displaystyle  \int_{\bf R} \int_{\bf R} 1_E(x) \Psi( 2^k (N(x)-\xi) ) \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx

where

\displaystyle  \Psi(\xi) := \int_{1/2}^1 |\eta( t - \xi )|^2\ dt.

Hence if we average over all {\xi_0} in (say) {[0,2^K]}, we conclude that

\displaystyle  \sum_{k=-K}^K \int_{\bf R} \int_{\bf R} 1_E(x) \Psi( 2^k (N(x)-\xi) ) \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

and hence on sending {K} to infinity

\displaystyle  \sum_{k \in {\bf Z}} \int_{\bf R} \int_{\bf R} 1_E(x) \Psi( 2^k (N(x)-\xi) ) \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

Using dilation symmetry, we also see that

\displaystyle  \sum_{k \in {\bf Z}} \int_{\bf R} \int_{\bf R} 1_E(x) \Psi( \lambda 2^k (N(x)-\xi) ) \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

for any {\lambda > 0}. Averaging this for {\lambda \in [1,2]} with Haar measure {d\lambda/\lambda}, we conclude that

\displaystyle  \int_{\bf R} \int_{\bf R} 1_E(x) \sum_{k \in {\bf Z}} \int_1^2 \Psi( \lambda 2^k (N(x)-\xi) ) \frac{d\lambda}{\lambda} \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}.

But as {\Psi} is a bump function supported in {(0,+\infty)}, one has

\displaystyle \sum_{k \in {\bf Z}} \int_1^2 \Psi( \lambda 2^k (N(x)-\xi) ) \frac{d\lambda}{\lambda} = \sum_{k \in {\bf Z}} \int_{2^k}^{2^{k+1}} \Psi( \lambda (N(x)-\xi) ) \frac{d\lambda}{\lambda}

\displaystyle  = 1_{\xi \leq N(x)} \int_0^\infty \Psi(\lambda) \frac{d\lambda}{\lambda}.

The quantity {\int_0^\infty \Psi(\lambda) \frac{d\lambda}{\lambda}} is a non-zero constant, hence

\displaystyle  \int_{\bf R} \int_{\bf R} 1_E(x) 1_{\xi \leq N(x)} \hat f(\xi) e^{2\pi i x \xi}\ d\xi dx \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

which is (12).

It remains to prove (15). As in the heuristic discussion, we approximately decompose the convolution {|\eta_\omega|^2(D) f} into a sum over tiles. We have

\displaystyle  |\eta_\omega|^2(D) f(x) = f * \tilde \eta_\omega^\vee * \eta_\omega^\vee(x)

\displaystyle  = \int_{\bf R} (\int_{\bf R} f(z) \overline{\eta_\omega^\vee}(z-y)\ dz) \eta_\omega^\vee(x-y)\ dy

\displaystyle  = \int_{\bf R} \langle f, \mathrm{Trans}_y \eta_\omega^\vee \rangle \mathrm{Trans}_y \eta_\omega^\vee\ dy. \ \ \ \ \ (16)

Motivated by this, we define as before a tile {P} to be a rectangle {I_P \times \omega_P} with {I_P, \omega_P} dyadic intervals with {|I_P| |\omega_P| = 1}; we also split each such tile into an upper half {P_+ := I_P \times (\omega_P)_+} and a lower half {P_- := I_P \times (\omega_P)_-}. We refer to {|I_P|} as the spatial scale of the tile, and the reciprocal {|\omega_P| = |I_P|^{-1}} as the frequency scale. For each tile {P} define the wave packet

\displaystyle  \phi_P := |I_P|^{1/2} \mathrm{Trans}_{\inf I_P} \eta_{\omega_P}^\vee,

which is a Schwartz function with Fourier support in {(\omega_P)_-} (in fact it is supported in {[\inf \omega_P + 0.1 |\omega_P|, \inf \omega_P + 0.2 |\omega_P|]}) that is normalised to have {L^2} norm {1} and is localised spatially near {I_P}, so morally it has “phase space support in {P_-}“. We will later establish the estimate

\displaystyle  \sum_P |\langle f, \phi_P \rangle \langle \phi_P 1_{(\omega_P)_+}(N), 1_E \rangle| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2} \ \ \ \ \ (17)

for all {f \in {\mathcal S}({\bf R})} and sets {E} of finite measure (cf. (14)), where {P} ranges over the set of all tiles. For now, we show why this estimate implies (15) and hence (12). Just as (12) was obtained from (15) by averaging over dilation and frequency modulations, we shall recover (15) from (17) by averaging over spatial translations. As before, we first temporarily restrict the size range of {\omega_P} and use the triangle inequality to obtain

\displaystyle  \sum_{k=-K}^K \sum_{|\omega| = 2^{-k}} |\sum_{P: \omega_P = \omega} \langle f, \phi_P \rangle \langle \phi_P 1_{\omega_+}(N), 1_E \rangle| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}.

Applying translation symmetry, we conclude that

\displaystyle  \sum_{k=-K}^K \sum_{|\omega| = 2^{-k}} |\sum_{P: \omega_P = \omega} \langle \mathrm{Trans}_{x_0} f, \phi_P \rangle \langle \phi_P 1_{\omega_+}(N(\cdot - x_0)), 1_{E+x_0} \rangle| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}

for any {x_0 \in {\bf R}}. The left-hand side may be rewritten as

\displaystyle  \sum_{k=-K}^K \sum_{|\omega| = 2^{-k}} |\sum_{P: \omega_P = \omega} \langle f, \phi_{P+(x_0,0)} \rangle \langle \phi_{P+(x_0,0)} 1_{\omega_+}(N(\cdot)), 1_{E} \rangle|

where we extend the definition of {\phi_P} to translated tiles {P + (x_0,0)} in the obvious fashion. The expression inside the absolute values is periodic in {x_0} with period {2^k}, and averages to

\displaystyle  \int_{\bf R} \langle f, \mathrm{Trans}_y \eta_\omega^\vee \rangle \langle \mathrm{Trans}_y \eta_\omega^\vee 1_{\omega_+}(N(\cdot)), 1_{E} \rangle\ dy

which by (16) simplifies to

\displaystyle  \langle |\eta_\omega|^2(D) f 1_{\omega_+}(N(\cdot)), 1_{E} \rangle

and so on averaging in {x_0} and then sending {K} to infinity we recover (15).

It remains to establish (17). It is convenient to introduce the sets

\displaystyle  E_P^+ := \{ x \in E: N(x) \in (\omega_P)_+ \}

so that the target estimate (17) simplifies slightly to

\displaystyle  \sum_{P} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}. \ \ \ \ \ (18)

As advertised, we have now decoupled the influences of {f} and the influences of {E,N} (which determine the sets {E_P^+}), as these quantities now only directly interact with the wave packets {\phi_P}, rather than with each other. Moreover, {f} in some sense only interacts with the lower half {P_-} of the tile (as this is where {\phi_P} is concentrated), while {E} and {N} only interact with the upper half {P_+} of the tile.

One advantage of this “model” formulation of the problem is that one can naturally build up to the full problem by trying to establish estimates of the form

\displaystyle  \sum_{P \in {\mathcal P}} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}. \ \ \ \ \ (19)

where {{\mathcal P}} is some smaller set of tiles. For instance, if we can prove (19) for all finite collections {{\mathcal P}} of tiles, then by monotone convergence we recover the required estimate.

The key problem here is that tiles {P} have three degrees of freedom: scale, spatial location, and frequency location, corresponding to the three symmetries of dilation, spatial translation, and frequency modulation of the original estimate (12). But one can warm up by looking at families of tiles that only exhibit two or fewer degrees of freedom, in a way that slowly builds up the various techniques we will need to apply to establish the general case:

The case of a single tile We begin with the simplest case of a single tile (so that there are zero degrees of freedom):

\displaystyle  \langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle \lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}. \ \ \ \ \ (20)

On the one hand, {\phi_P} is normalised in {L^2}, by Cauchy-Schwarz we have

\displaystyle  \langle f, \phi_P \rangle \lesssim \|f\|_{L^2({\bf R})}. \ \ \ \ \ (21)

From the construction of {\phi_P} we see that we have the pointwise bounds

\displaystyle  \phi_P(x) \lesssim |I_P|^{-1/2} \chi^{100}_{I_P}(x) \ \ \ \ \ (22)

where we use {\chi_I(x)} to denote the following variant of the indicator function {1_I(x)} that has a non-trivial tail:

\displaystyle  \chi_I(x) := \langle \frac{\mathrm{dist}(x,I)}{|I|} \rangle^{-1}.

In particular from Hölder’s inequality we have the bounds

\displaystyle  \langle \phi_P, 1_{E_P^+} \rangle \lesssim \| \phi_P \|_{L^1({\bf R})} \lesssim |I_P|^{1/2} \ \ \ \ \ (23)

and also

\displaystyle  \langle \phi_P, 1_{E_P^+} \rangle \lesssim \| \phi_P \|_{L^\infty({\bf R})} |E_P^+|

\displaystyle  \lesssim |I_P|^{-1/2} |E|

thanks to the trivial bound {|E_P^+| \leq |E|}, so on taking geometric means we have

\displaystyle  \langle \phi_P, 1_{E_P^+} \rangle \lesssim |E|^{1/2}

and the claim (20) follows.

The case of separated tiles of fixed scale Now we let {{\mathcal P}} be a collection of tiles all of a fixed spatial scale {|I_P| = 2^k} (so that {|\omega_P| = 2^{-k}} (so that we have the two parameters of spatial and frequency location, but not the scale parameter). Among other things, this makes the tiles {P} in {{\mathcal P}} essentially disjoint (i.e., disjoint ignoring sets of measure zero). This disjointness manifests itself in two useful ways. Firstly, we claim that we can improve the trivial bound

\displaystyle  \langle \phi_P, 1_{E_P^+} \rangle \lesssim |I_P|^{-1/2} |E|

obtained previously to the “mass estimate”

\displaystyle  \sum_{P \in {\mathcal P}} |I_P|^{1/2} |\langle \phi_P, 1_{E_P^+} \rangle| \lesssim |E|; \ \ \ \ \ (24)

secondly, we claim that we can improve the Cauchy-Schwarz bound (21) to the “energy estimate”

\displaystyle  \sum_{P \in {\mathcal P}} |\langle f, \phi_P \rangle|^2 \lesssim \|f\|_{L^2({\bf R})}^2. \ \ \ \ \ (25)

(Here we informally use “mass” to refer to expressions and estimates of {L^1} type involving the set {E} and graph {N}, and “energy” to refer to expressions and estimates of {L^2} type involving the function {f}.) If we assume (24), (25) for now, then by combining (24) with (23) we have

\displaystyle  \sum_{P \in {\mathcal P}} |\langle \phi_P, 1_{E_P^+} \rangle|^2 \lesssim |E|,

and then from (25) and Cauchy-Schwarz we obtain the required bound (19) in this case.

Now let us see why (24) is true. To motivate the argument, suppose that {\phi_P} had no tail outside of {I_P}, so that one could replace {\chi_{I_P}^{100}} to {1_{I_P}} in (22). Then would have

\displaystyle  |I_P|^{1/2} \langle \phi_P, 1_{E_P^+} \rangle \lesssim |\{ x \in E: (x,N(x)) \in P_+ \}|,

and as the tiles {P_+} are all essentially disjoint the claim (24) would then follow from summing in {P}, since each {x \in E} contributes to at most one of the sets {\{ x \in E: (x,N(x)) \in P_+ \}}. Now we have to deal with the contribution of the tails. We can bound

\displaystyle  \sum_{P \in {\mathcal P}} |I_P|^{1/2} |\langle \phi_P, 1_{E_P^+} \rangle| \leq \int_E |I_P|^{1/2} \sum_{P \in {\mathcal P}: N(x) \in (\omega_P)_+} |\phi_P(x)|\ dx.

For each {x \in E}, there is at most one dyadic interval {\omega} of the fixed length {2^{-k}} such that {N(x) \in \omega_+}. Thus in the above sum {\omega_P} is fixed, and only {I_P} can vary; from (22) we then see that {|I_P|^{1/2} \sum_{P \in {\mathcal P}: N(x) \in (\omega_P)_+} |\phi_P(x)|\ dx \lesssim 1}, giving (24).

Now we prove (25). The intuition here is that the essential disjointness of the tiles {P} make the {\phi_P} approximately orthogonal, so that (25) should be a variant of Bessel’s inequality. We exploit this approximate orthogonality by a {TT^*} method, which we perform here explicitly. By duality we have

\displaystyle  (\sum_{P \in {\mathcal P}} |\langle f, \phi_P \rangle|^2)^{1/2} = \sum_{P \in {\mathcal P}} \overline{c_P} \langle f, \phi_P \rangle

\displaystyle  = \langle f, \sum_{P \in {\mathcal P}} c_P \phi_P \rangle

for some coefficients {c_P} with {\sum_{P \in {\mathcal P}} |c_P|^2 = 1}, so by Cauchy-Schwarz it suffices to show that

\displaystyle  \| \sum_{P \in {\mathcal P}} c_P \phi_P \|_{L^2({\bf R})}^2 \lesssim 1.

The left-hand side expands as

\displaystyle  \sum_{P, P' \in {\mathcal P}} c_P \overline{c_{P'}} \langle \phi_P, \phi_{P'} \rangle. \ \ \ \ \ (26)

From the Fourier support of {\phi_P, \phi_{P'}} we see that the inner product {\langle \phi_P, \phi_{P'} \rangle} vanishes unless the intervals {(\omega_P)_-, (\omega_{P'})_-} overlap which by the equal sizes of {\omega_P, \omega_{P'}} force {\omega_P = \omega_{P'}}. In this case we can use (22) to bound the inner product by

\displaystyle  \langle \phi_P, \phi_{P'} \rangle \lesssim \langle \frac{\mathrm{dist}(I_P, I_{P'})}{|I_P|} \rangle^{-100}

and then a routine application of Schur’s test gives (26). This establishes (25), giving (19) in the case of tiles of equal dimensions.

The case of a regular {-}-tree

Now we attack some cases where the tiles {P} can vary in scale. In phase space, a key geometric difficulty now arises from the fact that tiles may start partially overlapping each other, in contrast to the previous case in which the essential disjointness of the tile set was crucial in establishing the key estimates (24), (25). However, because we took care to restrict the intervals {I_P, \omega_P} of the tiles to be dyadic, there are only a limited number of ways in which two tiles can overlap. Given two rectangles {R = I_R \times \omega_R} and {R' = I_{R'} \times \omega_{R'}}, we define the relation {R \leq R'} if {I_R \subset I_{R'}} and {\omega_R \supset \omega_{R'}}; this is clearly a partial order on rectangles. The key observation is as follows: if two tiles {P, P'} overlap, then either {P \leq P'} or {P \geq P'}. Similarly if {P, P'} are replaced by their upper tiles {P_+, P'_+} or by their lower tiles {P_-, P'_-}. Note that if {P, P'} are tiles with {P \leq P'}, then one of {P_+ \leq P'_+} or {P_- \leq P'_-} holds (and the only way both inequalities can hold simultaneously is if {P=P'}).

As was first observed by Fefferman, a key configuration of tiles that needs to be understood for these sorts of problems is that of a tree.

Definition 10 Let {P_T} be a tile. A tree with top {P_T} is a collection {T} of tiles {P} with the property that {P \leq P_T} for all {P \in T}. (For minor technical reasons it is convenient to not require the top {P_T} to actually lie in the tree {T}, though this is often the case.) We write {I_T = I_{P_T}} for the spatial support of the tree, and {\omega_T = \omega_{P_T}} for the frequency support of the tree top. If we in fact have {P_+ \leq (P_T)_+} for all {P \in T}, we say that {T} is a {+}-tree; similarly if {P_- \leq (P_T)_-} for all {P \in T}, we say that {T} is a {-}-tree. (Thus every tree can be partitioned into a {+}-tree and a {-}-tree with the same top as the original tree.)

The tiles in a tree can vary in scale and in spatial location, but once these two parameters are given, the frequency location is fixed, so a tree can again be viewed as a “two-parameter” subfamily of the three-parameter family of tiles.

We now prove (19) in the case when {{\mathcal P}} is a {-}-tree {T}, thus {P_- \leq (P_T)_-} for all {P \in T}. Here, the {\langle f, \phi_P \rangle} factors will all “collide” with each other and there will be no orthogonality to exploit here; on the other hand, there will be a lot of “disjointness” in the {E_P^+} that can be exploited instead.

To illustrate the key ideas (and to help motivate the arguments for the general case) we will also make the following “regularity” hypotheses: there exists two quantities {\varepsilon, \mu > 0} (which we will refer to as the energy and mass of the tree respectively) for which we have the upper bounds

\displaystyle  |I_P|^{-1/2} \langle f, \phi_P \rangle \lesssim \varepsilon \ \ \ \ \ (27)

and

\displaystyle  |I_P|^{-1} \int_{\bf R} 1_{E}(x) \chi^{10}_{I_P}(x)\ dx \lesssim \mu \ \ \ \ \ (28)

for all {P \in T}; informally, these estimates assert that {f} is size {O(\varepsilon)} “on average” on the tiles in the tree, and similarly that {E} has density {O(\mu)} on all tiles in the tree. (These are slightly oversimplified versions of the energy and mass concept; we will refine these notions later.) For technical reasons we also need to generalise (28) to

\displaystyle  |I_{P'}|^{-1} \int_{\bf R} 1_{E}(x) \chi^{10}_{I_{P'}}(x)\ dx \lesssim \mu \ \ \ \ \ (29)

for any tile {P'} with {P' \geq P} for some {P \in T}. (Informally, (29) asserts that a sort of “Hardy-Littlewood maximal function” of {1_E} is bounded by {O(\mu)} on the tree.)

We also assume that we have the reverse bounds for the tree top:

\displaystyle  |I_{T}|^{-1/2} \langle f, \phi_{P_T} \rangle \gtrsim \varepsilon \ \ \ \ \ (30)

and

\displaystyle  |I_{T}|^{-1} \int_{\bf R} 1_{E}(x) \chi^{10}_{I_{T}}(x)\ dx \gtrsim \mu. \ \ \ \ \ (31)

It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either {\varepsilon} or {\mu}.

We will use (27), (28), (29) to establish the tree estimate

\displaystyle  \sum_{P \in T} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim \varepsilon \mu |I_T|. \ \ \ \ \ (32)

Note from (30) and Cauchy-Schwarz that

\displaystyle  \varepsilon \lesssim |I_T|^{-1/2} \|f\|_{L^2}

and from (31) and Cauchy-Schwarz one similarly has

\displaystyle  \mu \lesssim |I_T|^{-1/2} |E|^{1/2}

and so (32) recovers the desired estimate (19).

It remains to establish the tree estimate (32). It will be convenient to use the tree {T} to partition the real line {{\bf R}} into dyadic intervals {J} that are naturally “adapted to” the geometry of the tree (or more precisely to the spatial intervals {\{ I_P: P \in T\}} of the tree) in a certain way (in a manner reminiscent of a Whitney decomposition).

Exercise 11 (Whitney-type decomposition associated to a tree) Let {T} be a non-empty tree. Show that there exists a family {{\mathcal J}} of dyadic intervals with the following properties:
  • (i) The intervals {J} in {{\mathcal J}} form a partition of {{\bf R}} (up to sets of measure zero).
  • (ii) For each {J \in {\mathcal J}} and any {P \in T} with {|I_P| < |J|}, we have {\mathrm{dist}(I_P,J) \gtrsim |J|}.
  • (iii) For each {J \in {\mathcal J}}, there exists {P \in T} with {|I_P| \lesssim |J|} and {\mathrm{dist}(I_P,J) \lesssim |J|}.
(Hint: one can choose {{\mathcal J}} to be the collection of all dyadic intervals {J} whose dilate {3J := [\inf J - |J|, \inf J + 2|J|)} does not contain any {I_P, P \in T}, and which is maximal with respect to set inclusion.)

We can of course assume that the tree {T} is non-empty, since (32) is trivial for empty sets of tiles. We apply the partition from Exercise 11. By the triangle inequality, we can bound the left hand side of (32) by

\displaystyle  \sum_{J \in {\mathcal J}} \sum_{P \in T} |\langle f, \phi_P \rangle| \int_J |\phi_P| 1_{E_P^+}

which by (27), (22) may be bounded by

\displaystyle  \lesssim \varepsilon \sum_{J \in {\mathcal J}} \sum_{P \in T} \int_J \chi_{I_P}^{100} 1_{E_P^+}.

We first dispose of the narrow tiles {P} in which {|I_P| < |J|}. By Exercise 11(ii) this forces {\mathrm{dist}(I_P, J) \gtrsim |J|}. From (28) we have

\displaystyle  \int_J \chi_{I_P}^{100} 1_{E_P^+} \lesssim \mu |J| (|I_P| / \mathrm{dist}(I_P,J))^{20}

\displaystyle  \lesssim \mu |J| (|I_P|/|J|)^{10} (|I_P| / \mathrm{dist}(I_P,J))^{10}

(say). For each fixed spatial scale {|I_P| = 2^k \leq |I_T|}, the intervals {I_P} in the tree {T} are all essentially disjoint, so a routine calculation then shows

\displaystyle  \sum_{P\in T: |I_P| = 2^k} (|I_P| / \mathrm{dist}(I_P,J))^{10} \lesssim |J|/2^k

(say), so that

\displaystyle  \sum_{P \in T: |I_P| < |J|} \int_J \chi_{I_P}^{100} 1_{E_P^+} \lesssim \mu |J| \sum_{2^k \leq |I_T|, c|J|} (2^k/|J|)^{9}

\displaystyle  \lesssim \mu |J| \min( 1, |I_T|/|J| )^{9}

which from Exercise 11(ii) implies that the contribution of the {|I_P| < |J|} case to (32) is acceptable.

Now we consider the wide tiles in which {|I_P| \geq |J|}. From Exercise 11(ii) this case is only possible if {|J| \leq |I_T|} and {\mathrm{dist}(J, I_T) \lesssim |I_T|}. Thus the {J} are now restricted to an interval of length {O(|I_T|)}, and it will suffice to establish the local estimate

\displaystyle  \sum_{P \in T: |I_P| \geq |J|} \int_J \chi_{I_P}^{100} 1_{E_P^+} \lesssim \mu |J|

for each {J}. Note that for each fixed spatial scale {2^k}, there is at most one choice {\omega_k} of frequency interval {\omega_P} with {P \in T} and {|\omega_P| = 2^{-k}}, thus for fixed {k} the set {E_P^+ = \{ x \in {\bf R}: N(x) \in \omega_P^+\} = \{ x \in {\bf R}: N(x) \in \omega_k^+\}} is independent of {P}. We may then sum in {P} for each such scale to conclude

\displaystyle  \sum_{P \in T: |I_P| \geq |J|} \int_J \chi_{I_P}^{100} 1_{E_P^+} \lesssim \sum_k \int_J 1_{E_k^+}.

Now we make the crucial observation that in a {-}-tree {T}, the intervals {\omega_k^+} are all essentially disjoint, hence the {E_k^+} are disjoint as well. As these sets are also contained in {E}, we conclude that

\displaystyle \sum_k \int_J 1_{E_k^+} \lesssim \int_J 1_E.

From Exercise 11(iii) and (29) (choosing a tile {P'} with spatial scale {\sim |J|} and within {O(|J|)} of {J}, and with {P' \geq P} for the tile {P} provided by Exercise 11(iii)) we have

\displaystyle \int_J 1_E \lesssim \mu |J|

giving the claim.

The case of a regular {+}-tree

We now complement the previous case by establishing (19) for (certain types of) {+}-trees {T}. The situation is now reversed: there is a lot of “collision” in the {E_P^+}, but on the other hand there is now some “orthogonality” in the {\langle f, \phi_P \rangle} that can be exploited.

As before we will assume some regularity on the {+}-tree {T}, namely that there exist {\varepsilon,\mu>0} for which one has the upper bounds

\displaystyle  |I_P|^{-1/2} (\sum_{P' \in T: P' \leq P} |\langle f, \phi_{P'} \rangle|^2)^{1/2} \lesssim \varepsilon \ \ \ \ \ (33)

for all {P \in T} (note this is slightly stronger than (27)), as well as the bound (29) for any tile {P'} with {P' \geq P} for some {P \in T}. We complement this with the matching lower bounds

\displaystyle  |I_{T}|^{-1/2} (\sum_{P \in T} |\langle f, \phi_{P} \rangle|^2)^{1/2} \gtrsim \varepsilon \ \ \ \ \ (34)

and (31).

As before we will focus on establishing the tree estimate (32). From (31) and Cauchy-Schwarz as before we have

\displaystyle  \mu \lesssim |I_T|^{-1/2} |E|^{1/2}.

As we now have a {+}-tree, the tiles {P_-, P \in T} become disjoint (up to null sets), and we can obtain an almost orthogonality estimate:

Exercise 12 (Almost orthogonality) For any {+}-tree {T}, show that

\displaystyle  \|\sum_{P \in T} c_P \phi_P \|_{L^2({\bf R})}^2 \lesssim \sum_{P \in T} |c_P|^2

for all complex numbers {c_P}, and use this to deduce the Bessel-type inequality

\displaystyle  \sum_{P \in T} |\langle f, \phi_{P} \rangle|^2 \lesssim \|f\|_{L^2({\bf R})}^2.

From this exercise and (34) we see that

\displaystyle  \varepsilon \lesssim |I_T|^{-1/2} \|f\|_{L^2({\bf R})}

and so the desired bound (19) will follow from the tree estimate (32).

In this case it will be convenient to linearise the sum to remove the absolute value signs; more precisely, to show (32) it suffices to show that

\displaystyle  \sum_{P \in T} \epsilon_P \langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle \lesssim \varepsilon \mu |I_T|

for any complex numbers {\epsilon_P} of magnitude {1}. Again we may assume that the tree {T} is non-empty, and use the partition {{\mathcal J}} from Exercise 11, to split the left-hand side as

\displaystyle  \sum_{J \in {\mathcal J}} \int_J \sum_{P \in T} \epsilon_P \langle f, \phi_P \rangle \phi_P 1_{E_P^+}.

The contribution of the narrow tiles {|I_P| < |J|} can be disposed of as before without any additional difficulty, so we focus on estimating the contribution

\displaystyle  \sum_{J \in {\mathcal J}} \int_J \sum_{P \in T: |I_P| \geq |J|} \epsilon_P \langle f, \phi_P \rangle \phi_P 1_{E_P^+}

of the wide tiles. As before, in order for this sum to be non-empty {J} has to be contained in an {O(|I_T|)} neighbourhood {CI_T} of {I_T}.

The main difficulty here is the dependence of {E_P^+} on {P}. We rewrite

\displaystyle  1_{E_P^+}(x) = 1_E(x) 1_{N(x) \in \omega_P^+}

so that the above expression can be written as

\displaystyle  \sum_{J \in {\mathcal J}} \int_{J \cap E} \sum_{P \in T: |I_P| \geq |J|; N(x) \in \omega_P^+} \epsilon_P \langle f, \phi_P \rangle \phi_P(x)\ dx.

Now for a key geometric observation: the intervals {\omega_P^+} are nested (and decrease when {|I_P|} increases), so the condition {N(x) \in \omega_P^+} is equivalent to a condition of the form {|I_P| \leq R(x)} for some scale {R(x)} depending on {x}. Thus the above sum can be written as

\displaystyle  \sum_{J \in {\mathcal J}} \int_{J \cap E} \sum_{P \in T: |J| \leq |I_P| \leq R(x)} \epsilon_P \langle f, \phi_P \rangle \phi_P(x)\ dx. \ \ \ \ \ (35)

One can bound the integrand here by a “maximal Calderón-Zygmund operator”

\displaystyle  \sup_R |\sum_{P \in T: |J| \leq |I_P| \leq R(x)} \epsilon_P \langle f, \phi_P \rangle \phi_P(x)|

which is basically a sup over truncations of the “(modulated) pseudodifferential operator”

\displaystyle  f \mapsto \sum_{P \in T: |J| \leq |I_P|} \epsilon_P \langle f, \phi_P \rangle \phi_P.

The point of this formulation is that the integrand can now be expressed as a sort of “Littlewood-Paley projection” of the function

\displaystyle  F(x) := \sum_{P \in T} \epsilon_P \langle f, \phi_P \rangle \langle \phi_P(x)

to the region of frequency space corresponding to those intervals {\omega_P^+} with {|J| \leq |I_P| \leq R}:

Exercise 13 Establish the pointwise estimate

\displaystyle  \sup_R |\sum_{P \in T: |J| \leq |I_P| \leq R} \epsilon_P \langle f, \phi_P \rangle \phi_P(x)| \lesssim \sup_{I: I \supset J} \frac{1}{|I|} \int_I |F|

for all {x \in J} where {I} ranges over all intervals (not necessarily dyadic) containing {J}.

From (29) and Exercise 11(iii) as before we have

\displaystyle  |J \cap E| \lesssim \mu |J|

and so we can bound the expression (35) by

\displaystyle  \lesssim \sum_{J \in {\mathcal J}: J \subset CI_T} \mu |J| \sup_{I: I \supset J} \frac{1}{|I|} \int_I |F|

which one can bound in terms of the Hardy-Littlewood maximal function {MF} of {F}, followed by Cauchy-Schwarz and the Hardy-Littlewood inequality, and finally Exercise 12, as

\displaystyle  \lesssim \sum_{J \in {\mathcal J}: J \subset CI_T} \mu \int_J MF

\displaystyle  \lesssim \mu \int_{CI_T} MF

\displaystyle  \lesssim \mu |I_T|^{1/2} \| MF \|_{L^2({\bf R})}

\displaystyle  \lesssim \mu |I_T|^{1/2} \| F \|_{L^2({\bf R})}

\displaystyle  \lesssim \mu |I_T|^{1/2} (\sum_{P \in T} |\langle f, \phi_P \rangle|^2)^{1/2}.

On the other hand, from (33) we have

\displaystyle  \sum_{P \in T: P \leq P_*} |\langle f, \phi_P \rangle|^2 \leq \varepsilon^2 |I_{P_*}|

for every {P_* \in T}. By grouping the tiles in {T} according to their maximal elements {P_*} (which necessarily have essentially disjoint spatial intervals) and applying the above inequality to each such group and summing, we conclude that

\displaystyle  \sum_{P \in T} |\langle f, \phi_P \rangle|^2 \leq \varepsilon^2 |I_T|

and the tree estimate (32) follows.

The general case

We are now ready to handle the general case of an arbitrary finite collection {{\mathcal P}} of tiles. Motivated by the previous discussion, we define two quantities:

Definition 14 (Energy and mass) For any non-empty finite collection {{\mathcal P}} of tiles, we define the energy {\mathrm{Energy}({\mathcal P})} to be the quantity

\displaystyle  \mathrm{Energy}({\mathcal P}) := \sup_T |I_T|^{-1/2} (\sum_{P \in T} |\langle f, \phi_P \rangle|^2)^{1/2}

where {T} ranges over all {+}-trees in {{\mathcal P}}, and the mass {\mathrm{Mass}({\mathcal P})} to be the quantity

\displaystyle  \mathrm{Mass}({\mathcal P}) := \sup_{P \in {\mathcal P}} \sup_{P' \geq P} |I_{P'}|^{-1} \int_{E_{P'}} \chi_{I_{P'}}^{10}

where {E_{P'}} is the set

\displaystyle  E_{P'} := \{ x \in E: N(x) \in \omega_{P'} \}

(thus for instance {E_{P'}^+ \subset E_{P'} \subset E}). By convention, we declare the empty set of tiles to have energy and mass equal to zero.

Note here that the definition of mass has been modified slightly from previous arguments, in that we now use {E_{P'}} instead of {E}. However, this turns out to be an acceptable modification, in the sense that we still continue to have the analogue of (32):

Exercise 15 (Tree estimate) If {T} is a tree, show that

\displaystyle \sum_{P \in T} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim \mathrm{Energy}(T) \mathrm{Mass}(T) |I_T|.

Since {\chi_{I_{P'}}^{10}} has an {L^1} norm of {O(|I_{P'}|)}, we also have the trivial bound

\displaystyle  \mathrm{Mass}({\mathcal P}) \lesssim 1 \ \ \ \ \ (36)

for any finite collection of tiles {{\mathcal P}}.

The strategy is now to try to partition an arbitrary family {{\mathcal P}} of tiles into collections of disjoint trees {T} (or “forests”, if you will) whose energy {\mathrm{Energy}(T)}, mass {\mathrm{Mass}(T)}, and spatial scale {|I_T|} are all under control, apply Exercise 15 to each tree, and sum. To do this we rely on two key selection results, which are vaguely reminiscent of the Calderón-Zygmund decomposition:

Proposition 16 (Energy selection) Let {{\mathcal P}} be a finite collection of tiles with

\displaystyle  \mathrm{Energy}({\mathcal P}) \leq \varepsilon \ \ \ \ \ (37)

for some {\varepsilon > 0}. Then one can partition {{\mathcal P}} into a collection {{\mathcal T}} of disjoint trees {T} with

\displaystyle  \sum_{T \in {\mathcal T}} |I_T| \lesssim \varepsilon^{-2} \|f\|_{L^2({\bf R})}^2

together with a remainder set {{\mathcal P}'} with

\displaystyle  \mathrm{Energy}({\mathcal P}') \leq \frac{1}{2} \varepsilon.

Proposition 17 (Mass selection) Let {{\mathcal P}} be a finite collection of tiles with

\displaystyle  \mathrm{Mass}({\mathcal P}) \leq \mu

for some {\mu > 0}. Then one can partition {{\mathcal P}} into a collection {{\mathcal T}} of disjoint trees {T} with

\displaystyle  \sum_{T \in {\mathcal T}} |I_T| \lesssim \mu^{-1} |E|

together with a remainder set {{\mathcal P}'} with

\displaystyle  \mathrm{Mass}({\mathcal P}') \leq \frac{1}{2} \mu.

(In these propositions, “disjoint” means that any given tile {P} belongs to at most one of the trees {T} in {{\mathcal T}}; but the tiles in one tree are allowed to overlap the tiles in another tree.)

Let us assume these two propositions for now and see how these (together with Exercise 15) establishes the required estimate (19) for an arbitrary collection {{\mathcal P}} of tiles. We may assume without loss of generality that {\|f\|_{L^2}} and {|E|} are non-zero. Rearranging the above two propositions slightly, we see that if {{\mathcal P}_n} is a finite collection of tiles such that

\displaystyle  \mathrm{Energy}({\mathcal P}_{n}) \leq 2^{-n/2} \|f\|_{L^2({\bf R})}; \quad \mathrm{Mass}({\mathcal P}_n) \leq 2^{-n} |E| \ \ \ \ \ (38)

for some integer {n} then after applying Proposition 16 followed by Proposition 17, we can partition {{\mathcal P}_n} into a disjoint collection {{\mathcal T}_n} of trees {T} with

\displaystyle  \sum_{T \in {\mathcal T}_n} |I_T| \lesssim 2^n \ \ \ \ \ (39)

together with a remainder {{\mathcal P}_{n+1}} with

\displaystyle  \mathrm{Energy}({\mathcal P}_{n+1}) \leq 2^{-(n+1)/2} \|f\|_{L^2({\bf R})}; \quad \mathrm{Mass}({\mathcal P}_{n+1}) \leq 2^{-(n+1)} |E|.

Note that any finite collection of tiles {{\mathcal P}} will obey (38) for some sufficiently large and negative {n}. Starting with this {n} and then iterating indefinitely, and discarding any empty families, we can therefore partition any finite collection of tiles {{\mathcal P}} as

\displaystyle  {\mathcal P} = \bigcup_n \bigcup_{T \in {\mathcal T}_n} T \cup {\mathcal P}_{-\infty}

where {{\mathcal T}_n} are collections of trees (empty for all but finitely many {n}) such that

\displaystyle  \mathrm{Energy}({\mathcal T}_{n}) \leq 2^{-n/2} \|f\|_{L^2({\bf R})}; \quad \mathrm{Mass}({\mathcal T}_n) \leq 2^{-n} |E| \ \ \ \ \ (40)

and (39) holds, and {{\mathcal P}_{-\infty}} is a residual collection of tiles with

\displaystyle  \mathrm{Energy}({\mathcal P}_{-\infty}) = \mathrm{Mass}({\mathcal P}_{-\infty}) = 0. \ \ \ \ \ (41)

We can then bound the left-hand side of (19) by

\displaystyle  \sum_n \sum_{T \in {\mathcal T}_n} \sum_{P \in T} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle|

\displaystyle  + \sum_{P \in {\mathcal P}_{-\infty}} |\langle f, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle|.

From Exercise 15 applied to individual tiles {P \in {\mathcal P}_{-\infty}} and (41) we see that the second term in this expression vanishes. For the first term, we use Exercise 15, (40), (36) to bound this sum by

\displaystyle  \lesssim \sum_n \sum_{T \in {\mathcal T}_n} |I_T| (2^{-n/2} \|f\|_{L^2({\bf R})}) \min( 1, 2^{-n} |E| )

which by (39) is bounded by

\displaystyle  \lesssim \|f\|_{L^2({\bf R})} \sum_n 2^{n/2} \min( 1, 2^{-n} |E| )

which sums to {\lesssim \|f\|_{L^2({\bf R})} |E|^{1/2}} as required.

It remains to establish the energy and mass selection lemmas. We begin with the mass selection claim, Proposition 17. Let {{\mathcal P}_1} denote the set of all tiles {P'} with {P' \geq P} for some {P \in {\mathcal P}} and such that

\displaystyle  |I_{P'}|^{-1} \int_{E_{P'}} \chi_{I_{P'}}^{10} > \frac{1}{2} \mu. \ \ \ \ \ (42)

Let {{\mathcal P}_*} denote the set of tiles in {{\mathcal P}_1} that are maximal with respect to the tile partial order. (Note that the left-hand side of (42) is bounded by {O(|E|/|I_{P'}|)}, so there is an upper bound to the spatial scales of the tiles involved here.) Then every tile {P} in {{\mathcal P}} is either less than or equal to a tile in {{\mathcal P}_*}, or is such that

\displaystyle  |I_{P'}|^{-1} \int_{E_{P'}} \chi_{I_{P'}}^{10} \leq \frac{1}{2} \mu

for all {P' \leq P}. Thus if we let {{\mathcal P}'} be the collection of tiles of the second form, and let {{\mathcal T}} be the collection of trees with tree top {P_*} associated to each {P_* \in {\mathcal P}_*} (selected greedily, and in arbitrary order, subject of course to the requirement that no tile belongs to more than one tree), we obtain the required partition

\displaystyle  {\mathcal P} = \biguplus_{T \in {\mathcal T}} T \uplus {\mathcal P}'

with

\displaystyle  \mathrm{Mass}({\mathcal P}') \leq \frac{1}{2} \mu

and it remains to establish the bound

\displaystyle  \sum_{T \in {\mathcal T}} |I_T| \lesssim \mu^{-1} |E|. \ \ \ \ \ (43)

This will be a (rather heavily disguised) variant of the Hardy-Littlewood maximal inequality. By construction, the tree tops {P_T, T \in {\mathcal T}} are essentially disjoint, and one has

\displaystyle  \int_{E_{P_T}} \chi_{I_T}^{10} > \frac{1}{2} \mu |I_T|

for all such tree tops. To motivate the argument, suppose for sake of discussion that we had the stronger estimate

\displaystyle  \int_{E_{P_T}} 1_{I_T} > \frac{1}{2} \mu |I_T|. \ \ \ \ \ (44)

By the essential disjointness of the {P_T}, the sets

\displaystyle  E_{P_T} \cap I_T = \{ x \in E: (x,N(x)) \in P_T) \}

are also essentially disjoint subsets of {E}, hence

\displaystyle  \sum_{T \in {\mathcal T}} \int_{E_{P_T}} 1_{I_T} \leq |E|

and the claim (43) would then follow. Now we do not quite have (44); but from the pigeonhole principle we see that for each {T \in {\mathcal T}} there is a natural number {m = m_T} such that

\displaystyle  \int_{E_{P_T}} 1_{2^m I_T} \gtrsim 2^{2m} \mu |I_T|

(say), where {2^m I_T} denotes the interval with the same center as {I_T} but {2^m} times the length (this is not quite a dyadic interval). We now restrict attention to those {T} associated to a fixed choice of {m}. Let {2^m P_T := 2^m I_T \times \omega_{T}} denote the corresponding dilated tiles, then we have

\displaystyle  |\{ x \in E: (x,N(x)) \in 2^m P_T \}| \gtrsim 2^{2m} \mu |I_T| \ \ \ \ \ (45)

for each {T} with {m_T = m}.

Unfortunately, the {2^m P_T} are no longer disjoint. However, by the greedy algorithm (repeatedly choosing maximal tiles (in the tile ordering)), we can find a collection {{\mathcal T}_m \subset {\mathcal T}} such that

  • (i) All the dilated tree tops {2^m P_T, P_T \in {\mathcal T}_m} are essentially disjoint.
  • (ii) For every {T \in {\mathcal T}} with {m_T = m}, there is {T' \in {\mathcal T}_m} such that {2^m P_T} intersects {2^m P_{T'}} and {|I_T| \leq |I_{T'}|}.

From property (i) and (45) we have

\displaystyle  \sum_{T' \in {\mathcal T}_m} 2^{2m} \mu |I_{T'}| \lesssim |E|.

On the other hand, from property (ii) we see that the sum of all the {|I_T|} for all {T \in {\mathcal T}} with {m_T = m} associated to a single {T' \in {\mathcal T}_m} is {O(2^m |I_{T'}|)}. Putting the two statements together we see that

\displaystyle  \sum_{T \in {\mathcal T}: m_T = m} |I_T| \lesssim 2^{-m} \mu^{-1} |E|

and on summing in {m} we obtain the required claim (43).

Finally, we prove the energy selection claim, Proposition 16. The basic idea is to extract all the high-energy trees from {{\mathcal P}} in such a way that the {+}-tree component of those trees are sufficiently “disjoint” from each other that a useful Bessel inequality, generalising Exercise 12, may be deployed. Implementing this strategy correctly turns out however to be slightly delicate. We perform the following iterative algorithm to generate a partition

\displaystyle  {\mathcal P} = \biguplus_{T \in {\mathcal T}} T \uplus {\mathcal P}' \ \ \ \ \ (46)

as well as a companion collection {{\mathcal T}_+} of {+}-trees as follows.

  • Step 1. Initialise {{\mathcal T}={\mathcal T}_+ = \emptyset} and {{\mathcal P}' = {\mathcal P}}.
  • Step 2. If {\mathrm{Energy}({\mathcal P}') \leq \frac{1}{2} \varepsilon} then STOP. Otherwise, go on to Step 3.
  • Step 3. Since we now have {\mathrm{Energy}({\mathcal P}') > \frac{1}{2} \varepsilon}, {{\mathcal P}'} contains a {+}-tree {T_+} for which

    \displaystyle  \sum_{P \in T_+} |\langle f, \phi_P \rangle|^2 > \frac{1}{4} \varepsilon^2 |I_{T_+}|. \ \ \ \ \ (47)

    Among all such {T_+}, choose one for which the midpoint {\inf \omega_{T_+} + \frac{1}{2} |\omega_{T_+}|} of the frequency is minimal. (The reason for this rather strange choice will be made clearer shortly.)
  • Step 4. Add {T_+} to {{\mathcal T}_+}, add the larger tree {T := \{ P \in {\mathcal P}': P \leq P_{T_+} \}} (with the same top {P_{T_+}} as {T_+}) to {{\mathcal T}}, then remove {T} from {{\mathcal P}'}. We also remove the adjacent trees {T_{right} = \{ P \in {\mathcal P}': P \leq P_{T_+} + (|I_{T_+}|,0)\}} and {T_{left} = \{ P \in {\mathcal P}': P \leq P_{T_+} - (|I_{T_+}|,0)\}} from {{\mathcal P}'} and also place them into {{\mathcal T}}. Now return to Step 2.

This procedure terminates in finite time to give a partition (46) with {\mathrm{Energy}({\mathcal P}') \leq \frac{1}{2} \varepsilon}, and with the trees {{\mathcal T}} coming in triplets {T, T_{right}, T_{left}} all associated to a {+}-tree {T_+} in {{\mathcal T}_+} with the same spatial scale as {T}, with all the {+}-trees {T_+} disjoint and obeying the estimates

\displaystyle  \sum_{P \in T_+} |\langle f, \phi_P \rangle|^2 \sim \varepsilon^2 |I_{T_+}| \ \ \ \ \ (48)

(both the upper and lower bounds will be important for this argument). It will then suffice to show that

\displaystyle  \sum_{T_+ \in {\mathcal T}_+} |I_{T_+}| \lesssim \varepsilon^{-2} \|f\|_{L^2({\bf R})}^2;

by (48), it then suffices to show the Bessel type inequality

\displaystyle  \sum_{P \in \bigcup_{T_+ \in {\mathcal T}_+} T_+} |\langle f, \phi_P \rangle|^2 \lesssim \|f\|_{L^2({\bf R})}^2. \ \ \ \ \ (49)

Now we make a crucial observation: not only are the trees {T_+} in {{\mathcal T}_+} disjoint (in the sense that no tile {P} belongs to two of these trees), but the lower tiles {\{ P_-: P \in \bigcup_{T_+ \in {\mathcal T}_+}\}} are also essentially disjoint. Indeed we claim an even stronger disjointness property: if {P \in T_+ \in {\mathcal T}_+}, {P' \in T'_+ \in {\mathcal T}_+} are such that {\omega_{P_-} \subsetneq \omega_{P'_-}}, then {I_{P'}} is not only disjoint from the larger dyadic interval {I_P}, but is in fact disjoint from the even larger interval {3I_{T_+}}. To see this, suppose for contradiction that {\omega_{P_-} \subsetneq \omega_{P'_-}} and {I_{P'} \subset 3I_{T_+}}. There are three possibilities to rule out:

  • {T'_+} is equal to {T_+}. This can be ruled out because any two lower frequency intervals {\omega_{Q_-}, Q \in T_+} associated to a {+}-tree are either equal or disjoint.
  • {T'_+} was selected after {T_+} was. To rule this out, observe that {\omega_{P'_-}} contains the parent {\omega_P} of {\omega_{P_-}}, and hence {P' \leq P_{T_+}}, {P' \leq P_{T_+} + (|I_{T_+}|,0)}, or {P' \leq P_{T_+} - (|I_{T_+}|,0)}. Thus, when {T_+} was selected, {P'} should have been placed with one of the three trees {T_{left}, T, T_{right}} associated to {T} and would therefore not have been available for inclusion into {T'_+}, a contradiction.
  • {T'_+} was selected before {T_+} was. If this case held, then the midpoint of {\omega_{T_+}} would have to be greater than or equal to that of {\omega_{T'_+}}, otherwise {T'_+} would not have a minimal midpoint at the time of its selection. But {\omega_{T_+}} is contained in {\omega_P}, which is contained in {\omega_{P'_-}}, which lies below {\omega_{P'_+}}, which contains {(\omega_{T'_+})_+}, which contains the midpoint of {\omega_{T'_+}}; thus the midpoint of {\omega_{T_+}} lies strictly below that of {\omega_{T'_+}}, a contradiction.

If the {\phi_P} were perfectly orthogonal to each other, this disjointness would be more than enough to establish (49). Unfortunately we only have imperfect orthogonality, and we have to work a little harder. As usual, we turn to a {TT^*} type argument. We can write the left-hand side of (49) as

\displaystyle  \langle \sum_{P \in \bigcup_{T_+ \in {\mathcal T}_+} T_+} \langle f, \phi_P \rangle \phi_P, f \rangle

so by Cauchy-Schwarz it suffices to show that

\displaystyle  \| \sum_{P \in \bigcup_{T_+ \in {\mathcal T}_+} T_+} \langle f, \phi_P \rangle \phi_P \|_{L^2({\bf R})}^2 \lesssim \sum_{P \in \bigcup_{T_+ \in {\mathcal T}_+} T_+} |\langle f, \phi_P \rangle|^2. \ \ \ \ \ (50)

By the triangle inequality, the left-hand side may be bounded by

\displaystyle  \sum_{T_+, T'_+ \in {\mathcal T}_+} \sum_{P \in T_+, P' \in T'_+} |\langle f, \phi_P \rangle| |\langle f, \phi_{P'} \rangle| |\langle \phi_P, \phi_{P'} \rangle|.

As {\phi_P} has Fourier support in {\omega_{P_-}}, we see that {\langle \phi_P, \phi_{P'} \rangle} vanishes unless {\omega_{P_-}} and {\omega_{P'_+}} overlap. By symmetry it suffices to consider the cases {\omega_{P_-} = \omega_{P'_-}} and {\omega_{P_-} \subsetneq \omega_{P'_-}}.

First let us consider the contribution of {\omega_{P_-} = \omega_{P'_-}}. Using Young’s inequality {ab \leq \frac{1}{2}(a^2+b^2)} and symmetry, we may bound this contribution by

\displaystyle  \sum_{T_+, T'_+ \in {\mathcal T}_+} \sum_{P \in T_+, P' \in T'_+: \omega_{P_-} = \omega_{P'_-}} |\langle f, \phi_P \rangle|^2 |\langle \phi_P, \phi_{P'} \rangle|.

A direct calculation using (22) reveals that

\displaystyle  \sum_{P': \omega_{P_-} = \omega_{P'_-}} =|\langle \phi_P, \phi_{P'} \rangle| \lesssim 1

so the contribution of this case is at most

\displaystyle  \sum_{T_+} \sum_{P \in T_+} |\langle f, \phi_P \rangle|^2

as desired.

Now we deal with the case when {\omega_{P_-} \subsetneq \omega_{P'_-}}, which by the preceding discussion implies that {|I_{P'}| < |I_P|} and {I_{P'}} lies outside of {3I_T}. Here we use (37) to bound

\displaystyle  |\langle f, \phi_{P} \rangle| \leq \varepsilon |I_{P}|^{1/2}

and

\displaystyle  |\langle f, \phi_{P'} \rangle| \leq \varepsilon |I_{P'}|^{1/2}

and then we can bound this contribution by

\displaystyle  \varepsilon^2 \sum_{T_+ \in {\mathcal T}_+} \sum_{P \in T_+} |I_P|^{1/2} \sum_{P': |I_{P'}| < |I_P|, I_{P'} \cap 3I_T = \emptyset} |I_{P'}|^{1/2} |\langle \phi_P, \phi_{P'}|.

Direct calculation using (22) reveals that

\displaystyle  \sum_{P': |I_{P'}| < |I_P|, I_{P'} \cap 3I_T = \emptyset} |I_{P'}|^{1/2} |\langle \phi_P, \phi_{P'}| \lesssim |I_T|^{1/2} (|I_P|/|I_T|)^{10}

(say), and also

\displaystyle  \sum_{P \in T_+} |I_P|^{1/2} |I_T|^{1/2} (|I_P|/|I_T|)^{10} \lesssim |I_T|

so we obtain a bound of

\displaystyle  \varepsilon^2 \sum_{T_+ \in {\mathcal T}_+} |I_T|

which is acceptable by (48). This finally finishes the proof of Proposition 16, which in turn completes the proof of Carleson’s theorem.

Remark 18 The Lacey-Thiele proof of Carleson’s theorem given above relies on a decomposition of a tileset in a way that controls both energy and mass. The original proof of Carleson dispenses with mass (or with the function {N(x)}), and focuses on controlling maximal operators that (in our notation) are basically of the form

\displaystyle  \sup_N | \sum_{P \in {\mathcal P}: N \in \omega_{P_+}} \langle f, \phi_P \rangle \phi_P(x) |.

To control such functions, one iterates a decomposition similar to Proposition 16 to partition {{\mathcal P}} into trees with good energy control, and establishes pointwise control of the contribution of each tree outside of an exceptional set. See Section 4 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis. The proof of Fefferman takes the opposite tack, dispensing with energy and focusing on bounding the operator norm of the linearised operator

\displaystyle  f \mapsto \sum_{P \in {\mathcal P}: N(x) \in \omega_{P_+}} \langle f, \phi_P \rangle \phi_P(x).

Roughly speaking, the strategy is to iterate a version of Proposition 16 for partition {{\mathcal P}} into “forests” of disjoint trees, though in Fefferman’s argument some additional work is invested into obtaining even better disjointness properties on these forests than is given here. See Section 5 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis.

A modification of the above arguments used to establish the weak {(2,2)} estimate can also establish restricted weak-type {(p,p)} estimates for any {p>2}:

Exercise 19 For any sets {E,F \subset {\bf R}} of finite measure, and any measurable function {N: {\bf R} \rightarrow {\bf R}}, show that

\displaystyle  \int_{\bf R} 1_E(X) 1_{D \leq N(X)} 1_F\ dx \lesssim_p |F|^{1/p} |E|^{1/p'} \ \ \ \ \ (51)

for any {2 \leq p < \infty}. (Hint: repeat the previous analysis with {f=1_F}, but supplement it with an additional energy bound {\mathrm{Energy}({\mathcal P}) \lesssim 1} coming from a suitably localised version of Exercise 12.)

The bound (51) is also true for {1 < p < 2}, yielding Hunt’s theorem, but this requires some additional arguments of Calderón-Zygmund type, involving the removal of an exceptional set {\Omega} defined using the Hardy-Littlewood maximal function:

Exercise 20 (Hunt’s theorem) Let {E, F \subset {\bf R}} be of finite non-zero measure, and let {N: {\bf R} \rightarrow {\bf R}} be a measurable function. Let {\Omega} be the exceptional set

\displaystyle  \Omega := \{ x \in {\bf R}: M 1_F \geq C \frac{|F|}{|E|} \}

for a large absolute constant {C}; note from the Hardy-Littlewood inequality that {|\Omega| \leq \frac{1}{2} |E|} if {C} is large enough.
  • (i) If {{\mathcal P}} be a finite collection of tiles with {I_P \subset \Omega} for all {P \in {\mathcal P}}, show that

    \displaystyle  \sum_{P \in {\mathcal P}} |\langle 1_F, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim |F|.

    (Hint: By using (22) and the disjointness of the {E_P^+} when {I_P} is fixed, first establish the estimate

    \displaystyle  \sum_{P \in {\mathcal P}: I_P = I} |\langle 1_F, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim 2^{-k} \frac{|F|}{|E|} |I|

    whenever {k} is a natural number and {I} is an interval with {2^k I \subset \Omega} and {2^{k+1} I \not \subset \Omega}.)
  • (ii) If {{\mathcal P}} be a finite collection of tiles with {I_P \subsetneq \Omega} for all {P \in {\mathcal P}}, show that {\mathrm{Energy}( {\mathcal P} ) \lesssim |F|/|E|}. (For a given tree {T}, one can introduce the dyadic intervals {J} as in Exercise 11, then perform a Calderón-Zygmund type decomposition to {1_F}, splitting it into a “good” function bounded pointwise by {O(|F|/|E|)}, plus “bad functions” that are supported on the intervals {J} and have mean zero. See this paper of Grafakos, Terwilleger, and myself for details.)
  • (iii) For any finite collection of tiles for all {P \in {\mathcal P}}

    \displaystyle  \sum_{P \in {\mathcal P}} |\langle 1_F, \phi_P \rangle \langle \phi_P, 1_{E_P^+} \rangle| \lesssim |F| \log(2 + \frac{|E|}{|F|}).

  • (iv) Show that (51) holds for all {1 < p < \infty}, and conclude Theorem 2(iii).

Remark 21 The methods of time-frequency analysis given here can handle several other operators that, like the Carleson operator, exhibit scaling, translation, and frequency modulation symmetries. One model example is the bilinear Hilbert transform

\displaystyle  B(f,g)(x) := p.v. \int_{\bf R} \frac{f(x-t) g(x-2t)}{t}\ dt

for {f,g \in {\mathcal S}({\bf R})}. The methods in this set of notes were used by Lacey and Thiele to establish the estimates

\displaystyle  \| B(f,g) \|_{L^{r'}({\bf R})} \lesssim_{p,q,r} \|f\|_{L^p({\bf R})} \|g\|_{L^q({\bf R})}

for {2 < p,q,r < \infty} with {1/p + 1/q + 1/r = 1} (these estimates have since been strengthened and extended in a number of ways). We only give the briefest of sketches here. Much as how Carleson’s theorem can be reduced to a bound (19), the above estimates can be reduced to the estimation of a model sum

\displaystyle  \sum_{(P_1,P_2,P_3) \in {\mathcal P}} \langle f, \phi_{P_1} \rangle \langle g, \phi_{P_2} \rangle \langle h, \phi_{P_3} \rangle

where {{\mathcal P}} is a certain collection of triples {(P_1,P_2,P_3)} of tiles {P_1,P_2,P_3} with common spatial interval {I_{P_1} = I_{P_2} = I_{P_3}} and frequency intervals {\omega_{P_1}, \omega_{P_2}, \omega_{P_3}} varying along a certain one-parameter family for each fixed choice of spatial interval. One then uses a variant of Proposition 16 to partition {{\mathcal P}} into “{1}-trees”, “{2}-trees”, and “{3}-trees”, the contribution of each of which can be controlled by the energies of {f, g, h} on such trees, times the length of the spatial support of the tree, in analogy with Exercise 15. See for instance the text of Muscalu and Schlag for more discussion and further results.

Remark 22 The concepts of mass and energy can be abstracted into a framework of {L^p} spaces associated to outer measures (as opposed to the classical setup of {L^p} spaces associated to countably additive measures), in which the mass and energy selection propositions can be viewed as consequences of an abstract Carleson embedding theorem, and the calculations establishing estimates such as (19) from such propositions and a tree estimate can be viewed as consequences of an “outer Hölder inequality”. See this paper of Do and Thiele for details.