Ciprian Demeter, Michael Lacey, Christoph Thiele and I have just uploaded our joint paper, “The Walsh model for $M_2^*$ Carleson” to the arXiv. This paper (which was recently accepted for publication in Revista Iberoamericana) establishes a simplified model for the key estimate (the “ $M_2^*$ Carleson estimate”) in another (much longer) paper of ours on the return times theorem of Bourgain, in which the Fourier transform is replaced by its dyadic analogue, the Walsh-Fourier transform. This model estimate is established by the now-standard techniques of time-frequency analysis: one decomposes the expression to be estimated into a sum over tiles, and then uses combinatorial stopping time arguments into group the tiles into trees, and the trees into forests. One then uses (phase-space localised, and frequency-modulated) versions of classical Calderòn-Zygmund theory (or in this particular case, a certain maximal Fourier inequality of Bourgain) to control individual trees and forests, and sums up over the trees and forests using orthogonality methods (excluding an exceptional set if necessary).

Rather than discuss time-frequency analysis in detail here, I thought I would dwell instead on the return times theorem, and sketch how it is connected to the $M_2^*$ Carleson estimate; this is a more complicated version of the “ $M_2$ Carleson estimate”, which is an estimate which is logically equivalent to Carleson’s famous theorem (and its extension by Hunt) on the almost everywhere convergence of Fourier series.

— The Carleson-Hunt theorem —

Let’s begin with the Carleson-Hunt theorem, which asserts that the Fourier inversion formula $f(x) = \lim_{N \to \infty} \int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi$

is valid for almost every $x \in {\Bbb R}$, whenever f lies in an $L^p({\Bbb R})$ space for some $1 < p < \infty$. Here, the Fourier transform is defined for sufficiently rapidly decreasing f by the formula $\hat f(\xi) = \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx$

and then extended to $L^p({\Bbb R})$ by density (here one can use a classical theorem of Riesz that asserts that the Dirichlet operators $S_N f(x) := \int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi$ can be continuously extended to $L^p({\Bbb R})$ for every $1 < p < \infty$).

The convergence theorem is easy to show if the function f is smooth and rapidly decreasing; the difficulty is to extend it to the rougher functions in $L^p({\Bbb R})$. It turns out that the key lies in establishing the maximal inequality $\| \sup_N |\int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi| \|_{L^p({\Bbb R})} \ll_p \|f\|_{L^p({\Bbb R})}$ (1)

for all test functions f; once one has (1), the Carleson-Hunt theorem follows by a standard approximation argument. (Conversely, by using Stein’s maximal principle, one can show that the Carleson-Hunt theorem is in fact logically equivalent to (1), at least in the range $1 < p < 2$.)

One can recast this estimate in a slightly different manner. Define the Carleson operator ${\mathcal C}f(\theta, x) = \int_{-\infty}^\theta \hat f(\xi) e^{2\pi i x \xi} = p.v. \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{f(x-y)}{y} e^{2\pi i y \theta}\ dy$.

Note that this operator extends the Hilbert transform, which is recovered by restricting $\theta$ to equal zero. It is then not hard to show that the estimate (1) is equivalent to the estimate $\| \| {\mathcal C} f(x,\theta) \|_{L^\infty_\theta({\Bbb R})} \|_{L^p_x({\Bbb R})} \ll_p \|f\|_{L^p({\Bbb R})}.$ (2)

— The return times theorem —

Now let us leave Fourier analysis and pass to ergodic theory, which studies the long-time dynamics of measure-preserving systems $(X, {\mathcal X}, \mu, T)$, thus $(X,{\mathcal X},\mu)$ is a probability space, and $T: X \to X$ is a measure-preserving bijection. One of the fundamental theorems in ergodic theory is, of course, the pointwise (or Birkhoff) ergodic theorem, which asserts that for any function $f \in L^p(X)$ with $1 \leq p \leq \infty$, the sequence $\frac{1}{N} \sum_{n=1}^N f( T^n x )$

is convergent as $N \to \infty$ for almost every $x \in X$ (as measured with respect to $\mu$, of course). (If the measure-preserving system is ergodic, then the limit is equal to $\int_X f\ d\mu$, but for this discussion we will not be interested in the specific value of the limit.) Just as the Carleson-Hunt theorem (which is a qualitative convergence result) is connected to (2) (which is a quantitative estimate), the pointwise ergodic theorem is connected to an estimate, namely the maximal ergodic theorem $\| \sup_N | \frac{1}{N} \sum_{n=1}^N f( T^n x ) | \|_{L^p(X)} \ll_p \|f\|_{L^p(X)},$

valid for all $1 < p \leq \infty$ and $f \in L^p({\Bbb R})$. (Actually, the maximal ergodic theorem is a little stronger than this, incorporating a sharp weak (1,1) inequality, but I will not discuss this improvement here.) The maximal ergodic theorem does not directly imply the pointwise ergodic theorem, although it does let deduce the $1 < p < \infty$ case of the pointwise ergodic theorem from the $p=\infty$ case by the density argument mentioned earlier. The maximal ergodic theorem is also closely related to the Hardy-Littlewood maximal inequality $\| \sup_k | \frac{1}{2^{k}} \int_{\Bbb R} f(x+z) K(z/2^k)\ dz | \|_{L^p_x({\Bbb R})} \ll_p \|f\|_{L^p({\Bbb R})},$

where K is some approximation to the identity whose exact value is not important for this discussion. Indeed, one can deduce the former from the latter by Conze’s transference principle.

The pointwise ergodic theorem immediately implies the following extension of itself, namely that for any fixed $\theta \in {\Bbb R}$ and every $f \in L^p(X)$ for some $1 \leq p \leq \infty$, the sequence $\frac{1}{N} \sum_{n=1}^N e^{2\pi i n \theta x} f( T^n x )$ (3)

converges for almost every $x \in X$. Indeed, to see this one simply applies the pointwise ergodic theorem to the product system $X \times S^1$ with the shift $(x,\omega) \to (Tx,\omega+\alpha)$ and to the function $(x,\omega) \mapsto f(x) e^{2\pi i \omega}$. However, note that for each individual real number $\theta$, there is an exceptional set of x for which the sequence (3) fails to converge. Since the set of real numbers is uncountable, it could conceivably happen that there does not exist any x for which (3) converges for all $\theta$. Nevertheless, this does not happen, thanks to the Wiener-Wintner theorem, which asserts that if $f \in L^p(X)$ for some $1 \leq p \leq \infty$, then for almost every x in X, the sequence (3) converges for all $\theta$. This theorem is closely linked to the maximal ergodic theorem-type estimate $\| \| \sup_N | \frac{1}{N} \sum_{n=1}^N e^{2\pi i n \theta} f( T^n x ) | \|_{L^\infty_\theta({\Bbb R})} \|_{L^p(X)} \ll_p \|f\|_{L^p(X)},$

or the Hardy-Littlewood maximal type estimate $\| \| \sup_k | \frac{1}{2^{k}} \int_{{\Bbb R}} f(x+z) e^{2\pi i z \theta} K(z/2^k)\ dz | \|_{L^\infty_\theta({\Bbb R})} \|_{L^p_x({\Bbb R})} \ll_p \|f\|_{L^p({\Bbb R})},$

for $1 < p \leq \infty$, although in this particular case, these apparently stronger estimates follow immediately from their untwisted counterparts (in which $\theta=0$ throughout) by the triangle inequality.

We motivated the Wiener-Wintner theorem by adjoining an arbitrary circle rotation $\omega \mapsto \omega + \theta$ to the original measure-preserving system $x \mapsto Tx$. But one could have adjoined a more general system $y \mapsto Sy$ to the original system, leading to the conclusion that for each $f \in L^p(X)$, $g \in L^q(Y)$, and any measure preserving transformations $T: X \to X$, $S: Y \to Y$, the sequence $\frac{1}{N} \sum_{n=1}^N f( T^n x ) g(S^n y)$ (4)

converges for almost every x,y.

Motivated by the Wiener-Wintner theorem, one can ask whether one can make the exceptional set uniform in the choice of g, S, y. This is indeed the case, and is Bourgain’s return times theorem: if $1 < p,q \leq \infty$ and one has the duality condition $1/p + 1/q \leq 1$, then for every $f \in L^p(X)$ and $T: X \to X$ it is true that for almost every x in X, the sequence (4) converges for all systems $(Y, {\mathcal Y},\nu, S)$, all $g \in L^q(Y)$, and all y in Y. (In our longer paper, we replaced the duality condition with $q \geq 2$; conversely, the claim fails at the double endpoint $p=q=1$, as shown by Assani, Buzcolich, and Mauldin.)

The return times theorem is related to the estimate $\| \sup_{g, Y, S: \|g\|_{L^q(Y)} \leq 1} \sup_N | \frac{1}{N} \sum_{n=1}^N f( T^n x ) g(S^n y) | \|_{L^p(X)} \ll_{p,q} \|f\|_{L^p(X)},$ (5)

in much the same way that the pointwise ergodic theorem is related to the maximal ergodic theorem. (Thus, the estimate does not quite imply the convergence result by itself – one would need a “variational” or “upcrossing” version of the estimate for that – but can be used via density arguments to enlarge the range of functions in which convergence holds.)

Just as Conze’s transference principle allows one to deduce the maximal ergodic theorem from the Hardy-Littlewood maximal inequality (which can be viewed as its continuous counterpart over ${\Bbb R}$), the estimate (5) follows from the estimate $\| \sup_{g: \|g\|_{L^q({\Bbb R})} \leq 1} \| \sup_k | \frac{1}{2^{k}} \int_{\Bbb R} f(x+z) g(y+z) K(z/2^k)\ dz | \|_{L^q_y({\Bbb R})} \|_{L^p_x({\Bbb R})}$ $\ll_{p,q} \|f\|_{L^p(X)}.$ (6)

The left-hand side here is rather fearsome looking (two norms and two suprema, one of which is over a class of functions rather than a single real number) but we can simplify it a bit by noting that the inner integral is a convolution operator in g, thus the left-hand side is $\| \sup_g \| \sup_k |g * K_{k,f,x}(y)| \|_{L^q_y} \|_{L^p_x}$

where $K_{k,f,x}$ is the kernel $K_{k,f,x}(y) := f(x-z) 2^{-k} K(-z/2^k)$. The inner supremum is then a maximal Fourier multiplier $\sup_k |m_{k,f,x}(D) g(y)|$, where $m_{k,f,x} = \widehat{K_{k,f,x}}$ is the Fourier transform of $K_{k,f,x}$. Recall that Fourier multipliers m(D) have the multiplier norm $\| m \|_{M_q} := \sup_{g: \|g\|_{L^q({\Bbb R})} \leq 1} \| m(D) g \|_{L^q({\Bbb R})};$

motivated by this, we can define the maximal multiplier norm of a sequence $m = (m_k)_{k \in {\Bbb Z}}$ as $\| m \|_{M_q^*} := \sup_{g: \|g\|_{L^q({\Bbb R})} \leq 1} \| \sup_k |m_k(D) g| \|_{L^q({\Bbb R})}.$

The return times estimate is then equivalent to the assertion that $\|\|m_{\cdot,f,x}\|_{M_q^*} \|_{L^p_x} \ll_{p,q} \|f\|_{L^p}.$

It turns out that the Fourier multiplier $m_{k,f,x}(\theta)$ is roughly a smoothed out version ${\mathcal C}_k f(\theta,x)$ of the Carleson operator ${\mathcal C} f(\theta,x)$, in which one only considers physical scales of order $2^k$ and coarser (or equivalently, frequency scales of order $2^{-k}$ and finer). This can be made precise using wave packet decompositions, which I will not discuss here. Thus the return times estimate is basically equivalent to $\|\|{\mathcal C}_{\cdot} f(\theta,x)\|_{(M_q^*)_\theta} \|_{L^p_x} \ll_{p,q} \|f\|_{L^p}.$

On the other hand, from the well-known fact (from Plancherel’s theorem) that the $M_2$ norm of a multiplier m is the same as its supremum norm, the Carleson-Hunt inequality (1) is equivalent to $\|\|{\mathcal C} f(\theta,x) \|_{(M_2)_\theta} \|_{L^p_x} \ll_{p,q} \|f\|_{L^p}.$

Thus we see that the return times theorem for q=2 is a maximal version of Carleson’s theorem. Indeed, we prove the return times theorem in the case $q=2$ by adapting the time-frequency proof of Carleson’s theorem due to Lacey and Thiele.

[Update, Dec 12: A missing exponential in a Hardy-Littlewood type maximal estimate has been reinstated; thanks to Richard Oberlin for the correction.]