I’ve just uploaded to the arXiv my paper “Failure of the {L^1} pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map {T: X \rightarrow X} on a probability space {X = (X,\mu)}, then for any {f \in L^1(X)}, the averages {\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}} converge pointwise almost everywhere. (In the important case when the shift map {T} is ergodic, the pointwise limit is simply the mean {\int_X f\ d\mu} of the original function {f}.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group {F_2} on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for {F_2}-actions {(T_g)_{g \in F_2}} on probability spaces {(X,\mu)}. For instance, for the spherical averaging operators

\displaystyle  {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}

(where {|g|} denotes the length of the reduced word that forms {g}), they showed that {{\mathcal A}_{2n} f} converged pointwise almost everywhere provided that {f} was in {L^p(X)} for some {p>1}. (The need to restrict to spheres of even radius can be seen by considering the action of {F_2} on the two-element set {\{0,1\}} in which both generators of {F_2} act by interchanging the elements, in which case {{\mathcal A}_n} is determined by the parity of {n}.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition {f \in L^p(X)} to the weaker condition {f \in L \log L(X)}.

The question remained open as to whether the pointwise ergodic theorem for {F_2}-actions held if one only assumed that {f} was in {L^1(X)}. Nevo and Stein were able to establish this for the Cesáro averages {\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}, but not for {{\mathcal A}_n} itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on {\ell^1(F_2)}, but due to the non-amenability of {F_2}, this inequality did not transfer to {L^1(X)} and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the {L^1} pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our {\ell^1(F_2)} maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an {L^1} ergodic theorem for iterates {P^n} of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the {F_2} setting, thus settling the problem in the negative:

Theorem 1 (Failure of {L^1} pointwise ergodic theorem) There exists a measure-preserving {F_2}-action on a probability space {X} and a non-negative function {f \in L^1(X)} such that {\sup_n {\mathcal A}_{2n} f(x) = +\infty} for almost every {x}.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} such that {\sup_n P^n f(x) = +\infty} for almost every {x}. By some standard manipulations, it suffices to show that for any given {\alpha > 0} and {\varepsilon>0}, there exists a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} with {\|f\|_{L^1(X)} \leq \alpha}, such that {\sup_n P^n f \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Actually, it will be convenient to replace the Markov chain {(P^n f)_{n \geq 0}} with an ancient Markov chain {(f_n)_{n \in {\bf Z}}} – that is to say, a sequence of non-negative functions {f_n} for both positive and negative {f}, such that {f_{n+1} = P f_n} for all {n \in {\bf Z}}. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the {F_2} version of the argument can be run using infinitely ancient chains.)

For any {\alpha>0}, let {P(\alpha)} denote the claim that for any {\varepsilon>0}, there exists an ancient Markov chain {(f_n)_{n \in {\bf Z}}} with {\|f_n\|_{L^1(X)} = \alpha} such that {\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Clearly {P(1)} holds since we can just take {f_n=1} for all {n}. Our objective is to show that {P(\alpha)} holds for arbitrarily small {\alpha}. The heart of Ornstein’s argument is then the implication

\displaystyle  P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)

for any {0 < \alpha \leq 1}, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain {(f_n)_{n \in {\bf Z}}} on some probability space {X} of total mass {\|f_n\|_{L^1(X)} = \alpha}, such that {\sup_n f_n} attains the value of {1} or greater almost everywhere. Assuming that the Markov process is irreducible, the {f_n} will eventually converge as {n \rightarrow \infty} to the constant value of {\|f_n\|_{L^1(X)}}, in particular its final state will essentially stay above {\alpha} (up to small errors).

Now suppose we duplicate the Markov process by replacing {X} with a double copy {X \times \{1,2\}} (giving {\{1,2\}} the uniform probability measure), and using the disjoint sum of the Markov operators on {X \times \{1\}} and {X \times \{2\}} as the propagator, so that there is no interaction between the two components of this new system. Then the functions {f'_n(x,i) := f_n(x) 1_{i=1}} form an ancient Markov chain of mass at most {\alpha/2} that lives solely in the first half {X \times \{1\}} of this copy, and {\sup_n f'_n} attains the value of {1} or greater on almost all of the first half {X \times \{1\}}, but is zero on the second half. The final state of {f'_n} will be to stay above {\alpha} in the first half {X \times \{1\}}, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves {X \times \{1\}}, {X \times \{2\}} of the system (I mentally think of {X \times \{1\}} and {X \times \{2\}} as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain {(f'_n)_{n \in {\bf Z}}} in the previous example gets replaced by a slightly different ancient Markov chain {(f''_n)_{n \in {\bf Z}}} which is more or less identical with {f'_n} for negative times {n}, or for bounded positive times {n}, but for very large values of {n} the final state is now constant across the entire state space {X \times \{1,2\}}, and will stay above {\alpha/2} on this space.

Finally, we consider an ancient Markov chain {F_n} which is basically of the form

\displaystyle  F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}

for some large parameter {M} and for all {n \leq M} (the approximation becomes increasingly inaccurate for {n} much larger than {M}, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces {X \times \{1\}, X \times \{2\}}, but with the second copy delayed by a large time delay {M}, and also attenuated in amplitude by a factor of {1-\frac{\alpha}{2}}. The total mass of this process is now {\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}. Because of the {f''_n} component of {F_n}, we see that {\sup_n F_n} basically attains the value of {1} or greater on the first half {X \times \{1\}}. On the second half {X \times \{2\}}, we work with times {n} close to {M}. If {M} is large enough, {f''_n} would have averaged out to about {\alpha/2} at such times, but the {(1 - \frac{\alpha}{2}) f_{n-M}(x)} component can get as large as {1-\alpha/2} here. Summing (and continuing to ignore various epsilon losses), we see that {\sup_n F_n} can get as large as {1} on almost all of the second half of {X \times \{2\}}. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages {{\mathcal A}_n} for a free group action can be lifted up to become powers {P^n} of a Markov operator, basically by randomly assigning a “velocity vector” {s \in \{a,b,a^{-1},b^{-1}\}} to one’s base point {x} and then applying the Markov process that moves {x} along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from {s} to {s^{-1}}). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of {F_2} systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with {F_2}-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case {P(1)}) comes from basically considering the action of {F_2} on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time {n=-\infty}, and only reaches the boundary of this ball at the time {n=0}.