I’ve just uploaded to the arXiv my paper “Mixing for progressions in non-abelian groups“, submitted to Forum of Mathematics, Sigma (which, along with sister publication Forum of Mathematics, Pi, has just opened up its online submission system). This paper is loosely related in subject topic to my two previous papers on polynomial expansion and on recurrence in quasirandom groups (with Vitaly Bergelson), although the methods here are rather different from those in those two papers. The starting motivation for this paper was a question posed in this foundational paper of Tim Gowers on quasirandom groups. In that paper, Gowers showed (among other things) that if {G} was a quasirandom group, patterns such as {(x,xg,xh, xgh)} were mixing in the sense that, for any four sets {A,B,C,D \subset G}, the number of such quadruples {(x,xg,xh, xgh)} in {A \times B \times C \times D} was equal to {(\mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^3}, where {\mu(A) := |A| / |G|}, and {o(1)} denotes a quantity that goes to zero as the quasirandomness of the group goes to infinity. In my recent paper with Vitaly, we also considered mixing properties of some other patterns, namely {(x,xg,gx)} and {(g,x,xg,gx)}. This paper is concerned instead with the pattern {(x,xg,xg^2)}, that is to say a geometric progression of length three. As observed by Gowers, by applying (a suitably quantitative version of) Roth’s theorem in (cosets of) a cyclic group, one can obtain a recurrence theorem for this pattern without much effort: if {G} is an arbitrary finite group, and {A} is a subset of {G} with {\mu(A) \geq \delta}, then there are at least {c(\delta) |G|^2} pairs {(x,g) \in G} such that {x, xg, xg^2 \in A}, where {c(\delta)>0} is a quantity depending only on {\delta}. However, this argument does not settle the question of whether there is a stronger mixing property, in that the number of pairs {(x,g) \in G^2} such that {(x,xg,xg^2) \in A \times B \times C} should be {(\mu(A)\mu(B)\mu(C)+o(1)) |G|^2} for any {A,B,C \subset G}. Informally, this would assert that for {x, g} chosen uniformly at random from {G}, the triplet {(x, xg, xg^2)} should resemble a uniformly selected element of {G^3} in some weak sense.

For non-quasirandom groups, such mixing properties can certainly fail. For instance, if {G} is the cyclic group {G = ({\bf Z}/N{\bf Z},+)} (which is abelian and thus highly non-quasirandom) with the additive group operation, and {A = \{1,\ldots,\lfloor \delta N\rfloor\}} for some small but fixed {\delta > 0}, then {\mu(A) = \delta + o(1)} in the limit {N \rightarrow \infty}, but the number of pairs {(x,g) \in G^2} with {x, x+g, x+2g \in A} is {(\delta^2/2 + o(1)) |G|^2} rather than {(\delta^3+o(1)) |G|^2}. The problem here is that the identity {(x+2g) = 2(x+g) - x} ensures that if {x} and {x+g} both lie in {A}, then {x+2g} has a highly elevated likelihood of also falling in {A}. One can view {A} as the preimage of a small ball under the one-dimensional representation {\rho: G \rightarrow U(1)} defined by {\rho(n) := e^{2\pi i n/N}}; similar obstructions to mixing can also be constructed from other low-dimensional representations.

However, by definition, quasirandom groups do not have low-dimensional representations, and Gowers asked whether mixing for {(x,xg,xg^2)} could hold for quasirandom groups. I do not know if this is the case for arbitrary quasirandom groups, but I was able to settle the question for a specific class of quasirandom groups, namely the special linear groups {G := SL_d(F)} over a finite field {F} in the regime where the dimension {d} is bounded (but is at least two) and {F} is large. Indeed, for such groups I can obtain a count of {(\mu(A) \mu(B) \mu(C) + O( |F|^{-\min(d-1,2)/8} )) |G|^2} for the number of pairs {(x,g) \in G^2} with {(x, xg, xg^2) \in A \times B \times C}. In fact, I have the somewhat stronger statement that there are {(\mu(A) \mu(B) \mu(C) \mu(D) + O( |F|^{-\min(d-1,2)/8} )) |G|^2} pairs {(x,g) \in G^2} with {(x,xg,xg^2,g) \in A \times B \times C \times D} for any {A,B,C,D \subset G}.

I was also able to obtain a partial result for the length four progression {(x,xg,xg^2, xg^3)} in the simpler two-dimensional case {G = SL_2(F)}, but I had to make the unusual restriction that the group element {g \in G} was hyperbolic in the sense that it was diagonalisable over the finite field {F} (as opposed to diagonalisable over the algebraic closure {\overline{F}} of that field); this amounts to the discriminant of the matrix being a quadratic residue, and this holds for approximately half of the elements of {G}. The result is then that for any {A,B,C,D \subset G}, one has {(\frac{1}{2} \mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^2} pairs {(x,g) \in G} with {g} hyperbolic and {(x,xg,xg^2,xg^3) \subset A \times B \times C \times D}. (Again, I actually show a slightly stronger statement in which {g} is restricted to an arbitrary subset {E} of hyperbolic elements.)

For the length three argument, the main tools used are the Cauchy-Schwarz inequality, the quasirandomness of {G}, and some algebraic geometry to ensure that a certain family of probability measures on {G} that are defined algebraically are approximately uniformly distributed. The length four argument is significantly more difficult and relies on a rather ad hoc argument involving, among other things, expander properties related to the work of Bourgain and Gamburd, and also a “twisted” version of an argument of Gowers that is used (among other things) to establish an inverse theorem for the {U^3} norm.

I give some details of these arguments below the fold.

— 1. Length three progressions —

One can view the mixing property of length three progressions as an assertion about the unbiased nature of sums of the form

\displaystyle  \sum_{x,g \in G} f_0(x) f_1(xg) f_2(xg^2) \ \ \ \ \ (1)

for various bounded functions {f_0,f_1,f_2: G \rightarrow {\bf C}}. (To obtain the stronger statement in which {g} is also restricted to some set {D}, one would throw in an additional function {f_3(g)}, but let us ignore that generalisation here for sake of simplicity.) Roughly speaking, mixing means that the sum (1) should be small if at least one of the {f_0,f_1,f_2} have small mean.

One way in which mixing fails would be if there was an unexpected constraint between {x, xg, xg^2}, for instance if there was a constraint of the form

\displaystyle  \phi_0(x) + \phi_1(xg) + \phi_2(xg^2) = 0 \ \ \ \ \ (2)

for all {x, g\in G} and some non-trivial functions {\phi_0,\phi_1,\phi_2:G \rightarrow {\bf R}/{\bf Z}} (not necessarily homomorphisms). Then one could make the sum (1) for {f_j(x) := e^{2\pi i \phi_j(x)}} exhibit no cancellation whatsoever, even though one would expect {f_0,f_1,f_2} to have small mean if the {\phi_i} were sufficiently non-trivial. (This observation is basically what underlies the failure of mixing in the abelian case.) Thus, this suggests the toy problem of ruling out constraints of the form (2) when {G} is a special linear group {G = SL_d(F)}. This toy problem (which can be viewed as ruling out the “{100\%} structured” version of the mixing problem, which is about excluding a more general “{1\%} structured” situation) is significantly weaker than the general result, but it turns out that the proof strategy for the toy problem can be adapted to the general case (basically by replacing many of the algebraic manipulations below with a suitable analogue involving the Cauchy-Schwarz inequality).

Let’s see how this works. Suppose for contradiction that we had a constraint of the form (2). In the abelian case, standard “double differencing” arguments let one conclude that {\phi_0,\phi_1,\phi_2} are affine homomorphisms; see e.g. Section 2 of these lecture notes. It turns out that essentially the same argument can be applied in the nonabelian case, but one acquires a nonabelian “twist” which can be exploited to give additional mixing. Shifting {x} by {g}, we conclude that

\displaystyle  \phi_0(xg^{-1}) + \phi_1(x) + \phi_2(xg) = 0

for all {x,g \in G}. Now we use some algebraic manipulation to eliminate {\phi_1}. If we replace {g} by {ga} for some {a \in G}, we also have

\displaystyle  \phi_0(xa^{-1} g^{-1}) + \phi_1(x) + \phi_2(xga) = 0;

subtracting, we conclude that

\displaystyle  \partial_{ga^{-1}g^{-1}} \phi_0(xg^{-1}) + \partial_a \phi_2(xg) = 0

where {\partial_a \phi(x) := \phi(xa) - \phi(x)} is the “derivative” of {\phi} in the {a} direction. Setting {y := xg}, we conclude that

\displaystyle  \partial_{ga^{-1}g^{-1}} \phi_0(yg^{-2}) + \partial_a \phi_2(y) = 0

for all {y,g,a \in G}.

We can now perform a similar manipulation to eliminate {\phi_2}. Replacing {g} by {hg} for some {h \in G}, we have

\displaystyle  \partial_{hga^{-1}g^{-1}h^{-1}} \phi_0(yg^{-1}h^{-1}g^{-1}h^{-1}) + \partial_a \phi_2(y) = 0.

Subtracting, we conclude that

\displaystyle  \partial_{ga^{-1}g^{-1}} \phi_0(yg^{-2}) = \partial_{hga^{-1}g^{-1}h^{-1}} \phi_0(yg^{-1}h^{-1}g^{-1}h^{-1})

for all {y,g,a,h \in G}. We can clean this up a bit by setting {b := ga^{-1} g} and {z := yg^{-2}}, leading to

\displaystyle  \partial_{b} \phi_0(z) = \partial_{hbh^{-1}} \phi_0(zgh^{-1}g^{-1}h^{-1})

for all {z,g,b,h \in G}.

Next, we exploit the fact that the quantity {hbh^{-1}} appearing on the right-hand side does not change if one replaces {h} by {hc} for any {c} in the centraliser {Z(b) := \{ c\in G: cb=bc\}} of {b}. If we then replace {h} by {hc} in the above equation, we conclude that

\displaystyle  \partial_{b} \phi_0(z) = \partial_{hbh^{-1}} \phi_0(zgc^{-1}h^{-1}g^{-1}h^{-1}c^{-1})

for all {z,g,b,h \in G} and {c \in Z(b)}.

Let us now fix {h,b}, and let {A_{h,b} \subset G} denote the set

\displaystyle  A_{h,b} := \{ gc^{-1}h^{-1}g^{-1}h^{-1} c^{-1}: g \in G; c \in Z(b) \}.

The above identity then tells us that for {z \in G} and {a \in A_{h,b}}, the quantity {\partial_{hbh^{-1}} \phi_0( z a )} is in fact independent of {a}. So if one can show that {A_{h,b}} is “large” (e.g. has positive density in {G}), then this suggests that the function {\partial_{hbh^{-1}} \phi_0} has to be basically constant (and with quasirandomness, one can make this statement precise). Further application of quasirandomness then lets one conclude that {\phi_0} is itself constant, at which point it is not difficult to ensure that {\phi_1} and {\phi_2} are constant as well, rendering the entire constraint (2) trivial.

In the {d=2} case, one can establish this by explicit (but ad hoc) computations (taking advantage of the special role of the trace in the {d=2} case, for instance it is the case that two (non-central) matrices in {SL_2} are conjugate iff they have the same trace, and there is also the nice fact that a matrix in {SL_2} and its inverse have the same trace). For general {d}, this largeness of {A_{h,b}} can be established by algebraic geometry methods; the key is to show that the map {(g,c) \mapsto gc^{-1}h^{-1}g^{-1}h^{-1}c^{-1}} from {G \times Z(b)} to {G} is dominant in the sense that its image is Zariski-dense in {G}. In the case of {SL_d(F)}, this can be accomplished by an inspection of the derivative of this map at the identity. (I expect that similar things can be done in other almost simple algebraic groups, but did not attempt to do so in this paper.)

— 2. Length four progressions —

It is remarkably difficult to extend the Cauchy-Schwarz based length three arguments to length four or higher in the nonabelian setting. In the abelian case, every application of the Cauchy-Schwarz inequality reduces a certain “complexity” of the average being studied; in terms of raw length, the average may look much more fearsome after Cauchy-Schwarz, but after making some changes of variable and collecting terms, one can arrive at an average that is actually simpler in certain key respects than the original average. But it turns out that in the nonabelian setting, the process of making changes of variable and collecting terms introduces additional complexity into the average that counteracts the abelian phenomenon of complexity reduction. This was already apparent in the length three setting, when one started to see messy looking expressions such as {zgc^{-1}h^{-1}g^{-1}h^{-1}c^{-1}} emerge, but the argument was short enough that one could conclude before these expressions spiraled out of control. In the case of length four progressions, the nonabelian complications seem to outrun the simplifying process, and I was not able to end up with a tractable average after a finite number of applications of the Cauchy-Schwarz inequality.

Instead, we leverage the abelian additive combinatorics theory by working primarily with a metabelian subgroup of {SL_2(F)}, namely the Borel subgroup {B} of upper-triangular elements of {SL_2(F)}. Note that every hyperbolic element of {SL_2(F)} can be conjugated into {B}, which explains our restriction to the hyperbolic elements. By using the conjugates of {B} to trace out all the hyperbolic elements of {SL_2(F)} more or less evenly, matters soon reduce to establishing a “relative mixing” property for the pattern {(x,xg,xg^2,xg^3)} on {B}. To explain this relative mixing, first observe that one does not have complete mixing for this pattern in {B}, due to the presence of an abelian quotient {F^\times} of {B}, formed by mapping {\begin{pmatrix} t & a \\ 0 & t^{-1} \end{pmatrix}} to {t}, and one can then pull back the failure of mixing on {F^\times} (e.g. by counting length four progressions inside a single fixed geometric progression) to demonstrate failure of mixing of {B}. However, one can hope to show that this is the only obstruction to mixing, in the sense that we can get sums such as

\displaystyle  \sum_{x,g \in B} f_0(x) f_1(xg) f_2(xg^2) f_3(xg^3) \ \ \ \ \ (3)

to be small if at least one of {f_0,f_1,f_2,f_3: B \rightarrow {\bf C}} pushes down to zero on {F^\times}, or equivalently if it has mean zero on every coset of the kernel of this quotient, which is the group {U} of unipotent matrices in {B}.

In order to upgrade relative mixing on {B} and its conjugates back to full mixing on {G}, we need a certain expansion property of a given conjugacy class {C(a)} of a non-central element {a \in G}. This property asserts that if {f \in \ell^2(G)} has mean zero, then after convolving {f} with the uniform probability measure on such a conjugacy class, the {\ell^2} norm drops by a positive power of {|F|}. This type of expansion is related to the work of Bourgain and Gamburd (in which the conjugacy class is replaced by a set of bounded cardinality, and the drop in {L^2} norm is proportionally smaller as a result), and uses some of the same tools in the proof (in particular the “escape from subvarieties” phenomenon of Eskin, Mozes, and Oh)). (On the other hand, the \href{combinatorial product theory of Helfgott}, which plays a central role in the work of Bourgain and Gamburd, is not needed here, because in this setting one only needs to understand products of algebraic sets, such as conjugacy classes, rather than arbitrary subsets.)

By foliating {B} into cosets of {U} (which is isomorphic to {F}), one can after some straightforward calculations rewrite the sum into a sum which is basically of the form

\displaystyle  \sum_{s,t \in F^\times} \sum_{a,b \in F} f_{0,s}(a) f_{1,st}(a+b) f_{2,st^2}(a+(1+t^2)b) f_{3,st^3}(a+(1+t^2+t^4)b)

for some family {f_{i,s}: F \rightarrow {\bf C}} of bounded functions for {i=0,1,2,3} and {s \in F^\times}. The inner sum resembles a count of four term progressions, a statistic which has been studied by higher order Fourier-analytic methods since the work of Gowers on Szemerédi’s theorem for length four progressions. In principle one could analyse these expressions using the inverse {U^3} theorem of Ben Green and myself, but this would require a large amount of manipulation of two-step nilsequences, which would lead to a number of technical complications. Instead, we take a “softer” approach, in which we set up some of the quadratic Fourier analysis of Gowers that goes into the proof of the inverse {U^3} theorem, but stop well before the nilsequences come in. More precisely, we use a variant of the basic fact in quadratic Fourier analysis (already present in the previously mentioned paper of Gowers) that if a function {f} has large {U^3} norm, then for many shifts {h}, the derivative {\Delta_h f(x) := f(x+h) \overline{f(x)}} correlates with a linear phase {e(\xi(h) x)}, and furthermore that this phase {\xi} is approximately linear in the sense that there are many quadruples {(h_1,h_2,h_3,h_4)} with {h_1+h_2=h_3+h_4} and {\xi(h_1)+\xi(h_2) = \xi(h_3)+\xi(h_4)}. Applying this analysis to the above sum, we see that if that sum is large, then one obtains a number of approximate linearity relationships between the frequencies {\xi} for which {\Delta_h f_{i,s}(x)} correlates with {e(\xi x)}. On the other hand, for each fixed {h,i,s}, Plancherel’s theorem tells us that there can only be a bounded number of frequencies {\xi} for which the correlation between {\Delta_h f_{i,s}(x)} and {e(\xi x)} is large. Varying {s,t} suitably, this eventually creates so many linear constraints between these frequencies (with coefficients that vary in a sufficiently nonlinear fashion to ensure a high rank) that a contradiction can be derived, unless all the frequencies involved vanish. But this case can be handled by a variant of the above arguments, though in this case one needs to vary {s,t} inside a moderately large two-dimensional arithmetic progression before one can finally reduce to a contradiction, which requires invoking the multidimensional Szemereédi theorem in order to ensure that all the pairs {(s,t)} used are “good” in a certain technical sense. It is this last step which makes the error terms in the length four progression results to be qualitative (of order {o(1)}) rather than quantitative (of order {O(|F|^{-c})}). I feel that there should be a better approach than the rather ad hoc approach employed here which should lead to better bounds (and which would more easily extend to other groups than {SL_2(F)}).