We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\nu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

— 1. The Balog-Szemerédi-Gowers lemma —

The Balog-Szemerédi-Gowers lemma (Lemma 1) is ostensibly a statement about group structure, but the main tool in its proof is a remarkable graph-theoretic lemma (also known as the Balog-Szemerédi-Gowers lemma) that allows one to upgrade a “statistical” structure (a structure which is only valid a small fraction of the time, say 1% of the time) to a “complete” structure (one which is valid 100% of the time), by shrinking the size of the structure slightly (and in particular, with losses of polynomial type, as opposed to exponential or worse). This is in contrast to other structure-improving results (such as Ramsey’s theorem, Szemerédi’s theorem, or Freiman’s theorem), which are qualitatively similar in spirit, but have much worse quantitative bounds (though there is some hope in the case of Freiman’s theorem to only lose polynomial bounds with some improvement of existing arguments).

As we shall see later, the property of {\|\nu*\nu\|_{\ell^2(G)}} being large is a statistical assertion about {\nu} (it asserts that {\nu*\nu} collides with itself somewhat often), whereas approximate groups {H} represent a more complete sort of structure (all products of {H \cdot H} are trapped in a small set, whereas only many of the products in {\nu * \nu} are so constrained). The graph-theoretic Balog-Szemerédi lemma is the key to moving from the former type of structure to the latter with only polynomial losses.

We need some notation. Define a bipartite graph {G = G(A,B,E)} to be a graph whose vertex set {V := A \cup B} is partitioned into two non-empty sets {A, B}, and the edge set {E} consists only of edges between {A} and {B}. If a finite bipartite graph {G = G(A,B,E)} is dense in the sense that its edge density {|E|/|A||B|} is large, then for many vertices {a \in A} and {b \in B}, {a} and {b} are connected by a path of length one (i.e. an edge). It is thus intuitive that many pairs of vertices {a \in A} and {a' \in A} will be connected by many paths of length two. Perhaps surprisingly, one can upgrade “many pairs” here to “almost all pairs”, provided that one is willing to shrink the set {A} slightly. More precisely, one has

Lemma 3 (Balog-Szemerédi-Gowers lemma: paths of length two) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq |A| |B| / K}. Let {\epsilon > 0}. Then there exists a subset {A'} of {A} with {|A'| \geq \frac{|A|}{\sqrt{2} K}} such that at least {(1-\epsilon)|A'|^2} of the pairs {(a,a') \in A' \times A'} are such that {a,a'} are connected by at least {\frac{\epsilon}{2K^2} |B|} paths of length two (i.e. there exists at least {\frac{\epsilon}{2K^2} |B|} vertices {b \in B} such that {\{a,b\}, \{a',b\}} both lie in {E}).

Remark 1 It is not possible to remove the {\epsilon} entirely from this lemma; see Exercise 6.4.2 of my book with Van Vu for a counterexample (involving Hamming balls).

Proof: The idea here is to use a probabilistic construction, picking {A'} to be a neighbourhood of a randomly selected element {b} of {B}. The rationale here is that if a pair {a,a'} of vertices in {A} are not connected by many paths of length two, then they are unlikely to lie in the same neighbourhood, and so are unlikely to “wreck” the construction.

We turn to the details. Let {b \in B} be chosen uniformly at random, and let {A' := \{ a \in A: (a,b) \in E \}} be the neighbourhood of {b}. Observe that the expected size of {A'} is

\displaystyle {\bf E} |A'| = \frac{1}{|B|} |E| \geq \frac{|A|}{K}.

By Cauchy-Schwarz, we conclude in particular that

\displaystyle  {\bf E} |A'|^2 \geq \frac{|A|^2}{K^2}. \ \ \ \ \ (2)

Now, call a pair {(a,a')} bad if it is connected by fewer than {\frac{\epsilon |B|}{2K^2}} paths of length two, and let {N} be the number of bad pairs {(a,a')} in {A' \times A'}. We consider the quantity {{\bf E} N}. Observe that if {(a,a')} is a bad pair in {A \times A}, then there are at most {\frac{\epsilon |B|}{2K^2}} values of {b} for which {a} and {a'} will both lie in {A'}, and so this bad pair contributes at most {\frac{\epsilon}{2K^2}} to the expectation. Since there are at most {|A|^2} bad pairs, we conclude that

\displaystyle  {\bf E} N \leq \frac{\epsilon |A|^2}{2K^2}.

Combining this with (2), we see that

\displaystyle  {\bf E} |A'|^2 - \frac{N}{\epsilon} - \frac{|A|^2}{2K^2} \geq 0.

In particular, there exists a choice of {b} for which the expression on the left-hand side is non-negative. This implies that

\displaystyle  N \leq \epsilon |A'|^2

and

\displaystyle  |A'|^2 \geq \frac{|A|^2}{2K^2}

and the claim follows. \Box

Given that almost all pairs {a,a'} in {A'} are joined by many paths of length two, it is then plausible that almost all pairs {a \in A'}, {b \in B'} are joined by many paths of length three, for some large subset {B'} of {B}. Remarkably, one can now upgrade “almost all” pairs here to all pairs:

Lemma 4 (Balog-Szemerédi-Gowers lemma: paths of length three) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq |A| |B| / K}. Then there exists subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|}, such that for every {a \in A} and {b \in B}, {a} and {b} are joined by {\gg K^{-O(1)} |A| |B|} paths of length three.

Remark 2 A lemma similar to this was first established by Balog and Szemerédi, as a consequence of the Szemereédi regularity lemma. However, as a consequence of using that lemma, the polynomial bounds {K^{-O(1)}} in the above lemma had to be replaced by much worse bounds (of tower-exponential type in {K}), which turns out to be far too weak for the purposes of establishing expansion.

Proof: The idea is to first prune a few “unpopular” vertices from {A} and {B} and then apply the preceding lemma.

Let {A_1} be the vertices in {A} of degree at least {|B|/2K}, and let {E_1} be the edges connecting {A_1} and {B}. Note that the vertices in {A \backslash A_1} are connected to a total of at most {|A| |B|/2K} edges, and so {|E_1| \geq |A| |B|/2K \geq |A_1| |B|/2K}. Since {|E_1| \leq |A_1| |B|}, we conclude in particular that {|A_1| \ge |A|/2K}.

Let {\epsilon > 0} be a sufficiently small quantity (depending on {K}) to be chosen later. Applying Lemma 3, one can find a subset {A_2} of {A_1} of cardinality {|A_2| \gg |A_1|/K \gg |A|/K^2} such that at most {\epsilon |A_2|^2} of the pairs {(a,a') \in A_2 \times A_2} are bad in the sense that they are connected by {\gg \epsilon/K^2 |B|} paths of length two.

Let {A'} be those vertices {a} in {A_2} for which there are at most {\sqrt{\epsilon} |A_2|} elements {a'} of {A_2} for which {(a,a')} is bad. By Markov’s inequality, {A'} consists of all but at most {\sqrt{\epsilon} |A_2|} elements of {A_2}.

Let {E_2} be the edges connecting {A_2} with {B}. Since each vertex in {A_2} has degree at least {|B|/2K}, one has

\displaystyle  |E_2| \geq |A_2| |B| / 2K \gg |A| |B| / K^3.

We may thus find a subset {B'} of {B} of cardinality {|B'| \gg |B|/K^3} such that each {b \in B'} is adjacent to {\gg |A|/K^3} elements of {A_2}.

Now let {a \in A'} and {b \in B'}. We know that {b} is adjacent to {\gg |A|/K^3} elements {a'} of {A_2}, and that at most {\sqrt{\epsilon} |A_2|} of these elements are such that {(a,a')} is bad. If we choose {\epsilon} to be a sufficiently small multiple of {1/K^6}, we conclude that there are {\gg |A|/K^3} elements {a'} which are adjacent to {b} and for which {(a,a')} is not bad. One thus has {\gg (|A|/K^3) (\epsilon/K^2) |B| \gg |A| |B| / K^{11}} paths of length three connecting {a} to {b}, and the claim follows. \Box

The exponents in {K} here can be improved slightly, but we will not attempt to obtain the optimal numerology here.

Remark 3 The above results are analogous to a phenomenon in additive combinatorics, namely that a “1%-structured” set (such as a small density subset of a group) can often be upgraded to a “99%-structured” set (such as the complement of a small density subset of a group) by applying a single “convolution” or “sumset” operation, and then upgraded further to a “100%-structured” set (such as a genuine group) by applying a further convolution or sumset operation. (This is basically why, for instance, it is known that almost all even natural numbers are the sum of two primes, and all but finitely many odd natural numbers are the sum of three primes; but it is not known whether all but finitely many even natural numbers are the sum of two primes.)

Exercise 1 (Weighted Balog-Szemerédi-Gowers theorem) Let {(X,\mu)} and {(Y,\nu)} be probability spaces, and let {E \subset X \times Y} have measure {\mu \times \nu(E) \geq 1/K} for some {K \geq 1}.

  • (i) Show that for any {\epsilon > 0}, there exists a subset {X'} of {X} of measure {\mu(X') \geq \frac{1}{\sqrt{2}K}} such that

    \displaystyle  \mu \times \mu( \{ (x,x') \in X' \times X':

    \displaystyle  \int_Y 1_E(x,y) 1_E(x',y)\ d\nu(y) < \frac{\epsilon}{2K^2} \} ) \leq \epsilon \mu(X')^2.

  • (ii) Show that there exists subsets {X', Y'} of {X,Y} of measure {\mu(X') \gg K^{-O(1)}} and {\nu(Y') \gg K^{-O(1)}} such that

    \displaystyle  \int_X \int_Y 1_E(x,y') 1_E(x',y') 1_E(x',y)\ d\mu(x') d\nu(y') \gg K^{-O(1)}

    for all {x \in X'} and {y \in Y'}.

Exercise 2 (99% Balog-Szemerédi theorem) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq (1-\epsilon) |A| |B|}.

  • (i) Show that there exists a subset {A'} of {A} of size {|A'| \geq (1-O(\sqrt{\epsilon})) |A|} such that for every {a,a' \in A'}, {a} and {a'} are connected by at least {(1-O(\sqrt{\epsilon})) |B|} paths of length {2}. (Hint: select {A'} to be those vertices in {A'} that are connected to “almost all” the vertices in {B}.)
  • (ii) Show that there also exists a subset {B'} of {B} of size {|B'| \geq (1-O(\sqrt{\epsilon})) |B|} such that for every {a \in A'} and {b \in B'}, {a} and {b} are connected by at least {(1-O(\sqrt{\epsilon})) |A| |B|} paths of length {3}.

We now apply the graph-theoretic lemma to the group context. The main idea here is to show that various sets (e.g. product sets {A \cdot B}) are small by showing that they are in the high-multiplicity region of some convolution (e.g. {1_{A_1} * \ldots * 1_{A_k}}), or equivalently that elements {g} of such sets have many representations as a product {g = a_1 \ldots a_k} with {a_1 \in A_1, \ldots, a_k \in A_k}. One can then use Markov’s inequality and the trivial identity {\| 1_{A_1} * \ldots * 1_{A_k} \|_{\ell^1(G)} = |A_1| \ldots |A_k|} to get usable size bounds on such sets.

Corollary 5 (Balog-Szemerédi lemma, product set form) let {A, B} be finite non-empty subsets of a group {G = (G,\cdot)}, and suppose that

\displaystyle  \|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/4} |B|^{3/4}/K

for some {K \geq 1}. (This hypothesis should be compared with the upper bound

\displaystyle  \|1_A * 1_B \|_{\ell^2(G)} \leq \|1_A\|_{\ell^{4/3}(G)} \|1_B\|_{\ell^{4/3}(G)} = |A|^{3/4} |B|^{3/4}

arising from Young’s inequality.) Then there exists subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|} with {|A' \cdot B'| \ll K^{O(1)} |A|^{1/2} |B|^{1/2}} and {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|}.

The quantity {\|1_A *1_B\|_{\ell^2(G)}^2} (or equivalently, the number of solutions to the equation {ab=a'b'} with {a,a' \in A} and {b,b' \in B}) is also known as the multiplicative energy of {A} and {B}, and is sometimes denoted {E(A,B)} in the literature.

Proof: By hypothesis, we have

\displaystyle  \sum_{(a,b) \in A \times B} 1_A * 1_B(ab) = \|1_A * 1_B \|_{\ell^2(G)}^2 \geq |A|^{3/2} |B|^{3/2} / K^2.

Since

\displaystyle  \sum_{(a,b) \in A \times B: 1_A * 1_B(ab) \leq |A|^{1/2} |B|^{1/2}/2K^2} 1_A * 1_B(ab) \leq |A|^{3/2} |B|^{3/2} / 2K^2,

we conclude that

\displaystyle  \sum_{(a,b) \in A \times B: 1_A * 1_B(ab) > |A|^{1/2} |B|^{1/2}/2K^2} 1_A * 1_B(ab) \geq |A|^{3/2} |B|^{3/2} / 2K^2.

Since, by Cauchy-Schwarz (or Young’s inequality), we have {1_A*1_B(ab) \leq |A|^{1/2} |B|^{1/2}}, we conclude that there is a set {E \subset A \times B} with {|E| \geq |A| |B|/2K^2} such that

\displaystyle  1_A * 1_B(ab) > |A|^{1/2} |B|^{1/2}/2K^2

for all {(a,b) \in E}.

By slight abuse of notation (arising from the fact that {A}, {B} are not necessarily disjoint, and that {E} is a set of ordered pairs rather than unordered pairs), we can view the triplet {(A,B,E)} as a bipartite graph. Applying Lemma 4, we can find subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|} such that for all {a \in A'} and {b \in B'}, one can find {\gg K^{-O(1)} |A| |B|} elements {a' \in A, b' \in B} such that {(a,b'), (a',b'), (a',b) \in E}. In particular, we see that

\displaystyle  \sum_{a' \in G} \sum_{b' \in G} 1_A * 1_B(ab') 1_A * 1_B(a'b') 1_A*1_B(a'b) \gg K^{-O(1)} |A|^{5/2} |B|^{5/2}. \ \ \ \ \ (3)

Observe that {1_A * 1_B(a'b') = 1_{B^{-1}}*1_{A^{-1}} ((b')^{-1} (a')^{-1} )}. Using the identity {(ab') ((b')^{-1} (a')^{-1}) (a' b) = ab}, we note that triples {(ab', (b')^{-1} (a')^{-1}, a'b)} for {a',b' \in G} are precisely those triples {(g_1,g_2,g_3) \in G \times G} with {g_1g_2g_3 = ab}. Thus the left-hand side of (3) is equal to {F(ab)}, where

\displaystyle  F := 1_A * 1_B * 1_{B^{-1}} * 1_{A^{-1}} * 1_A * 1_B.

But since

\displaystyle  \|F\|_{\ell^1} = |A| |B| |B^{-1}| |A^{-1}| |A| |B| = |A|^3 |B|^3,

we see from Markov’s inequality that there are at most {O(K^{O(1)} |A|^{1/2} |B|^{1/2})} possible values for {ab}, which gives the bound {|A' \cdot B'| \ll K^{O(1)} |A|^{1/2} |B|^{1/2}}.

The second bound {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|} can be proven similarly to the first (noting that any {a,a' \in A'} are connected by {\gg K^{-O(1)} |A|^2 |B|^2} paths of length six), but can also from the former bound as follows. Observe that any element {a (a')^{-1} \in A' \cdot (A')^{-1}} has at least {|B'|} representations of the form {a(a')^{-1} = (ab) (a'b)^{-1}} with {b \in B'}, and hence {ab,a'b \in A' \cdot B'}, thus

\displaystyle  1_{A'B'} * 1_{(A'B')^{-1}} \geq |B'| \gg K^{-O(1)} |B|

on {A' (A')^{-1}}. On the other hand, the left-hand side has an {\ell^1(G)} norm of {|A'B'| |(A'B')^{-1}| \ll K^{O(1)} |A| |B|}, and the bound {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|} then follows from Markov’s inequality. \Box

Exercise 3 In the converse direction, show that if {A, B} are non-empty finite subsets of {G} with {|AB| \leq K |A|^{1/2} |B|^{1/2}}, then {\|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/4} |B|^{3/4} / K^{1/2}}.

Exercise 4 If {A, B, C} are three non-empty finite subsets of {G}, establish the Ruzsa triangle inequality {|A \cdot C^{-1}| \leq \frac{|A \cdot B^{-1}| |B \cdot C^{-1}|}{|B|}}. (Hint: mimic the final part of the proof of Corollary 5.)

We now give a variant of this corollary involving approximate groups.

Lemma 6 (Balog-Szemerédi lemma, approximate group form) Let {A} be a finite symmetric subset of a group {G = (G,\cdot)}, and suppose that

\displaystyle  \|1_A * 1_A \|_{\ell^2(G)} \geq |A|^{3/2}/K

for some {K \geq 1}. Then there exists a {K^{O(1)}}-approximate group {H} with {|H| \ll K^{O(1)} |A|} such that {|A \cap gH| \gg K^{-O(1)} |A|} for some {g \in H}.

Proof: By Corollary 5, we may find a subset {A' \subset A} with {|A'| \gg K^{-O(1)} |A|} such that

\displaystyle  |A' (A')^{-1}| \ll K^{O(1)} |A|. \ \ \ \ \ (4)

By Exercise 3, this implies that

\displaystyle  \| 1_{A'} * 1_{(A')^{-1}} \|_{\ell^2(G)}^2 \gg K^{-O(1)} |A|^3.

Observe that the left-hand side is equal to

\displaystyle  1_{A'} * 1_{(A')^{-1}} * 1_{A'} * 1_{(A')^{-1}} (1)

\displaystyle  = 1_{(A')^{-1}} * 1_{A'} * 1_{(A')^{-1}} * 1_{A'}(1)

\displaystyle  = \| 1_{(A')^{-1}} * 1_{A'} \|_{\ell^2(G)}^2.

We conclude that

\displaystyle  \sum_{s \in G} (1_{(A')^{-1}} * 1_{A'}(s))^2 \gg K^{-O(1)} |A|^3.

On the other hand, we have

\displaystyle  \sum_{s \in G} 1_{(A')^{-1}} * 1_{A'}(s) = |A'| |A'| \leq |A|^2.

As a consequence, we see that if we set

\displaystyle  S := \{ s \in G: 1_{(A')^{-1}} * 1_{A'}(s) \geq C^{-1} K^{-C} |A| \}

for some sufficiently large absolute constant {C}, then

\displaystyle  \sum_{s \in G \backslash S} (1_{(A')^{-1}} * 1_{A'}(s))^2 \leq C^{-1} K^{-C} |A|^3,

and thus (for {C} large enough)

\displaystyle  \sum_{s \in S} (1_{(A')^{-1}} * 1_{A'}(s))^2 \gg K^{-O(1)} |A|^3.

Since {1_{(A')^{-1}} * 1_{A'}(s) \leq|A'| \leq |A|}, we conclude that

\displaystyle  |S| \gg K^{-O(1)} |A|.

Also, {S} is clearly symmetric and contains the origin.

Now let us consider an element {g = a_0 s_1 \ldots s_5 b_6^{-1}} of the product {(A') S^5 (A')^{-1}}. By construction of {S}, we can write each {s_i} as a product {b_i^{-1} a_i} with {a_i,b_i \in A'} in at least {C^{-1} K^{-C} |A|} ways. Doing so for each {i=1,\ldots,5} gives rise to a factorisation

\displaystyle  g = g_1 \ldots g_6

where {g_i := a_{i-1} b_i^{-1} \in A' (A')^{-1}}; as the {g_1,\ldots,g_6} uniquely determine the {a_i,b_i} (for fixed {a_0,s_1,\ldots,s_5,b_6}), we conclude that each element {g} of {(A') S^5 (A')^{-1}} has at least {\gg K^{-O(1)} |A|^5} such factorisations. But by (4), there are at most {O(K^{O(1)}|A|^6)} such tuples {g_1,\ldots,g_6}, and so there are at most {O(K^{O(1)} |A|)} possible values for {g}, thus

\displaystyle  |(A') S^5 (A')^{-1}| \ll K^{O(1)} |A|. \ \ \ \ \ (5)

In particular,

\displaystyle  |S^5| \ll K^{O(1)} |S|.

By the Ruzsa covering lemma (see the exercise below), this implies that {S^4} is covered by {O(K^{O(1)})} left-translates of {S^2}, and so {H := S^2} is a {K^{O(1)}}-approximate group. Finally, from (5) one has

\displaystyle  |A' H| \ll K^{O(1)} |A|

and thus by Exercise 3

\displaystyle  \| 1_{A'} * 1_H \|_{\ell^2(G)} \gg K^{-O(1)} |A|^{3/2}.

In particular, since the support of {1_{A'} * 1_H} has size {O(K^{O(1)} |A|)}, one has

\displaystyle  1_{A'} * 1_H(g) \gg K^{-O(1)} |A|

for some {g \in G}, or equivalently that

\displaystyle  |A' \cap Hg| \gg K^{-O(1)} |A|.

Increasing {A'} to {A} and taking inverses, we conclude that {|gH \cap A| \ll K^{-O(1)} |A|}, and the claim follows. \Box

Exercise 5 (Ruzsa covering lemma) Let {A, B} be finite non-empty subsets of a group {G}. Show that {A} can be covered by at most {\frac{|AB|}{|B|}} left-translates of {BB^{-1}}. (Hint: consider a maximal disjoint collection of translates {aB} of {B} with {a \in A}.)

Exercise 6 (Converse to Balog-Szemerédi-Gowers) Let {A} be a finite symmetric subset of a group {G = (G,\cdot)}, and suppose there exists a {K}-approximate group {H} with {|H| \leq K |A|} such that {|A \cap gH| \geq |A|/K} for some {g \in H}. Show that

\displaystyle  \|1_A * 1_A\|_{\ell^2(G)} \geq K^{-3} |A|^{3/2}.

Exercise 7 Let {A, B} be finite non-empty subsets of a group {G}, and suppose that {\|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/2} |B|^{3/2} / K}. Show that there exists a {O(K^{O(1)})}-approximate group {H} with {K^{-O(1)} |A|^{1/2} |B|^{1/2} \leq |H| \leq K^{O(1)} |A|^{1/2} |B|^{1/2}} and elements {g, h \in G} such that {|A \cap gH| \gg K^{-O(1)} |H|} and {|B \cap Hh| \gg K^{-O(1)} |H|}.

Finally, we can prove Lemma 1. Fix {G, \nu, K}. We may assume that

\displaystyle  \| \nu * \nu \|_{\ell^2(G)} > \frac{1}{K} \|\nu\|_{\ell^2(G)} \ \ \ \ \ (6)

and we need to use this to locate an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

Let us write {M := 1/\|\nu\|_{\ell^2(G)}^2}. Intuitively, {M} represents the “width” of the probability meaure {\nu}, as can be seen by considering the model example {\nu = \frac{1}{M} 1_A} where {A} is a symmetric set of cardinality {M} (i.e. {\nu} is the uniform probability measure on {A}). If we were actually in this model case, we could apply Lemma 6 immediately and be done. Of course, in general, {\nu} need not be a uniform measure on a set of size {M}. However, it turns out that one can use (6) to conclude that the “bulk” of {\nu} is basically of this form.

More precisely, let us split {\nu = \nu_{<}+\nu_{>}+\nu_=}, where

\displaystyle  \nu_{<} := \nu 1_{\nu \leq \frac{1}{100K^2M}}

\displaystyle  \nu_{>} := \nu 1_{\nu \geq \frac{10K}{M}}

\displaystyle  \nu_= := \nu - \nu_{<} - \nu_{>}.

Observe that

\displaystyle  \| \nu_{<}\|_{\ell^2(G)}^2 \leq \frac{1}{100K^2M} \| \nu \|_{\ell^1(G)} = \frac{1}{100K^2M}

and so by Young’s inequality

\displaystyle  \| \nu_{<} * \nu \|_{\ell^2(G)} = \| \nu * \nu_{>} \|_{\ell^2(G)} \leq \frac{1}{10KM^{1/2}}.

In a similar vein, we have

\displaystyle  \| \nu_{>} \|_{\ell^1(G)} \leq \frac{M}{10K} \| \nu \|_{\ell^2(G)}^2 = \frac{1}{10K}

and thus by Young’s inequality (and the normalisation {\|\nu\|_{\ell^2(G)} = 1/M^{1/2}})

\displaystyle  \| \nu_{>} * \nu \|_{\ell^2(G)} = \| \nu * \nu_{<} \|_{\ell^2(G)} \leq \frac{1}{10KM^{1/2}}.

Finally, from (6) one has

\displaystyle  \| \nu * \nu \|_{\ell^2(G)} \geq \frac{1}{K M^{1/2}}.

Subtracting using the triangle inequality (ignoring some slight double-counting), we conclude that

\displaystyle  \| \nu_= * \nu_= \|_{\ell^2(G)} \gg \frac{1}{KM^{1/2}}.

If we then set {A := \{ g \in G: \nu(g) > \frac{1}{100K^2 M} \}}, we conclude in particular that

\displaystyle  \| 1_A * 1_A \|_{\ell^2(G)} \gg K^{-O(1)} M^{3/2}.

On the other hand, from Markov’s inequality one has {|A| \ll K^2 M}. Applying Lemma 6, we conclude the existence of a {O(K^{O(1)})}-approximate group {H} with {|H| \ll K^{O(1)} M} such that {|A \cap gH| \gg K^{-O(1)} M} for some {g \in G}, which by definition of {A} implies that {\nu(gH) \gg K^{-O(1)}}, and the claim follows.

— 2. The Bourgain-Gamburd expansion machine —

We can now prove Theorem 2. We can assume that {|G|} is sufficiently large depending on the parameters {k,\kappa,\Lambda,\delta'}, since the claim is trivial for bounded {G} (note that as {S} generates {G}, the Cayley graph {Cay(G,S)} will be an {\epsilon}-expander for some {\epsilon>0}). Henceforth we allow all implied constants in the asymptotic notation to depend on {k,\kappa,\Lambda,\delta'}.

To show expansion, it suffices from the quasirandomness hypothesis (and Proposition 4 from the preceding notes), it will suffice to show that

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-1/2+\kappa/2} \ \ \ \ \ (7)

for some {n = O(\log |G|)}.

From Young’s inequality, {\| \mu^{(n)}\|_{\ell^2(G)}} is decreasing in {n}, and is initially equal to {1} when {n=0}. We need to “flatten” the {\ell^2(G)} norm of {\mu^{(n)}} as {n} increases. We first use the non-concentration hypothesis to obtain an initial amount of flattening:

Proposition 7 For any {n \geq \frac{1}{2} \Lambda \log |G|}, one has

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-\kappa/4}. \ \ \ \ \ (8)

Furthermore, we have

\displaystyle  \mu^{(n)}(gH) \leq |G|^{-\kappa/2} \ \ \ \ \ (9)

for all proper subgroups {H} of {G} and all {g \in G}.

Proof: By the non-concentration hypothesis, we can find {n_0 \leq \frac{1}{2} \Lambda \log |G|} such that

\displaystyle  \mu^{(2n_0)}(H) \leq |G|^{-\kappa}

for all proper subgroups {H} of {G}. If we write {\mu^{(2n_0)}(H)} as {\mu^{(n_0)}*\mu^{(n_0)}( Hg g^{-1} H)}, we see that

\displaystyle  \mu^{(2n_0)}(H) \geq \mu^{(n_0)}(Hg) \mu^{(n_0)}(g^{-1} H)

for all {g \in G}. By symmetry, {\mu^{(n_0)}(g^{-1} H) = \mu^{(n_0)}(Hg)}, and thus

\displaystyle  \sup_{g \in G} \mu^{(n_0)}(gH) \leq |G|^{-\kappa/2}.

If {n \geq \frac{1}{2} \Lambda \log |G|}, then we may write {\mu^{(n)}} as the convolution of a probability measure {\mu^{(n-n_0)}} and {\mu^{(n_0)}}. From this, we see that

\displaystyle  \mu^{(n)}(g' H) \leq \sup_{g \in G} \mu^{(n_0)}(gH) \leq |G|^{-\kappa/2}

for all {g' \in G}, giving the claim (9). Specialising this to the case when {H} is the trivial group, one has

\displaystyle  \| \mu^{(n)} \|_{\ell^\infty(G)} \leq |G|^{-\kappa/2}.

Since we also have

\displaystyle  \| \mu^{(n)} \|_{\ell^1(G)} = 1,

the claim (8) then follows from Hölder’s inequality. \Box

Now we obtain additional flattening using the product theorem hypothesis:

Lemma 8 (Flattening lemma) Suppose {n \geq \frac{1}{2} \Lambda \log |G|} is such that

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \geq |G|^{-1/2+\kappa/2}. \ \ \ \ \ (10)

Then one has

\displaystyle  \| \mu^{(n)} * \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-\epsilon} \| \mu^{(n)} \|_{\ell^2(G)}

for some {\epsilon>0} depending only on {\kappa} and {\delta'}.

Proof: Suppose the claim fails for some {\epsilon} to be chosen later, thus

\displaystyle  \| \mu^{(n)} * \mu^{(n)} \|_{\ell^2(G)} > |G|^{-\epsilon} \| \mu^{(n)} \|_{\ell^2(G)}.

Applying Lemma 1, we may thus find a {O(|G|^{O(\epsilon)})}-approximate group {H} with

\displaystyle  |H| \ll |G|^{O(\epsilon)} / \| \mu^{(n)} \|_{\ell^2(G)}^2

and {g \in G} such that

\displaystyle  \mu^{(n)}(gH) \gg |G|^{-O(\epsilon)}.

Since {\mu^{(n)}\|_{\ell^\infty(G)} \leq |G|^{-\kappa/2}} by (9), we see that

\displaystyle  |H| \gg |G|^{\kappa/2-O(\epsilon)}.

Meanwhile, from (10) one has

\displaystyle  |H| \ll |G|^{1-\kappa + O(\epsilon)}.

Applying the product hypothesis (assuming {\epsilon} sufficiently small depending on {\kappa} and {\delta}), we conclude that {H} generates a proper subgroup {K} of {G}, and thus

\displaystyle  \mu^{(n)}(gK) \gg |G|^{-O(\epsilon)}.

But this contradicts (9) (again if {\epsilon} is sufficiently small). \Box

Iterating the above lemma {O(1)} times we obtain (7) for some {n = O(\log |G|)}, as desired.

Remark 4 Roughly speaking, the three hypotheses in Theorem 2 govern three separate stages of the life cycle of the random walk and its distributions {\mu^{(n)}}. In the early stage {n = o(\log |G|)}, the non-concentration hypotheses creates some initial spreading of this random walk, in particular ensuring that the walk “escapes” from cosets of proper subgroups. In the middle stage {n \sim \log |G|}, the product theorem steadily flattens the distribution of the random walk, until it is very roughly comparable to the uniform distribution. Finally, in the late stage {n \gg \log |G|}, the quasirandomness property can smooth out the random walk almost completely to obtain the mixing necessary for expansion.