You are currently browsing the tag archive for the ‘Balog-Szemeredi-Gowers lemma’ tag.

Let {G = (G,+)}, {H = (H,+)} be additive groups (i.e., groups with an abelian addition group law). A map {f: G \rightarrow H} is a homomorphism if one has

\displaystyle  f(x+y) - f(x) - f(y) = 0

for all {x,y \in G}. A map {f: G \rightarrow H} is an affine homomorphism if one has

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) = 0 \ \ \ \ \ (1)

for all additive quadruples {(x_1,x_2,x_3,x_4)} in {G}, by which we mean that {x_1,x_2,x_3,x_4 \in G} and {x_1-x_2+x_3-x_4=0}. The two notions are closely related; it is easy to verify that {f} is an affine homomorphism if and only if {f} is the sum of a homomorphism and a constant.

Now suppose that {H} also has a translation-invariant metric {d}. A map {f: G \rightarrow H} is said to be a quasimorphism if one has

\displaystyle  f(x+y) - f(x) - f(y) = O(1) \ \ \ \ \ (2)

for all {x,y \in G}, where {O(1)} denotes a quantity at a bounded distance from the origin. Similarly, {f: G \rightarrow H} is an affine quasimorphism if

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) = O(1) \ \ \ \ \ (3)

for all additive quadruples {(x_1,x_2,x_3,x_4)} in {G}. Again, one can check that {f} is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism {f: {\bf Z} \rightarrow {\bf R}}. Iterating (2), we see that {f(kx) = kf(x) + O(k)} for any integer {x} and natural number {k}, which we can rewrite as {f(kx)/kx = f(x)/x + O(1/|x|)} for non-zero {x}. Also, {f} is Lipschitz. Sending {k \rightarrow \infty}, we can verify that {f(x)/x} is a Cauchy sequence as {x \rightarrow \infty} and thus tends to some limit {\alpha}; we have {\alpha = f(x)/x + O(1/x)} for {x \geq 1}, hence {f(x) = \alpha x + O(1)} for positive {x}, and then one can use (2) one last time to obtain {f(x) = \alpha x + O(1)} for all {x}. Thus {f} is the sum of the homomorphism {x \mapsto \alpha x} and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map {f: G \rightarrow H} a {0}-cocycle. A {1}-cocycle is a map {\rho: G \times G \rightarrow H} obeying the identity

\displaystyle  \rho(x,y+z) + \rho(y,z) = \rho(x,y) + \rho(x+y,z)

for all {x,y,z \in G}. Given a {0}-cocycle {f: G \rightarrow H}, one can form its derivative {\partial f: G \times G \rightarrow H} by the formula

\displaystyle  \partial f(x,y) := f(x+y)-f(x)-f(y).

Such functions are called {1}-coboundaries. It is easy to see that the abelian group of {1}-coboundaries is a subgroup of the abelian group of {1}-cocycles. The quotient of these two groups is the first group cohomology of {G} with coefficients in {H}, and is denoted {H^1(G; H)}.

If a {0}-cocycle is bounded then its derivative is a bounded {1}-coboundary. The quotient of the group of bounded {1}-cocycles by the derivatives of bounded {0}-cocycles is called the bounded first group cohomology of {G} with coefficients in {H}, and is denoted {H^1_b(G; H)}. There is an obvious homomorphism {\phi} from {H^1_b(G; H)} to {H^1(G; H)}, formed by taking a coset of the space of derivatives of bounded {0}-cocycles, and enlarging it to a coset of the space of {1}-coboundaries. By chasing all the definitions, we see that all quasimorphism from {G} to {H} are the sum of a homomorphism and a bounded function if and only if this homomorphism {\phi} is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of {\phi}.

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “{1\%} of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of {1\%}-structure to {100\%}-structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1 Let {G = (G,+)}, {H = (H,+)} be additive groups with {|G|=N}, let {S} be a subset of {H}, let {E \subset G}, and let {f: E \rightarrow H} be a function such that

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) \in S

for {\geq K^{-1} N^3} additive quadruples {(x_1,x_2,x_3,x_4)} in {E}. Then there exists a subset {A} of {G} containing {0} with {|A| \gg K^{-O(1)} N}, a subset {X} of {H} with {|X| \ll K^{O(1)}}, and a function {g: 4A-4A \rightarrow H} such that

\displaystyle  g(x+y) - g(x)-g(y) \in X + 496S - 496S \ \ \ \ \ (4)

for all {x, y \in 2A-2A} (thus, the derivative {\partial g} takes values in {X + 496 S - 496 S} on {2A - 2A}), and such that for each {h \in A}, one has

\displaystyle  f(x+h) - f(x) - g(h) \in 8S - 8S \ \ \ \ \ (5)

for {\gg K^{-O(1)} N} values of {x \in E}.

Presumably the constants {8} and {496} can be improved further, but we have not attempted to optimise these constants. We chose {2A-2A} as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside {2A-2A}. In applications, the set {S} need not have bounded size, or even bounded doubling; for instance, in the inverse {U^4} theory over a small finite fields {F}, one would be interested in the situation where {H} is the group of {n \times n} matrices with coefficients in {F} (for some large {n}, and {S} being the subset consisting of those matrices of rank bounded by some bound {C = O(1)}.

Proof: By hypothesis, there are {\geq K N^3} triples {(h,x,y) \in G^3} such that {x,x+h,y,y+h \in E} and

\displaystyle  f(x+h) - f(x) \in f(y+h)-f(y) + S. \ \ \ \ \ (6)

Thus, there is a set {B \subset G} with {|B| \gg K^{-1} N} such that for all {h \in B}, one has (6) for {\gg K^{-1} N^2} pairs {(x,y) \in G^2} with {x,x+h,y,y+h \in E}; in particular, there exists {y = y(h) \in E \cap (E-h)} such that (6) holds for {\gg K^{-1} N} values of {x \in E \cap (E-h)}. Setting {g_0(h) := f(y(h)+h) - f(y(h))}, we conclude that for each {h \in B}, one has

\displaystyle  f(x+h) - f(x) \in g_0(h) + S \ \ \ \ \ (7)

for {\gg K^{-1} N} values of {x \in E \cap (E-h)}.

Consider the bipartite graph whose vertex sets are two copies of {E}, and {x} and {x+h} connected by a (directed) edge if {h \in B} and (7) holds. Then this graph has {\gg K^{-2} N^2} edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset {C} of {E} with {|C| \gg K^{-O(1)} N} with the property that for any {x_1,x_3 \in C}, there exist {\gg K^{-O(1)} N^3} triples {(x_2,y_1,y_2) \in E^3} such that the edges {(x_1,y_1), (x_2,y_1), (x_2,y_2), (x_3,y_2)} all lie in this bipartite graph. This implies that, for all {x_1,x_3 \in C}, there exist {\gg K^{-O(1)} N^7} septuples {(x_2,y_1,y_2,z_{11},z_{21},z_{22},z_{32}) \in G^7} obeying the constraints

\displaystyle  f(y_j) - f(x_i), f(y_j+z_{ij}) - f(x_i+z_{ij}) \in g_0(y_j-x_i) + S

and {y_j, x_i, y_j+z_{ij}, x_i+z_{ij} \in E} for {ij = 11, 21, 22, 32}. These constraints imply in particular that

\displaystyle  f(x_3) - f(x_1) \in f(x_3+z_{32}) - f(y_2+z_{32}) + f(y_2+z_{22}) - f(x_2+z_{22}) + f(x_2+z_{21}) - f(y_1+z_{21}) + f(y_1+z_{11}) - f(x_1+z_{11}) + 4S - 4S.

Also observe that

\displaystyle  x_3 - x_1 = (x_3+z_{32}) - (y_2+z_{32}) + (y_2+z_{22}) - (x_2+z_{22}) + (x_2+z_{21}) - (y_1+z_{21}) + (y_1+z_{11}) - (x_1+z_{11}).

Thus, if {h \in G} and {x_3,x_1 \in C} are such that {x_3-x_1 = h}, we see that

\displaystyle  f(w_1) - f(w_2) + f(w_3) - f(w_4) + f(w_5) - f(w_6) + f(w_7) - f(w_8) \in f(x_3) - f(x_1) + 4S - 4S

for {\gg K^{-O(1)} N^7} octuples {(w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8) \in E^8} in the hyperplane

\displaystyle  h = w_1 - w_2 + w_3 - w_4 + w_5 - w_6 + w_7 - w_8.

By the pigeonhole principle, this implies that for any fixed {h \in G}, there can be at most {O(K^{O(1)})} sets of the form {f(x_3)-f(x_1) + 3S-3S} with {x_3-x_1=h}, {x_1,x_3 \in C} that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set {W_h} of cardinality {O(K^{O(1)})}, such that each set {f(x_3) - f(x_1) + 3S-3S} with {x_3-x_1=h}, {x_1,x_3 \in C} intersects {w+4S -4S} for some {w \in W_h}, or in other words that

\displaystyle  f(x_3) - f(x_1) \in W_{x_3-x_1} + 8S-8S \ \ \ \ \ (8)

whenever {x_1,x_3 \in C}. In particular,

\displaystyle  \sum_{h \in G} \sum_{w \in W_h} | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in w + 8S-8S \}| \geq |C|^2 \gg K^{-O(1)} N^2.

This implies that there exists a subset {A} of {G} with {|A| \gg K^{-O(1)} N}, and an element {g_1(h) \in W_h} for each {h \in A}, such that

\displaystyle  | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in g_1(h) + 8S-8S \}| \gg K^{-O(1)} N \ \ \ \ \ (9)

for all {h \in A}. Note we may assume without loss of generality that {0 \in A} and {g_1(0)=0}.

Suppose that {h_1,\dots,h_{16} \in A} are such that

\displaystyle  \sum_{i=1}^{16} (-1)^{i-1} h_i = 0. \ \ \ \ \ (10)

By construction of {A}, and permuting labels, we can find {\gg K^{-O(1)} N^{16}} 16-tuples {(x_1,\dots,x_{16},y_1,\dots,y_{16}) \in C^{32}} such that

\displaystyle  y_i - x_i = (-1)^{i-1} h_i


\displaystyle  f(y_i) - f(x_i) \in (-1)^{i-1} g_i(h) + 8S - 8S

for {i=1,\dots,16}. We sum this to obtain

\displaystyle  f(y_1) + \sum_{i=1}^{15} f(y_{i+1})-f(x_i) - f(x_8) \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 128 S - 128 S

and hence by (8)

\displaystyle  f(y_1) - f(x_{16}) + \sum_{i=1}^{15} W_{k_i} \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 248 S - 248 S

where {k_i := y_{i+1}-x_i}. Since

\displaystyle  y_1 - x_{16} + \sum_{i=1}^{15} k_i = 0

we see that there are only {N^{16}} possible values of {(y_1,x_{16},k_1,\dots,k_{15})}. By the pigeonhole principle, we conclude that at most {O(K^{O(1)})} of the sets {\sum_{i=1}^{16} (-1)^i g_1(h_i) + 248 S - 248 S} can be disjoint. Arguing as before, we conclude that there exists a set {X} of cardinality {O(K^{O(1)})} such that

\displaystyle  \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) \in X + 496 S - 496 S \ \ \ \ \ (11)

whenever (10) holds.

For any {h \in 4A-4A}, write {h} arbitrarily as {h = \sum_{i=1}^8 (-1)^{i-1} h_i} for some {h_1,\dots,h_8 \in A} (with {h_5=\dots=h_8=0} if {h \in 2A-2A}, and {h_2 = \dots = h_8 = 0} if {h \in A}) and then set

\displaystyle  g(h) := \sum_{i=1}^8 (-1)^i g_1(h_i).

Then from (11) we have (4). For {h \in A} we have {g(h) = g_1(h)}, and (5) then follows from (9). \Box

We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\nu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

Read the rest of this entry »