You are currently browsing the tag archive for the ‘additive combinatorics’ tag.

In graph theory, the recently developed theory of graph limits has proven to be a useful tool for analysing large dense graphs, being a convenient reformulation of the Szemerédi regularity lemma. Roughly speaking, the theory asserts that given any sequence {G_n = (V_n, E_n)} of finite graphs, one can extract a subsequence {G_{n_j} = (V_{n_j}, E_{n_j})} which converges (in a specific sense) to a continuous object known as a “graphon” – a symmetric measurable function {p\colon [0,1] \times [0,1] \rightarrow [0,1]}. What “converges” means in this context is that subgraph densities converge to the associated integrals of the graphon {p}. For instance, the edge density

\displaystyle  \frac{1}{|V_{n_j}|^2} |E_{n_j}|

converge to the integral

\displaystyle  \int_0^1 \int_0^1 p(x,y)\ dx dy,

the triangle density

\displaystyle  \frac{1}{|V_{n_j}|^3} \lvert \{ (v_1,v_2,v_3) \in V_{n_j}^3: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_1\} \in E_{n_j} \} \rvert

converges to the integral

\displaystyle  \int_0^1 \int_0^1 \int_0^1 p(x_1,x_2) p(x_2,x_3) p(x_3,x_1)\ dx_1 dx_2 dx_3,

the four-cycle density

\displaystyle  \frac{1}{|V_{n_j}|^4} \lvert \{ (v_1,v_2,v_3,v_4) \in V_{n_j}^4: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_4\}, \{v_4,v_1\} \in E_{n_j} \} \rvert

converges to the integral

\displaystyle  \int_0^1 \int_0^1 \int_0^1 \int_0^1 p(x_1,x_2) p(x_2,x_3) p(x_3,x_4) p(x_4,x_1)\ dx_1 dx_2 dx_3 dx_4,

and so forth. One can use graph limits to prove many results in graph theory that were traditionally proven using the regularity lemma, such as the triangle removal lemma, and can also reduce many asymptotic graph theory problems to continuous problems involving multilinear integrals (although the latter problems are not necessarily easy to solve!). See this text of Lovasz for a detailed study of graph limits and their applications.

One can also express graph limits (and more generally hypergraph limits) in the language of nonstandard analysis (or of ultraproducts); see for instance this paper of Elek and Szegedy, Section 6 of this previous blog post, or this paper of Towsner. (In this post we assume some familiarity with nonstandard analysis, as reviewed for instance in the previous blog post.) Here, one starts as before with a sequence {G_n = (V_n,E_n)} of finite graphs, and then takes an ultraproduct (with respect to some arbitrarily chosen non-principal ultrafilter {\alpha \in\beta {\bf N} \backslash {\bf N}}) to obtain a nonstandard graph {G_\alpha = (V_\alpha,E_\alpha)}, where {V_\alpha = \prod_{n\rightarrow \alpha} V_n} is the ultraproduct of the {V_n}, and similarly for the {E_\alpha}. The set {E_\alpha} can then be viewed as a symmetric subset of {V_\alpha \times V_\alpha} which is measurable with respect to the Loeb {\sigma}-algebra {{\mathcal L}_{V_\alpha \times V_\alpha}} of the product {V_\alpha \times V_\alpha} (see this previous blog post for the construction of Loeb measure). A crucial point is that this {\sigma}-algebra is larger than the product {{\mathcal L}_{V_\alpha} \times {\mathcal L}_{V_\alpha}} of the Loeb {\sigma}-algebra of the individual vertex set {V_\alpha}. This leads to a decomposition

\displaystyle  1_{E_\alpha} = p + e

where the “graphon” {p} is the orthogonal projection of {1_{E_\alpha}} onto {L^2( {\mathcal L}_{V_\alpha} \times {\mathcal L}_{V_\alpha} )}, and the “regular error” {e} is orthogonal to all product sets {A \times B} for {A, B \in {\mathcal L}_{V_\alpha}}. The graphon {p\colon V_\alpha \times V_\alpha \rightarrow [0,1]} then captures the statistics of the nonstandard graph {G_\alpha}, in exact analogy with the more traditional graph limits: for instance, the edge density

\displaystyle  \hbox{st} \frac{1}{|V_\alpha|^2} |E_\alpha|

(or equivalently, the limit of the {\frac{1}{|V_n|^2} |E_n|} along the ultrafilter {\alpha}) is equal to the integral

\displaystyle  \int_{V_\alpha} \int_{V_\alpha} p(x,y)\ d\mu_{V_\alpha}(x) d\mu_{V_\alpha}(y)

where {d\mu_V} denotes Loeb measure on a nonstandard finite set {V}; the triangle density

\displaystyle  \hbox{st} \frac{1}{|V_\alpha|^3} \lvert \{ (v_1,v_2,v_3) \in V_\alpha^3: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_1\} \in E_\alpha \} \rvert

(or equivalently, the limit along {\alpha} of the triangle densities of {E_n}) is equal to the integral

\displaystyle  \int_{V_\alpha} \int_{V_\alpha} \int_{V_\alpha} p(x_1,x_2) p(x_2,x_3) p(x_3,x_1)\ d\mu_{V_\alpha}(x_1) d\mu_{V_\alpha}(x_2) d\mu_{V_\alpha}(x_3),

and so forth. Note that with this construction, the graphon {p} is living on the Cartesian square of an abstract probability space {V_\alpha}, which is likely to be inseparable; but it is possible to cut down the Loeb {\sigma}-algebra on {V_\alpha} to minimal countable {\sigma}-algebra for which {p} remains measurable (up to null sets), and then one can identify {V_\alpha} with {[0,1]}, bringing this construction of a graphon in line with the traditional notion of a graphon. (See Remark 5 of this previous blog post for more discussion of this point.)

Additive combinatorics, which studies things like the additive structure of finite subsets {A} of an abelian group {G = (G,+)}, has many analogies and connections with asymptotic graph theory; in particular, there is the arithmetic regularity lemma of Green which is analogous to the graph regularity lemma of Szemerédi. (There is also a higher order arithmetic regularity lemma analogous to hypergraph regularity lemmas, but this is not the focus of the discussion here.) Given this, it is natural to suspect that there is a theory of “additive limits” for large additive sets of bounded doubling, analogous to the theory of graph limits for large dense graphs. The purpose of this post is to record a candidate for such an additive limit. This limit can be used as a substitute for the arithmetic regularity lemma in certain results in additive combinatorics, at least if one is willing to settle for qualitative results rather than quantitative ones; I give a few examples of this below the fold.

It seems that to allow for the most flexible and powerful manifestation of this theory, it is convenient to use the nonstandard formulation (among other things, it allows for full use of the transfer principle, whereas a more traditional limit formulation would only allow for a transfer of those quantities continuous with respect to the notion of convergence). Here, the analogue of a nonstandard graph is an ultra approximate group {A_\alpha} in a nonstandard group {G_\alpha = \prod_{n \rightarrow \alpha} G_n}, defined as the ultraproduct of finite {K}-approximate groups {A_n \subset G_n} for some standard {K}. (A {K}-approximate group {A_n} is a symmetric set containing the origin such that {A_n+A_n} can be covered by {K} or fewer translates of {A_n}.) We then let {O(A_\alpha)} be the external subgroup of {G_\alpha} generated by {A_\alpha}; equivalently, {A_\alpha} is the union of {A_\alpha^m} over all standard {m}. This space has a Loeb measure {\mu_{O(A_\alpha)}}, defined by setting

\displaystyle \mu_{O(A_\alpha)}(E_\alpha) := \hbox{st} \frac{|E_\alpha|}{|A_\alpha|}

whenever {E_\alpha} is an internal subset of {A_\alpha^m} for any standard {m}, and extended to a countably additive measure; the arguments in Section 6 of this previous blog post can be easily modified to give a construction of this measure.

The Loeb measure {\mu_{O(A_\alpha)}} is a translation invariant measure on {O(A_{\alpha})}, normalised so that {A_\alpha} has Loeb measure one. As such, one should think of {O(A_\alpha)} as being analogous to a locally compact abelian group equipped with a Haar measure. It should be noted though that {O(A_\alpha)} is not actually a locally compact group with Haar measure, for two reasons:

  • There is not an obvious topology on {O(A_\alpha)} that makes it simultaneously locally compact, Hausdorff, and {\sigma}-compact. (One can get one or two out of three without difficulty, though.)
  • The addition operation {+\colon O(A_\alpha) \times O(A_\alpha) \rightarrow O(A_\alpha)} is not measurable from the product Loeb algebra {{\mathcal L}_{O(A_\alpha)} \times {\mathcal L}_{O(A_\alpha)}} to {{\mathcal L}_{O(\alpha)}}. Instead, it is measurable from the coarser Loeb algebra {{\mathcal L}_{O(A_\alpha) \times O(A_\alpha)}} to {{\mathcal L}_{O(\alpha)}} (compare with the analogous situation for nonstandard graphs).

Nevertheless, the analogy is a useful guide for the arguments that follow.

Let {L(O(A_\alpha))} denote the space of bounded Loeb measurable functions {f\colon O(A_\alpha) \rightarrow {\bf C}} (modulo almost everywhere equivalence) that are supported on {A_\alpha^m} for some standard {m}; this is a complex algebra with respect to pointwise multiplication. There is also a convolution operation {\star\colon L(O(A_\alpha)) \times L(O(A_\alpha)) \rightarrow L(O(A_\alpha))}, defined by setting

\displaystyle  \hbox{st} f \star \hbox{st} g(x) := \hbox{st} \frac{1}{|A_\alpha|} \sum_{y \in A_\alpha^m} f(y) g(x-y)

whenever {f\colon A_\alpha^m \rightarrow {}^* {\bf C}}, {g\colon A_\alpha^l \rightarrow {}^* {\bf C}} are bounded nonstandard functions (extended by zero to all of {O(A_\alpha)}), and then extending to arbitrary elements of {L(O(A_\alpha))} by density. Equivalently, {f \star g} is the pushforward of the {{\mathcal L}_{O(A_\alpha) \times O(A_\alpha)}}-measurable function {(x,y) \mapsto f(x) g(y)} under the map {(x,y) \mapsto x+y}.

The basic structural theorem is then as follows.

Theorem 1 (Kronecker factor) Let {A_\alpha} be an ultra approximate group. Then there exists a (standard) locally compact abelian group {G} of the form

\displaystyle  G = {\bf R}^d \times {\bf Z}^m \times T

for some standard {d,m} and some compact abelian group {T}, equipped with a Haar measure {\mu_G} and a measurable homomorphism {\pi\colon O(A_\alpha) \rightarrow G} (using the Loeb {\sigma}-algebra on {O(A_\alpha)} and the Borel {\sigma}-algebra on {G}), with the following properties:

  • (i) {\pi} has dense image, and {\mu_G} is the pushforward of Loeb measure {\mu_{O(A_\alpha)}} by {\pi}.
  • (ii) There exists sets {\{0\} \subset U_0 \subset K_0 \subset G} with {U_0} open and {K_0} compact, such that

    \displaystyle  \pi^{-1}(U_0) \subset 4A_\alpha \subset \pi^{-1}(K_0). \ \ \ \ \ (1)

  • (iii) Whenever {K \subset U \subset G} with {K} compact and {U} open, there exists a nonstandard finite set {B} such that

    \displaystyle  \pi^{-1}(K) \subset B \subset \pi^{-1}(U). \ \ \ \ \ (2)

  • (iv) If {f, g \in L}, then we have the convolution formula

    \displaystyle  f \star g = \pi^*( (\pi_* f) \star (\pi_* g) ) \ \ \ \ \ (3)

    where {\pi_* f,\pi_* g} are the pushforwards of {f,g} to {L^2(G, \mu_G)}, the convolution {\star} on the right-hand side is convolution using {\mu_G}, and {\pi^*} is the pullback map from {L^2(G,\mu_G)} to {L^2(O(A_\alpha), \mu_{O(A_\alpha)})}. In particular, if {\pi_* f = 0}, then {f*g=0} for all {g \in L}.

One can view the locally compact abelian group {G} as a “model “or “Kronecker factor” for the ultra approximate group {A_\alpha} (in close analogy with the Kronecker factor from ergodic theory). In the case that {A_\alpha} is a genuine nonstandard finite group rather than an ultra approximate group, the non-compact components {{\bf R}^d \times {\bf Z}^m} of the Kronecker group {G} are trivial, and this theorem was implicitly established by Szegedy. The compact group {T} is quite large, and in particular is likely to be inseparable; but as with the case of graphons, when one is only studying at most countably many functions {f}, one can cut down the size of this group to be separable (or equivalently, second countable or metrisable) if desired, so one often works with a “reduced Kronecker factor” which is a quotient of the full Kronecker factor {G}.

Given any sequence of uniformly bounded functions {f_n\colon A_n^m \rightarrow {\bf C}} for some fixed {m}, we can view the function {f \in L} defined by

\displaystyle  f := \pi_* \hbox{st} \lim_{n \rightarrow \alpha} f_n \ \ \ \ \ (4)

as an “additive limit” of the {f_n}, in much the same way that graphons {p\colon V_\alpha \times V_\alpha \rightarrow [0,1]} are limits of the indicator functions {1_{E_n}\colon V_n \times V_n \rightarrow \{0,1\}}. The additive limits capture some of the statistics of the {f_n}, for instance the normalised means

\displaystyle  \frac{1}{|A_n|} \sum_{x \in A_n^m} f_n(x)

converge (along the ultrafilter {\alpha}) to the mean

\displaystyle  \int_G f(x)\ d\mu_G(x),

and for three sequences {f_n,g_n,h_n\colon A_n^m \rightarrow {\bf C}} of functions, the normalised correlation

\displaystyle  \frac{1}{|A_n|^2} \sum_{x,y \in A_n^m} f_n(x) g_n(y) h_n(x+y)

converges along {\alpha} to the correlation

\displaystyle  \int_G \int_G f(x) g(y) h(x+y)\ d\mu_G(x) d\mu_G(y),

the normalised {U^2} Gowers norm

\displaystyle  ( \frac{1}{|A_n|^3} \sum_{x,y,z,w \in A_n^m: x+w=y+z} f_n(x) \overline{f_n(y)} \overline{f_n(z)} f_n(w))^{1/4}

converges along {\alpha} to the {U^2} Gowers norm

\displaystyle  ( \int_{G \times G \times G} f(x) \overline{f(y)} \overline{f(z)} f_n(x+y-z)\ d\mu_G(x) d\mu_G(y) d\mu_G(z))^{1/4}

and so forth. We caution however that some correlations that involve evaluating more than one function at the same point will not necessarily be preserved in the additive limit; for instance the normalised {\ell^2} norm

\displaystyle  (\frac{1}{|A_n|} \sum_{x \in A_n^m} |f_n(x)|^2)^{1/2}

does not necessarily converge to the {L^2} norm

\displaystyle  (\int_G |f(x)|^2\ d\mu_G(x))^{1/2},

but can converge instead to a larger quantity, due to the presence of the orthogonal projection {\pi_*} in the definition (4) of {f}.

An important special case of an additive limit occurs when the functions {f_n\colon A_n^m \rightarrow {\bf C}} involved are indicator functions {f_n = 1_{E_n}} of some subsets {E_n} of {A_n^m}. The additive limit {f \in L} does not necessarily remain an indicator function, but instead takes values in {[0,1]} (much as a graphon {p} takes values in {[0,1]} even though the original indicators {1_{E_n}} take values in {\{0,1\}}). The convolution {f \star f\colon G \rightarrow [0,1]} is then the ultralimit of the normalised convolutions {\frac{1}{|A_n|} 1_{E_n} \star 1_{E_n}}; in particular, the measure of the support of {f \star f} provides a lower bound on the limiting normalised cardinality {\frac{1}{|A_n|} |E_n + E_n|} of a sumset. In many situations this lower bound is an equality, but this is not necessarily the case, because the sumset {2E_n = E_n + E_n} could contain a large number of elements which have very few ({o(|A_n|)}) representations as the sum of two elements of {E_n}, and in the limit these portions of the sumset fall outside of the support of {f \star f}. (One can think of the support of {f \star f} as describing the “essential” sumset of {2E_n = E_n + E_n}, discarding those elements that have only very few representations.) Similarly for higher convolutions of {f}. Thus one can use additive limits to partially control the growth {k E_n} of iterated sumsets of subsets {E_n} of approximate groups {A_n}, in the regime where {k} stays bounded and {n} goes to infinity.

Theorem 1 can be proven by Fourier-analytic means (combined with Freiman’s theorem from additive combinatorics), and we will do so below the fold. For now, we give some illustrative examples of additive limits.

Example 1 (Bohr sets) We take {A_n} to be the intervals {A_n := \{ x \in {\bf Z}: |x| \leq N_n \}}, where {N_n} is a sequence going to infinity; these are {2}-approximate groups for all {n}. Let {\theta} be an irrational real number, let {I} be an interval in {{\bf R}/{\bf Z}}, and for each natural number {n} let {B_n} be the Bohr set

\displaystyle  B_n := \{ x \in A^{(n)}: \theta x \hbox{ mod } 1 \in I \}.

In this case, the (reduced) Kronecker factor {G} can be taken to be the infinite cylinder {{\bf R} \times {\bf R}/{\bf Z}} with the usual Lebesgue measure {\mu_G}. The additive limits of {1_{A_n}} and {1_{B_n}} end up being {1_A} and {1_B}, where {A} is the finite cylinder

\displaystyle  A := \{ (x,t) \in {\bf R} \times {\bf R}/{\bf Z}: x \in [-1,1]\}

and {B} is the rectangle

\displaystyle  B := \{ (x,t) \in {\bf R} \times {\bf R}/{\bf Z}: x \in [-1,1]; t \in I \}.

Geometrically, one should think of {A_n} and {B_n} as being wrapped around the cylinder {{\bf R} \times {\bf R}/{\bf Z}} via the homomorphism {x \mapsto (\frac{x}{N_n}, \theta x \hbox{ mod } 1)}, and then one sees that {B_n} is converging in some normalised weak sense to {B}, and similarly for {A_n} and {A}. In particular, the additive limit predicts the growth rate of the iterated sumsets {kB_n} to be quadratic in {k} until {k|I|} becomes comparable to {1}, at which point the growth transitions to linear growth, in the regime where {k} is bounded and {n} is large.

If {\theta = \frac{p}{q}} were rational instead of irrational, then one would need to replace {{\bf R}/{\bf Z}} by the finite subgroup {\frac{1}{q}{\bf Z}/{\bf Z}} here.

Example 2 (Structured subsets of progressions) We take {A_n} be the rank two progression

\displaystyle  A_n := \{ a + b N_n^2: a,b \in {\bf Z}; |a|, |b| \leq N_n \},

where {N_n} is a sequence going to infinity; these are {4}-approximate groups for all {n}. Let {B_n} be the subset

\displaystyle  B_n := \{ a + b N_n^2: a,b \in {\bf Z}; |a|^2 + |b|^2 \leq N_n^2 \}.

Then the (reduced) Kronecker factor can be taken to be {G = {\bf R}^2} with Lebesgue measure {\mu_G}, and the additive limits of the {1_{A_n}} and {1_{B_n}} are then {1_A} and {1_B}, where {A} is the square

\displaystyle  A := \{ (a,b) \in {\bf R}^2: |a|, |b| \leq 1 \}

and {B} is the circle

\displaystyle  B := \{ (a,b) \in {\bf R}^2: a^2+b^2 \leq 1 \}.

Geometrically, the picture is similar to the Bohr set one, except now one uses a Freiman homomorphism {a + b N_n^2 \mapsto (\frac{a}{N_n}, \frac{b}{N_n})} for {a,b = O( N_n )} to embed the original sets {A_n, B_n} into the plane {{\bf R}^2}. In particular, one now expects the growth rate of the iterated sumsets {k A_n} and {k B_n} to be quadratic in {k}, in the regime where {k} is bounded and {n} is large.

Example 3 (Dissociated sets) Let {d} be a fixed natural number, and take

\displaystyle  A_n = \{0, v_1,\dots,v_d,-v_1,\dots,-v_d \}

where {v_1,\dots,v_d} are randomly chosen elements of a large cyclic group {{\bf Z}/p_n{\bf Z}}, where {p_n} is a sequence of primes going to infinity. These are {O(d)}-approximate groups. The (reduced) Kronecker factor {G} can (almost surely) then be taken to be {{\bf Z}^d} with counting measure, and the additive limit of {1_{A_n}} is {1_A}, where {A = \{ 0, e_1,\dots,e_d,-e_1,\dots,-e_d\}} and {e_1,\dots,e_d} is the standard basis of {{\bf Z}^d}. In particular, the growth rates of {k A_n} should grow approximately like {k^d} for {k} bounded and {n} large.

Example 4 (Random subsets of groups) Let {A_n = G_n} be a sequence of finite additive groups whose order is going to infinity. Let {B_n} be a random subset of {G_n} of some fixed density {0 \leq \lambda \leq 1}. Then (almost surely) the Kronecker factor here can be reduced all the way to the trivial group {\{0\}}, and the additive limit of the {1_{B_n}} is the constant function {\lambda}. The convolutions {\frac{1}{|G_n|} 1_{B_n} * 1_{B_n}} then converge in the ultralimit (modulo almost everywhere equivalence) to the pullback of {\lambda^2}; this reflects the fact that {(1-o(1))|G_n|} of the elements of {G_n} can be represented as the sum of two elements of {B_n} in {(\lambda^2 + o(1)) |G_n|} ways. In particular, {B_n+B_n} occupies a proportion {1-o(1)} of {G_n}.

Example 5 (Trigonometric series) Take {A_n = G_n = {\bf Z}/p_n {\bf C}} for a sequence {p_n} of primes going to infinity, and for each {n} let {\xi_{n,1},\xi_{n,2},\dots} be an infinite sequence of frequencies chosen uniformly and independently from {{\bf Z}/p_n{\bf Z}}. Let {f_n\colon {\bf Z}/p_n{\bf Z} \rightarrow {\bf C}} denote the random trigonometric series

\displaystyle  f_n(x) := \sum_{j=1}^\infty 2^{-j} e^{2\pi i \xi_{n,j} x / p_n }.

Then (almost surely) we can take the reduced Kronecker factor {G} to be the infinite torus {({\bf R}/{\bf Z})^{\bf N}} (with the Haar probability measure {\mu_G}), and the additive limit of the {f_n} then becomes the function {f\colon ({\bf R}/{\bf Z})^{\bf N} \rightarrow {\bf R}} defined by the formula

\displaystyle  f( (x_j)_{j=1}^\infty ) := \sum_{j=1}^\infty e^{2\pi i x_j}.

In fact, the pullback {\pi^* f} is the ultralimit of the {f_n}. As such, for any standard exponent {1 \leq q < \infty}, the normalised {l^q} norm

\displaystyle  (\frac{1}{p_n} \sum_{x \in {\bf Z}/p_n{\bf Z}} |f_n(x)|^q)^{1/q}

can be seen to converge to the limit

\displaystyle  (\int_{({\bf R}/{\bf Z})^{\bf N}} |f(x)|^q\ d\mu_G(x))^{1/q}.

The reader is invited to consider combinations of the above examples, e.g. random subsets of Bohr sets, to get a sense of the general case of Theorem 1.

It is likely that this theorem can be extended to the noncommutative setting, using the noncommutative Freiman theorem of Emmanuel Breuillard, Ben Green, and myself, but I have not attempted to do so here (see though this recent preprint of Anush Tserunyan for some related explorations); in a separate direction, there should be extensions that can control higher Gowers norms, in the spirit of the work of Szegedy.

Note: the arguments below will presume some familiarity with additive combinatorics and with nonstandard analysis, and will be a little sketchy in places.

Read the rest of this entry »

Roth’s theorem on arithmetic progressions asserts that every subset of the integers {{\bf Z}} of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:

Theorem 1 (Roth’s theorem) Let {G = (G,+)} be a compact abelian group, with Haar probability measure {\mu}, which is {2}-divisible (i.e. the map {x \mapsto 2x} is surjective) and let {A} be a measurable subset of {G} with {\mu(A) \geq \alpha} for some {0 < \alpha < 1}. Then we have

\displaystyle  \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r)\ d\mu(x) d\mu(r) \gg_\alpha 1,

where {X \gg_\alpha Y} denotes the bound {X \geq c_\alpha Y} for some {c_\alpha > 0} depending only on {\alpha}.

This theorem is usually formulated in the case that {G} is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group {G = {\bf Z}/N{\bf Z}} of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of {2}-divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant {c_\alpha} on {\alpha}, but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the {2}-divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the {2r} shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let {\hat G} denote the Pontryagin dual of a compact abelian group {G}, that is to say the set of all continuous homomorphisms {\xi: x \mapsto \xi \cdot x} from {G} to the (additive) unit circle {{\bf R}/{\bf Z}}. Thus {\hat G} is a discrete abelian group, and functions {f \in L^2(G)} have a Fourier transform {\hat f \in \ell^2(\hat G)} defined by

\displaystyle  \hat f(\xi) := \int_G f(x) e^{-2\pi i \xi \cdot x}\ d\mu(x).

If {G} is {2}-divisible, then {\hat G} is {2}-torsion-free in the sense that the map {\xi \mapsto 2 \xi} is injective. For any finite set {S \subset \hat G} and any radius {\rho>0}, define the Bohr set

\displaystyle  B(S,\rho) := \{ x \in G: \sup_{\xi \in S} \| \xi \cdot x \|_{{\bf R}/{\bf Z}} < \rho \}

where {\|\theta\|_{{\bf R}/{\bf Z}}} denotes the distance of {\theta} to the nearest integer. We refer to the cardinality {|S|} of {S} as the rank of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound) Let {G} be a compact abelian group with Haar probability measure {\mu}. For any Bohr set {B(S,\rho)}, we have

\displaystyle  \mu( B( S, \rho ) ) \gg_{|S|, \rho} 1.

Proof: We can cover the torus {({\bf R}/{\bf Z})^S} by {O_{|S|,\rho}(1)} translates {\theta+Q} of the cube {Q := \{ (\theta_\xi)_{\xi \in S} \in ({\bf R}/{\bf Z})^S: \sup_{\xi \in S} \|\theta_\xi\|_{{\bf R}/{\bf Z}} < \rho/2 \}}. Then the sets {\{ x \in G: (\xi \cdot x)_{\xi \in S} \in \theta + Q \}} form an cover of {G}. But all of these sets lie in a translate of {B(S,\rho)}, and the claim then follows from the translation invariance of {\mu}. \Box

Given any Bohr set {B(S,\rho)}, we define a normalised “Lipschitz” cutoff function {\nu_{B(S,\rho)}: G \rightarrow {\bf R}} by the formula

\displaystyle  \nu_{B(S,\rho)}(x) = c_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})_+ \ \ \ \ \ (1)

where {c_{B(S,\rho)}} is the constant such that

\displaystyle  \int_G \nu_{B(S,\rho)}\ d\mu = 1,

thus

\displaystyle c_{B(S,\rho)} = \left( \int_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})\ d\mu(x) \right)^{-1}.

The function {\nu_{B(S,\rho)}} should be viewed as an {L^1}-normalised “tent function” cutoff to {B(S,\rho)}. Note from Lemma 2 that

\displaystyle  1 \ll_{|S|,\rho} c_{B(S,\rho)} \ll_{|S|,\rho} 1. \ \ \ \ \ (2)

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem) Let {G = (G,+)} be a {2}-divisible compact abelian group, with Haar probability measure {\mu}, and let {\epsilon>0}. Then for any measurable function {f: G \rightarrow [0,1]}, there exists a Bohr set {B(S,\rho)} with {|S| \ll_\epsilon 1} and {\rho \gg_\epsilon 1} such that

\displaystyle  \int_G \int_G f(x) f(x+r) f(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \ \ \ \ \ (3)

\displaystyle  \geq (\int_G f\ d\mu)^3 - O(\epsilon)

where {*} denotes the convolution operation

\displaystyle  f*g(x) := \int_G f(y) g(x-y)\ d\mu(y).

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with {f := 1_A} and {\epsilon} equal to a small multiple of {\alpha^3} to conclude that there is a Bohr set {B(S,\rho)} with {|S| \ll_\alpha 1} and {\rho \gg_\alpha 1} such that

\displaystyle  \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \gg \alpha^3.

But from (2) we have the pointwise bound {\nu_{B(S,\rho)}*\nu_{B(S,\rho)} \ll_\alpha 1}, and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set {B(S,\rho)} to capture all the “large Fourier coefficients” of {f}, but such that a certain “dilate” of {B(S,\rho)} does not capture much more Fourier energy of {f} than {B(S,\rho)} itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of {f} into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the {\nu_{B(S,\rho)}} considered above to achieve a similar effect.

Read the rest of this entry »

A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of additive combinatorics) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets

\displaystyle  A+B := \{ a+b: a \in A, b \in B \}

and difference sets

\displaystyle  A-B := \{ a-b: a \in A, b \in B \},

as well as iterated sumsets such as {3A=A+A+A}, {2A-2A=A+A-A-A}, and so forth. Here, {A, B} are finite non-empty subsets of some additive group {G = (G,+)} (classically one took {G={\bf Z}} or {G={\bf R}}, but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma) Let {A, B} be finite non-empty subsets of {G}. Then {A} may be covered by at most {\frac{|A+B|}{|B|}} translates of {B-B}.

Proof: Consider a maximal set of disjoint translates {a+B} of {B} by elements {a \in A}. These translates have cardinality {|B|}, are disjoint, and lie in {A+B}, so there are at most {\frac{|A+B|}{|B|}} of them. By maximality, for any {a' \in A}, {a'+B} must intersect at least one of the selected {a+B}, thus {a' \in a+B-B}, and the claim follows. \Box

Lemma 2 (Ruzsa triangle inequality) Let {A,B,C} be finite non-empty subsets of {G}. Then {|A-C| \leq \frac{|A-B| |B-C|}{|B|}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map of cardinality at least {|B|}, thanks to the obvious identity {a-c = (a-b) + (b-c)} for each {b \in B}. Since {(A-B) \times (B-C)} has cardinality {|A-B| |B-C|}, the claim follows. \Box

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets {A} of small doubling, in the sense that {|A+A| \leq K|A|} for some bounded {K}. (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in {K} (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in {K}.)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. {{\bf R}^n} or a torus {({\bf R}/{\bf Z})^n}) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets {A} remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group {O_n({\bf R})} of orthogonal {n \times n} matrices, and let {A} be the matrices obtained by starting with all of the orthogonal matrice in {O_n({\bf R})} and rounding each coefficient of each matrix in this set to the nearest multiple of {\epsilon}, for some small {\epsilon>0}. This forms a finite set (whose cardinality grows as {\epsilon\rightarrow 0} like a certain negative power of {\epsilon}). In the limit {\epsilon \rightarrow 0}, the set {A} is not a set of small doubling in the discrete sense. However, {A \cdot A} is still close to {A} in a metric sense, being contained in the {O_n(\epsilon)}-neighbourhood of {A}. Another key example comes from graphs {\Gamma := \{ (x, f(x)): x \in G \}} of maps {f: A \rightarrow H} from a subset {A} of one additive group {G = (G,+)} to another {H = (H,+)}. If {f} is “approximately additive” in the sense that for all {x,y \in G}, {f(x+y)} is close to {f(x)+f(y)} in some metric, then {\Gamma} might not have small doubling in the discrete sense (because {f(x+y)-f(x)-f(y)} could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric {d} on the underlying group {G = (G,+)} (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3 Let {(X,d)} be a metric space, let {E} be a subset of {X}, and let {r>0} be a radius.

  • The packing number {N^{pack}_r(E)} is the largest number of points {x_1,\dots,x_n} one can pack inside {E} such that the balls {B(x_1,r),\dots,B(x_n,r)} are disjoint.
  • The internal covering number {N^{int}_r(E)} is the fewest number of points {x_1,\dots,x_n \in E} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The external covering number {N^{ext}_r(E)} is the fewest number of points {x_1,\dots,x_n \in X} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The metric entropy {N^{ent}_r(E)} is the largest number of points {x_1,\dots,x_n} one can find in {E} that are {r}-separated, thus {d(x_i,x_j) \geq r} for all {i \neq j}.

It is an easy exercise to verify the inequalities

\displaystyle  N^{ent}_{2r}(E) \leq N^{pack}_r(E) \leq N^{ext}_r(E) \leq N^{int}_r(E) \leq N^{ent}_r(E)

for any {r>0}, and that {N^*_r(E)} is non-increasing in {r} and non-decreasing in {E} for the three choices {* = pack,ext,ent} (but monotonicity in {E} can fail for {*=int}!). It turns out that the external covering number {N^{ent}_r(E)} is slightly more convenient than the other notions of metric entropy, so we will abbreviate {N_r(E) = N^{ent}_r(E)}. The cardinality {|E|} can be viewed as the limit of the entropies {N^*_r(E)} as {r \rightarrow 0}.

If we have the bounded doubling property that {B(0,2r)} is covered by {O(1)} translates of {B(0,r)} for each {r>0}, and one has a Haar measure {m} on {G} which assigns a positive finite mass to each ball, then any of the above entropies {N^*_r(E)} is comparable to {m( E + B(0,r) ) / m(B(0,r))}, as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter {r} by some absolute constant. Such losses can be acceptable in applications in which the underlying sets {A} are largely “transverse” to the balls {B(0,r)}, so that the {N_r}-entropy of {A} is largely independent of {A}; this is a situation which arises in particular in the case of graphs {\Gamma = \{ (x,f(x)): x \in G \}} discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group {G} equipped with a translation-invariant metric {d}. (One can also generalise things slightly by allowing the metric to attain the values {0} or {+\infty}, without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set {E} will have finite entropy {N_r(E)} for any {r>0}. We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma) Let {A, B} be precompact non-empty subsets of {G}, and let {r>0}. Then {A} may be covered by at most {\frac{N_r(A+B)}{N_r(B)}} translates of {B-B+B(0,2r)}.

Proof: Let {a_1,\dots,a_n \in A} be a maximal set of points such that the sets {a_i + B + B(0,r)} are all disjoint. Then the sets {a_i+B} are disjoint in {A+B} and have entropy {N_r(a_i+B)=N_r(B)}, and furthermore any ball of radius {r} can intersect at most one of the {a_i+B}. We conclude that {N_r(A+B) \geq n N_r(B)}, so {n \leq \frac{N_r(A+B)}{N_r(B)}}. If {a \in A}, then {a+B+B(0,r)} must intersect one of the {a_i + B + B(0,r)}, so {a \in a_i + B-B + B(0,2r)}, and the claim follows. \Box

Lemma 5 (Ruzsa triangle inequality) Let {A,B,C} be precompact non-empty subsets of {G}, and let {r>0}. Then {N_{4r}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(B)}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. The domain {(A-B) \times (B-C)} may be covered by {N_r(A-B) N_r(B-C)} product balls {B(x,r) \times B(y,r)}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map which projects to a translate of {B}, and thus must meet at least {N_r(B)} of these product balls. However, if two elements of {A-C} are separated by a distance of at least {4r}, then no product ball can intersect both preimages. We thus see that {N_{4r}^{ent}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(A-C)}}, and the claim follows. \Box

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

Read the rest of this entry »

Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our survey “Small doubling in groups“, for the proceedings of the upcoming Erdos Centennial.  This is a short survey of the known results on classifying finite subsets A of an (abelian) additive group G = (G,+) or a (not necessarily abelian) multiplicative group G = (G,\cdot) that have small doubling in the sense that the sum set A+A or product set A \cdot A is small.  Such sets behave approximately like finite subgroups of G (and there is a closely related notion of an approximate group in which the analogy is even tighter) , and so this subject can be viewed as a sort of approximate version of finite group theory.  (Unfortunately, thus far the theory does not have much new to say about the classification of actual finite groups; progress has been largely made instead on classifying the (highly restricted) number of ways in which approximate groups can differ from a genuine group.)

In the classical case when G is the integers {\mathbb Z}, these sets were classified (in a qualitative sense, at least) by a celebrated theorem of Freiman, which roughly speaking says that such sets A are necessarily “commensurate” in some sense with a (generalised) arithmetic progression P of bounded rank.   There are a number of essentially equivalent ways to define what “commensurate” means here; for instance, in the original formulation of the theorem, one asks that A be a dense subset of P, but in modern formulations it is often more convenient to require instead that A be of comparable size to P and be covered by a bounded number of translates of P, or that A and P have an intersection that is of comparable size to both A and P (cf. the notion of commensurability in group theory).

Freiman’s original theorem was extended to more general abelian groups in a sequence of papers culminating in the paper of Green and Ruzsa that handled arbitrary abelian groups.   As such groups now contain non-trivial finite subgroups, the conclusion of the theorem must be  modified by allowing for “coset progressions” P+H, which can be viewed as “extensions”  of generalized arithmetic progressions P by genuine finite groups H.

The proof methods in these abelian results were Fourier-analytic in nature (except in the cases of sets of very small doubling, in which more combinatorial approaches can be applied, and there were also some geometric or combinatorial methods that gave some weaker structural results).  As such, it was a challenge to extend these results to nonabelian groups, although for various important special types of ambient group G (such as an linear group over a finite or infinite field) it turns out that one can use tools exploiting the special structure of those groups (e.g. for linear groups one would use tools from Lie theory and algebraic geometry) to obtain quite satisfactory results; see e.g. this survey of  Pyber and Szabo for the linear case.   When the ambient group G is completely arbitrary, it turns out the problem is closely related to the classical Hilbert’s fifth problem of determining the minimal requirements of a topological group in order for such groups to have Lie structure; this connection was first observed and exploited by Hrushovski, and then used by Breuillard, Green, and myself to obtain the analogue of Freiman’s theorem for an arbitrary nonabelian group.

This survey is too short to discuss in much detail the proof techniques used in these results (although the abelian case is discussed in this book of mine with Vu, and the nonabelian case discussed in this more recent book of mine), but instead focuses on the statements of the various known results, as well as some remaining open questions in the subject (in particular, there is substantial work left to be done in making the estimates more quantitative, particularly in the nonabelian setting).

We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\nu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

Read the rest of this entry »

Let {\alpha \in {\bf R}/{\bf Z}} be an element of the unit circle, let {N \geq 1}, and let {\rho > 0}. We define the (rank one) Bohr set {B_N(\alpha;\rho)} to be the set

\displaystyle B_N(\alpha;\rho) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha\|_{{\bf R}/{\bf Z}} \leq \rho \}

where {\|x\|_{{\bf R}/{\bf Z}}} is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to {{\bf R}}). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as {n \mapsto e^{2\pi i n \alpha}}.

Observe that Bohr sets enjoy the doubling property

\displaystyle B_N(\alpha;\rho) + B_N(\alpha;\rho) \subset B_{2N}(\alpha;2\rho),

thus doubling the Bohr set doubles both the length parameter {N} and the radius parameter {\rho}. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view {B_N(\alpha;\rho)} as the preimage of the two-dimensional box {[-1,1] \times [-\rho,\rho] \subset {\bf R} \times {\bf R}/{\bf Z}} under the homomorphism {n \mapsto (n/N, \alpha n \hbox{ mod } 1)}.

Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions

\displaystyle P( a_1,a_2; N_1,N_2 ) := \{ n_1 a_1 + n_2 a_2: n_1,n_2 \in {\bf Z}; |n_1| \leq N_1, |n_2| \leq N_2 \}

with {a_1,a_2 \in {\bf Z}} and {N_1,N_2 > 0} Indeed, we have

\displaystyle P( a_1,a_2; N_1,N_2 ) + P( a_1,a_2; N_1,N_2 ) \subset P( a_1,a_2; 2N_1, 2N_2 )

and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.

More generally, there is an analogy between rank {r} Bohr sets

\displaystyle B_N(\alpha_1,\ldots,\alpha_r; \rho_1,\ldots,\rho_r) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha_i\|_{{\bf R}/{\bf Z}} \leq \rho_i

\displaystyle \hbox{ for all } 1 \leq i \leq r \}

and the rank {r+1} generalised arithmetic progressions

\displaystyle P( a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1} ) := \{ n_1 a_1 + \ldots + n_{r+1} a_{r+1}:

\displaystyle n_1,\ldots,n_{r+1} \in {\bf Z}; |n_i| \leq N_i \hbox{ for all } 1 \leq i \leq r+1 \}.

One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank {r} Bohr set {B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)}, there is a rank {r+1} generalised arithmetic progression {P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})} for which one has the containments

\displaystyle B_N(\alpha_1,\ldots,\alpha_r;\epsilon \rho_1,\ldots,\epsilon \rho_r) \subset P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})

\displaystyle \subset B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)

for some explicit {\epsilon>0} depending only on {r} (in fact one can take {\epsilon = (r+1)^{-2(r+1)}}); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.

In the special case when {r=1}, one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators {a_1,a_2} and lengths {N_1,N_2} of the generalised arithmetic progression associated to a rank one Bohr set {B_N(\alpha;\rho)}. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.

It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as {{\bf Z}} or {{\bf R}}). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.

Read the rest of this entry »

In 1964, Kemperman established the following result:

Theorem 1 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then

\displaystyle \mu(AB) \geq \min( \mu(A) + \mu(B), 1 ).

Remark 1 The estimate is sharp, as can be seen by considering the case when {G} is a unit circle, and {A, B} are arcs; similarly if {G} is any compact connected group that projects onto the circle. The connectedness hypothesis is essential, as can be seen by considering what happens if {A} and {B} are a non-trivial open subgroup of {G}. For locally compact connected groups which are unimodular but not compact, there is an analogous statement, but with {\mu} now a Haar measure instead of a Haar probability measure, and the right-hand side {\min(\mu(A)+\mu(B),1)} replaced simply by {\mu(A)+\mu(B)}. The case when {G} is a torus is due to Macbeath, and the case when {G} is a circle is due to Raikov. The theorem is closely related to the Cauchy-Davenport inequality; indeed, it is not difficult to use that inequality to establish the circle case, and the circle case can be used to deduce the torus case by considering increasingly dense circle subgroups of the torus (alternatively, one can also use Kneser’s theorem).

By inner regularity, the hypothesis that {A,B} are compact can be replaced with Borel measurability, so long as one then adds the additional hypothesis that {A+B} is also Borel measurable.

A short proof of Kemperman’s theorem was given by Ruzsa. In this post I wanted to record how this argument can be used to establish the following more “robust” version of Kemperman’s theorem, which not only lower bounds {AB}, but gives many elements of {AB} some multiplicity:

Theorem 2 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then for any {0 \leq t \leq \min(\mu(A),\mu(B))}, one has

\displaystyle \int_G \min(1_A*1_B, t)\ d\mu \geq t \min(\mu(A)+\mu(B) - t,1). \ \ \ \ \ (1)

 

Indeed, Theorem 1 can be deduced from Theorem 2 by dividing (1) by {t} and then taking limits as {t \rightarrow 0}. The bound in (1) is sharp, as can again be seen by considering the case when {A,B} are arcs in a circle. The analogous claim for cyclic groups for prime order was established by Pollard, and for general abelian groups by Green and Ruzsa.

Let us now prove Theorem 2. It uses a submodularity argument related to one discussed in this previous post. We fix {B} and {t} with {0 \leq t \leq \mu(B)}, and define the quantity

\displaystyle c(A) := \int_G \min(1_A*1_B, t)\ d\mu - t (\mu(A)+\mu(B)-t).

for any compact set {A}. Our task is to establish that {c(A) \geq 0} whenever {t \leq \mu(A) \leq 1-\mu(B)+t}.

We first verify the extreme cases. If {\mu(A) = t}, then {1_A*1_B \leq t}, and so {c(A)=0} in this case (since {\int_G 1_A*1_B = \mu(A)\mu(B) = t \mu(B)}). At the other extreme, if {\mu(A) = 1-\mu(B)+t}, then from the inclusion-exclusion principle we see that {1_A * 1_B \geq t}, and so again {c(A)=0} in this case.

To handle the intermediate regime when {\mu(A)} lies between {t} and {1-\mu(B)+t}, we rely on the submodularity inequality

\displaystyle c(A_1) + c(A_2) \geq c(A_1 \cap A_2) + c(A_1 \cup A_2) \ \ \ \ \ (2)

 

for arbitrary compact {A_1,A_2}. This inequality comes from the obvious pointwise identity

\displaystyle 1_{A_1} + 1_{A_2} = 1_{A_1 \cap A_2} + 1_{A_1 \cup A_2}

whence

\displaystyle 1_{A_1}*1_B + 1_{A_2}*1_B = 1_{A_1 \cap A_2}*1_B + 1_{A_1 \cup A_2}*1_B

and thus (noting that the quantities on the left are closer to each other than the quantities on the right)

\displaystyle \min(1_{A_1}*1_B,t) + \min(1_{A_2}*1_B,t)

\displaystyle \geq \min(1_{A_1 \cap A_2}*1_B,t) + \min(1_{A_1 \cup A_2}*1_B,t)

at which point (2) follows by integrating over {G} and then using the inclusion-exclusion principle.

Now introduce the function

\displaystyle f(a) := \inf \{ c(A) : \mu(A) = a \}

for {t \leq a \leq 1-\mu(B)+t}. From the preceding discussion {f(a)} vanishes at the endpoints {a = t, 1-\mu(B)+t}; our task is to show that {f(a)} is non-negative in the interior region {t < a < 1-\mu(B)+t}. Suppose for contradiction that this was not the case. It is easy to see that {f} is continuous (indeed, it is even Lipschitz continuous), so there must be {t < a < 1-\mu(B)+t} at which {f} is a local minimum and not locally constant. In particular, {0 <a <1}. But for any {A} with {\mu(A) = a}, we have the translation-invariance

\displaystyle c(gA) = c(A) \ \ \ \ \ (3)

 

for any {g \in G}, and hence by (2)

\displaystyle c(A) \geq \frac{1}{2} c(A \cap gA) + \frac{1}{2} c(A \cup gA ).

Note that {\mu(A \cap gA)} depends continuously on {g}, equals {a} when {g} is the identity, and has an average value of {a^2}. As {G} is connected, we thus see from the intermediate value theorem that for any {0 < \epsilon < a-a^2}, we can find {g} such that {\mu(A \cap gA) = a-\epsilon}, and thus by inclusion-exclusion {\mu(A \cup gA) = a+\epsilon}. By definition of {f}, we thus have

\displaystyle c(A) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon).

Taking infima in {A} (and noting that the hypotheses on {\epsilon} are independent of {A}) we conclude that

\displaystyle f(a) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon)

for all {0 < \epsilon < a-a^2}. As {f} is a local minimum and {\epsilon} is arbitrarily small, this implies that {f} is locally constant, a contradiction. This establishes Theorem 2.

We observe the following corollary:

Corollary 3 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B, C} be compact subsets of {G}, and let {\delta := \min(\mu(A),\mu(B),\mu(C))}. Then one has the pointwise estimate

\displaystyle 1_A * 1_B * 1_C \geq \frac{1}{4} (\mu(A)+\mu(B)+\mu(C)-1)_+^2

if {\mu(A)+\mu(B)+\mu(C)-1 \leq 2 \delta}, and

\displaystyle 1_A * 1_B * 1_C \geq \delta (\mu(A)+\mu(B)+\mu(C)-1-\delta)

if {\mu(A)+\mu(B)+\mu(C)-1 \geq 2 \delta}.

Once again, the bounds are completely sharp, as can be seen by computing {1_A*1_B*1_C} when {A,B,C} are arcs of a circle. For quasirandom {G}, one can do much better than these bounds, as discussed in this recent blog post; thus, the abelian case is morally the worst case here, although it seems difficult to convert this intuition into a rigorous reduction.

Proof: By cyclic permutation we may take {\delta = \mu(C)}. For any

\displaystyle (\mu(A)+\mu(B)-1)_+ \leq t \leq \min(\mu(A),\mu(B)),

we can bound

\displaystyle 1_A*1_B*1_C \geq \min(1_A*1_B,t)*1_C

\displaystyle \geq \int_G \min(1_A*1_B,t)\ d\mu - t (1-\mu(C))

\displaystyle \geq t (\mu(A)+\mu(B)-t) - t (1-\mu(C))

\displaystyle = t \min( \mu(A)+\mu(B)+\mu(C)-1-t )

where we used Theorem 2 to obtain the third line. Optimising in {t}, we obtain the claim. \Box

A few days ago, I received the sad news that Yahya Ould Hamidoune had recently died. Hamidoune worked in additive combinatorics, and had recently solved a question on noncommutative Freiman-Kneser theorems posed by myself on this blog last year. Namely, Hamidoune showed

Theorem 1 (Noncommutative Freiman-Kneser theorem for small doubling) Let {0 < \epsilon \leq 1}, and let {S \subset G} be a finite non-empty subset of a multiplicative group {G} such that {|A \cdot S| \leq (2-\epsilon) |S|} for some finite set {A} of cardinality {|A|} at least {|S|}, where {A \cdot S := \{ as: a \in A, s \in S \}} is the product set of {A} and {S}. Then there exists a finite subgroup {H} of {G} with cardinality {|H| \leq C(\epsilon) |S|}, such that {S} is covered by at most {C'(\epsilon)} right-cosets {H \cdot x} of {H}, where {c(\epsilon), C(\epsilon) > 0} depend only on {\epsilon}.

One can of course specialise here to the case {A=S}, and view this theorem as a classification of those sets {S} of doubling constant at most {2-\epsilon}.

In fact Hamidoune’s argument, which is completely elementary, gives the very nice explicit constants {C(\epsilon) := \frac{2}{\epsilon}} and {C'(\epsilon) := \frac{2}{\epsilon} - 1}, which are essentially optimal except for factors of {2} (as can be seen by considering an arithmetic progression in an additive group). This result was also independently established (in the {A=S} case) by Tom Sanders (unpublished) by a more Fourier-analytic method, in particular drawing on Sanders’ deep results on the Wiener algebra {A(G)} on arbitrary non-commutative groups {G}.

This type of result had previously been known when {2-\epsilon} was less than the golden ratio {\frac{1+\sqrt{5}}{2}}, as first observed by Freiman; see my previous blog post for more discussion.

Theorem 1 is not, strictly speaking, contained in Hamidoune’s paper, but can be extracted from his arguments, which share some similarity with the recent simple proof of the Ruzsa-Plünnecke inequality by Petridis (as discussed by Tim Gowers here), and this is what I would like to do below the fold. I also include (with permission) Sanders’ unpublished argument, which proceeds instead by Fourier-analytic methods. Read the rest of this entry »

In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces {V} over a finite field {{\bf F}}.

In this case, the Gowers norms {U^{d+1}(V)} are closely tied to the space {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} of polynomials of degree at most {d}. Indeed, as noted in Exercise 20 of Notes 4, a function {f: V \rightarrow {\bf C}} of {L^\infty(V)} norm {1} has {U^{d+1}(V)} norm equal to {1} if and only if {f = e(\phi)} for some {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}; thus polynomials solve the “{100\%} inverse problem” for the trivial inequality {\|f\|_{U^{d+1}(V)} \leq \|f\|_{L^\infty(V)}}. They are also a crucial component of the solution to the “{99\%} inverse problem” and “{1\%} inverse problem”. For the former, we will soon show:

Proposition 1 ({99\%} inverse theorem for {U^{d+1}(V)}) Let {f: V \rightarrow {\bf C}} be such that {\|f\|_{L^\infty(V)}} and {\|f\|_{U^{d+1}(V)} \geq 1-\epsilon} for some {\epsilon > 0}. Then there exists {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} such that {\| f - e(\phi)\|_{L^1(V)} = O_{d, {\bf F}}( \epsilon^c )}, where {c = c_d > 0} is a constant depending only on {d}.

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to {99\%} inverse theorem for {U^{d+1}(V)}) If {\|f\|_{L^\infty(V)} \leq 1} and {\|f-e(\phi)\|_{L^1(V)} \leq \epsilon} for some {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}, then {\|F\|_{U^{d+1}(V)} \geq 1 - O_{d,{\bf F}}( \epsilon^c )}, where {c = c_d > 0} is a constant depending only on {d}.

In the {1\%} world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has

Lemma 2 (Converse to the {1\%} inverse theorem for {U^{d+1}(V)}) If {f: V \rightarrow {\bf C}} and {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} are such that {|\langle f, e(\phi) \rangle_{L^2(V)}| \geq \epsilon}, where {\langle f, g \rangle_{L^2(V)} := {\bf E}_{x \in G} f(x) \overline{g(x)}}, then {\|f\|_{U^{d+1}(V)} \geq \epsilon}.

Proof: From the definition of the {U^1} norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

\displaystyle  |\langle f, e(\phi) \rangle| = \| f e(-\phi) \|_{U^1(V)}

\displaystyle  \leq \|f e(-\phi) \|_{U^{d+1}(V)}

\displaystyle  = \|f\|_{U^{d+1}(V)}

and the claim follows. \Box

In the high characteristic case {\hbox{char}({\bf F}) > d} at least, this can be reversed:

Theorem 3 ({1\%} inverse theorem for {U^{d+1}(V)}) Suppose that {\hbox{char}({\bf F}) > d \geq 0}. If {f: V \rightarrow {\bf C}} is such that {\|f\|_{L^\infty(V)} \leq 1} and {\|f\|_{U^{d+1}(V)} \geq \epsilon}, then there exists {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} such that {|\langle f, e(\phi) \rangle_{L^2(V)}| \gg_{\epsilon,d,{\bf F}} 1}.

This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small {d}, the claim is easy:

Exercise 2 Verify the cases {d=0,1} of this theorem. (Hint: to verify the {d=1} case, use the Fourier-analytic identities {\|f\|_{U^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^4)^{1/4}} and {\|f\|_{L^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^2)^{1/2}}, where {\hat V} is the space of all homomorphisms {\xi: x \mapsto \xi \cdot x} from {V} to {{\bf R}/{\bf Z}}, and {\hat f(\xi) := \mathop{\bf E}_{x \in V} f(x) e(-\xi \cdot x)} are the Fourier coefficients of {f}.)

This conjecture for larger values of {d} are more difficult to establish. The {d=2} case of the theorem was established by Ben Green and myself in the high characteristic case {\hbox{char}({\bf F}) > 2}; the low characteristic case {\hbox{char}({\bf F}) = d = 2} was independently and simultaneously established by Samorodnitsky. The cases {d>2} in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} in the above conjecture with the essentially equivalent space of classical polynomials {\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if {\hbox{char}({\bf F}) = 2} and {d \geq 3}; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields) Let {{\bf F} = {\bf F}_p} be a finite field, let {\delta > 0}, and let {A \subset {\bf F}^n} be such that {|A| \geq \delta |{\bf F}^n|}. If {n} is sufficiently large depending on {p,\delta}, then {A} contains an (affine) line {\{ x, x+r, \ldots, x+(p-1)r\}} for some {x,r \in {\bf F}^n} with { r\neq 0}.

Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if {k \geq 1} and {n} is sufficiently large depending on {p,\delta}, then {A} contains an affine {k}-dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

Read the rest of this entry »

A handy inequality in additive combinatorics is the Plünnecke-Ruzsa inequality:

Theorem 1 (Plünnecke-Ruzsa inequality) Let {A, B_1, \ldots, B_m} be finite non-empty subsets of an additive group {G}, such that {|A+B_i| \leq K_i |A|} for all {1 \leq i \leq m} and some scalars {K_1,\ldots,K_m \geq 1}. Then there exists a subset {A'} of {A} such that {|A' + B_1 + \ldots + B_m| \leq K_1 \ldots K_m |A'|}.

The proof uses graph-theoretic techniques. Setting {A=B_1=\ldots=B_m}, we obtain a useful corollary: if {A} has small doubling in the sense that {|A+A| \leq K|A|}, then we have {|mA| \leq K^m |A|} for all {m \geq 1}, where {mA = A + \ldots + A} is the sum of {m} copies of {A}.

In a recent paper, I adapted a number of sum set estimates to the entropy setting, in which finite sets such as {A} in {G} are replaced with discrete random variables {X} taking values in {G}, and (the logarithm of) cardinality {|A|} of a set {A} is replaced by Shannon entropy {{\Bbb H}(X)} of a random variable {X}. (Throughout this note I assume all entropies to be finite.) However, at the time, I was unable to find an entropy analogue of the Plünnecke-Ruzsa inequality, because I did not know how to adapt the graph theory argument to the entropy setting.

I recently discovered, however, that buried in a classic paper of Kaimonovich and Vershik (implicitly in Proposition 1.3, to be precise) there was the following analogue of Theorem 1:

Theorem 2 (Entropy Plünnecke-Ruzsa inequality) Let {X, Y_1, \ldots, Y_m} be independent random variables of finite entropy taking values in an additive group {G}, such that {{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i} for all {1 \leq i \leq m} and some scalars {K_1,\ldots,K_m \geq 1}. Then {{\Bbb H}(X+Y_1+\ldots+Y_m) \leq {\Bbb H}(X) + \log K_1 \ldots K_m}.

In fact Theorem 2 is a bit “better” than Theorem 1 in the sense that Theorem 1 needed to refine the original set {A} to a subset {A'}, but no such refinement is needed in Theorem 2. One corollary of Theorem 2 is that if {{\Bbb H}(X_1+X_2) \leq {\Bbb H}(X) + \log K}, then {{\Bbb H}(X_1+\ldots+X_m) \leq {\Bbb H}(X) + (m-1) \log K} for all {m \geq 1}, where {X_1,\ldots,X_m} are independent copies of {X}; this improves slightly over the analogous combinatorial inequality. Indeed, the function {m \mapsto {\Bbb H}(X_1+\ldots+X_m)} is concave (this can be seen by using the {m=2} version of Theorem 2 (or (2) below) to show that the quantity {{\Bbb H}(X_1+\ldots+X_{m+1})-{\Bbb H}(X_1+\ldots+X_m)} is decreasing in {m}).

Theorem 2 is actually a quick consequence of the submodularity inequality

\displaystyle  {\Bbb H}(W) + {\Bbb H}(X) \leq {\Bbb H}(Y) + {\Bbb H}(Z) \ \ \ \ \ (1)

in information theory, which is valid whenever {X,Y,Z,W} are discrete random variables such that {Y} and {Z} each determine {X} (i.e. {X} is a function of {Y}, and also a function of {Z}), and {Y} and {Z} jointly determine {W} (i.e {W} is a function of {Y} and {Z}). To apply this, let {X, Y, Z} be independent discrete random variables taking values in {G}. Observe that the pairs {(X,Y+Z)} and {(X+Y,Z)} each determine {X+Y+Z}, and jointly determine {(X,Y,Z)}. Applying (1) we conclude that

\displaystyle  {\Bbb H}(X,Y,Z) + {\Bbb H}(X+Y+Z) \leq {\Bbb H}(X,Y+Z) + {\Bbb H}(X+Y,Z)

which after using the independence of {X,Y,Z} simplifies to the sumset submodularity inequality

\displaystyle  {\Bbb H}(X+Y+Z) + {\Bbb H}(Y) \leq {\Bbb H}(X+Y) + {\Bbb H}(Y+Z) \ \ \ \ \ (2)

(this inequality was also recently observed by Madiman; it is the {m=2} case of Theorem 2). As a corollary of this inequality, we see that if {{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}, then

\displaystyle  {\Bbb H}(X+Y_1+\ldots+Y_i) \leq {\Bbb H}(X+Y_1+\ldots+Y_{i-1}) + \log K_i,

and Theorem 2 follows by telescoping series.

The proof of Theorem 2 seems to be genuinely different from the graph-theoretic proof of Theorem 1. It would be interesting to see if the above argument can be somehow adapted to give a stronger version of Theorem 1. Note also that both Theorem 1 and Theorem 2 have extensions to more general combinations of {X,Y_1,\ldots,Y_m} than {X+Y_i}; see this paper and this paper respectively.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 4,047 other followers