You are currently browsing the tag archive for the ‘Freiman’s theorem’ tag.

This fall (starting Monday, September 26), I will be teaching a graduate topics course which I have entitled “Hilbert’s fifth problem and related topics.” The course is going to focus on three related topics:

  • Hilbert’s fifth problem on the topological description of Lie groups, as well as the closely related (local) classification of locally compact groups (the Gleason-Yamabe theorem).
  • Approximate groups in nonabelian groups, and their classification via the Gleason-Yamabe theorem (this is very recent work of Emmanuel Breuillard, Ben Green, Tom Sanders, and myself, building upon earlier work of Hrushovski);
  • Gromov’s theorem on groups of polynomial growth, as proven via the classification of approximate groups (as well as some consequences to fundamental groups of Riemannian manifolds).

I have already blogged about these topics repeatedly in the past (particularly with regard to Hilbert’s fifth problem), and I intend to recycle some of that material in the lecture notes for this course.

The above three families of results exemplify two broad principles (part of what I like to call “the dichotomy between structure and randomness“):

  • (Rigidity) If a group-like object exhibits a weak amount of regularity, then it (or a large portion thereof) often automatically exhibits a strong amount of regularity as well;
  • (Structure) This strong regularity manifests itself either as Lie type structure (in continuous settings) or nilpotent type structure (in discrete settings). (In some cases, “nilpotent” should be replaced by sister properties such as “abelian“, “solvable“, or “polycyclic“.)

Let me illustrate what I mean by these two principles with two simple examples, one in the continuous setting and one in the discrete setting. We begin with a continuous example. Given an {n \times n} complex matrix {A \in M_n({\bf C})}, define the matrix exponential {\exp(A)} of {A} by the formula

\displaystyle  \exp(A) := \sum_{k=0}^\infty \frac{A^k}{k!} = 1 + A + \frac{1}{2!} A^2 + \frac{1}{3!} A^3 + \ldots

which can easily be verified to be an absolutely convergent series.

Exercise 1 Show that the map {A \mapsto \exp(A)} is a real analytic (and even complex analytic) map from {M_n({\bf C})} to {M_n({\bf C})}, and obeys the restricted homomorphism property

\displaystyle  \exp(sA) \exp(tA) = \exp((s+t)A) \ \ \ \ \ (1)

for all {A \in M_n({\bf C})} and {s,t \in {\bf C}}.

Proposition 1 (Rigidity and structure of matrix homomorphisms) Let {n} be a natural number. Let {GL_n({\bf C})} be the group of invertible {n \times n} complex matrices. Let {\Phi: {\bf R} \rightarrow GL_n({\bf C})} be a map obeying two properties:

  • (Group-like object) {\Phi} is a homomorphism, thus {\Phi(s) \Phi(t) = \Phi(s+t)} for all {s,t \in {\bf R}}.
  • (Weak regularity) The map {t \mapsto \Phi(t)} is continuous.


  • (Strong regularity) The map {t \mapsto \Phi(t)} is smooth (i.e. infinitely differentiable). In fact it is even real analytic.
  • (Lie-type structure) There exists a (unique) complex {n \times n} matrix {A} such that {\Phi(t) = \exp(tA)} for all {t \in {\bf R}}.

Proof: Let {\Phi} be as above. Let {\epsilon > 0} be a small number (depending only on {n}). By the homomorphism property, {\Phi(0) = 1} (where we use {1} here to denote the identity element of {GL_n({\bf C})}), and so by continuity we may find a small {t_0>0} such that {\Phi(t) = 1 + O(\epsilon)} for all {t \in [-t_0,t_0]} (we use some arbitrary norm here on the space of {n \times n} matrices, and allow implied constants in the {O()} notation to depend on {n}).

The map {A \mapsto \exp(A)} is real analytic and (by the inverse function theorem) is a diffeomorphism near {0}. Thus, by the inverse function theorem, we can (if {\epsilon} is small enough) find a matrix {B} of size {B = O(\epsilon)} such that {\Phi(t_0) = \exp(B)}. By the homomorphism property and (1), we thus have

\displaystyle  \Phi(t_0/2)^2 = \Phi(t_0) = \exp(B) = \exp(B/2)^2.

On the other hand, by another application of the inverse function theorem we see that the squaring map {A \mapsto A^2} is a diffeomorphism near {1} in {GL_n({\bf C})}, and thus (if {\epsilon} is small enough)

\displaystyle  \Phi(t_0/2) = \exp(B/2).

We may iterate this argument (for a fixed, but small, value of {\epsilon}) and conclude that

\displaystyle  \Phi(t_0/2^k) = \exp(B/2^k)

for all {k = 0,1,2,\ldots}. By the homomorphism property and (1) we thus have

\displaystyle  \Phi(qt_0) = \exp(qB)

whenever {q} is a dyadic rational, i.e. a rational of the form {a/2^k} for some integer {a} and natural number {k}. By continuity we thus have

\displaystyle  \Phi(st_0) = \exp(sB)

for all real {s}. Setting {A := B/t_0} we conclude that

\displaystyle  \Phi(t) = \exp(tA)

for all real {t}, which gives existence of the representation and also real analyticity and smoothness. Finally, uniqueness of the representation {\Phi(t) = \exp(tA)} follows from the identity

\displaystyle  A = \frac{d}{dt} \exp(tA)|_{t=0}.


Exercise 2 Generalise Proposition 1 by replacing the hypothesis that {\Phi} is continuous with the hypothesis that {\Phi} is Lebesgue measurable (Hint: use the Steinhaus theorem.). Show that the proposition fails (assuming the axiom of choice) if this hypothesis is omitted entirely.

Note how one needs both the group-like structure and the weak regularity in combination in order to ensure the strong regularity; neither is sufficient on its own. We will see variants of the above basic argument throughout the course. Here, the task of obtaining smooth (or real analytic structure) was relatively easy, because we could borrow the smooth (or real analytic) structure of the domain {{\bf R}} and range {M_n({\bf C})}; but, somewhat remarkably, we shall see that one can still build such smooth or analytic structures even when none of the original objects have any such structure to begin with.

Now we turn to a second illustration of the above principles, namely Jordan’s theorem, which uses a discreteness hypothesis to upgrade Lie type structure to nilpotent (and in this case, abelian) structure. We shall formulate Jordan’s theorem in a slightly stilted fashion in order to emphasise the adherence to the above-mentioned principles.

Theorem 2 (Jordan’s theorem) Let {G} be an object with the following properties:

  • (Group-like object) {G} is a group.
  • (Discreteness) {G} is finite.
  • (Lie-type structure) {G} is contained in {U_n({\bf C})} (the group of unitary {n \times n} matrices) for some {n}.

Then there is a subgroup {G'} of {G} such that

  • ({G'} is close to {G}) The index {|G/G'|} of {G'} in {G} is {O_n(1)} (i.e. bounded by {C_n} for some quantity {C_n} depending only on {n}).
  • (Nilpotent-type structure) {G'} is abelian.

A key observation in the proof of Jordan’s theorem is that if two unitary elements {g, h \in U_n({\bf C})} are close to the identity, then their commutator {[g,h] = g^{-1}h^{-1}gh} is even closer to the identity (in, say, the operator norm {\| \|_{op}}). Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have

\displaystyle  \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}

\displaystyle  = \| (g-1)(h-1) - (h-1)(g-1) \|_{op}

and so by the triangle inequality

\displaystyle  \| [g,h] - 1 \|_{op} \leq 2 \|g-1\|_{op} \|h-1\|_{op}. \ \ \ \ \ (2)

Now we can prove Jordan’s theorem.

Proof: We induct on {n}, the case {n=1} being trivial. Suppose first that {G} contains a central element {g} which is not a multiple of the identity. Then, by definition, {G} is contained in the centraliser {Z(g)} of {g}, which by the spectral theorem is isomorphic to a product {U_{n_1}({\bf C}) \times \ldots \times U_{n_k}({\bf C})} of smaller unitary groups. Projecting {G} to each of these factor groups and applying the induction hypothesis, we obtain the claim.

Thus we may assume that {G} contains no central elements other than multiples of the identity. Now pick a small {\epsilon > 0} (one could take {\epsilon=\frac{1}{10n}} in fact) and consider the subgroup {G'} of {G} generated by those elements of {G} that are within {\epsilon} of the identity (in the operator norm). By considering a maximal {\epsilon}-net of {G} we see that {G'} has index at most {O_{n,\epsilon}(1)} in {G}. By arguing as before, we may assume that {G'} has no central elements other than multiples of the identity.

If {G'} consists only of multiples of the identity, then we are done. If not, take an element {g} of {G'} that is not a multiple of the identity, and which is as close as possible to the identity (here is where we crucially use that {G} is finite). By (2), we see that if {\epsilon} is sufficiently small depending on {n}, and if {h} is one of the generators of {G'}, then {[g,h]} lies in {G'} and is closer to the identity than {g}, and is thus a multiple of the identity. On the other hand, {[g,h]} has determinant {1}. Given that it is so close to the identity, it must therefore be the identity (if {\epsilon} is small enough). In other words, {g} is central in {G'}, and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done. \Box

Commutator estimates such as (2) will play a fundamental role in many of the arguments we will see in this course; as we saw above, such estimates combine very well with a discreteness hypothesis, but will also be very useful in the continuous setting.

Exercise 3 Generalise Jordan’s theorem to the case when {G} is a finite subgroup of {GL_n({\bf C})} rather than of {U_n({\bf C})}. (Hint: The elements of {G} are not necessarily unitary, and thus do not necessarily preserve the standard Hilbert inner product of {{\bf C}^n}. However, if one averages that inner product by the finite group {G}, one obtains a new inner product on {{\bf C}^n} that is preserved by {G}, which allows one to conjugate {G} to a subgroup of {U_n({\bf C})}. This averaging trick is (a small) part of Weyl’s unitary trick in representation theory.)

Exercise 4 (Inability to discretise nonabelian Lie groups) Show that if {n \geq 3}, then the orthogonal group {O_n({\bf R})} cannot contain arbitrarily dense finite subgroups, in the sense that there exists an {\epsilon = \epsilon_n > 0} depending only on {n} such that for every finite subgroup {G} of {O_n({\bf R})}, there exists a ball of radius {\epsilon} in {O_n({\bf R})} (with, say, the operator norm metric) that is disjoint from {G}. What happens in the {n=2} case?

Remark 1 More precise classifications of the finite subgroups of {U_n({\bf C})} are known, particularly in low dimensions. For instance, one can show that the only finite subgroups of {SO_3({\bf R})} (which {SU_2({\bf C})} is a double cover of) are isomorphic to either a cyclic group, a dihedral group, or the symmetry group of one of the Platonic solids.

Read the rest of this entry »

A few days ago, I received the sad news that Yahya Ould Hamidoune had recently died. Hamidoune worked in additive combinatorics, and had recently solved a question on noncommutative Freiman-Kneser theorems posed by myself on this blog last year. Namely, Hamidoune showed

Theorem 1 (Noncommutative Freiman-Kneser theorem for small doubling) Let {0 < \epsilon \leq 1}, and let {S \subset G} be a finite non-empty subset of a multiplicative group {G} such that {|A \cdot S| \leq (2-\epsilon) |S|} for some finite set {A} of cardinality {|A|} at least {|S|}, where {A \cdot S := \{ as: a \in A, s \in S \}} is the product set of {A} and {S}. Then there exists a finite subgroup {H} of {G} with cardinality {|H| \leq C(\epsilon) |S|}, such that {S} is covered by at most {C'(\epsilon)} right-cosets {H \cdot x} of {H}, where {C(\epsilon), C'(\epsilon) > 0} depend only on {\epsilon}.

One can of course specialise here to the case {A=S}, and view this theorem as a classification of those sets {S} of doubling constant at most {2-\epsilon}.

In fact Hamidoune’s argument, which is completely elementary, gives the very nice explicit constants {C(\epsilon) := \frac{2}{\epsilon}} and {C'(\epsilon) := \frac{2}{\epsilon} - 1}, which are essentially optimal except for factors of {2} (as can be seen by considering an arithmetic progression in an additive group). This result was also independently established (in the {A=S} case) by Tom Sanders (unpublished) by a more Fourier-analytic method, in particular drawing on Sanders’ deep results on the Wiener algebra {A(G)} on arbitrary non-commutative groups {G}.

This type of result had previously been known when {2-\epsilon} was less than the golden ratio {\frac{1+\sqrt{5}}{2}}, as first observed by Freiman; see my previous blog post for more discussion.

Theorem 1 is not, strictly speaking, contained in Hamidoune’s paper, but can be extracted from his arguments, which share some similarity with the recent simple proof of the Ruzsa-Plünnecke inequality by Petridis (as discussed by Tim Gowers here), and this is what I would like to do below the fold. I also include (with permission) Sanders’ unpublished argument, which proceeds instead by Fourier-analytic methods. Read the rest of this entry »

Let {X} be a finite subset of a non-commutative group {G}. As mentioned previously on this blog (as well as in the current logic reading seminar), there is some interest in classifying those {X} which obey small doubling conditions such as {|X \cdot X| = O(|X|)} or {|X \cdot X^{-1}| = O(|X|)}. A full classification here has still not been established. However, I wanted to record here an elementary argument (based on Exercise 2.6.5 of my book with Van Vu, which in turn is based on this paper of Izabella Laba) that handles the case when {|X \cdot X|} is very close to {|X|}:

Proposition 1 If {|X^{-1} \cdot X| < \frac{3}{2} |X|}, then {X \cdot X^{-1}} and {X^{-1} \cdot X} are both finite groups, which are conjugate to each other. In particular, {X} is contained in the right-coset (or left-coset) of a group of order less than {\frac{3}{2} |X|}.

Remark 1 The constant {\frac{3}{2}} is completely sharp; consider the case when {X = \{e, x\}} where {e} is the identity and {x} is an element of order larger than {2}. This is a small example, but one can make it as large as one pleases by taking the direct product of {X} and {G} with any finite group. In the converse direction, we see that whenever {X} is contained in the right-coset {S \cdot x} (resp. left-coset {x \cdot S}) of a group of order less than {2|X|}, then {X \cdot X^{-1}} (resp. {X^{-1} \cdot X}) is necessarily equal to all of {S}, by the inclusion-exclusion principle (see the proof below for a related argument).

Proof: We begin by showing that {S := X \cdot X^{-1}} is a group. As {S} is symmetric and contains the identity, it suffices to show that this set is closed under addition.

Let {a, b \in S}. Then we can write {a=xy^{-1}} and {b=zw^{-1}} for {x,y,z,w \in X}. If {y} were equal to {z}, then {ab = xw^{-1} \in X \cdot X^{-1}} and we would be done. Of course, there is no reason why {y} should equal {z}; but we can use the hypothesis {|X^{-1} \cdot X| < \frac{3}{2}|X|} to boost this as follows. Observe that {x^{-1} \cdot X} and {y^{-1} \cdot X} both have cardinality {|X|} and lie inside {X^{-1} \cdot X}, which has cardinality strictly less than {\frac{3}{2} |X|}. By the inclusion-exclusion principle, this forces {x^{-1} \cdot X \cap y^{-1} \cdot X} to have cardinality greater than {\frac{1}{2}|X|}. In other words, there exist more than {\frac{1}{2}|X|} pairs {x',y' \in X} such that {x^{-1} x' = y^{-1} y'}, which implies that {a = x' (y')^{-1}}. Thus there are more than {\frac{1}{2}|X|} elements {y' \in X} such that {a = x' (y')^{-1}} for some {x'\in X} (since {x'} is uniquely determined by {y'}); similarly, there exists more than {\frac{1}{2}|X|} elements {z' \in X} such that {b = z' (w')^{-1}} for some {w' \in X}. Again by inclusion-exclusion, we can thus find {y'=z'} in {X} for which one has simultaneous representations {a = x' (y')^{-1}} and {b = y' (z')^{-1}}, and so {ab = x'(z')^{-1} \in X \cdot X^{-1}}, and the claim follows.

In the course of the above argument we showed that every element of the group {S} has more than {\frac{1}{2}|X|} representations of the form {xy^{-1}} for {x,y \in X}. But there are only {|X|^2} pairs {(x,y)} available, and thus {|S| < 2|X|}.

Now let {x} be any element of {X}. Since {X \cdot x^{-1} \subset S}, we have {X \subset S \cdot x}, and so {X^{-1} \cdot X \subset x^{-1} \cdot S \cdot x}. Conversely, every element of {x^{-1} \cdot S \cdot x} has exactly {|S|} representations of the form {z^{-1} w} where {z, w \in S \cdot x}. Since {X} occupies more than half of {S \cdot x}, we thus see from the inclusion-exclusion principle, there is thus at least one representation {z^{-1} w} for which {z, w} both lie in {X}. In other words, {x^{-1} \cdot S \cdot x = X^{-1} \cdot X}, and the claim follows. \Box

To relate this to the classical doubling constants {|X \cdot X|/|X|}, we first make an easy observation:

Lemma 2 If {|X \cdot X| < 2|X|}, then {X \cdot X^{-1} = X^{-1} \cdot X}.

Again, this is sharp; consider {X} equal to {\{x,y\}} where {x,y} generate a free group.

Proof: Suppose that {xy^{-1}} is an element of {X \cdot X^{-1}} for some {x,y \in X}. Then the sets {X \cdot x} and {X \cdot y} have cardinality {|X|} and lie in {X \cdot X}, so by the inclusion-exclusion principle, the two sets intersect. Thus there exist {z,w \in X} such that {zx=wy}, thus {xy^{-1}=z^{-1}w \in X^{-1} \cdot X}. This shows that {X \cdot X^{-1}} is contained in {X^{-1} \cdot X}. The converse inclusion is proven similarly. \Box

Proposition 3 If {|X \cdot X| < \frac{3}{2} |X|}, then {S := X \cdot X^{-1}} is a finite group of order {|X \cdot X|}, and {X \subset S \cdot x = x \cdot S} for some {x} in the normaliser of {S}.

The factor {\frac{3}{2}} is sharp, by the same example used to show sharpness of Proposition 1. However, there seems to be some room for further improvement if one weakens the conclusion a bit; see below the fold.

Proof: Let {S = X^{-1} \cdot X = X \cdot X^{-1}} (the two sets being equal by Lemma 2). By the argument used to prove Lemma 2, every element of {S} has more than {\frac{1}{2}|X|} representations of the form {xy^{-1}} for {x,y \in X}. By the argument used to prove Proposition 1, this shows that {S} is a group; also, since there are only {|X|^2} pairs {(x,y)}, we also see that {|S| < 2|X|}.

Pick any {x \in X}; then {x^{-1} \cdot X, X \cdot x^{-1} \subset S}, and so {X \subset x\cdot S, S \cdot x}. Because every element of {x \cdot S \cdot x} has {|S|} representations of the form {yz} with {y \in x \cdot S}, {z \in S \cdot x}, and {X} occupies more than half of {x \cdot S} and of {S \cdot x}, we conclude that each element of {x \cdot S \cdot x} lies in {X \cdot X}, and so {X \cdot X = x \cdot S \cdot x} and {|S| = |X \cdot X|}.

The intersection of the groups {S} and {x \cdot S \cdot x^{-1}} contains {X \cdot x^{-1}}, which is more than half the size of {S}, and so we must have {S = x \cdot S \cdot x^{-1}}, i.e. {x} normalises {S}, and the proposition follows. \Box

Because the arguments here are so elementary, they extend easily to the infinitary setting in which {X} is now an infinite set, but has finite measure with respect to some translation-invariant Kiesler measure {\mu}. We omit the details. (I am hoping that this observation may help simplify some of the theory in that setting.)

Read the rest of this entry »

One of my favorite open problems, which I have blogged about in the past, is that of establishing (or even correctly formulating) a non-commutative analogue of Freiman’s theorem. Roughly speaking, the question is this: given a finite set {X} in a non-commutative group {G} which is of small doubling in the sense that the product set {X \cdot X := \{ xy: x, y \in X \}} is not much larger than {X} (e.g. {|X \cdot X| \leq K|X|} for some {K = O(1)}), what does this say about the structure of {X}? (For various technical reasons one may wish to replace small doubling by, say, small tripling (i.e. {|X \cdot X \cdot X| = O( |X| )}), and one may also wish to assume that {X} contains the identity and is symmetric, {X^{-1} = X}, but these are relatively minor details.)

Sets of small doubling (or tripling), etc. can be thought of as “approximate groups”, since groups themselves have a doubling constant {K := |X \cdot X|/|X|} equal to one. Another obvious example of an approximate group is that of an arithmetic progression in an additive group, and more generally of a ball (in the word metric) in a nilpotent group of bounded rank and step. It is tentatively conjectured that in fact all examples can somehow be “generated” out of these basic examples, although it is not fully clear at present what “generated” should mean.

A weaker conjecture along the same lines is that if {X} is a set of small doubling, then there should be some sort of “pseudo-metric” {\rho} on {G} which is left-invariant, and for which {X} is controlled (in some suitable sense) by the unit ball in this metric. (For instance, if {X} was a subgroup of {G}, one would take the metric which identified all the left cosets of {X} to a point, but was otherwise a discrete metric; if {X} were a ball in a nilpotent group, one would use some rescaled version of the word metric, and so forth.) Actually for technical reasons one would like to work with a slightly weaker notion than a pseudo-metric, namely a Bourgain system, but let us again ignore this technicality here.

Recently, using some powerful tools from model theory combined with the theory of topological groups, Ehud Hrushovski has apparently achieved some breakthroughs on this problem, obtaining new structural control on sets of small doubling in arbitrary groups that was not previously accessible to the known combinatorial methods. The precise results are technical to state, but here are informal versions of two typical theorems. The first applies to sets of small tripling in an arbitrary group:

Theorem 1 (Rough version of Hrushovski Theorem 1.1) Let {X} be a set of small tripling, then one can find a long sequence of nested symmetric sets {X_1 \supset X_2 \supset X_3 \supset \ldots}, all of size comparable to {X} and contained in {(X^{-1} X)^2}, which are somewhat closed under multiplication in the sense that {X_i \cdot X_i \subset X_{i-1}} for all {i > 1}, and which are fairly well closed under commutation in the sense that {[X_i, X_j] \subset X_{i+j-1}}. (There are also some additional statements to the effect that the {X_n} efficiently cover each other, and also cover {X}, but I will omit those here.)

This nested sequence is somewhat analogous to a Bourgain system, though it is not quite the same notion.

If one assumes that {X} is “perfect” in a certain sense, which roughly means that there is no non-trivial abelian quotient, then one can do significantly better:

Theorem 2 (Rough version of Hrushovski Corollary 1.2) Let {X_0} be a set of small tripling, let {X := X_0^{-1} X_0}, and suppose that for almost all {l}-tuples {a_1, \ldots, a_l \in X} (where {l=O(1)}), the conjugacy classes {a_i^X := \{ x^{-1} ax: x \in X \}} generate most of {X} in the sense that {|a_1^X \cdot \ldots \cdot a_l^X| \gg |X|}. Then a large part of {X} is contained in a subgroup of size comparable to {X}.

Note that if one quotiented out by the commutator {[X,X]}, then all of the conjugacy classes {a_i^X} would collapse to points. So the hypothesis here is basically a strong quantitative assertion to the effect that the commutator {[X,X]} is extremely large, and rapidly fills out most of {X} itself.

Here at UCLA, a group of logicians and I (consisting of Matthias Aschenbrenner, Isaac Goldbring, Greg Hjorth, Henry Towsner, Anush Tserunyan, and possibly others) have just started a weekly reading seminar to come to grips with the various combinatorial, logical, and group-theoretic notions in Hrushovski’s paper, of which we only have a partial understanding at present. The seminar is a physical one, rather than an online one, but I am going to try to put some notes on the seminar on this blog as it progresses, as I know that there are a couple of other mathematicians who are interested in these developments.

So far there have been two meetings of the seminar. In the first, I surveyed the state of knowledge of the noncommutative Freiman theorem, covering broadly the material in my previous blog post. In the second meeting, Isaac reviewed some key notions of model theory used in Hrushovski’s paper, in particular the notions of definability and type, which I will review below. It is not yet clear how these are going to be connected with the combinatorial side of things, but this is something which we will hopefully develop in future seminars. The near-term objective is to understand the statement of the main theorem on the model-theoretic side (Theorem 3.4 of Hrushovski), and then understand some of its easier combinatorial consequences, before going back and trying to understand the proof of that theorem.

[Update, Oct 19: Given the level of interest in this paper, readers are encouraged to discuss any aspect of that paper in the comments below, even if they are not currently being covered by the UCLA seminar.]

Read the rest of this entry »

It turns out to be a favourable week or two for me to finally finish a number of papers that had been at a nearly completed stage for a while.  I have just uploaded to the arXiv my article “Sumset and inverse sumset theorems for Shannon entropy“, submitted to Combinatorics, Probability, and Computing.  This paper evolved from a “deleted scene” in my book with Van Vu entitled “Entropy sumset estimates“.  In those notes, we developed analogues of the standard Plünnecke-Ruzsa sumset estimates (which relate quantities such as the cardinalities |A+B|, |A-B| of the sum and difference sets of two finite sets A, B in an additive group G to each other), to the entropy setting, in which the finite sets A \subset G are replaced instead with discrete random variables X taking values in that group G, and the (logarithm of the) cardinality |A| is replaced with the Shannon entropy

{\textbf H}(X) := \sum_{x \in G} {\Bbb P}(x \in X) \log \frac{1}{{\Bbb P}(x \in X)}.

This quantity measures the information content of X; for instance, if {\textbf H}(X) = k \log 2, then it will take k bits on the average to store the value of X (thus a string of n independent copies of X will require about nk bits of storage in the asymptotic limit n \to \infty).  The relationship between entropy and cardinality is that if X is the uniform distribution on a finite non-empty set A, then {\textbf H}(X) = \log |A|.  If instead X is non-uniformly distributed on A, one has 0 < {\textbf H}(X) < \log |A|, thanks to Jensen’s inequality.

It turns out that many estimates on sumsets have entropy analogues, which resemble the “logarithm” of the sumset estimates.  For instance, the trivial bounds

|A|, |B| \leq |A+B| \leq |A| |B|

have the entropy analogue

{\textbf H}(X), {\textbf H}(Y) \leq {\textbf H}(X+Y) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X, Y are independent discrete random variables in an additive group; this is not difficult to deduce from standard entropy inequalities.  Slightly more non-trivially, the sum set estimate

|A+B| \leq \frac{|A-B|^3}{|A| |B|}

established by Ruzsa, has an entropy analogue

{\textbf H}(X+Y) \leq 3 {\textbf H}(X-Y) - {\textbf H}(X) - {\textbf H}(Y),

and similarly for a number of other standard sumset inequalities in the literature (e.g. the Rusza triangle inequality, the Plünnecke-Rusza inequality, and the Balog-Szemeredi-Gowers theorem, though the entropy analogue of the latter requires a little bit of care to state).  These inequalities can actually be deduced fairly easily from elementary arithmetic identities, together with standard entropy inequalities, most notably the submodularity inequality

{\textbf H}(Z) + {\textbf H}(W) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X,Y,Z,W are discrete random variables such that X and Y each determine W separately (thus W = f(X) = g(Y) for some deterministic functions f, g) and X and Y determine Z jointly (thus Z = h(X,Y) for some deterministic function f).  For instance, if X,Y,Z are independent discrete random variables in an additive group G, then (X-Y,Y-Z) and (X,Z) each determine X-Z separately, and determine X,Y,Z jointly, leading to the inequality

{\textbf H}(X,Y,Z) + {\textbf H}(X-Z) \leq {\textbf H}(X-Y,Y-Z) + {\textbf H}(X,Z)

which soon leads to the entropy Rusza triangle inequality

{\textbf H}(X-Z) \leq {\textbf H}(X-Y) + {\textbf H}(Y-Z) - {\textbf H}(Y)

which is an analogue of the combinatorial Ruzsa triangle inequality

|A-C| \leq \frac{|A-B| |B-C|}{|B|}.

All of this was already in the unpublished notes with Van, though I include it in this paper in order to place it in the literature.  The main novelty of the paper, though, is to consider the entropy analogue of Freiman’s theorem, which classifies those sets A for which |A+A| = O(|A|).  Here, the analogous problem is to classify the random variables X such that {\textbf H}(X_1+X_2) = {\textbf H}(X) + O(1), where X_1,X_2 are independent copies of X.  Let us say that X has small doubling if this is the case.

For instance, the uniform distribution U on a finite subgroup H of G has small doubling (in fact {\textbf H}(U_1+U_2)={\textbf H}(U) = \log |H| in this case). In a similar spirit, the uniform distribution on a (generalised) arithmetic progression P also has small doubling, as does the uniform distribution on a coset progression H+P.  Also, if X has small doubling, and Y has bounded entropy, then X+Y also has small doubling, even if Y and X are not independent.  The main theorem is that these are the only cases:

Theorem 1. (Informal statement) X has small doubling if and only if X = U + Y for some uniform distribution U on a coset progression (of bounded rank), and Y has bounded entropy.

For instance, suppose that X was the uniform distribution on a dense subset A of a finite group G.  Then Theorem 1 asserts that X is close in a “transport metric” sense to the uniform distribution U on G, in the sense that it is possible to rearrange or transport the probability distribution of X to the probability distribution of U (or vice versa) by shifting each component of the mass of X by an amount Y which has bounded entropy (which basically means that it primarily ranges inside a set of bounded cardinality).  The way one shows this is by randomly translating the mass of X around by a few random shifts to approximately uniformise the distribution, and then deal with the residual fluctuation in the distribution by hand.  Theorem 1 as a whole is established by using the Freiman theorem in the combinatorial setting combined with various elementary convexity and entropy inequality arguments to reduce matters to the above model case when X is supported inside a finite group G and has near-maximal entropy.

I also show a variant of the above statement: if X, Y are independent and {\textbf H}(X+Y) = {\textbf H}(X)+O(1) = {\textbf H}(Y)+O(1), then we have X \equiv Y+Z (i.e. X has the same distribution as Y+Z for some Z of bounded entropy (not necessarily independent of X or Y).  Thus if two random variables are additively related to each other, then they can be additively transported to each other by using a bounded amount of entropy.

In the last part of the paper I relate these discrete entropies to their continuous counterparts

{\textbf H}_{\Bbb R}(X) := \int_{{\Bbb R}} p(x) \log \frac{1}{p(x)}\ dx,

where X is now a continuous random variable on the real line with density function p(x)\ dx.  There are a number of sum set inequalities known in this setting, for instance

{\textbf H}_{\Bbb R}(X_1 + X_2) \geq {\textbf H}_{\Bbb R}(X) + \frac{1}{2} \log 2,

for independent copies X_1,X_2 of a finite entropy random variable X, with equality if and only if X is a Gaussian.  Using this inequality and Theorem 1, I show a discrete version, namely that

{\textbf H}(X_1 + X_2) \geq {\textbf H}(X) + \frac{1}{2} \log 2 - \varepsilon,

whenever \varepsilon> 0 and X_1,X_2 are independent copies of a random variable in {\Bbb Z} (or any other torsion-free abelian group) whose entropy is sufficiently large depending on \varepsilon.  This is somewhat analogous to the classical sumset inequality

|A+A| \geq 2 |A| - 1

though notice that we have a gain of just \frac{1}{2} \log 2 rather than \log 2 here, the point being that there is a Gaussian counterexample in the entropy setting which does not have a combinatorial analogue (except perhaps in the high-dimensional limit).  The main idea is to use Theorem 1 to trap most of X inside a coset progression, at which point one can use Fourier-analytic additive combinatorial tools to show that the distribution X_1+X_2 is “smooth” in some non-trivial direction r, which can then be used to approximate the discrete distribution by a continuous one.

I also conjecture more generally that the entropy monotonicity inequalities established by Artstein, Barthe, Ball, and Naor in the continuous case also hold in the above sense in the discrete case, though my method of proof breaks down because I no longer can assume small doubling.

I’m continuing the stream of uploaded papers this week with my paper “Freiman’s theorem for solvable groups“, submitted to Contrib. Disc. Math..  This paper concerns the problem (discussed in this earlier blog post) of determining the correct analogue of Freiman’s theorem in a general non-abelian group G = (G,\cdot).  Specifically, if A \subset G is a finite set that obeys the doubling condition |A \cdot A| \leq K|A| for some bounded K, what does this tell us about A?  Heuristically, we expect A to behave like a finite subgroup of G (or perhaps a coset of such a subgroup).

When G is the integers (with the additive group operation), Freiman’s theorem then tells us that A is controlled by a generalised arithmetic progression P, where I say that one set A is controlled by another P if they have comparable size, and the former can be covered by a finite number of translates of the latter.  (One can view generalised arithmetic progressions as an approximate version of a subgroup, in which one only uses the generators of the progression for a finite amount of time before stopping, as opposed to groups which allow words of unbounded length in the generators.) For more general abelian groups, the Freiman theorem of Green and Ruzsa tells us that a set of bounded doubling is controlled by a generalised coset progression P+H, i.e. the sum of a generalised arithmetic progression P and a finite subgroup H of G.  (Of course, if G is torsion-free, the finite subgroup H must be trivial.)

In this paper we address the case when G is a solvable group of bounded derived length.  The main result is that if a subset of G has small doubing, then it is controlled by an object which I call a “coset nilprogression”, which is a certain technical generalisation of a coset progression, in which the generators do not quite commute, but have commutator expressible in terms of “higher order” generators.  This is essentially a sharp characterisation of such sets, except for the fact that one would like a more explicit description of these coset nilprogressions.   In the torsion-free case, a more explicit description (analogous to the Mal’cev basis description of nilpotent groups) has appeared in a very recent paper of Breulliard and Green; in the case of monomial groups (a class of groups that overlaps to a large extent with solvable groups), and assuming a polynomial growth condition rather than a doubling condition, a related result controlling A by balls in a suitable type of metric has appeared in very recent work of Sanders.  In the nilpotent case there is also a nice recent argument of Fisher, Peng, and Katz which shows that sets of small doubling remain of small doubling with respect to the Lie algebra operations of addition and Lie bracket, and thus are amenable to the abelian Freiman theorems.

The conclusion of my paper is easiest to state (and easiest to prove) in the model case of the lamplighter group G = {\Bbb Z} \rtimes {\Bbb F}_2^\omega, where {\Bbb F}_2^\omega = \lim_{n \to \infty} {\Bbb F}_2^n is the additive group of doubly infinite sequences in the finite field {\Bbb F}_2 with only finitely many non-zero entries, and {\Bbb Z} acts on this space by translations.  This is a solvable group of derived length two.  The main result here is

Theorem 1. (Freiman’s theorem for the lamplighter group) If A \subset {\Bbb Z} \ltimes {\Bbb F}_2^\omega has bounded doubling, then A is controlled either by a finite subspace of the “vertical” group \{0\} \times {\Bbb F}_2^\omega, or else by a set of the form \{ (n,\phi(n)): n \in P \}, where P \subset {\Bbb Z} is a generalised arithmetic progression, and \phi: P \to {\Bbb F}_2^{\omega} obeys the Freiman isomorphism property (n_1,\phi(n_1)) \cdot (n_2, \phi(n_2)) = (n_3,\phi(n_3)) \cdot (n_4,\phi(n_4)) whenever n_1,n_2,n_3,n_4 \in P and n_1+n_2=n_3+n_4.

This result, incidentally, recovers an earlier result of Lindenstrauss that the lamplighter group does not contain a Følner sequence of sets of uniformly bounded doubling.  It is a good exercise to establish the “exact” version of this theorem, in which one classifies subgroups of the lamplighter group rather than sets of small doubling; indeed, the proof of this the above theorem follows fairly closely the natural proof of the exact version.

One application of the solvable Freiman theorem is the following quantitative version of a classical result of Milnor and of Wolf, which asserts that any solvable group of polynomial growth is virtually nilpotent:

Theorem 2. (Quantitative Milnor-Wolf theorem) Let G be a solvable group of derived length O(1), let S be a set of generators for G, and suppose one has the polynomial growth condition |B_S(R)| \leq R^d for some d = O(1), where B_S(R) is the set of all words generated by S of length at most R.  If R is sufficiently large, then this implies that G is virtually nilpotent; more precisely, G contains a nilpotent subgroup of step O(1) and index O(R^{O(1)}).

The key points here are that one only needs polynomial growth at a single scale R, rather than on many scales, and that the index of the nilpotent subgroup has polynomial size.

The proofs are based on an induction on the derived length.  After some standard manipulations (basically, splitting A by an approximate version of a short exact sequence), the problem boils down to that of understanding the action \rho of some finite set A on a set E in an additive group.  If one assumes that E has small doubling and that the action of A leaves E approximately invariant, then one can show that E is a coset progression, and the action of A can be described efficiently using the generators of that progression (after refining the set A a bit).

In the course of the proof we need two new additive combinatorial results which may be of independent interest.  The first is a variant of a well-known theorem of Sárközy, which asserts that if A is a large subset of an arithmetic progression P, then an iterated sumset kA of A for some k=O(1) itself contains a long arithmetic progression. Here, we need the related fact that if A is a large subset of a coset progression, then an iterated subset kA for k=O(1) contains a large coset progression Q, and furthermore this inclusion is “robust” in the sense that all elements the elements of Q have a large number of representations as sums of elements of A.  We also need a new (non-commutative) variant of the Balog-Szemerédi(-Gowers) lemma, which asserts that if A has small doubling, then A (or more precisely A \cdot A^{-1}) contains a large “core” subset D such that almost all of a large iterated subset kD of D still lies inside A \cdot A^{-1}).  (This may not look like the usual Balog-Szemerédi lemma, but the proof of the lemma is almost identical to the original proof of Balog and Szemerédi, in particular relying on the Szemerédi regularity lemma.

For a number of reasons, including the start of the summer break for me and my coauthors, a number of papers that we have been working on are being released this week.  For instance, Ben Green and I have just uploaded to the arXiv our paper “An equivalence between inverse sumset theorems and inverse conjectures for the U^3 norm“, submitted to Math. Proc. Camb. Phil. Soc..  The main result of this short paper (which was briefly announced in this earlier post) is a connection between two types of inverse theorems in additive combinatorics, namely the inverse sumset theorems of Freiman type, and inverse theorems for the Gowers uniformity norm, and more specifically, for the U^3 norm

\|f\|_{U^3(G)}^8 := {\Bbb E}_{x,a,b,c \in G} f(x) \overline{f(x+a)} \overline{f(x+b)} \overline{f(x+c)} f(x+a+b) f(x+a+c) f(x+b+c) \overline{f(x+a+b+c)}

on finite additive group G, where f: G \to {\Bbb C} is a complex-valued function.

As usual, the connection is easiest to state in a finite field model such as G = {\Bbb F}_2^n.  In this case, we have the following inverse sumset theorem of Ruzsa:

Theorem 1. If A \subset {\Bbb F}_2^n is such that |A+A| \leq K|A|, then A can be covered by a translate of a subspace of {\Bbb F}_2^n of cardinality at most K^2 2^{K^4} |A|.

The constant K^2 2^{K^4} has been improved for large K in a sequence of papers, from K 2^{\lfloor K^3 \rfloor-1} by Ruzsa, K^2 2^{K^2-2} by Green-Ruzsa, 2^{O(K^{3/2} \log(1+K)} by Sanders, 2^{2K+O(\sqrt{K} \log K}) by Green and myself, and finally 2^{2K+O(\log K)} by Konyagin (private communication) which is sharp except for the precise value of the O() implied constant (as can be seen by considering the example when A consists of about 2K independent elements).  However, it is conjectured that the polynomial loss can be removed entirely if one modifies the conclusion slightly:

Conjecture 1. (Polynomial Freiman-Ruzsa conjecture for {\Bbb F}_2^n.) If A \subset {\Bbb F}_2^n is such that |A+A| \leq K|A|, then A can be covered by O(K^{O(1)}) translates of subspaces of {\Bbb F}_2^n of cardinality at most |A|.

This conjecture was verified for downsets by Green and myself, but is open in general.   This conjecture has a number of equivalent formulations; see this paper of Green for more discussion.  In this previous post we show that a stronger version of this conjecture fails.

Meanwhile, for the Gowers norm, we have the following inverse theorem, due to Samorodnitsky:

Theorem 2. Let f: {\Bbb F}_2^n \to [-1,1] be a function whose U^3 norm is at least 1/K.  Then there exists a quadratic polynomial Q: {\Bbb F}_2^n \to {\Bbb F}_2 such that |{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq \exp( - O(K)^{O(1)} ).

Note that the quadratic phases (-1)^{Q(x)} are the only functions taking values in [-1,1] whose U^3 norm attains its maximal value of 1.

It is conjectured that the exponentially weak correlation here can be strengthened to a polynomial one:

Conjecture 2. (Polynomial inverse conjecture for the U^3({\Bbb F}_2^n) norm). Let f: {\Bbb F}_2^n \to [-1,1] be a function whose U^3 norm is at least 1/K.  Then there exists a quadratic polynomial Q: {\Bbb F}_2^n \to {\Bbb F}_2 such that |{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq K^{-O(1)}.

The first main result of this paper is

Theorem 3. Conjecture 1 and Conjecture 2 are equivalent.

This result was also independently observed by Shachar Lovett (private communication).  We also establish an analogous result for the cyclic group {\Bbb Z}/N{\Bbb Z}, in which the notion of polynomial is replaced by that of a subexponential \exp(K^{o(1)}), and in which the notion of a quadratic polynomial is replaced by a 2-step nilsequence; the precise statement is a bit technical and will not be given here.  We also observe a partial partial analogue of the correpsondence between inverse sumset theorems and Gowers norms in the higher order case, in particular observing that U^4 inverse theorems imply a certain rigidity result for “Freiman-quadratic polynomials” (a quadratic version of Conjecture 3 below).

Below the fold, we sketch the proof of Theorem 3.

Read the rest of this entry »

[This post is authored by Ben Green, who has kindly “guest blogged” this week’s “open problem of the week”. – T.]

In an earlier blog post Terry discussed Freiman’s theorem. The name of Freiman is attached to a growing body of theorems which take some rather “combinatorial” hypothesis, such that the sumset |A+A| of some set A is small, and deduce from it rather “algebraic” information (such that A is contained in a subspace or a grid).

The easiest place to talk about Freiman’s theorem is in the finite field model {\Bbb F}_2^n (see my survey article on this subject for a full discussion). Here it was shown by Ruzsa that if |A+A| is at most K |A| then A is contained in a subspace of size no more than about 2^{K^4}|A|. The exponent has been improved a few times since Ruzsa’s paper, the best result currently in print being due to Sanders, who obtains an upper bound of 2^{K^{3/2}\log K}|A|. Terry and I are in the process of writing a paper which obtains 2^{2K + o(K)}|A|, which is best possible in view of the example A := \{e_1,...,e_m\} where m := 2K + O(1); this set has doubling roughly K but is not contained in a subspace of dimension smaller than 2K.

This result has an air of finality (except for the true nature of the o(K) term, which represents an interesting open problem). This is something of an illusion, however. Even using this theorem, one loses an exponential every time one tries to transition between “combinatorial” structure and “algebraic” structure and back again. Indeed if one knows that A is contained in a subspace of size 2^{2K}|A| then the strongest assertion one can make about the doubling of A is that it is at most 2^{2K}.

The Polynomial Freiman-Ruzsa conjecture (PFR), in {\Bbb F}_2^n, hypothesises a more precise structure theorem for sets with small doubling. Using this
conjecture, one may flit back and forth between combinatorial and algebraic structure with only polynomial losses. Ruzsa attributes the conjecture to
Marton: it states that if A has doubling at most K then A is contained in the union of K^{O(1)} translates of some subspace H of size at most |A|.

Read the rest of this entry »

This is another one of my favourite open problems, falling under the heading of inverse theorems in arithmetic combinatorics. “Direct” theorems in arithmetic combinatorics take a finite set A in a group or ring and study things like the size of its sum set A+A := \{ a+b: a,b \in A \} or product set A \cdot A := \{ ab: a,b \in A \}. For example, a typical result in this area is the sum-product theorem, which asserts that whenever A \subset {\Bbb F}_p is a subset of a finite field of prime order with 1 \leq |A| \leq p^{1-\delta}, then

\max( |A+A|, |A \cdot A| ) \geq |A|^{1+\epsilon}

for some \epsilon = \epsilon(\delta) > 0. (This particular theorem was first proven here, with an earlier partial result here; more recent and elementary proofs with civilised bounds can be found here, here or here. It has a number of applications.)

In contrast, inverse theorems in this subject start with a hypothesis that, say, the sum set A+A of an unknown set A is small, and try to deduce structural information about A. A typical goal is to completely classify all sets A for which A+A has comparable size with A. In the case of finite subsets of integers, this is Freiman’s theorem, which roughly speaking asserts that if |A+A| = O(|A|), if and only if A is a dense subset of a generalised arithmetic progression P of rank O(1), where we say that A is a dense subset of B if A \subset B and |B| = O(|A|). (The “if and only if” has to be interpreted properly; in either the “if” or the “only if” direction, the implicit constants in the conclusion depends on the implicit constants in the hypothesis, but these dependencies are not inverses of each other.) In the case of finite subsets A of an arbitrary abelian group, we have the Freiman-Green-Ruzsa theorem, which asserts that |A+A| = O(|A|) if and only if A is a dense subset of a sum P+H of a finite subgroup H and a generalised arithmetic progression P of rank O(1).

Read the rest of this entry »