You are currently browsing the tag archive for the ‘sum set estimates’ tag.

I have just uploaded to the arXiv the paper “An inverse theorem for an inequality of Kneser“, submitted to a special issue of the Proceedings of the Steklov Institute of Mathematics in honour of Sergei Konyagin. It concerns an inequality of Kneser discussed previously in this blog, namely that

\displaystyle \mu(A+B) \geq \min(\mu(A)+\mu(B), 1) \ \ \ \ \ (1)

whenever {A,B} are compact non-empty subsets of a compact connected additive group {G} with probability Haar measure {\mu}.  (A later result of Kemperman extended this inequality to the nonabelian case.) This inequality is non-trivial in the regime

\displaystyle \mu(A), \mu(B), 1- \mu(A)-\mu(B) > 0. \ \ \ \ \ (2)

The connectedness of {G} is essential, otherwise one could form counterexamples involving proper subgroups of {G} of positive measure. In the blog post, I indicated how this inequality (together with a more “robust” strengthening of it) could be deduced from submodularity inequalities such as

\displaystyle \mu( (A_1 \cup A_2) + B) + \mu( (A_1 \cap A_2) + B) \leq \mu(A_1+B) + \mu(A_2+B) \ \ \ \ \ (3)

which in turn easily follows from the identity {(A_1 \cup A_2) + B = (A_1+B) \cup (A_2+B)} and the inclusion {(A_1 \cap A_2) + B \subset (A_1 +B) \cap (A_2+B)}, combined with the inclusion-exclusion formula.

In the non-trivial regime (2), equality can be attained in (1), for instance by taking {G} to be the unit circle {G = {\bf R}/{\bf Z}} and {A,B} to be arcs in that circle (obeying (2)). A bit more generally, if {G} is an arbitrary connected compact abelian group and {\xi: G \rightarrow {\bf R}/{\bf Z}} is a non-trivial character (i.e., a continuous homomorphism), then {\xi} must be surjective (as {{\bf R}/{\bf Z}} has no non-trivial connected subgroups), and one can take {A = \xi^{-1}(I)} and {B = \xi^{-1}(J)} for some arcs {I,J} in that circle (again choosing the measures of these arcs to obey (2)). The main result of this paper is an inverse theorem that asserts that this is the only way in which equality can occur in (1) (assuming (2)); furthermore, if (1) is close to being satisfied with equality and (2) holds, then {A,B} must be close (in measure) to an example of the above form {A = \xi^{-1}(I), B = \xi^{-1}(J)}. Actually, for technical reasons (and for the applications we have in mind), it is important to establish an inverse theorem not just for (1), but for the more robust version mentioned earlier (in which the sumset {A+B} is replaced by the partial sumset {A +_\varepsilon B} consisting of “popular” sums).

Roughly speaking, the idea is as follows. Let us informally call {(A,B)} a critical pair if (2) holds and the inequality (1) (or more precisely, a robust version of this inequality) is almost obeyed with equality. The notion of a critical pair obeys some useful closure properties. Firstly, it is symmetric in {A,B}, and invariant with respect to translation of either {A} or {B}. Furthermore, from the submodularity inequality (3), one can show that if {(A_1,B)} and {(A_2,B)} are critical pairs (with {\mu(A_1 \cap A_2)} and {1 - \mu(A_1 \cup A_2) - \mu(B)} positive), then {(A_1 \cap A_2,B)} and {(A_1 \cup A_2, B)} are also critical pairs. (Note that this is consistent with the claim that critical pairs only occur when {A,B} come from arcs of a circle.) Similarly, from associativity {(A+B)+C = A+(B+C)}, one can show that if {(A,B)} and {(A+B,C)} are critical pairs, then so are {(B,C)} and {(A,B+C)}.

One can combine these closure properties to obtain further ones. For instance, suppose {A,B} is such that {\mu(A+B) < 2 \mu(A)}, so that all translates {A+b}, {b \in B} intersect each other in a set of positive measure. Suppose also that {(A,C)} is a critical pair and {1-\mu(A+B) - \mu(C) > 0}. Then (cheating a little bit), one can show that {(A+B,C)} is also a critical pair, basically because {A+B} is the union of the {A+b}, {b \in B}, the {(A+b,C)} are all critical pairs, and the {A+b} all intersect each other. This argument doesn’t quite work as stated because one has to apply the closure property under union an uncountable number of times, but it turns out that if one works with the robust version of sumsets and uses a random sampling argument to approximate {A+B} by the union of finitely many of the {A+b}, then the argument can be made to work.

Using all of these closure properties, it turns out that one can start with an arbitrary critical pair {(A,B)} and end up with a small set {C} such that {(A,C)} and {(kC,C)} are also critical pairs for all {1 \leq k \leq 10^4} (say), where {kC} is the {k}-fold sumset of {C}. (Intuitively, if {A,B} are thought of as secretly coming from the pullback of arcs {I,J} by some character {\xi}, then {C} should be the pullback of a much shorter arc by the same character.) In particular, {C} exhibits linear growth, in that {\mu(kC) = k\mu(C)} for all {1 \leq k \leq 10^4}. One can now use standard technology from inverse sumset theory to show first that {C} has a very large Fourier coefficient (and thus is biased with respect to some character {\xi}), and secondly that {C} is in fact almost of the form {C = \xi^{-1}(K)} for some arc {K}, from which it is not difficult to conclude similar statements for {A} and {B} and thus finish the proof of the inverse theorem.

In order to make the above argument rigorous, one has to be more precise about what the modifier “almost” means in the definition of a critical pair. I chose to do this in the language of “cheap” nonstandard analysis (aka asymptotic analysis), as discussed in this previous blog post; one could also have used the full-strength version of nonstandard analysis, but this does not seem to convey any substantial advantages. (One can also work in a more traditional “non-asymptotic” framework, but this requires one to keep much more careful account of various small error terms and leads to a messier argument.)


[Update, Nov 15: Corrected the attribution of the inequality (1) to Kneser instead of Kemperman.  Thanks to John Griesmer for pointing out the error.]

A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of additive combinatorics) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets

\displaystyle  A+B := \{ a+b: a \in A, b \in B \}

and difference sets

\displaystyle  A-B := \{ a-b: a \in A, b \in B \},

as well as iterated sumsets such as {3A=A+A+A}, {2A-2A=A+A-A-A}, and so forth. Here, {A, B} are finite non-empty subsets of some additive group {G = (G,+)} (classically one took {G={\bf Z}} or {G={\bf R}}, but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma) Let {A, B} be finite non-empty subsets of {G}. Then {A} may be covered by at most {\frac{|A+B|}{|B|}} translates of {B-B}.

Proof: Consider a maximal set of disjoint translates {a+B} of {B} by elements {a \in A}. These translates have cardinality {|B|}, are disjoint, and lie in {A+B}, so there are at most {\frac{|A+B|}{|B|}} of them. By maximality, for any {a' \in A}, {a'+B} must intersect at least one of the selected {a+B}, thus {a' \in a+B-B}, and the claim follows. \Box

Lemma 2 (Ruzsa triangle inequality) Let {A,B,C} be finite non-empty subsets of {G}. Then {|A-C| \leq \frac{|A-B| |B-C|}{|B|}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map of cardinality at least {|B|}, thanks to the obvious identity {a-c = (a-b) + (b-c)} for each {b \in B}. Since {(A-B) \times (B-C)} has cardinality {|A-B| |B-C|}, the claim follows. \Box

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets {A} of small doubling, in the sense that {|A+A| \leq K|A|} for some bounded {K}. (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in {K} (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in {K}.)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. {{\bf R}^n} or a torus {({\bf R}/{\bf Z})^n}) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets {A} remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group {O_n({\bf R})} of orthogonal {n \times n} matrices, and let {A} be the matrices obtained by starting with all of the orthogonal matrice in {O_n({\bf R})} and rounding each coefficient of each matrix in this set to the nearest multiple of {\epsilon}, for some small {\epsilon>0}. This forms a finite set (whose cardinality grows as {\epsilon\rightarrow 0} like a certain negative power of {\epsilon}). In the limit {\epsilon \rightarrow 0}, the set {A} is not a set of small doubling in the discrete sense. However, {A \cdot A} is still close to {A} in a metric sense, being contained in the {O_n(\epsilon)}-neighbourhood of {A}. Another key example comes from graphs {\Gamma := \{ (x, f(x)): x \in G \}} of maps {f: A \rightarrow H} from a subset {A} of one additive group {G = (G,+)} to another {H = (H,+)}. If {f} is “approximately additive” in the sense that for all {x,y \in G}, {f(x+y)} is close to {f(x)+f(y)} in some metric, then {\Gamma} might not have small doubling in the discrete sense (because {f(x+y)-f(x)-f(y)} could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric {d} on the underlying group {G = (G,+)} (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3 Let {(X,d)} be a metric space, let {E} be a subset of {X}, and let {r>0} be a radius.

  • The packing number {N^{pack}_r(E)} is the largest number of points {x_1,\dots,x_n} one can pack inside {E} such that the balls {B(x_1,r),\dots,B(x_n,r)} are disjoint.
  • The internal covering number {N^{int}_r(E)} is the fewest number of points {x_1,\dots,x_n \in E} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The external covering number {N^{ext}_r(E)} is the fewest number of points {x_1,\dots,x_n \in X} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The metric entropy {N^{ent}_r(E)} is the largest number of points {x_1,\dots,x_n} one can find in {E} that are {r}-separated, thus {d(x_i,x_j) \geq r} for all {i \neq j}.

It is an easy exercise to verify the inequalities

\displaystyle  N^{ent}_{2r}(E) \leq N^{pack}_r(E) \leq N^{ext}_r(E) \leq N^{int}_r(E) \leq N^{ent}_r(E)

for any {r>0}, and that {N^*_r(E)} is non-increasing in {r} and non-decreasing in {E} for the three choices {* = pack,ext,ent} (but monotonicity in {E} can fail for {*=int}!). It turns out that the external covering number {N^{ent}_r(E)} is slightly more convenient than the other notions of metric entropy, so we will abbreviate {N_r(E) = N^{ent}_r(E)}. The cardinality {|E|} can be viewed as the limit of the entropies {N^*_r(E)} as {r \rightarrow 0}.

If we have the bounded doubling property that {B(0,2r)} is covered by {O(1)} translates of {B(0,r)} for each {r>0}, and one has a Haar measure {m} on {G} which assigns a positive finite mass to each ball, then any of the above entropies {N^*_r(E)} is comparable to {m( E + B(0,r) ) / m(B(0,r))}, as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter {r} by some absolute constant. Such losses can be acceptable in applications in which the underlying sets {A} are largely “transverse” to the balls {B(0,r)}, so that the {N_r}-entropy of {A} is largely independent of {A}; this is a situation which arises in particular in the case of graphs {\Gamma = \{ (x,f(x)): x \in G \}} discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group {G} equipped with a translation-invariant metric {d}. (One can also generalise things slightly by allowing the metric to attain the values {0} or {+\infty}, without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set {E} will have finite entropy {N_r(E)} for any {r>0}. We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma) Let {A, B} be precompact non-empty subsets of {G}, and let {r>0}. Then {A} may be covered by at most {\frac{N_r(A+B)}{N_r(B)}} translates of {B-B+B(0,2r)}.

Proof: Let {a_1,\dots,a_n \in A} be a maximal set of points such that the sets {a_i + B + B(0,r)} are all disjoint. Then the sets {a_i+B} are disjoint in {A+B} and have entropy {N_r(a_i+B)=N_r(B)}, and furthermore any ball of radius {r} can intersect at most one of the {a_i+B}. We conclude that {N_r(A+B) \geq n N_r(B)}, so {n \leq \frac{N_r(A+B)}{N_r(B)}}. If {a \in A}, then {a+B+B(0,r)} must intersect one of the {a_i + B + B(0,r)}, so {a \in a_i + B-B + B(0,2r)}, and the claim follows. \Box

Lemma 5 (Ruzsa triangle inequality) Let {A,B,C} be precompact non-empty subsets of {G}, and let {r>0}. Then {N_{4r}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(B)}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. The domain {(A-B) \times (B-C)} may be covered by {N_r(A-B) N_r(B-C)} product balls {B(x,r) \times B(y,r)}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map which projects to a translate of {B}, and thus must meet at least {N_r(B)} of these product balls. However, if two elements of {A-C} are separated by a distance of at least {4r}, then no product ball can intersect both preimages. We thus see that {N_{4r}^{ent}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(A-C)}}, and the claim follows. \Box

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

Read the rest of this entry »

It turns out to be a favourable week or two for me to finally finish a number of papers that had been at a nearly completed stage for a while.  I have just uploaded to the arXiv my article “Sumset and inverse sumset theorems for Shannon entropy“, submitted to Combinatorics, Probability, and Computing.  This paper evolved from a “deleted scene” in my book with Van Vu entitled “Entropy sumset estimates“.  In those notes, we developed analogues of the standard Plünnecke-Ruzsa sumset estimates (which relate quantities such as the cardinalities |A+B|, |A-B| of the sum and difference sets of two finite sets A, B in an additive group G to each other), to the entropy setting, in which the finite sets A \subset G are replaced instead with discrete random variables X taking values in that group G, and the (logarithm of the) cardinality |A| is replaced with the Shannon entropy

{\textbf H}(X) := \sum_{x \in G} {\Bbb P}(x \in X) \log \frac{1}{{\Bbb P}(x \in X)}.

This quantity measures the information content of X; for instance, if {\textbf H}(X) = k \log 2, then it will take k bits on the average to store the value of X (thus a string of n independent copies of X will require about nk bits of storage in the asymptotic limit n \to \infty).  The relationship between entropy and cardinality is that if X is the uniform distribution on a finite non-empty set A, then {\textbf H}(X) = \log |A|.  If instead X is non-uniformly distributed on A, one has 0 < {\textbf H}(X) < \log |A|, thanks to Jensen’s inequality.

It turns out that many estimates on sumsets have entropy analogues, which resemble the “logarithm” of the sumset estimates.  For instance, the trivial bounds

|A|, |B| \leq |A+B| \leq |A| |B|

have the entropy analogue

{\textbf H}(X), {\textbf H}(Y) \leq {\textbf H}(X+Y) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X, Y are independent discrete random variables in an additive group; this is not difficult to deduce from standard entropy inequalities.  Slightly more non-trivially, the sum set estimate

|A+B| \leq \frac{|A-B|^3}{|A| |B|}

established by Ruzsa, has an entropy analogue

{\textbf H}(X+Y) \leq 3 {\textbf H}(X-Y) - {\textbf H}(X) - {\textbf H}(Y),

and similarly for a number of other standard sumset inequalities in the literature (e.g. the Rusza triangle inequality, the Plünnecke-Rusza inequality, and the Balog-Szemeredi-Gowers theorem, though the entropy analogue of the latter requires a little bit of care to state).  These inequalities can actually be deduced fairly easily from elementary arithmetic identities, together with standard entropy inequalities, most notably the submodularity inequality

{\textbf H}(Z) + {\textbf H}(W) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X,Y,Z,W are discrete random variables such that X and Y each determine W separately (thus W = f(X) = g(Y) for some deterministic functions f, g) and X and Y determine Z jointly (thus Z = h(X,Y) for some deterministic function f).  For instance, if X,Y,Z are independent discrete random variables in an additive group G, then (X-Y,Y-Z) and (X,Z) each determine X-Z separately, and determine X,Y,Z jointly, leading to the inequality

{\textbf H}(X,Y,Z) + {\textbf H}(X-Z) \leq {\textbf H}(X-Y,Y-Z) + {\textbf H}(X,Z)

which soon leads to the entropy Rusza triangle inequality

{\textbf H}(X-Z) \leq {\textbf H}(X-Y) + {\textbf H}(Y-Z) - {\textbf H}(Y)

which is an analogue of the combinatorial Ruzsa triangle inequality

|A-C| \leq \frac{|A-B| |B-C|}{|B|}.

All of this was already in the unpublished notes with Van, though I include it in this paper in order to place it in the literature.  The main novelty of the paper, though, is to consider the entropy analogue of Freiman’s theorem, which classifies those sets A for which |A+A| = O(|A|).  Here, the analogous problem is to classify the random variables X such that {\textbf H}(X_1+X_2) = {\textbf H}(X) + O(1), where X_1,X_2 are independent copies of X.  Let us say that X has small doubling if this is the case.

For instance, the uniform distribution U on a finite subgroup H of G has small doubling (in fact {\textbf H}(U_1+U_2)={\textbf H}(U) = \log |H| in this case). In a similar spirit, the uniform distribution on a (generalised) arithmetic progression P also has small doubling, as does the uniform distribution on a coset progression H+P.  Also, if X has small doubling, and Y has bounded entropy, then X+Y also has small doubling, even if Y and X are not independent.  The main theorem is that these are the only cases:

Theorem 1. (Informal statement) X has small doubling if and only if X = U + Y for some uniform distribution U on a coset progression (of bounded rank), and Y has bounded entropy.

For instance, suppose that X was the uniform distribution on a dense subset A of a finite group G.  Then Theorem 1 asserts that X is close in a “transport metric” sense to the uniform distribution U on G, in the sense that it is possible to rearrange or transport the probability distribution of X to the probability distribution of U (or vice versa) by shifting each component of the mass of X by an amount Y which has bounded entropy (which basically means that it primarily ranges inside a set of bounded cardinality).  The way one shows this is by randomly translating the mass of X around by a few random shifts to approximately uniformise the distribution, and then deal with the residual fluctuation in the distribution by hand.  Theorem 1 as a whole is established by using the Freiman theorem in the combinatorial setting combined with various elementary convexity and entropy inequality arguments to reduce matters to the above model case when X is supported inside a finite group G and has near-maximal entropy.

I also show a variant of the above statement: if X, Y are independent and {\textbf H}(X+Y) = {\textbf H}(X)+O(1) = {\textbf H}(Y)+O(1), then we have X \equiv Y+Z (i.e. X has the same distribution as Y+Z for some Z of bounded entropy (not necessarily independent of X or Y).  Thus if two random variables are additively related to each other, then they can be additively transported to each other by using a bounded amount of entropy.

In the last part of the paper I relate these discrete entropies to their continuous counterparts

{\textbf H}_{\Bbb R}(X) := \int_{{\Bbb R}} p(x) \log \frac{1}{p(x)}\ dx,

where X is now a continuous random variable on the real line with density function p(x)\ dx.  There are a number of sum set inequalities known in this setting, for instance

{\textbf H}_{\Bbb R}(X_1 + X_2) \geq {\textbf H}_{\Bbb R}(X) + \frac{1}{2} \log 2,

for independent copies X_1,X_2 of a finite entropy random variable X, with equality if and only if X is a Gaussian.  Using this inequality and Theorem 1, I show a discrete version, namely that

{\textbf H}(X_1 + X_2) \geq {\textbf H}(X) + \frac{1}{2} \log 2 - \varepsilon,

whenever \varepsilon> 0 and X_1,X_2 are independent copies of a random variable in {\Bbb Z} (or any other torsion-free abelian group) whose entropy is sufficiently large depending on \varepsilon.  This is somewhat analogous to the classical sumset inequality

|A+A| \geq 2 |A| - 1

though notice that we have a gain of just \frac{1}{2} \log 2 rather than \log 2 here, the point being that there is a Gaussian counterexample in the entropy setting which does not have a combinatorial analogue (except perhaps in the high-dimensional limit).  The main idea is to use Theorem 1 to trap most of X inside a coset progression, at which point one can use Fourier-analytic additive combinatorial tools to show that the distribution X_1+X_2 is “smooth” in some non-trivial direction r, which can then be used to approximate the discrete distribution by a continuous one.

I also conjecture more generally that the entropy monotonicity inequalities established by Artstein, Barthe, Ball, and Naor in the continuous case also hold in the above sense in the discrete case, though my method of proof breaks down because I no longer can assume small doubling.