You are currently browsing the tag archive for the ‘Shannon entropy’ tag.

A handy inequality in additive combinatorics is the Plünnecke-Ruzsa inequality:

Theorem 1 (Plünnecke-Ruzsa inequality) Let ${A, B_1, \ldots, B_m}$ be finite non-empty subsets of an additive group ${G}$, such that ${|A+B_i| \leq K_i |A|}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then there exists a subset ${A'}$ of ${A}$ such that ${|A' + B_1 + \ldots + B_m| \leq K_1 \ldots K_m |A'|}$.

The proof uses graph-theoretic techniques. Setting ${A=B_1=\ldots=B_m}$, we obtain a useful corollary: if ${A}$ has small doubling in the sense that ${|A+A| \leq K|A|}$, then we have ${|mA| \leq K^m |A|}$ for all ${m \geq 1}$, where ${mA = A + \ldots + A}$ is the sum of ${m}$ copies of ${A}$.

In a recent paper, I adapted a number of sum set estimates to the entropy setting, in which finite sets such as ${A}$ in ${G}$ are replaced with discrete random variables ${X}$ taking values in ${G}$, and (the logarithm of) cardinality ${|A|}$ of a set ${A}$ is replaced by Shannon entropy ${{\Bbb H}(X)}$ of a random variable ${X}$. (Throughout this note I assume all entropies to be finite.) However, at the time, I was unable to find an entropy analogue of the Plünnecke-Ruzsa inequality, because I did not know how to adapt the graph theory argument to the entropy setting.

I recently discovered, however, that buried in a classic paper of Kaimonovich and Vershik (implicitly in Proposition 1.3, to be precise) there was the following analogue of Theorem 1:

Theorem 2 (Entropy Plünnecke-Ruzsa inequality) Let ${X, Y_1, \ldots, Y_m}$ be independent random variables of finite entropy taking values in an additive group ${G}$, such that ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then ${{\Bbb H}(X+Y_1+\ldots+Y_m) \leq {\Bbb H}(X) + \log K_1 \ldots K_m}$.

In fact Theorem 2 is a bit “better” than Theorem 1 in the sense that Theorem 1 needed to refine the original set ${A}$ to a subset ${A'}$, but no such refinement is needed in Theorem 2. One corollary of Theorem 2 is that if ${{\Bbb H}(X_1+X_2) \leq {\Bbb H}(X) + \log K}$, then ${{\Bbb H}(X_1+\ldots+X_m) \leq {\Bbb H}(X) + (m-1) \log K}$ for all ${m \geq 1}$, where ${X_1,\ldots,X_m}$ are independent copies of ${X}$; this improves slightly over the analogous combinatorial inequality. Indeed, the function ${m \mapsto {\Bbb H}(X_1+\ldots+X_m)}$ is concave (this can be seen by using the ${m=2}$ version of Theorem 2 (or (2) below) to show that the quantity ${{\Bbb H}(X_1+\ldots+X_{m+1})-{\Bbb H}(X_1+\ldots+X_m)}$ is decreasing in ${m}$).

Theorem 2 is actually a quick consequence of the submodularity inequality

$\displaystyle {\Bbb H}(W) + {\Bbb H}(X) \leq {\Bbb H}(Y) + {\Bbb H}(Z) \ \ \ \ \ (1)$

in information theory, which is valid whenever ${X,Y,Z,W}$ are discrete random variables such that ${Y}$ and ${Z}$ each determine ${X}$ (i.e. ${X}$ is a function of ${Y}$, and also a function of ${Z}$), and ${Y}$ and ${Z}$ jointly determine ${W}$ (i.e ${W}$ is a function of ${Y}$ and ${Z}$). To apply this, let ${X, Y, Z}$ be independent discrete random variables taking values in ${G}$. Observe that the pairs ${(X,Y+Z)}$ and ${(X+Y,Z)}$ each determine ${X+Y+Z}$, and jointly determine ${(X,Y,Z)}$. Applying (1) we conclude that

$\displaystyle {\Bbb H}(X,Y,Z) + {\Bbb H}(X+Y+Z) \leq {\Bbb H}(X,Y+Z) + {\Bbb H}(X+Y,Z)$

which after using the independence of ${X,Y,Z}$ simplifies to the sumset submodularity inequality

$\displaystyle {\Bbb H}(X+Y+Z) + {\Bbb H}(Y) \leq {\Bbb H}(X+Y) + {\Bbb H}(Y+Z) \ \ \ \ \ (2)$

(this inequality was also recently observed by Madiman; it is the ${m=2}$ case of Theorem 2). As a corollary of this inequality, we see that if ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$, then

$\displaystyle {\Bbb H}(X+Y_1+\ldots+Y_i) \leq {\Bbb H}(X+Y_1+\ldots+Y_{i-1}) + \log K_i,$

and Theorem 2 follows by telescoping series.

The proof of Theorem 2 seems to be genuinely different from the graph-theoretic proof of Theorem 1. It would be interesting to see if the above argument can be somehow adapted to give a stronger version of Theorem 1. Note also that both Theorem 1 and Theorem 2 have extensions to more general combinations of ${X,Y_1,\ldots,Y_m}$ than ${X+Y_i}$; see this paper and this paper respectively.

It turns out to be a favourable week or two for me to finally finish a number of papers that had been at a nearly completed stage for a while.  I have just uploaded to the arXiv my article “Sumset and inverse sumset theorems for Shannon entropy“, submitted to Combinatorics, Probability, and Computing.  This paper evolved from a “deleted scene” in my book with Van Vu entitled “Entropy sumset estimates“.  In those notes, we developed analogues of the standard Plünnecke-Ruzsa sumset estimates (which relate quantities such as the cardinalities $|A+B|, |A-B|$ of the sum and difference sets of two finite sets $A, B$ in an additive group $G$ to each other), to the entropy setting, in which the finite sets $A \subset G$ are replaced instead with discrete random variables $X$ taking values in that group G, and the (logarithm of the) cardinality |A| is replaced with the Shannon entropy

${\textbf H}(X) := \sum_{x \in G} {\Bbb P}(x \in X) \log \frac{1}{{\Bbb P}(x \in X)}.$

This quantity measures the information content of X; for instance, if ${\textbf H}(X) = k \log 2$, then it will take k bits on the average to store the value of X (thus a string of n independent copies of X will require about nk bits of storage in the asymptotic limit $n \to \infty$).  The relationship between entropy and cardinality is that if X is the uniform distribution on a finite non-empty set A, then ${\textbf H}(X) = \log |A|$.  If instead X is non-uniformly distributed on A, one has $0 < {\textbf H}(X) < \log |A|$, thanks to Jensen’s inequality.

It turns out that many estimates on sumsets have entropy analogues, which resemble the “logarithm” of the sumset estimates.  For instance, the trivial bounds

$|A|, |B| \leq |A+B| \leq |A| |B|$

have the entropy analogue

${\textbf H}(X), {\textbf H}(Y) \leq {\textbf H}(X+Y) \leq {\textbf H}(X) + {\textbf H}(Y)$

whenever X, Y are independent discrete random variables in an additive group; this is not difficult to deduce from standard entropy inequalities.  Slightly more non-trivially, the sum set estimate

$|A+B| \leq \frac{|A-B|^3}{|A| |B|}$

established by Ruzsa, has an entropy analogue

${\textbf H}(X+Y) \leq 3 {\textbf H}(X-Y) - {\textbf H}(X) - {\textbf H}(Y)$,

and similarly for a number of other standard sumset inequalities in the literature (e.g. the Rusza triangle inequality, the Plünnecke-Rusza inequality, and the Balog-Szemeredi-Gowers theorem, though the entropy analogue of the latter requires a little bit of care to state).  These inequalities can actually be deduced fairly easily from elementary arithmetic identities, together with standard entropy inequalities, most notably the submodularity inequality

${\textbf H}(Z) + {\textbf H}(W) \leq {\textbf H}(X) + {\textbf H}(Y)$

whenever X,Y,Z,W are discrete random variables such that X and Y each determine W separately (thus $W = f(X) = g(Y)$ for some deterministic functions f, g) and X and Y determine Z jointly (thus $Z = h(X,Y)$ for some deterministic function f).  For instance, if X,Y,Z are independent discrete random variables in an additive group G, then $(X-Y,Y-Z)$ and $(X,Z)$ each determine $X-Z$ separately, and determine $X,Y,Z$ jointly, leading to the inequality

${\textbf H}(X,Y,Z) + {\textbf H}(X-Z) \leq {\textbf H}(X-Y,Y-Z) + {\textbf H}(X,Z)$

which soon leads to the entropy Rusza triangle inequality

${\textbf H}(X-Z) \leq {\textbf H}(X-Y) + {\textbf H}(Y-Z) - {\textbf H}(Y)$

which is an analogue of the combinatorial Ruzsa triangle inequality

$|A-C| \leq \frac{|A-B| |B-C|}{|B|}.$

All of this was already in the unpublished notes with Van, though I include it in this paper in order to place it in the literature.  The main novelty of the paper, though, is to consider the entropy analogue of Freiman’s theorem, which classifies those sets A for which $|A+A| = O(|A|)$.  Here, the analogous problem is to classify the random variables $X$ such that ${\textbf H}(X_1+X_2) = {\textbf H}(X) + O(1)$, where $X_1,X_2$ are independent copies of X.  Let us say that X has small doubling if this is the case.

For instance, the uniform distribution U on a finite subgroup H of G has small doubling (in fact ${\textbf H}(U_1+U_2)={\textbf H}(U) = \log |H|$ in this case). In a similar spirit, the uniform distribution on a (generalised) arithmetic progression P also has small doubling, as does the uniform distribution on a coset progression H+P.  Also, if X has small doubling, and Y has bounded entropy, then X+Y also has small doubling, even if Y and X are not independent.  The main theorem is that these are the only cases:

Theorem 1. (Informal statement) X has small doubling if and only if $X = U + Y$ for some uniform distribution U on a coset progression (of bounded rank), and Y has bounded entropy.

For instance, suppose that X was the uniform distribution on a dense subset A of a finite group G.  Then Theorem 1 asserts that X is close in a “transport metric” sense to the uniform distribution U on G, in the sense that it is possible to rearrange or transport the probability distribution of X to the probability distribution of U (or vice versa) by shifting each component of the mass of X by an amount Y which has bounded entropy (which basically means that it primarily ranges inside a set of bounded cardinality).  The way one shows this is by randomly translating the mass of X around by a few random shifts to approximately uniformise the distribution, and then deal with the residual fluctuation in the distribution by hand.  Theorem 1 as a whole is established by using the Freiman theorem in the combinatorial setting combined with various elementary convexity and entropy inequality arguments to reduce matters to the above model case when X is supported inside a finite group G and has near-maximal entropy.

I also show a variant of the above statement: if X, Y are independent and ${\textbf H}(X+Y) = {\textbf H}(X)+O(1) = {\textbf H}(Y)+O(1)$, then we have $X \equiv Y+Z$ (i.e. X has the same distribution as Y+Z for some Z of bounded entropy (not necessarily independent of X or Y).  Thus if two random variables are additively related to each other, then they can be additively transported to each other by using a bounded amount of entropy.

In the last part of the paper I relate these discrete entropies to their continuous counterparts

${\textbf H}_{\Bbb R}(X) := \int_{{\Bbb R}} p(x) \log \frac{1}{p(x)}\ dx,$

where X is now a continuous random variable on the real line with density function $p(x)\ dx$.  There are a number of sum set inequalities known in this setting, for instance

${\textbf H}_{\Bbb R}(X_1 + X_2) \geq {\textbf H}_{\Bbb R}(X) + \frac{1}{2} \log 2$,

for independent copies $X_1,X_2$ of a finite entropy random variable X, with equality if and only if X is a Gaussian.  Using this inequality and Theorem 1, I show a discrete version, namely that

${\textbf H}(X_1 + X_2) \geq {\textbf H}(X) + \frac{1}{2} \log 2 - \varepsilon$,

whenever $\varepsilon> 0$ and $X_1,X_2$ are independent copies of a random variable in ${\Bbb Z}$ (or any other torsion-free abelian group) whose entropy is sufficiently large depending on $\varepsilon$.  This is somewhat analogous to the classical sumset inequality

$|A+A| \geq 2 |A| - 1$

though notice that we have a gain of just $\frac{1}{2} \log 2$ rather than $\log 2$ here, the point being that there is a Gaussian counterexample in the entropy setting which does not have a combinatorial analogue (except perhaps in the high-dimensional limit).  The main idea is to use Theorem 1 to trap most of X inside a coset progression, at which point one can use Fourier-analytic additive combinatorial tools to show that the distribution $X_1+X_2$ is “smooth” in some non-trivial direction r, which can then be used to approximate the discrete distribution by a continuous one.

I also conjecture more generally that the entropy monotonicity inequalities established by Artstein, Barthe, Ball, and Naor in the continuous case also hold in the above sense in the discrete case, though my method of proof breaks down because I no longer can assume small doubling.