A handy inequality in additive combinatorics is the Plünnecke-Ruzsa inequality:

Theorem 1 (Plünnecke-Ruzsa inequality) Let ${A, B_1, \ldots, B_m}$ be finite non-empty subsets of an additive group ${G}$, such that ${|A+B_i| \leq K_i |A|}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then there exists a subset ${A'}$ of ${A}$ such that ${|A' + B_1 + \ldots + B_m| \leq K_1 \ldots K_m |A'|}$.

The proof uses graph-theoretic techniques. Setting ${A=B_1=\ldots=B_m}$, we obtain a useful corollary: if ${A}$ has small doubling in the sense that ${|A+A| \leq K|A|}$, then we have ${|mA| \leq K^m |A|}$ for all ${m \geq 1}$, where ${mA = A + \ldots + A}$ is the sum of ${m}$ copies of ${A}$.

In a recent paper, I adapted a number of sum set estimates to the entropy setting, in which finite sets such as ${A}$ in ${G}$ are replaced with discrete random variables ${X}$ taking values in ${G}$, and (the logarithm of) cardinality ${|A|}$ of a set ${A}$ is replaced by Shannon entropy ${{\Bbb H}(X)}$ of a random variable ${X}$. (Throughout this note I assume all entropies to be finite.) However, at the time, I was unable to find an entropy analogue of the Plünnecke-Ruzsa inequality, because I did not know how to adapt the graph theory argument to the entropy setting.

I recently discovered, however, that buried in a classic paper of Kaimonovich and Vershik (implicitly in Proposition 1.3, to be precise) there was the following analogue of Theorem 1:

Theorem 2 (Entropy Plünnecke-Ruzsa inequality) Let ${X, Y_1, \ldots, Y_m}$ be independent random variables of finite entropy taking values in an additive group ${G}$, such that ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then ${{\Bbb H}(X+Y_1+\ldots+Y_m) \leq {\Bbb H}(X) + \log K_1 \ldots K_m}$.

In fact Theorem 2 is a bit “better” than Theorem 1 in the sense that Theorem 1 needed to refine the original set ${A}$ to a subset ${A'}$, but no such refinement is needed in Theorem 2. One corollary of Theorem 2 is that if ${{\Bbb H}(X_1+X_2) \leq {\Bbb H}(X) + \log K}$, then ${{\Bbb H}(X_1+\ldots+X_m) \leq {\Bbb H}(X) + (m-1) \log K}$ for all ${m \geq 1}$, where ${X_1,\ldots,X_m}$ are independent copies of ${X}$; this improves slightly over the analogous combinatorial inequality. Indeed, the function ${m \mapsto {\Bbb H}(X_1+\ldots+X_m)}$ is concave (this can be seen by using the ${m=2}$ version of Theorem 2 (or (2) below) to show that the quantity ${{\Bbb H}(X_1+\ldots+X_{m+1})-{\Bbb H}(X_1+\ldots+X_m)}$ is decreasing in ${m}$).

Theorem 2 is actually a quick consequence of the submodularity inequality

$\displaystyle {\Bbb H}(W) + {\Bbb H}(X) \leq {\Bbb H}(Y) + {\Bbb H}(Z) \ \ \ \ \ (1)$

in information theory, which is valid whenever ${X,Y,Z,W}$ are discrete random variables such that ${Y}$ and ${Z}$ each determine ${X}$ (i.e. ${X}$ is a function of ${Y}$, and also a function of ${Z}$), and ${Y}$ and ${Z}$ jointly determine ${W}$ (i.e ${W}$ is a function of ${Y}$ and ${Z}$). To apply this, let ${X, Y, Z}$ be independent discrete random variables taking values in ${G}$. Observe that the pairs ${(X,Y+Z)}$ and ${(X+Y,Z)}$ each determine ${X+Y+Z}$, and jointly determine ${(X,Y,Z)}$. Applying (1) we conclude that

$\displaystyle {\Bbb H}(X,Y,Z) + {\Bbb H}(X+Y+Z) \leq {\Bbb H}(X,Y+Z) + {\Bbb H}(X+Y,Z)$

which after using the independence of ${X,Y,Z}$ simplifies to the sumset submodularity inequality

$\displaystyle {\Bbb H}(X+Y+Z) + {\Bbb H}(Y) \leq {\Bbb H}(X+Y) + {\Bbb H}(Y+Z) \ \ \ \ \ (2)$

(this inequality was also recently observed by Madiman; it is the ${m=2}$ case of Theorem 2). As a corollary of this inequality, we see that if ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$, then

$\displaystyle {\Bbb H}(X+Y_1+\ldots+Y_i) \leq {\Bbb H}(X+Y_1+\ldots+Y_{i-1}) + \log K_i,$

and Theorem 2 follows by telescoping series.

The proof of Theorem 2 seems to be genuinely different from the graph-theoretic proof of Theorem 1. It would be interesting to see if the above argument can be somehow adapted to give a stronger version of Theorem 1. Note also that both Theorem 1 and Theorem 2 have extensions to more general combinations of ${X,Y_1,\ldots,Y_m}$ than ${X+Y_i}$; see this paper and this paper respectively.

It is also worth remarking that the above inequalities largely carry over to the non-abelian setting. For instance, if ${X_1, X_2,\ldots}$ are iid copies of a discrete random variable in a multiplicative group ${G}$, the above arguments show that the function ${m \mapsto {\Bbb H}(X_1 \ldots X_m)}$ is concave. In particular, the expression ${\frac{1}{m} {\Bbb H}(X_1 \ldots X_m)}$ decreases monotonically to a limit, the asymptotic entropy ${{\Bbb H}(G,X)}$. This quantity plays an important role in the theory of bounded harmonic functions on ${G}$, as observed by Kaimonovich and Vershik:

Proposition 3 Let ${G}$ be a discrete group, and let ${X}$ be a discrete random variable in ${G}$ with finite entropy, whose support generates ${G}$. Then there exists a non-constant bounded function ${f: G \rightarrow {\Bbb R}}$ which is harmonic with respect to ${X}$ (which means that ${{\Bbb E} f(X x) = f(x)}$ for all ${x \in G}$) if and only if ${{\Bbb H}(G,X) \neq 0}$.

Proof: (Sketch) Suppose first that ${{\Bbb H}(G,X) = 0}$, then we see from concavity that the successive differences ${{\Bbb H}(X_1 \ldots X_m) - {\Bbb H}(X_1 \ldots X_{m-1})}$ converge to zero. From this it is not hard to see that the mutual information

$\displaystyle {\Bbb I}(X_m, X_1 \ldots X_m) := {\Bbb H}(X_m) + {\Bbb H}(X_1 \ldots X_m) - {\Bbb H}(X_m|X_1 \ldots X_m)$

goes to zero as ${m \rightarrow \infty}$. Informally, knowing the value of ${X_m}$ reveals very little about the value of ${X_1 \ldots X_m}$ when ${m}$ is large.

Now let ${f: G \rightarrow {\Bbb R}}$ be a bounded harmonic function, and let ${m}$ be large. For any ${x \in G}$ and any value ${s}$ in the support of ${X_m}$, we observe from harmonicity that

$\displaystyle f( s x ) = {\Bbb E}( f( X_1 \ldots X_m x ) | X_m = s ).$

But from the asymptotic vanishing of mutual information and the boundedness of ${f}$, one can show that the right-hand side will converge to ${{\Bbb E}( f(X_1 \ldots X_m x) )}$, which by harmonicity is equal to ${f(x)}$. Thus ${f}$ is invariant with respect to the support of ${X}$, and is thus constant since this support generates ${G}$.

Conversely, if ${{\Bbb H}(G,X)}$ is non-zero, then the above arguments show that ${{\Bbb I}(X_m, X_1 \ldots X_m)}$ stays bounded away from zero as ${m \rightarrow \infty}$, thus ${X_1 \ldots X_m}$ reveals a non-trivial amount of information about ${X_m}$. This turns out to be true even if ${m}$ is not deterministic, but is itself random, varying over some medium-sized range. From this, one can find a bounded function ${F}$ such that the conditional expectation ${{\Bbb E}(F(X_1 \ldots X_m) | X_m = s)}$ varies non-trivially with ${s}$. On the other hand, the bounded function ${x \mapsto {\Bbb E} F(X_1 \ldots X_{m-1} x)}$ is approximately harmonic (because we are varying ${m}$), and has some non-trivial fluctuation near the identity (by the preceding sentence). Taking a limit as ${m \rightarrow \infty}$ (using Arzelá-Ascoli) we obtain a non-constant bounded harmonic function as desired. $\Box$