Ben Green, Freddie Manners and I have just uploaded to the arXiv our preprint “Sumsets and entropy revisited“. This paper uses entropy methods to attack the Polynomial Freiman-Ruzsa (PFR) conjecture, which we study in the following two forms:

Conjecture 1 (Weak PFR over {Z}) Let {A \subset {\bf Z}^D} be a finite non-empty set whose doubling constant {\sigma[A] := |A+A|/|A|} is at most {K}. Then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} that has affine dimension {O(\log K)} (i.e., it is contained in an affine space of dimension {O(\log K)}).

Conjecture 2 (PFR over {{\bf F}_2}) Let {A \subset {\bf F}_2^D} be a non-empty set whose doubling constant {\sigma[A]} is at most {K}. Then {A} can be covered by {O(K^{O(1)})} cosets of a subspace of cardinality at most {|A|}.

Our main results are then as follows.

Theorem 3 If {A \subset {\bf Z}^D} with {\sigma[A] \leq K}, then
  • (i) There is a subset {A'} of {A} of density {\gg K^{-O(1)}} of “skew-dimension” (or “query complexity”) {O(\log K)}.
  • (ii) There is a subset {A'} of {A} of density {\gg \exp( -O(\log^{5/3+o(1)} K) )} of affine dimension {O(\log K)} (where {o(1)} goes to zero as {K \rightarrow \infty}).
  • (iii) If Conjecture 2 holds, then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} of affine dimension {O(\log K)}. In other words, Conjecture 2 implies Conjecture 1.

The skew-dimension of a set is a quantity smaller than the affine dimension which is defined recursively; the precise definition is given in the paper, but suffice to say that singleton sets have dimension {0}, and a set {A \subset {\bf Z}^{D_1} \times {\bf Z}^{D_2}} whose projection to {{\bf Z}^{D_1}} has skew-dimension at most {d_1}, and whose fibers in {\{x\} \times {\bf Z}^{D_2}} have skew-dimension at most {d_2} for any {x}, will have skew-dimension at most {d_1+d_2}. (In fact, the skew-dimension is basically the largest quantity which obeys all of these properties.)

Part (i) of this theorem was implicitly proven by Pálvölgi and Zhelezov by a different method. Part (ii) with {5/3+o(1)} replaced by {2} was established by Manners. To our knowledge, part (iii) is completely new.

Our proof strategy is to establish these combinatorial additive combinatorics results by using entropic additive combinatorics, in which we replace sets {A} with random variables {X}, and cardinality with (the exponential of) Shannon entropy. This is in order to take advantage of some superior features of entropic additive combinatorics, most notably good behavior with respect to homomorphisms.

For instance, the analogue of the combinatorial doubling constant {\sigma[A] := |A+A|/|A|} of a finite non-empty subset {A} of an abelian group {G}, is the entropy doubling constant

\displaystyle  \sigma_{\mathrm{ent}}[X] := {\exp( \bf H}(X_1+X_2) - {\bf H}(X) )

of a finitely-valued random variable {X} in {G}, where {X_1,X_2} are independent copies of {X} and {{\bf H}} denotes Shannon entropy. There is also an analogue of the Ruzsa distance

\displaystyle  d(A,B) := \log \frac{|A-B|}{|A|^{1/2} |B|^{1/2}}

between two finite non-empty subsets {A,B} of {G}, namely the entropic Ruzsa distance

\displaystyle  d_{\mathrm{ent}}(X,Y) := {\bf H}(X'+Y') - \frac{1}{2} {\bf H}(X) - \frac{1}{2} {\bf H}(Y)

where {X',Y'} are independent copies of {X,Y} respectively. (Actually, one thing we show in our paper is that the independence hypothesis can be dropped, and this only affects the entropic Ruzsa distance by a factor of three at worst.) Many of the results about sumsets and Ruzsa distance have entropic analogues, but the entropic versions are slightly better behaved; for instance, we have a contraction property

\displaystyle  d_{\mathrm{ent}}(\phi(X),\phi(Y)) \leq d_{\mathrm{ent}}(X,Y)

whenever {\phi: G \rightarrow H} is a homomorphism. In fact we have a refinement of this inequality in which the gap between these two quantities can be used to control the entropic distance between “fibers” of {X,Y} (in which one conditions {\phi(X)} and {\phi(Y)} to be fixed). On the other hand, there are direct connections between the combinatorial and entropic sumset quantities. For instance, if {U_A} is a random variable drawn uniformly from {A}, then

\displaystyle  \sigma_{\mathrm{ent}}[U_A] \leq \sigma[A].

Thus if {A} has small doubling, then {U_A} has small entropic doubling. In the converse direction, if {X} has small entropic doubling, then {X} is close (in entropic Ruzsa distance) to a uniform random variable {U_S} drawn from a set {S} of small doubling; a version of this statement was proven in an old paper of myself, but we establish here a quantitatively efficient version, established by rewriting the entropic Ruzsa distance in terms of certain Kullback-Liebler divergences.

Our first main result is a “99% inverse theorem” for entropic Ruzsa distance: if {d_{\mathrm{ent}}(X,Y)} is sufficiently small, then there exists a finite subgroup {H} of {G} such that

\displaystyle  d_{\mathrm{ent}}(X,U_H), d_{\mathrm{ent}}(Y,U_H) \leq 12 d_{\mathrm{ent}}(X,Y). \ \ \ \ \ (1)

This result uses the results just mentioned to relate {X,Y} to a set {S} of small doubling, which can then be related to a subgroup {H} by standard inverse theorems; this gives a weak version of (1) (roughly speaking losing a square root in the bound), and some additional analysis is needed to bootstrap this initial estimate back to (1).

We now sketch how these tools are used to prove our main theorem. For (i), we reduce matters to establishing the following bilinear entropic analogue: given two non-empty finite subsets {A,B} of {{\bf Z}^D}, one can find subsets {A' \subset A}, {B' \subset B} with

\displaystyle  |A'| |B'| \geq e^{-C d_{\mathrm{ent}}(U_A, U_B)} |A| |B|

such that {A', B'} have skew-dimension at most {C d_{\mathrm{ent}}(U_A, U_B)}, for some absolute constant {C}. This can be shown by an induction on {|A||B|} (say). One applies a non-trivial coordinate projection {\pi: {\bf Z}^D \rightarrow {\bf Z}} to {A,B}. If {\pi(U_A)} and {\pi(U_B)} are very close in entropic Ruzsa distance, then the 99% inverse theorem shows that these random variables must each concentrate at a point (because {{\bf Z}} has no non-trivial finite subgroups), and can pass to a fiber of these points and use the induction hypothesis. If instead {\pi(U_A)} and {\pi(U_B)} are far apart, then by the behavior of entropy under projections one can show that the fibers of {A} and {B} under {\pi} are closer on average in entropic Ruzsa distance of {A} and {B} themselves, and one can again proceed using the induction hypothesis.

For parts (ii) and (iii), we first use an entropic version of an observation of Manners that sets of small doubling in {{\bf Z}^D} must be irregularly distributed modulo {2}. A clean formulation of this in entropic language is the inequality

\displaystyle  d_{\mathrm{ent}}(X, 2Y) \leq 5 d_{\mathrm{ent}}(X,Y)

whenever {X,Y} take values in a torsion-free abelian group such as {{\bf Z}^D}; this turns out to follow from two applications of the entropy submodularity inequality. One corollary of this (and the behavior of entropy under projections) is that

\displaystyle  {\bf H}( X \hbox{ mod } 2 ), {\bf H}( Y \hbox{ mod } 2 ) \leq 10 d_{\mathrm{ent}}(X,Y).

This is the key link between the {{\bf Z}^D} and {{\bf F}_2^D} worlds that is used to prove (ii), (iii): while (iii) relies on the still unproven PFR conjecture over {{\bf F}_2}, (ii) uses the unconditional progress on PFR by Konyagin, as detailed in this survey of Sanders. The argument has a similar inductive structure to that used to establish (i) (and if one is willing to replace {5/3+o(1)} by {2} then the argument is in fact relatively straightforward and does not need any deep partial results on the PFR).

As one byproduct of our analysis we also obtain an appealing entropic reformulation of Conjecture 2, namely that if {X} is an {{\bf F}_2^D}-valued random variable then there exists a subspace {H} of {{\bf F}_2^D} such that

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll \sigma_{\mathrm{ent}}[X].

Right now the best result in this direction is

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll_\varepsilon \sigma_{\mathrm{ent}}[X] + \sigma_{\mathrm{ent}}^{3+\varepsilon}[X]

for any {\varepsilon > 0}, by using Konyagin’s partial result towards the PFR.