Sumsets and entropy revisited

27 June, 2023 in math.CO, paper | Tags: Ben Green, entropy, Freddie Manners, Polynomial Freiman-Ruzsa conjecture | by Terence Tao

Ben Green, Freddie Manners and I have just uploaded to the arXiv our preprint “Sumsets and entropy revisited“. This paper uses entropy methods to attack the Polynomial Freiman-Ruzsa (PFR) conjecture, which we study in the following two forms:

Conjecture 1 (Weak PFR over ${Z}$ ) Let ${A \subset {\bf Z}^D}$ be a finite non-empty set whose doubling constant ${\sigma[A] := |A+A|/|A|}$ is at most ${K}$ . Then there is a subset ${A'}$ of ${A}$ of density ${\gg K^{-O(1)}}$ that has affine dimension ${O(\log K)}$ (i.e., it is contained in an affine space of dimension ${O(\log K)}$ ).

Conjecture 2 (PFR over ${{\bf F}_2}$ ) Let ${A \subset {\bf F}_2^D}$ be a non-empty set whose doubling constant ${\sigma[A]}$ is at most ${K}$ . Then ${A}$ can be covered by ${O(K^{O(1)})}$ cosets of a subspace of cardinality at most ${|A|}$ .

Our main results are then as follows.

Theorem 3 If ${A \subset {\bf Z}^D}$ with ${\sigma[A] \leq K}$ , then

(i) There is a subset ${A'}$ of ${A}$ of density ${\gg K^{-O(1)}}$ of “skew-dimension” (or “query complexity”) ${O(\log K)}$ .
(ii) There is a subset ${A'}$ of ${A}$ of density ${\gg \exp( -O(\log^{5/3+o(1)} K) )}$ of affine dimension ${O(\log K)}$ (where ${o(1)}$ goes to zero as ${K \rightarrow \infty}$ ).
(iii) If Conjecture 2 holds, then there is a subset ${A'}$ of ${A}$ of density ${\gg K^{-O(1)}}$ of affine dimension ${O(\log K)}$ . In other words, Conjecture 2 implies Conjecture 1.

The skew-dimension of a set is a quantity smaller than the affine dimension which is defined recursively; the precise definition is given in the paper, but suffice to say that singleton sets have dimension ${0}$ , and a set ${A \subset {\bf Z}^{D_1} \times {\bf Z}^{D_2}}$ whose projection to ${{\bf Z}^{D_1}}$ has skew-dimension at most ${d_1}$ , and whose fibers in ${\{x\} \times {\bf Z}^{D_2}}$ have skew-dimension at most ${d_2}$ for any ${x}$ , will have skew-dimension at most ${d_1+d_2}$ . (In fact, the skew-dimension is basically the largest quantity which obeys all of these properties.)

Part (i) of this theorem was implicitly proven by Pálvölgi and Zhelezov by a different method. Part (ii) with ${5/3+o(1)}$ replaced by ${2}$ was established by Manners. To our knowledge, part (iii) is completely new.

Our proof strategy is to establish these combinatorial additive combinatorics results by using entropic additive combinatorics, in which we replace sets ${A}$ with random variables ${X}$ , and cardinality with (the exponential of) Shannon entropy. This is in order to take advantage of some superior features of entropic additive combinatorics, most notably good behavior with respect to homomorphisms.

For instance, the analogue of the combinatorial doubling constant ${\sigma[A] := |A+A|/|A|}$ of a finite non-empty subset ${A}$ of an abelian group ${G}$ , is the entropy doubling constant

$\displaystyle \sigma_{\mathrm{ent}}[X] := {\exp( \bf H}(X_1+X_2) - {\bf H}(X) )$

of a finitely-valued random variable ${X}$ in ${G}$ , where ${X_1,X_2}$ are independent copies of ${X}$ and ${{\bf H}}$ denotes Shannon entropy. There is also an analogue of the Ruzsa distance

$\displaystyle d(A,B) := \log \frac{|A-B|}{|A|^{1/2} |B|^{1/2}}$

between two finite non-empty subsets ${A,B}$ of ${G}$ , namely the entropic Ruzsa distance

$\displaystyle d_{\mathrm{ent}}(X,Y) := {\bf H}(X'+Y') - \frac{1}{2} {\bf H}(X) - \frac{1}{2} {\bf H}(Y)$

where ${X',Y'}$ are independent copies of ${X,Y}$ respectively. (Actually, one thing we show in our paper is that the independence hypothesis can be dropped, and this only affects the entropic Ruzsa distance by a factor of three at worst.) Many of the results about sumsets and Ruzsa distance have entropic analogues, but the entropic versions are slightly better behaved; for instance, we have a contraction property

$\displaystyle d_{\mathrm{ent}}(\phi(X),\phi(Y)) \leq d_{\mathrm{ent}}(X,Y)$

whenever ${\phi: G \rightarrow H}$ is a homomorphism. In fact we have a refinement of this inequality in which the gap between these two quantities can be used to control the entropic distance between “fibers” of ${X,Y}$ (in which one conditions ${\phi(X)}$ and ${\phi(Y)}$ to be fixed). On the other hand, there are direct connections between the combinatorial and entropic sumset quantities. For instance, if ${U_A}$ is a random variable drawn uniformly from ${A}$ , then

$\displaystyle \sigma_{\mathrm{ent}}[U_A] \leq \sigma[A].$

Thus if ${A}$ has small doubling, then ${U_A}$ has small entropic doubling. In the converse direction, if ${X}$ has small entropic doubling, then ${X}$ is close (in entropic Ruzsa distance) to a uniform random variable ${U_S}$ drawn from a set ${S}$ of small doubling; a version of this statement was proven in an old paper of myself, but we establish here a quantitatively efficient version, established by rewriting the entropic Ruzsa distance in terms of certain Kullback-Liebler divergences.

Our first main result is a “99% inverse theorem” for entropic Ruzsa distance: if ${d_{\mathrm{ent}}(X,Y)}$ is sufficiently small, then there exists a finite subgroup ${H}$ of ${G}$ such that

$\displaystyle d_{\mathrm{ent}}(X,U_H), d_{\mathrm{ent}}(Y,U_H) \leq 12 d_{\mathrm{ent}}(X,Y). \ \ \ \ \ (1)$

This result uses the results just mentioned to relate ${X,Y}$ to a set ${S}$ of small doubling, which can then be related to a subgroup ${H}$ by standard inverse theorems; this gives a weak version of (1) (roughly speaking losing a square root in the bound), and some additional analysis is needed to bootstrap this initial estimate back to (1).

We now sketch how these tools are used to prove our main theorem. For (i), we reduce matters to establishing the following bilinear entropic analogue: given two non-empty finite subsets ${A,B}$ of ${{\bf Z}^D}$ , one can find subsets ${A' \subset A}$ , ${B' \subset B}$ with

$\displaystyle |A'| |B'| \geq e^{-C d_{\mathrm{ent}}(U_A, U_B)} |A| |B|$

such that ${A', B'}$ have skew-dimension at most ${C d_{\mathrm{ent}}(U_A, U_B)}$ , for some absolute constant ${C}$ . This can be shown by an induction on ${|A||B|}$ (say). One applies a non-trivial coordinate projection ${\pi: {\bf Z}^D \rightarrow {\bf Z}}$ to ${A,B}$ . If ${\pi(U_A)}$ and ${\pi(U_B)}$ are very close in entropic Ruzsa distance, then the 99% inverse theorem shows that these random variables must each concentrate at a point (because ${{\bf Z}}$ has no non-trivial finite subgroups), and can pass to a fiber of these points and use the induction hypothesis. If instead ${\pi(U_A)}$ and ${\pi(U_B)}$ are far apart, then by the behavior of entropy under projections one can show that the fibers of ${A}$ and ${B}$ under ${\pi}$ are closer on average in entropic Ruzsa distance of ${A}$ and ${B}$ themselves, and one can again proceed using the induction hypothesis.

For parts (ii) and (iii), we first use an entropic version of an observation of Manners that sets of small doubling in ${{\bf Z}^D}$ must be irregularly distributed modulo ${2}$ . A clean formulation of this in entropic language is the inequality

$\displaystyle d_{\mathrm{ent}}(X, 2Y) \leq 5 d_{\mathrm{ent}}(X,Y)$

whenever ${X,Y}$ take values in a torsion-free abelian group such as ${{\bf Z}^D}$ ; this turns out to follow from two applications of the entropy submodularity inequality. One corollary of this (and the behavior of entropy under projections) is that

$\displaystyle {\bf H}( X \hbox{ mod } 2 ), {\bf H}( Y \hbox{ mod } 2 ) \leq 10 d_{\mathrm{ent}}(X,Y).$

This is the key link between the ${{\bf Z}^D}$ and ${{\bf F}_2^D}$ worlds that is used to prove (ii), (iii): while (iii) relies on the still unproven PFR conjecture over ${{\bf F}_2}$ , (ii) uses the unconditional progress on PFR by Konyagin, as detailed in this survey of Sanders. The argument has a similar inductive structure to that used to establish (i) (and if one is willing to replace ${5/3+o(1)}$ by ${2}$ then the argument is in fact relatively straightforward and does not need any deep partial results on the PFR).

As one byproduct of our analysis we also obtain an appealing entropic reformulation of Conjecture 2, namely that if ${X}$ is an ${{\bf F}_2^D}$ -valued random variable then there exists a subspace ${H}$ of ${{\bf F}_2^D}$ such that

$\displaystyle d_{\mathrm{ent}}(X, U_H) \ll \sigma_{\mathrm{ent}}[X].$

Right now the best result in this direction is

$\displaystyle d_{\mathrm{ent}}(X, U_H) \ll_\varepsilon \sigma_{\mathrm{ent}}[X] + \sigma_{\mathrm{ent}}^{3+\varepsilon}[X]$

for any ${\varepsilon > 0}$ , by using Konyagin’s partial result towards the PFR.

13 comments

Comments feed for this article

28 June, 2023 at 11:19 am

Anonymous

Is there a conjectured value for the best exponent in theorem 3(ii) (instead of 5/3)?

29 June, 2023 at 2:06 pm

Terence Tao

The conjectured exponent is 1 (this is Conjecture 1).

29 June, 2023 at 8:47 am

Joe

In Theorem 3 (iii),’There’ should be typed in lowercase.

[Corrected, thanks – T.]

2 July, 2023 at 9:54 am

Gil Kalai

I find it surprising and remarkable that a result over Z can be derived from a similar result over Z/2Z. Terry, can you elaborate more on how it works?

2 July, 2023 at 12:49 pm

Terence Tao

For simplicity I will describe the argument in combinatorial language rather than entropic language, although the combinatorial argument doesn’t quite work as stated and one has to move to the entropy formulation to make everything go through. (Also to make the inductive argument close properly one has to work with a “bilinear” formulation in which one has two sets $A,B$ to induct on rather than a single set $A$ , but I will ignore this complication.)

The first key observation is a remarkable observation of Manners that says, roughly speaking, that if a subset $A \subset {\bf Z}^d$ has small doubling, then $A$ is also close (in Ruzsa distance) to the dilate $2 \cdot A = \{ 2a: a \in A \}$ (which already strongly hints at the low dimensionality of a large portion of $A$ , but does not yet establish it). This comes from the inequality $|A| |A-2\cdot A| \leq |A-A-A| |A-A|$ which is proven by a double counting argument similar to the one used to prove the Ruzsa triangle inequality $|B| |A-C| \leq |A-B| |B-C|$ (the point being that because of the identity $x-2y_1 = (x-y_1-y_2) - (y_1-y_2)$ , every element of $A - 2 \cdot A$ generates $|A|$ distinct pairs in $(A-A-A) \times (A-A)$ ).

If we project the integer lattice ${\bf Z}^D$ down to the finite field vector space $({\bf Z}/2{\bf Z})^D$ , then the dilate $2 \cdot A$ gets sent to 0. As a consequence, we see that the projection of $A$ to $({\bf Z}/2{\bf Z})^D$ has to be quite small; also, since $A$ has small doubling, the projection has small doubling also. From this and the known progress on PFR, one can relate the projection of $A$ efficiently to a low dimensional subspace of $({\bf Z}/2{\bf Z})^D$ , whose inverse images are sublattices of ${\bf Z}^D$ . If $D$ is much larger than the dimension of this subspace, one can (in the entropic setting) use this to locate a coset of this sublattice inside of which $A$ is still quite large, and has better doubling properties than the original set $A$ . An induction on the dimension $D$ then lets one close the argument.

27 July, 2023 at 12:53 pm

CSperson

Why do you use the phrase “query complexity”?

29 July, 2023 at 11:09 am

Terence Tao

The terminology was first introduced to this context by Zhelezov and Pálvölgyi (see page 3 for the explanation).

31 July, 2023 at 6:19 am

CSperson

May be should be called ‘generalized decision tree complexity’ or something along those lines. Someone might know better on the analogy here.

10 October, 2023 at 9:52 am

Anonymous

It would be interesting to have an analog for Theorem 3 (i) for $F_2^n$. Possibly it may have some connections to the log rank conjecture in computer science?

10 October, 2023 at 2:14 pm

Terence Tao

We were actually able to get some sort of analogue in finite fields using these entropic methods, but it was really weird and we didn’t find a good use for it (or a clean statement). Roughly speaking, it is similar to 3(i) except that $A'$ , instead of having query complexity $O(\log K)$ , is instead generated from a point via $O(\log K)$ “extension” operations, where an extension from one set $A \in {\bf F}_2^n$ to another $B \in {\bf F}_2^{n+m}$ is one of the following three things:

1. Short extension $m=1$ , and $B$ is a subset of $A \times {\bf F}_2$ of relative density $\sim 1$ .

2. Dense extension. $B$ is a subset of $A \times {\bf F}_2^m$ of relative density $0.99$ (say).

3. Freiman extension. $B$ is a graph $\{ (a,\phi(a)): a \in A \}$ where $\phi: A \to {\bf F}_2^m$ is a “99% Freiman homomorphism”.

Of course, we believe the “correct” statement in ${\bf F}_2^n$ to be the polynomial Freiman-Ruzsa conjecture (Conjecture 2). We have some thoughts on how to improve the current state of the art towards this conjecture, but we don’t see a way to resolve it completely at this point.

28 October, 2023 at 8:49 pm

Kaiyi Huang

Hello Terence,

Thanks for your blog post and paper. I am a graduate student recently reading this paper in order to present in a fall school. I find it very accessible and inspiring. However, there are a few questions listed below.

1. In proving the small Ruzsa distance results, the construction of $S$ on which the uniform distribution is close to $X,Y$ induces a Bernoulli random variable $A$ . Is it a common trick to introduce this $A$ ? Can we improve the results in Proposition 1.2 by reducing/removing the $h(1-\frac{2}{C})$ term?

2. The bilinear forms you use in Theorems 7.3 and 9.3 are very different from that of 1.6. How did you come up with the inequality and specifically the functions $f$ ?

3. Do you have any conjecture on how to further improve Theorem 1.11? Do you think this is the best the entropic method can achieve with a function similar to $f$ Theorem 9.3?

4. What are some future directions? Do we have to come up with some statement different from Theorem 1.11 to attack the conjectures?

4. In the proof of 1.4 in Section 4, I think all the $X_1+X_2, Y_1+Y_2$ should be replaced by $X_1-X_2, Y_1-Y_2$ , respectively. Of course, this typo does not affect the results.

Thanks for reading the questions.

1 November, 2023 at 4:49 pm

Terence Tao

We did not try too strenuously to optimize the arguments, and it is very plausible that by applying entropy inequalities in a slightly different way one could get superior results. Introducing a boolean variable such as $A$ to restrict to a “good” event (or to the complementary “exceptional” event) in order to access facts that are available conditioning to that event, is a common trick in this subject.

In these sorts of iterative arguments, the bounding functions $f$ is usually not known in advance – one first works out what the inductive step gives, which in this case is some assertion of the form “if $A, B$ have a certain entropy distance $d_{ent}(U_A, U_B)$ , then I can find some better pair $A',B'$ obeying one of several possible improved bounds”, and then one has to hunt for a clean choice of function $f$ for which the inductive step will close properly. This often takes some trial and error, especially if the inductive step does not produce a single bound, but rather a disjunction of several possible bounds. Usually one has to explore a few iterations of the inductive step to get a feel of how the bounds might grow in order to propose a guess for $f$ , try to close the induction for that $f$ , and tweak the choice if it didn’t quite work.

We have a forthcoming paper (with Timothy Gowers) with some significant improvements to the results. Will hopefully be able to report more on this soon. (We will also correct the typos in Section 4 in the next revision of the ms.)

13 November, 2023 at 9:41 am

On a conjecture of Marton | What's new

[…] constant at most is unclear (and perhaps even false). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset in is a random variable taking […]

	Anonymous on Pointwise ergodic theorems for…
	Anonymous on 275A, Notes 3: The weak and st…
	Terence Tao on Pointwise ergodic theorems for…
	Terence Tao on Erratum for “An inverse…
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Pointwise ergodic theorems for…
	Anonymous on Erratum for “An inverse…
	Erratum for “A… on An inverse theorem for the Gow…
	Anonymous on Analysis II
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Twisted convolution and the se…
	Anonymous on A generalized Cauchy-Schwarz i…
	Notes on the B+B+t t… on Ultrafilters, nonstandard anal…
	Notes on the B+B+t t… on Soft analysis, hard analysis,…
	Anonymous on Two announcements: AI for Math…

Sumsets and entropy revisited

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

13 comments

Leave a comment Cancel reply

For commenters

Sumsets and entropy revisited

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

13 comments

Leave a comment Cancel reply

For commenters