Marton’s conjecture in abelian groups with bounded torsion

4 April, 2024 in math.CO, paper | Tags: additive combinatorics, Ben Green, Freddie Manners, Polynomial Freiman-Ruzsa conjecture, Shannon entropy, Timothy Gowers | by Terence Tao

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let ${G = (G,+)}$ be an abelian ${m}$ -torsion group (thus, ${mx=0}$ for all ${x \in G}$ ), and let ${A \subset G}$ be such that ${|A+A| \leq K|A|}$ . Then ${A}$ can be covered by at most ${(2K)^{O(m^3)}}$ translates of a subgroup ${H}$ of ${G}$ of cardinality at most ${|A|}$ . Moreover, ${H}$ is contained in ${\ell A - \ell A}$ for some ${\ell \ll (2 + m \log K)^{O(m^3 \log m)}}$ .

We had previously established the ${m=2}$ case of this result, with the number of translates bounded by ${(2K)^{12}}$ (which was subsequently improved to ${(2K)^{11}}$ by Jyun-Jie Liao), but without the additional containment ${H \subset \ell A - \ell A}$ . It remains a challenge to replace ${\ell}$ by a bounded constant (such as ${2}$ ); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The ${m=2}$ result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic ${2}$ to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to ${m}$ -dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let ${G}$ be an abelian ${m}$ -torsion group, and let ${X_1,\dots,X_m}$ be independent finitely supported random variables on ${G}$ , such that

$\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,$

where ${{\bf H}}$ denotes Shannon entropy. Then there is a uniform random variable ${U_H}$ on a subgroup ${H}$ of ${G}$ such that

$\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,$

where ${d}$ denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the ${X_i}$ take values in some symmetric set ${S}$ , then ${H}$ lies in ${\ell S}$ for some ${\ell \ll (2 + \log K)^{O(m^3 \log m)}}$ .

As a first approximation, one should think of all the ${X_i}$ as identically distributed, and having the uniform distribution on ${A}$ , as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the ${X_i}$ separately. It also is technically convenient to work with ${m}$ independent variables, rather than just a pair of variables as we did in the ${m=2}$ case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications ${X'_1,\dots,X'_m}$ of ${X_1,\dots,X_m}$ that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

$\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]$

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the ${X_i}$ are all translates of a uniform random variable ${U_H}$ on a subgroup ${H}$ ). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent ${12}$ , but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the ${m=2}$ case, albeit with somewhat worse exponents).
As before, we search for such improved random variables ${X'_1,\dots,X'_m}$ by introducing more independent random variables – we end up taking an array of ${m^2}$ random variables ${Y_{i,j}}$ for ${i,j=1,\dots,m}$ , with each ${Y_{i,j}}$ a copy of ${X_i}$ , and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums ${\sum_j Y_{i,j}}$ of our array will end up being close to independent of the column sums ${\sum_i Y_{i,j}}$ , subject to conditioning on the total sum ${\sum_{i,j} Y_{i,j}}$ . Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if ${X,Y}$ are two independent copies of a Gaussian random variable, then ${X+Y}$ and ${X-Y}$ are also independent of each other.
Up until now, the argument does not use the ${m}$ -torsion hypothesis, nor the fact that we work with an ${m \times m}$ array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

$\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.$

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum ${\sum_{i,j} Y_{i,j}}$ ). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable ${U}$ of small entropic doubling that is reasonably close to all of the ${X_i}$ in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

11 comments

Comments feed for this article

4 April, 2024 at 2:12 pm

Anonymous

first again!

4 April, 2024 at 4:26 pm

Anonymous

Trivial typo: “Bogulyubov” should be “Bogolyubov”, I think.

[Corrected, thanks – T.]

4 April, 2024 at 8:35 pm

shacharlovett

In theorem 2, should the bound in the conclusion depend on K? I believe it should be m^3 (log K) instead of m^3

[Corrected, thanks – T.]

4 April, 2024 at 10:05 pm

The URL in the footnote on page 2 is overfull.

[Thanks, this will be fixed in the next revision of the ms. -T]

8 April, 2024 at 11:12 am

Anonymous

I found this explanation of theorem 2 by GPT4 hilarious:

Alright, let’s break down this theorem into more digestible pieces. We’re venturing into a realm of mathematics that deals with groups, randomness, and information theory, but I’ll keep it as simple as possible.

The Setting: Imagine you have a group of objects that follow a specific mathematical structure called an “abelian group.” This group has a special property: if you combine (technically, perform the group operation on) any two elements, you can do it in any order without changing the outcome (like adding numbers). The “m-torsion” part means that if you combine any element with itself (m) times, you’ll end up back where you started, kind of like walking in a circle and ending up at the starting point after (m) steps.
The Characters: In our story, we have (m) random variables, let’s call them “mystery boxes,” (X_1) to (X_m), each containing some element from our group, chosen according to some probability distribution. These boxes are independent of each other, meaning what you find in one box doesn’t affect what you find in another.
The Plot Twist – Shannon Entropy: Shannon entropy (({bf H})) is a way to measure how “surprising” the contents of a mystery box are. If a box always contains the same item, there’s no surprise, and the entropy is low. If you never know what you’ll find, the entropy is high. The equation given compares the surprise of finding the sum of all (m) boxes’ contents to the average surprise of individual boxes, stating it shouldn’t exceed a certain limit ((log K)).
The Hero – Uniform Random Variable: The theorem claims there exists a hero in our story, a uniform random variable, which is like a perfectly fair mystery box ((U_H)) that equally likely contains any element from a subgroup (H) of our original group. This subgroup is a smaller circle you can walk in without stepping outside the larger circle.
The Conclusion: It tells us that, on average, the distance between what we find in our original (m) mystery boxes and what we’d find in our perfectly fair box is not too far, under a specific measure called the entropic Ruzsa distance. This distance measures how different two probability distributions are in a specific entropy-related sense.
The Bonus: If all our original mystery boxes only ever contained items from a specific set (S), then our hero, the subgroup (H), doesn’t stray far from home. It’s contained within a certain number of steps ((ell)) from (S), with (ell) being not too large, depending on (K), (m), and some mathematics magic.

In simpler terms, this theorem tells us about the relationship between groups, randomness, and information, showing under certain conditions, we can predict the average “surprise” of combining random elements from a group with a fair degree of certainty.

9 April, 2024 at 11:01 am

Anonymous

Where can I learn these topics, professor? Are there any particular books?

12 April, 2024 at 1:47 pm

Anonymous

discrete Gaussians also have maximum entropy (of random variables supported on Z) for a given variance.

Is there a succinct way to state what kind of CLT you need?

19 April, 2024 at 1:07 pm

Terence Tao

Roughly speaking, what seems to be needed is an assertion to the effect that if a discrete random variable $X$ behaves like a discrete Gaussian in the sense that all the entropy statistics such as ${\bf H}(X)$ , ${\bf H}(X_1+X_2)$ , ${\bf H}(X_1-X_2)$ , ${\bf H}(X_1+X_2,X_1-X_2)$ etc. (with $X_1,X_2,\dots$ being independent copies of $X$ ) are almost equal to what they would be if $X$ were indeed a discrete Gaussian, then $X$ is in fact close to such a discrete Gaussian in a Ruzsa distance sense.

For instance, one feature of continuous Gaussian variables $X$ is that $X_1+X_2$ and $X_1-X_2$ are independent; discrete Gaussian variables (that are sufficiently “nondegenerate”) approximately behave this way also. Is there an inverse theorem to this effect? I vaguely recall something of this nature in the continuous setting under enough hypotheses, but it is not clear what the discrete analogue is. (One problem is that the notion of “discrete Gaussian” needs to be invariant under Freiman homomorphisms; in particular, one is not necessarily discretizing with respect to a standard lattice such as ${\bf Z}^d$ . Among other things, this makes it unlikely that the classical notion of variance will be of much relevance, at least until one has somehow normalized away the Freiman homomorphism invariance.)

19 April, 2024 at 12:47 pm

Two announcements: AI for Math resources, and erdosproblems.com | What's new

[…] I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my […]

28 April, 2024 at 11:00 am

Anonymous

In appendix C of the preprint, there is a passage where I’m not sure if it’s just a bunch of typos that are confusing me or whether there is some less trivial issue: page 31, line 12, first of all I guess it should say “subspace of $0 \times H_1$ ” instead of “ $0 \times H_0$ “. However, if “ $0 \times H_1$ ” is indeed what it should say, then letting $W$ be such a complement of $0 \times H_1$ in $H$ , we have that for each $(x,y)$ in $H$ there are unique $w=(w_1,w_2)\in W$ and $z \in H_1$ such that $(x,y)=(0,z)+w$ , so in particular $w_1=x$ and $z=y-w_2$ . Now the main thing I was wondering about is how the linear map $M_0$ can be retrieved from this as is claimed. First, let me mention by the way that on lines 13 and 14 of page 31 there are typos with this linear map that make the passage not easy to parse: on line 13, the $Mx$ should be $M_0x$ right? And then on line 14 the domain of $M_0$ should be $H_0$ , not $H_1$ , right? But then the main question: as mentioned above, we have a unique expression $(x,y)=(0,z)+w$ , but this is unique for $(x,y)$ in $H$ , not for $x \in H_0$ . So how do we define a linear map $M_0$ on $H_0$ when, given $x \in H_0$ , there are actually more than one $y$ such that $(x,y) \in H$ ? Sorry in advance if I missed something and this is all trivial.

28 April, 2024 at 12:30 pm

Anonymous

Oh I see, the fact that $W$ is a complement of the “vertical” subspace $0 \times H_1$ already implies that $W$ is a graph of a function, and this function must then be linear.

	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Ring Theory Intervie… on Reading seminar: “Stable…
	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…

Marton’s conjecture in abelian groups with bounded torsion

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

11 comments

Leave a comment Cancel reply

For commenters

Marton’s conjecture in abelian groups with bounded torsion

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

11 comments

Leave a comment Cancel reply

For commenters