Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):
Theorem 1 (Marton’s conjecture) Let be an abelian -torsion group (thus, for all ), and let be such that . Then can be covered by at most translates of a subgroup of of cardinality at most . Moreover, is contained in for some .
We had previously established the case of this result, with the number of translates bounded by (which was subsequently improved to by Jyun-Jie Liao), but without the additional containment . It remains a challenge to replace by a bounded constant (such as ); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to -dependent constants, is actually equivalent to the above theorem):
Theorem 2 (Marton’s conjecture, entropy form) Let be an abelian -torsion group, and let be independent finitely supported random variables on , such that
where denotes Shannon entropy. Then there is a uniform random variable on a subgroup of such that
where denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the take values in some symmetric set , then lies in for some .
As a first approximation, one should think of all the as identically distributed, and having the uniform distribution on , as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the separately. It also is technically convenient to work with independent variables, rather than just a pair of variables as we did in the case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications of that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”
which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the are all translates of a uniform random variable on a subgroup ). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent , but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the case, albeit with somewhat worse exponents).
As before, we search for such improved random variables by introducing more independent random variables – we end up taking an array of random variables for , with each a copy of , and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums of our array will end up being close to independent of the column sums , subject to conditioning on the total sum . Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if are two independent copies of a Gaussian random variable, then and are also independent of each other.
Up until now, the argument does not use the -torsion hypothesis, nor the fact that we work with an array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity
In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum ). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable of small entropic doubling that is reasonably close to all of the in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).
11 comments
Comments feed for this article
4 April, 2024 at 2:12 pm
Anonymous
first again!
4 April, 2024 at 4:26 pm
Anonymous
Trivial typo: “Bogulyubov” should be “Bogolyubov”, I think.
[Corrected, thanks – T.]
4 April, 2024 at 8:35 pm
shacharlovett
In theorem 2, should the bound in the conclusion depend on K? I believe it should be m^3 (log K) instead of m^3
[Corrected, thanks – T.]
4 April, 2024 at 10:05 pm
YI
The URL in the footnote on page 2 is overfull.
[Thanks, this will be fixed in the next revision of the ms. -T]
8 April, 2024 at 11:12 am
Anonymous
I found this explanation of theorem 2 by GPT4 hilarious:
Alright, let’s break down this theorem into more digestible pieces. We’re venturing into a realm of mathematics that deals with groups, randomness, and information theory, but I’ll keep it as simple as possible.
In simpler terms, this theorem tells us about the relationship between groups, randomness, and information, showing under certain conditions, we can predict the average “surprise” of combining random elements from a group with a fair degree of certainty.
9 April, 2024 at 11:01 am
Anonymous
Where can I learn these topics, professor? Are there any particular books?
12 April, 2024 at 1:47 pm
Anonymous
discrete Gaussians also have maximum entropy (of random variables supported on Z) for a given variance.
Is there a succinct way to state what kind of CLT you need?
19 April, 2024 at 1:07 pm
Terence Tao
Roughly speaking, what seems to be needed is an assertion to the effect that if a discrete random variable behaves like a discrete Gaussian in the sense that all the entropy statistics such as , , , etc. (with being independent copies of ) are almost equal to what they would be if were indeed a discrete Gaussian, then is in fact close to such a discrete Gaussian in a Ruzsa distance sense.
For instance, one feature of continuous Gaussian variables is that and are independent; discrete Gaussian variables (that are sufficiently “nondegenerate”) approximately behave this way also. Is there an inverse theorem to this effect? I vaguely recall something of this nature in the continuous setting under enough hypotheses, but it is not clear what the discrete analogue is. (One problem is that the notion of “discrete Gaussian” needs to be invariant under Freiman homomorphisms; in particular, one is not necessarily discretizing with respect to a standard lattice such as . Among other things, this makes it unlikely that the classical notion of variance will be of much relevance, at least until one has somehow normalized away the Freiman homomorphism invariance.)
19 April, 2024 at 12:47 pm
Two announcements: AI for Math resources, and erdosproblems.com | What's new
[…] I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my […]
28 April, 2024 at 11:00 am
Anonymous
In appendix C of the preprint, there is a passage where I’m not sure if it’s just a bunch of typos that are confusing me or whether there is some less trivial issue: page 31, line 12, first of all I guess it should say “subspace of ” instead of ““. However, if “” is indeed what it should say, then letting be such a complement of in , we have that for each in there are unique and such that , so in particular and . Now the main thing I was wondering about is how the linear map can be retrieved from this as is claimed. First, let me mention by the way that on lines 13 and 14 of page 31 there are typos with this linear map that make the passage not easy to parse: on line 13, the should be right? And then on line 14 the domain of should be , not , right? But then the main question: as mentioned above, we have a unique expression , but this is unique for in , not for . So how do we define a linear map on when, given , there are actually more than one such that ? Sorry in advance if I missed something and this is all trivial.
28 April, 2024 at 12:30 pm
Anonymous
Oh I see, the fact that is a complement of the “vertical” subspace already implies that is a graph of a function, and this function must then be linear.