Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “On a conjecture of Marton“. This paper establishes a version of the notorious Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):
Theorem 1 (Polynomial Freiman–Ruzsa conjecture) Letbe such that
. Then
can be covered by at most
translates of a subspace
of
of cardinality at most
.
The previous best known result towards this conjecture was by Konyagin (as communicated in this paper of Sanders), who obtained a similar result but with replaced by
for any
(assuming that say
to avoid some degeneracies as
approaches
, which is not the difficult case of the conjecture). The conjecture (with
replaced by an unspecified constant
) has a number of equivalent forms; see this survey of Green, and these papers of Lovett and of Green and myself for some examples; in particular, as discussed in the latter two references, the constants in the inverse
theorem are now polynomial in nature (although we did not try to optimize the constant).
The exponent here was the product of a large number of optimizations to the argument (our original exponent here was closer to
), but can be improved even further with additional effort (our current argument, for instance, allows one to replace it with
, but we decided to state our result using integer exponents instead).
In this paper we will focus exclusively on the characteristic case (so we will be cavalier in identifying addition and subtraction), but in a followup paper we will establish similar results in other finite characteristics.
Much of the prior progress on this sort of result has proceeded via Fourier analysis. Perhaps surprisingly, our approach uses no Fourier analysis whatsoever, being conducted instead entirely in “physical space”. Broadly speaking, it follows a natural strategy, which is to induct on the doubling constant . Indeed, suppose for instance that one could show that every set
of doubling constant
was “commensurate” in some sense to a set
of doubling constant at most
. One measure of commensurability, for instance, might be the Ruzsa distance
, which one might hope to control by
. Then one could iterate this procedure until doubling constant dropped below say
, at which point the conjecture is known to hold (there is an elementary argument that if
has doubling constant less than
, then
is in fact a subspace of
). One can then use several applications of the Ruzsa triangle inequality
There are a number of possible ways to try to “improve” a set of not too large doubling by replacing it with a commensurate set of better doubling. We note two particular potential improvements:
- (i) Replacing
with
. For instance, if
was a random subset (of density
) of a large subspace
of
, then replacing
with
usually drops the doubling constant from
down to nearly
(under reasonable choices of parameters).
- (ii) Replacing
with
for a “typical”
. For instance, if
was the union of
random cosets of a subspace
of large codimension, then replacing
with
again usually drops the doubling constant from
down to nearly
.
Unfortunately, there are sets where neither of the above two operations (i), (ii) significantly improves the doubling constant. For instance, if
is a random density
subset of
random translates of a medium-sized subspace
, one can check that the doubling constant stays close to
if one applies either operation (i) or operation (ii). But in this case these operations don’t actually worsen the doubling constant much either, and by applying some combination of (i) and (ii) (either intersecting
with a translate, or taking a sumset of
with itself) one can start lowering the doubling constant again.
This begins to suggest a potential strategy: show that at least one of the operations (i) or (ii) will improve the doubling constant, or at least not worsen it too much; and in the latter case, perform some more complicated operation to locate the desired doubling constant improvement.
A sign that this strategy might have a chance of working is provided by the following heuristic argument. If has doubling constant
, then the Cartesian product
has doubling constant
. On the other hand, by using the projection map
defined by
, we see that
projects to
, with fibres
being essentially a copy of
. So, morally,
also behaves like a “skew product” of
and the fibres
, which suggests (non-rigorously) that the doubling constant
of
is also something like the doubling constant of
, times the doubling constant of a typical fibre
. This would imply that at least one of
and
would have doubling constant at most
, and thus that at least one of operations (i), (ii) would not worsen the doubling constant.
Unfortunately, this argument does not seem to be easily made rigorous using the traditional doubling constant; even the significantly weaker statement that has doubling constant at most
is false (see comments for more discussion). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset
in
is a random variable
taking values in
, and the analogue of the (logarithmic) doubling constant
is the entropic doubling constant
, where
are independent copies of
. If
is a random variable in some additive group
and
is a homomorphism, one then has what we call the fibring inequality
Applying this inequality with replaced by two independent copies
of itself, and using the addition map
for
, we obtain in particular that
A version of this endgame conclusion is in fact valid in any characteristic. But in characteristic , we can take advantage of the identity
To deal with the situation where the conditional mutual information is small but not completely zero, we have to use an entropic version of the Balog-Szemeredi-Gowers lemma, but fortunately this was already worked out in an old paper of mine (although in order to optimise the final constant, we ended up using a slight variant of that lemma).
I am planning to formalize this paper in the Lean4 language. Further discussion of this project will take place on this Zulip stream, and the project itself will be held at this Github repository.

34 comments
Comments feed for this article
13 November, 2023 at 9:48 am
Anonymous
Congrats on the result. The argument was a nice read.
13 November, 2023 at 10:32 am
Ruzsa distance and Ruzsa entropy warmup
[…] That earlier post would make a good warmup for reading Terence Tao’s latest post On a conjecture of Martin. […]
18 November, 2023 at 3:59 pm
mmshvartsman
Why 3 opinions negative??
13 November, 2023 at 10:57 am
Anonymous
Great read! Very excited to see that this will be formalised in Lean.
13 November, 2023 at 11:25 am
Marton’s “Polynomial Freiman-Ruzsa” Conjecture was Settled by the A-Team. | Combinatorics and more
[…] A subsequent paper of the same team will contain a proof for odd characteristics. (As far as I understand the conjecture over the integers remains a challenge.) Update: For more background and for some ideas of the proof see this post in Terry Tao’s blog. […]
13 November, 2023 at 12:04 pm
Ruzsa distance and Ruzsa entropy warmup - SoatDev IT Consulting
[…] That earlier post would make a good warmup for reading Terence Tao’s latest post On a conjecture of Martin. […]
13 November, 2023 at 2:04 pm
Mark Lewko
Congratulations on the remarkable result!
I’m curious what is known about lower bounds on the exponent in Theorem 1.1? In the other direction, given that you say the argument reflects a large number of optimizations, can the argument be substantially simplified if one doesn’t care about the exponent?
13 November, 2023 at 5:42 pm
Anonymous
I actually know very little about lower bounds; maybe someone else knows if there is literature on this. For the entropic version, the “standard” examples of random dense subsets or random dissociated sets show that C=11 cannot be less than 1. By taking n=1 and optimizing the distributions you can do a bit better — by my arithmetic you cannot improve C=11 to better than something like 2.34, using P[X_1=0] = P[X_2=0] = 0.89 or so. Presumably a computer search for n>1 could do better, though I have no reason to think the truth is close to our current constant.
Re the optimizations: for the most part they don’t make the argument significantly harder — it’s often a question of just not using wasteful bounds. For instance, when we see terms such as H[X_1]-H[X_2], we can bound this immediately by 2 d[X_1,X_2] — or more efficiently, we can try to cancel it with an H[X_2]-H[X_1] elsewhere.
It is probably a bit easier to see why the argument *deserves to work* by first trying to show something much cruder: for instance, show that given X_1,X_2 with d[X_1,X_2]=k, you can find X_1′, X_2′ such that d[X_1′, X_2′] <= 0.99 k and d[X_1,X_1'] + d[X_2, X_2'] <= 10000 k. The key steps are essentially the same, but you might get a clearer impression of which bounds are truly important, and which only matter for optimization. In particular, many quantities can be put in a bucket called "can be shown to be O(k) by some sequence of 'Ruzsa calculus' inequalities", and if you are willing to take such inequalities on trust, many estimates can be skipped.
14 November, 2023 at 5:07 pm
Terence Tao
I could imagine that one could work out the optimal constant for the 99% version of Theorem 1.2 could be evaluated exactly. That is, if
for small $\varepsilon$, find the best constant
for which we have
for some subgroup
. We get
(in our previous paper we had
), but in this perturbative situation where
is so close to
already it should be possible to get very precise information, for instance by using the fibring identity to quotient out
and using various perturbative analyses to discard anything that contributes
to the Ruzsa distance. (One obstacle perhaps is that an efficient 99% version of the entropic Balog-Szemeredi-Gowers lemma may be needed; we still lose a constant of 5 or so over the best lower bounds.) A tentative conjecture could then be that the asymptotic constant is actually the global constant for this inequality.
14 November, 2023 at 5:00 pm
Terence Tao
Actually we lucked out somewhat in that the optimizations mostly either kept the length of proof about the same, or increased it very slightly, or in some cases actually decreased the length of the argument (though perhaps that the expense of conceptual clarity). For instance we found it shorter and more quantitatively optimal to run a “show that the minimizer of this functional is well-behaved” type argument, rather than the contrapositive “either there is some way to decrease the functional, or good things happen” style of argument that is more common in additive combinatorics. There are some quantities that we take three or four lines to bound that we could bound in one line with a worse constant (mostly by appealing to off-the-shelf “Ruzsa calculus” inequalities such as the Ruzsa triangle inequality) but they don’t really simplify the global flow of the proof (all the versions of the argument we have involve starting with the fibring identity, analyzing when that doesn’t lead to an immediate win, and then performing an endgame).
15 November, 2023 at 2:00 pm
Mark Lewko
Thanks Terry (and the anonymous co-author) for the thoughtful reply.
The question in your heuristic summary above is a bit curious. I think you’re asking if |A+A| <= k|A| implies |4A| <= k^2 |2A|. If true, this would imply |4A| <=k^3|A|, which would be an improvement on PlĂĽnnecke–Ruzsa which gives that |4A| <=k^4|A|. I have always been under the impression that these sorts of PlĂĽnnecke–Ruzsa inequalities are sharp, however maybe that isn't so? In fact it isn't even immediately clear to me that |4A| <= k^{1.99} |2A| is false. This seems tangential to the entropic approach, but curious if you have sorted any of this out?
15 November, 2023 at 4:18 pm
Terence Tao
As far as I know, the optimal constants for pretty much all of the “Ruzsa calculus” inequalities remain undetermined; the best upper bounds and the best counterexamples usually do not have matching exponents. There were several papers by Ruzsa and others devoted to this topic over the years, but I do not know what the current state of the art is. Plunnecke’s inequality is sharp in some formulations, but not in the one regarding bounding the quadrupling constant by the doubling constant.
16 November, 2023 at 6:27 pm
Anonymous
such inequalities are sharp, up to multiplicative constants. this is just a minor generalization or Theorem 9.5 from some notes of Ruzsa (which Terry mentioned here a while ago: https://mathoverflow.net/questions/284398/optimality-of-the-plĂĽnnecke-ruzsa-inequality).
the construction (for fourth sums) is: take
to be the union of
lines each of length
(aligned along different axes) and a box with side lengths
. we have that
,
and
.
for higher iterated sums you essentially do the same. -ZH
18 November, 2023 at 2:05 pm
Terence Tao
Ha, I had forgotten that I had already looked into this question when writing the book with Van in 2006 (and when answering that MathOverflow question from 2017)! I’ll update the discussion in the blog post accordingly.
13 November, 2023 at 4:07 pm
Anonymous
Very very nice!
Do you think that this could have some links to the Katznelson conjecture?
This states that if (F_2)^n is partitioned into K sets A_1,…,A_K, then the union of all of the A_i+A_i contains a codimension O_K(1) subspace. So, if you have a Cayley graph over the Boolean hypercube with chromatic number at most K, then it has a codimension O_K(1) subspace which is independent (one can take the complement of the union of A_i+A_i to be the generating set).
I was wondering if some kind of techniques like this could be applied here. Probably the assumption that the A_i cover the cube could be weakened. If the A_i are all the same Niveau set then it’s false, but I suspect that this is the only obstacle in some sense. So you would move them around, and instead ask that the A_i cover 99% of a subspace or something. The A cap A+h is fine; translating an A_i doesn’t affect anything. But the A+A operation might be bad.
Still, I think it’s worth asking if you can do some set of operations to get some sort of reduction the way you do here.
14 November, 2023 at 12:03 pm
Anonymous
I meant “the A_i cover 99% of ANY subspace”, of course if the A_i cover 99% of a single subspace they could be Niveau set translates.
14 November, 2023 at 5:02 pm
Terence Tao
We’ve had trouble trying to adapt our methods to create subspaces inside sumsets (i.e., Bogulybov type theorems). It might be possible to create such spaces inside iterated sumsets (involving sums of
or so copies of the sets involved) but that is weaker than what is needed for most applications including this one. My feeling is that we lucked out to some extent that a pure entropy method was able to handle PFR; for these other problems I imagine one would have to augment this method with some other method or idea.
14 November, 2023 at 12:17 am
Anonymous
This is awesome. One of the top questions that I had wanted to see an answer to.
The paper states that there is a future paper that will resolve for F_p with odd characteristic? Does it also extend to other finite groups such as integers modulo a composite integer n?
14 November, 2023 at 3:26 am
Zach Hunter
I am fairly sure that it should also work for groups of bounded exponent. the reason is that there is a reasonable version of “the Endgame” which should be reachable (presumably, as I haven’t seen their exact definition of partite entropy). gun to my head, I bet one could bash out a proof of the bounded exponent case based off this paper within the next month. but I look forward to their presentation of the general argument.
14 November, 2023 at 5:10 pm
Terence Tao
Good question! Our current argument does rely at one key point on the exponent being prime (and in fact on being odd – our more general argument doesn’t directly cover the
case, though it only requires a slight modification here), but there are a lot of different things to try, and it is possible that by being a little more sophisticated one can handle composite exponents as well.
14 November, 2023 at 12:29 am
Zach Hunter
I will mention: an important implicit corollary of your argument is that if
, then
contains a subspace
with
where
with
(actually, using PFR as a blackbox, you can take
or a bit better, but you give a fully combinatorial argument). by a simple (but in my opinion, non-obvious) bootstrapping argument from my Master’s thesis, results of this form are closely related to finding subspaces inside
(i.e., the usual Bogolyubov-Ruzsa-type results).
for example, if you could take
(with the two implicit constants in
potentially changing), then this should imply some exponent improvement to the result Sanders, which has lasted ~15 years now. and presumably, for each
, we should be able to take
and get such a result (again letting the constants in
change wrt
). this would imply that
contains subspaces
with
for all
.
as you mentioned, the argument presented on arXiv is optimized to minimize the exponent for PFR. if you are aware of older arguments, which allow you to reduce the number of steps you take where
is a “non-fibre step” (meaning that their supports are not contained in the supports of
, but rather their sumsets), do let me know! basically you want to say that
for all of the non-fibre steps from your paper, then there is some reweighting of the support where
and
.
14 November, 2023 at 5:19 pm
Terence Tao
I can certainly believe that our arguments can be combined with other combinatorial bootstrap arguments to give results like the ones you mentioned. Nice observation!
All of the previous (working) iterations of our argument use the same basic strategy: fibre, grab quick wins if possible, if not proceed to endgame. If one doesn’t write the argument in terms of optimizing our functional
but instead proceeds by a more traditional “if this happens, then we get this win; otherwise we do this to get another type of win” type of “distance decrement” argument then I can certainly imagine by tweaking the weights to favor one sort of win over another one can do better, though we found that our constants tended to degrade when writing the argument in such a form.
18 November, 2023 at 3:46 pm
Formalizing the proof of PFR in Lean4 using Blueprint: a short tour | What's new
[…] the release of my preprint with Tim, Ben, and Freddie proving the Polynomial Freiman-Ruzsa (PFR) conjecture over , I have started a collaborative project […]
26 November, 2023 at 7:08 pm
Jyun-Jie Liao
Congratulations on the amazing result! The proof is so elegant!
By the way, I tried the calculation a bit and I think the exponent can be improved to exactly 11 (if I didn’t do anything wrong). In the endgame, the choice of
is
conditioned on
, for
being a permutation of
, and I think the bound for
can be improved a little bits by the following argument.
First of all, Lemma 5.2 shows that
so by taking
and
we get that
On the other hand, by Lemma 5.1 we have
And another formula in Lemma 5.2 shows that
so by taking
we get
Then apply similar arguments for all 6 choices of ordered pairs in {U,V,W}, and take the average of all the inequalities. The LHS is exactly the average of
. For the RHS, first observe that the terms without
and
is exactly
by Corollary 4.2. For the terms involving
and
, it has been shown in Section 5 and Section 6 that their sum is bounded by
(Here we use the fact that
to simplify the notation a bit.)
is at most
. The maximum of the upperbound for
again occurs at
and is
, so taking any
suffices.
Therefore the average of
3 December, 2023 at 8:09 am
Terence Tao
Very nice! It is great that one can use the fibring identity one last time to squeeze an additional gain in the constants.
I talked with my coauthors and we would like to include this improvement as an appendix to the paper, with acknowledgment to you of course for providing the argument. Would you be agreeable to that?
On a separate note, we have wondered if it may be amenable to use computer-assisted methods to try to optimize the constant here; the idea would be to write down a large number of entropic quantities and use the Shannon entropy inequalities (and all of their consequences) to generate a huge list of linear inequalities between them, at which point the problem is “just” a linear programming optimization. However, some thought would need to go into avoiding the combinatorial explosion (and in focusing on “useful” inequalities rather than “true, but too weak for applications” inequalities). Making some symmetry assumptions during the search (e.g., that
) may cut down on the complexity somewhat. A related task would be to try to optimize the constants in the entropic Balog-Szemeredi-Gowers inequality.
3 December, 2023 at 6:14 pm
Jyun-Jie Liao
Of course, it’s my pleasure!
6 December, 2023 at 9:46 am
Anonymous
Does the proof indicate a way to think about constructing H (the covering subgroup)?
15 December, 2023 at 9:57 am
Anonymous
“this paper of Sanders” has no link.
[Corrected, thanks – T.]
18 December, 2023 at 2:44 am
Anonymous
Great result! Does this proof immediately generalize to Z^d so that one can get a bunch of nice applications outlined by Chang in http://www.numdam.org/item/10.1016/j.crma.2009.04.006.pdf?
20 December, 2023 at 10:54 am
Terence Tao
Somewhat weirdly, only half of the proof seems to generalize to the torsion-free setting; one can set up the same entropy decrement strategy, and for random variables that extremize (or almost extremize) a suitable distance-like functional, the “first estimate” and “second estimate” arguments go through and establish various non-trivial approximate conditional independence relations between various linear combinations of copies of the random variables. However the “endgame” step relies very heavily on finite characteristic, and we do not know what to replace it with. One indication that something very different has to be done in this step is that the conclusion is different: in bounded characteristic, the extremizers are expected to be uniformly distributed on cosets of subgroups, but in
the extremizer should instead be (discrete) gaussians. There are intriguing hints that gaussian behavior might be detectable via entropic means (for instance, a real (continuous) gaussian
is more or less characterized by the property that
are independent when
are copies of
), but we have not been able to convert this intuition into an actual argument.
8 February, 2024 at 8:57 pm
Marcel Goh
In the appendix, it is stated that Lemma A.2 is related to Lemma 3.3 of your older paper. Does it directly imply (possibly a stronger version of) Lemma 3.3, or perhaps Theorem 3.1 directly? I am not quite sure how to relate the two statements, since the conditioning is slightly different (one is on
and the other is on
). Perhaps I’m missing something obvious.
[The two are not directly comparable, since as you say the conditioning is slightly different, but both are useful in basically the same way in applications, although the verison we have in the new paper gives slightly better constants in practice. -T]
9 February, 2024 at 7:34 am
Marcel Goh
Elaborating on my earlier comment, letting
and
be conditionally independent trials of
, I think I can show that
assuming Lemma A.2 as well as (24) and (25) from Theorem 3.1 in the older paper. This is not exactly (28), and I’m not sure how to transform the
into a
, since
gives us a lot more information than
from the coupling $X_1+Y_1 = X_2+Y_2$. (At least I think it does.)
4 April, 2024 at 2:06 pm
Marton’s conjecture in abelian groups with bounded torsion | What's new
[…] had previously established the case of this result, with the number of translates bounded by (which was subsequently […]
22 June, 2024 at 6:05 pm
An abridged proof of Marton’s conjecture | What's new
[…] Timothy Gowers, Ben Green, Freddie Manners, and I were able to establish the following […]