You are currently browsing the category archive for the ‘paper’ category.

The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers {U^{s+1}[N]}-norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.

Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree {s} filtration

\displaystyle  G = G_0 \geq G_1 \geq \dots \geq G_s \geq G_{s+1} = \{1\}

of a group {G}, which is a nested sequence of subgroups that obey the relation {[G_i,G_j] \leq G_{i+j}} for all {i,j}. The weaker notion (sometimes known as a prefiltration) permits the group {G_1} to be strictly smaller than {G_0}, while the stronger notion requires {G_0} and {G_1} to equal. In practice, one can often move between the two concepts, as {G_1} is always normal in {G_0}, and a prefiltration behaves like a filtration on every coset of {G_1} (after applying a translation and perhaps also a conjugation). However, we did not clarify this issue sufficiently in the paper, and there are some places in the text where results that were only proven for filtrations were applied for prefiltrations. The erratum fixes this issues, mostly by clarifying that we work with filtrations throughout (which requires some decomposition into cosets in places where prefiltrations are generated). Similar adjustments need to be made for multidegree filtrations and degree-rank filtrations, which we also use heavily on our paper.

In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group {G_s} was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter {h}).

Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let {G = (G,+)} be an abelian {m}-torsion group (thus, {mx=0} for all {x \in G}), and let {A \subset G} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {(2K)^{O(m^3)}} translates of a subgroup {H} of {G} of cardinality at most {|A|}. Moreover, {H} is contained in {\ell A - \ell A} for some {\ell \ll (2 + m \log K)^{O(m^3 \log m)}}.

We had previously established the {m=2} case of this result, with the number of translates bounded by {(2K)^{12}} (which was subsequently improved to {(2K)^{11}} by Jyun-Jie Liao), but without the additional containment {H \subset \ell A - \ell A}. It remains a challenge to replace {\ell} by a bounded constant (such as {2}); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The {m=2} result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic {2} to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to {m}-dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let {G} be an abelian {m}-torsion group, and let {X_1,\dots,X_m} be independent finitely supported random variables on {G}, such that

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,

where {{\bf H}} denotes Shannon entropy. Then there is a uniform random variable {U_H} on a subgroup {H} of {G} such that

\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,

where {d} denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the {X_i} take values in some symmetric set {S}, then {H} lies in {\ell S} for some {\ell \ll (2 + \log K)^{O(m^3 \log m)}}.

As a first approximation, one should think of all the {X_i} as identically distributed, and having the uniform distribution on {A}, as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the {X_i} separately. It also is technically convenient to work with {m} independent variables, rather than just a pair of variables as we did in the {m=2} case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications {X'_1,\dots,X'_m} of {X_1,\dots,X_m} that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the {X_i} are all translates of a uniform random variable {U_H} on a subgroup {H}). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent {12}, but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the {m=2} case, albeit with somewhat worse exponents).
As before, we search for such improved random variables {X'_1,\dots,X'_m} by introducing more independent random variables – we end up taking an array of {m^2} random variables {Y_{i,j}} for {i,j=1,\dots,m}, with each {Y_{i,j}} a copy of {X_i}, and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums {\sum_j Y_{i,j}} of our array will end up being close to independent of the column sums {\sum_i Y_{i,j}}, subject to conditioning on the total sum {\sum_{i,j} Y_{i,j}}. Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if {X,Y} are two independent copies of a Gaussian random variable, then {X+Y} and {X-Y} are also independent of each other.
Up until now, the argument does not use the {m}-torsion hypothesis, nor the fact that we work with an {m \times m} array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum {\sum_{i,j} Y_{i,j}}). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable {U} of small entropic doubling that is reasonably close to all of the {X_i} in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “On a conjecture of Marton“. This paper establishes a version of the notorious Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Polynomial Freiman–Ruzsa conjecture) Let {A \subset {\bf F}_2^n} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {2K^{12}} translates of a subspace {H} of {{\bf F}_2^n} of cardinality at most {A}.

The previous best known result towards this conjecture was by Konyagin (as communicated in this paper of Sanders), who obtained a similar result but with {2K^{12}} replaced by {\exp(O_\varepsilon(\log^{3+\varepsilon} K))} for any {\varepsilon>0} (assuming that say {K \geq 3/2} to avoid some degeneracies as {K} approaches {1}, which is not the difficult case of the conjecture). The conjecture (with {12} replaced by an unspecified constant {C}) has a number of equivalent forms; see this survey of Green, and these papers of Lovett and of Green and myself for some examples; in particular, as discussed in the latter two references, the constants in the inverse {U^3({\bf F}_2^n)} theorem are now polynomial in nature (although we did not try to optimize the constant).

The exponent {12} here was the product of a large number of optimizations to the argument (our original exponent here was closer to {1000}), but can be improved even further with additional effort (our current argument, for instance, allows one to replace it with {7+\sqrt{17} = 11.123\dots}, but we decided to state our result using integer exponents instead).

In this paper we will focus exclusively on the characteristic {2} case (so we will be cavalier in identifying addition and subtraction), but in a followup paper we will establish similar results in other finite characteristics.

Much of the prior progress on this sort of result has proceeded via Fourier analysis. Perhaps surprisingly, our approach uses no Fourier analysis whatsoever, being conducted instead entirely in “physical space”. Broadly speaking, it follows a natural strategy, which is to induct on the doubling constant {|A+A|/|A|}. Indeed, suppose for instance that one could show that every set {A} of doubling constant {K} was “commensurate” in some sense to a set {A'} of doubling constant at most {K^{0.99}}. One measure of commensurability, for instance, might be the Ruzsa distance {\log \frac{|A+A'|}{|A|^{1/2} |A'|^{1/2}}}, which one might hope to control by {O(\log K)}. Then one could iterate this procedure until doubling constant dropped below say {3/2}, at which point the conjecture is known to hold (there is an elementary argument that if {A} has doubling constant less than {3/2}, then {A+A} is in fact a subspace of {{\bf F}_2^n}). One can then use several applications of the Ruzsa triangle inequality

\displaystyle  \log \frac{|A+C|}{|A|^{1/2} |C|^{1/2}} \leq \log \frac{|A+B|}{|A|^{1/2} |B|^{1/2}} + \log \frac{|B+C|}{|B|^{1/2} |C|^{1/2}}

to conclude (the fact that we reduce {K} to {K^{0.99}} means that the various Ruzsa distances that need to be summed are controlled by a convergent geometric series).

There are a number of possible ways to try to “improve” a set {A} of not too large doubling by replacing it with a commensurate set of better doubling. We note two particular potential improvements:

  • (i) Replacing {A} with {A+A}. For instance, if {A} was a random subset (of density {1/K}) of a large subspace {H} of {{\bf F}_2^n}, then replacing {A} with {A+A} usually drops the doubling constant from {K} down to nearly {1} (under reasonable choices of parameters).
  • (ii) Replacing {A} with {A \cap (A+h)} for a “typical” {h \in A+A}. For instance, if {A} was the union of {K} random cosets of a subspace {H} of large codimension, then replacing {A} with {A \cap (A+h)} again usually drops the doubling constant from {K} down to nearly {1}.

Unfortunately, there are sets {A} where neither of the above two operations (i), (ii) significantly improves the doubling constant. For instance, if {A} is a random density {1/\sqrt{K}} subset of {\sqrt{K}} random translates of a medium-sized subspace {H}, one can check that the doubling constant stays close to {K} if one applies either operation (i) or operation (ii). But in this case these operations don’t actually worsen the doubling constant much either, and by applying some combination of (i) and (ii) (either intersecting {A+A} with a translate, or taking a sumset of {A \cap (A+h)} with itself) one can start lowering the doubling constant again.

This begins to suggest a potential strategy: show that at least one of the operations (i) or (ii) will improve the doubling constant, or at least not worsen it too much; and in the latter case, perform some more complicated operation to locate the desired doubling constant improvement.

A sign that this strategy might have a chance of working is provided by the following heuristic argument. If {A} has doubling constant {K}, then the Cartesian product {A \times A} has doubling constant {K^2}. On the other hand, by using the projection map {\pi: {\bf F}_2^n \times {\bf F}_2^n \rightarrow {\bf F}_2^n} defined by {\pi(x,y) := x+y}, we see that {A \times A} projects to {\pi(A \times A) = A+A}, with fibres {\pi^{-1}(\{h\})} being essentially a copy of {A \cap (A+h)}. So, morally, {A \times A} also behaves like a “skew product” of {A+A} and the fibres {A \cap (A+h)}, which suggests (non-rigorously) that the doubling constant {K^2} of {A \times A} is also something like the doubling constant of {A + A}, times the doubling constant of a typical fibre {A \cap (A+h)}. This would imply that at least one of {A +A} and {A \cap (A+h)} would have doubling constant at most {K}, and thus that at least one of operations (i), (ii) would not worsen the doubling constant.

Unfortunately, this argument does not seem to be easily made rigorous using the traditional doubling constant; even the significantly weaker statement that {A+A} has doubling constant at most {K^2} is false (see comments for more discussion). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset {A} in {{\bf F}_2^n} is a random variable {X} taking values in {{\bf F}_2^n}, and the analogue of the (logarithmic) doubling constant {\log \frac{|A+A|}{|A|}} is the entropic doubling constant {d(X;X) := {\bf H}(X_1+X_2)-{\bf H}(X)}, where {X_1,X_2} are independent copies of {X}. If {X} is a random variable in some additive group {G} and {\pi: G \rightarrow H} is a homomorphism, one then has what we call the fibring inequality

\displaystyle  d(X;X) \geq d(\pi(X);\pi(X)) + d(X|\pi(X); X|\pi(X)),

where the conditional doubling constant {d(X|\pi(X); X|\pi(X))} is defined as

\displaystyle  d(X|\pi(X); X|\pi(X)) = {\bf H}(X_1 + X_2 | \pi(X_1), \pi(X_2)) - {\bf H}( X | \pi(X) ).

Indeed, from the chain rule for Shannon entropy one has

\displaystyle  {\bf H}(X) = {\bf H}(\pi(X)) + {\bf H}(X|\pi(X))

and

\displaystyle  {\bf H}(X_1+X_2) = {\bf H}(\pi(X_1) + \pi(X_2)) + {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2))

while from the non-negativity of conditional mutual information one has

\displaystyle  {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2)) \geq {\bf H}(X_1 + X_2|\pi(X_1), \pi(X_2))

and it is an easy matter to combine all these identities and inequalities to obtain the claim.

Applying this inequality with {X} replaced by two independent copies {(X_1,X_2)} of itself, and using the addition map {(x,y) \mapsto x+y} for {\pi}, we obtain in particular that

\displaystyle  2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1,X_2|X_1+X_2; X_1,X_2|X_1+X_2)

or (since {X_2} is determined by {X_1} once one fixes {X_1+X_2})

\displaystyle  2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2).

So if {d(X;X) = \log K}, then at least one of {d(X_1+X_2; X_1+X_2)} or {d(X_1|X_1+X_2; X_1|X_1+X_2)} will be less than or equal to {\log K}. This is the entropy analogue of at least one of (i) or (ii) improving, or at least not degrading the doubling constant, although there are some minor technicalities involving how one deals with the conditioning to {X_1+X_2} in the second term {d(X_1|X_1+X_2; X_1|X_1+X_2)} that we will gloss over here (one can pigeonhole the instances of {X_1} to different events {X_1+X_2=x}, {X_1+X_2=x'}, and “depolarise” the induction hypothesis to deal with distances {d(X;Y)} between pairs of random variables {X,Y} that do not necessarily have the same distribution). Furthermore, we can even calculate the defect in the above inequality: a careful inspection of the above argument eventually reveals that

\displaystyle  2 d(X;X) = d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2)

\displaystyle  + {\bf I}( X_1 + X_2 : X_1 + X_3 | X_1 + X_2 + X_3 + X_4)

where we now take four independent copies {X_1,X_2,X_3,X_4}. This leads (modulo some technicalities) to the following interesting conclusion: if neither (i) nor (ii) leads to an improvement in the entropic doubling constant, then {X_1+X_2} and {X_2+X_3} are conditionally independent relative to {X_1+X_2+X_3+X_4}. This situation (or an approximation to this situation) is what we refer to in the paper as the “endgame”.

A version of this endgame conclusion is in fact valid in any characteristic. But in characteristic {2}, we can take advantage of the identity

\displaystyle  (X_1+X_2) + (X_2+X_3) = X_1 + X_3.

Conditioning on {X_1+X_2+X_3+X_4}, and using symmetry we now conclude that if we are in the endgame exactly (so that the mutual information is zero), then the independent sum of two copies of {(X_1+X_2|X_1+X_2+X_3+X_4)} has exactly the same distribution; in particular, the entropic doubling constant here is zero, which is certainly a reduction in the doubling constant.

To deal with the situation where the conditional mutual information is small but not completely zero, we have to use an entropic version of the Balog-Szemeredi-Gowers lemma, but fortunately this was already worked out in an old paper of mine (although in order to optimise the final constant, we ended up using a slight variant of that lemma).

I am planning to formalize this paper in the Lean4 language. Further discussion of this project will take place on this Zulip stream, and the project itself will be held at this Github repository.

I have just uploaded to the arXiv my paper “A Maclaurin type inequality“. This paper concerns a variant of the Maclaurin inequality for the elementary symmetric means

\displaystyle  s_k(y) := \frac{1}{\binom{n}{k}} \sum_{1 \leq i_1 < \dots < i_k \leq n} y_{i_1} \dots y_{i_k}

of {n} real numbers {y_1,\dots,y_n}. This inequality asserts that

\displaystyle  s_\ell(y)^{1/\ell} \leq s_k(y)^{1/k}

whenever {1 \leq k \leq \ell \leq n} and {y_1,\dots,y_n} are non-negative. It can be proven as a consequence of the Newton inequality

\displaystyle  s_{k-1}(y) s_{k+1}(y) \leq s_k(y)^2

valid for all {1 \leq k < n} and arbitrary real {y_1,\dots,y_n} (in particular, here the {y_i} are allowed to be negative). Note that the {k=1, n=2} case of this inequality is just the arithmetic mean-geometric mean inequality

\displaystyle  y_1 y_2 \leq (\frac{y_1+y_2}{2})^2;

the general case of this inequality can be deduced from this special case by a number of standard manipulations (the most non-obvious of which is the operation of differentiating the real-rooted polynomial {\prod_{i=1}^n (z-y_i)} to obtain another real-rooted polynomial, thanks to Rolle’s theorem; the key point is that this operation preserves all the elementary symmetric means up to {s_{n-1}}). One can think of Maclaurin’s inequality as providing a refined version of the arithmetic mean-geometric mean inequality on {n} variables (which corresponds to the case {k=1}, {\ell=n}).

Whereas Newton’s inequality works for arbitrary real {y_i}, the Maclaurin inequality breaks down once one or more of the {y_i} are permitted to be negative. A key example occurs when {n} is even, half of the {y_i} are equal to {+1}, and half are equal to {-1}. Here, one can verify that the elementary symmetric means {s_k(y)} vanish for odd {k} and are equal to { (-1)^{k/2} \frac{\binom{n/2}{k/2}}{\binom{n}{k}}} for even {k}. In particular, some routine estimation then gives the order of magnitude bound

\displaystyle  |s_k(y)|^{\frac{1}{k}} \asymp \frac{k^{1/2}}{n^{1/2}} \ \ \ \ \ (1)

for {0 < k \leq n} even, thus giving a significant violation of the Maclaurin inequality even after putting absolute values around the {s_k(y)}. In particular, vanishing of one {s_k(y)} does not imply vanishing of all subsequent {s_\ell(y)}.

On the other hand, it was observed by Gopalan and Yehudayoff that if two consecutive values {s_k(y), s_{k+1}(y)} are small, then this makes all subsequent values {s_\ell(y)} small as well. More precise versions of this statement were subsequently observed by Meka-Reingold-Tal and Doron-Hatami-Hoza, who obtained estimates of the shape

\displaystyle  |s_\ell(y)|^{\frac{1}{\ell}} \ll \ell^{1/2} \max (|s_k(y)|^{\frac{1}{k}}, |s_{k+1}(y)|^{\frac{1}{k+1}}) \ \ \ \ \ (2)

whenever {1 \leq k \leq \ell \leq n} and {y_1,\dots,y_n} are real (but possibly negative). For instance, setting {k=1, \ell=n} we obtain the inequality

\displaystyle  (y_1 \dots y_n)^{1/n} \ll n^{1/2} \max( |s_1(y)|, |s_2(y)|^2)

which can be established by combining the arithmetic mean-geometric mean inequality

\displaystyle  (y_1 \dots y_n)^{2/n} \leq \frac{y_1^2 + \dots + y_n^2}{n}

with the Newton identity

\displaystyle  y_1^2 + \dots + y_n^2 = n^2 s_1(y)^2 - n(n-1) s_2(y).

As with the proof of the Newton inequalities, the general case of (2) can be obtained from this special case after some standard manipulations (including the differentiation operation mentioned previously).

However, if one inspects the bound (2) against the bounds (1) given by the key example, we see a mismatch – the right-hand side of (2) is larger than the left-hand side by a factor of about {k^{1/2}}. The main result of the paper rectifies this by establishing the optimal (up to constants) improvement

\displaystyle  |s_\ell(y)|^{\frac{1}{\ell}} \ll \frac{\ell^{1/2}}{k^{1/2}} \max (|s_k(y)|^{\frac{1}{k}}, |s_{k+1}(y)|^{\frac{1}{k+1}}) \ \ \ \ \ (3)

of (2). This answers a question posed on MathOverflow.

Unlike the previous arguments, we do not rely primarily on the arithmetic mean-geometric mean inequality. Instead, our primary tool is a new inequality

\displaystyle  \sum_{m=0}^\ell \binom{\ell}{m} |s_m(y)| r^m \geq (1+ |s_\ell(y)|^{2/\ell} r^2)^{\ell/2}, \ \ \ \ \ (4)

valid for all {1 \leq \ell \leq n} and {r>0}. Roughly speaking, the bound (3) would follow from (4) by setting {r \asymp (k/\ell)^{1/2} |s_\ell(y)|^{-1/\ell}}, provided that we can show that the {m=k,k+1} terms of the left-hand side dominate the sum in this regime. This can be done, after a technical step of passing to tuples {y} which nearly optimize the required inequality (3).

We sketch the proof of the inequality (4) as follows. One can use some standard manipulations reduce to the case where {\ell=n} and {|s_n(y)|=1}, and after replacing {r} with {1/r} one is now left with establishing the inequality

\displaystyle  \sum_{m=0}^n \binom{n}{m} |s_m(y)| r^{n-m} \geq (1+r^2)^{n/2}.

Note that equality is attained in the previously discussed example with half of the {y_i} equal to {+1} and the other half equal to {-1}, thanks to the binomial theorem.

To prove this identity, we consider the polynomial

\displaystyle  \prod_{j=1}^n (z - y_j) = \sum_{m=0}^n (-1)^m \binom{n}{m} s_k(m) z^{n-m}.

Evaluating this polynomial at {ir}, taking absolute values, using the triangle inequality, and then taking logarithms, we conclude that

\displaystyle  \frac{1}{2} \sum_{j=1}^n \log(y_j^2 + r^2) \leq \log(\sum_{m=0}^n \binom{n}{m} |s_m(y)| r^{n-m}).

A convexity argument gives the lower bound

\displaystyle  \log(y_j^2 + r^2) \geq \log(1+r^2) + \frac{2}{1+r^2} \log |y_j|

while the normalization {|s_n(y)|=1} gives

\displaystyle  \sum_{j=1}^n \log |y_j| = 0

and the claim follows.

Rachel Greenfeld and I have just uploaded to the arXiv our paper “Undecidability of translational monotilings“. This is a sequel to our previous paper in which we constructed a translational monotiling {A \oplus F = {\bf Z}^d} of a high-dimensional lattice {{\bf Z}^d} (thus the monotile {F} is a finite set and the translates {a+F}, {a \in A} of {F} partition {{\bf Z}^d}) which was aperiodic (there is no way to “repair” this tiling into a periodic tiling {A' \oplus F = {\bf Z}^d}, in which {A'} is now periodic with respect to a finite index subgroup of {{\bf Z}^d}). This disproved the periodic tiling conjecture of Stein, Grunbaum-Shephard and Lagarias-Wang, which asserted that such aperiodic translational monotilings do not exist. (Compare with the “hat monotile“, which is a recently discovered aperiodic isometric monotile for of {{\bf R}^2}, where one is now allowed to use rotations and reflections as well as translations, or the even more recent “spectre monotile“, which is similar except that no reflections are needed.)

One of the motivations of this conjecture was the observation of Hao Wang that if the periodic tiling conjecture were true, then the translational monotiling problem is (algorithmically) decidable: there is a Turing machine which, when given a dimension {d} and a finite subset {F} of {{\bf Z}^d}, can determine in finite time whether {F} can tile {{\bf Z}^d}. This is because if a periodic tiling exists, it can be found by computer search; and if no tiling exists at all, then (by the compactness theorem) there exists some finite subset of {{\bf Z}^d} that cannot be covered by disjoint translates of {F}, and this can also be discovered by computer search. The periodic tiling conjecture asserts that these are the only two possible scenarios, thus giving the decidability.

On the other hand, Wang’s argument is not known to be reversible: the failure of the periodic tiling conjecture does not automatically imply the undecidability of the translational monotiling problem, as it does not rule out the existence of some other algorithm to determine tiling that does not rely on the existence of a periodic tiling. (For instance, even with the newly discovered hat and spectre tiles, it remains an open question whether the isometric monotiling problem for (say) polygons with rational coefficients in {{\bf R}^2} is decidable, with or without reflections.)

The main result of this paper settles this question (with one caveat):

Theorem 1 There does not exist any algorithm which, given a dimension {d}, a periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, determines in finite time whether there is a translational tiling {A \oplus F = E} of {E} by {F}.

The caveat is that we have to work with periodic subsets {E} of {{\bf Z}^d}, rather than all of {{\bf Z}^d}; we believe this is largely a technical restriction of our method, and it is likely that can be removed with additional effort and creativity. We also remark that when {d=2}, the periodic tiling conjecture was established by Bhattacharya, and so the problem is decidable in the {d=2} case. It remains open whether the tiling problem is decidable for any fixed value of {d>2} (note in the above result that the dimension {d} is not fixed, but is part of the input).

Because of a well known link between algorithmic undecidability and logical undecidability (also known as logical independence), the main theorem also implies the existence of an (in principle explicitly describable) dimension {d}, periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, such that the assertion that {F} tiles {E} by translation cannot be proven or disproven in ZFC set theory (assuming of course that this theory is consistent).

As a consequence of our method, we can also replace {{\bf Z}^d} here by “virtually two-dimensional” groups {{\bf Z}^2 \times G_0}, with {G_0} a finite abelian group (which now becomes part of the input, in place of the dimension {d}).

We now describe some of the main ideas of the proof. It is a common technique to show that a given problem is undecidable by demonstrating that some other problem that was already known to be undecidable can be “encoded” within the original problem, so that any algorithm for deciding the original problem would also decide the embedded problem. Accordingly, we will encode the Wang tiling problem as a monotiling problem in {{\bf Z}^d}:

Problem 2 (Wang tiling problem) Given a finite collection {{\mathcal W}} of Wang tiles (unit squares with each side assigned some color from a finite palette), is it possible to tile the plane with translates of these tiles along the standard lattice {{\bf Z}^2}, such that adjacent tiles have matching colors along their common edge?

It is a famous result of Berger that this problem is undecidable. The embedding of this problem into the higher-dimensional translational monotiling problem proceeds through some intermediate problems. Firstly, it is an easy matter to embed the Wang tiling problem into a similar problem which we call the domino problem:

Problem 3 (Domino problem) Given a finite collection {{\mathcal R}_1} (resp. {{\mathcal R}_2}) of horizontal (resp. vertical) dominoes – pairs of adjacent unit squares, each of which is decorated with an element of a finite set {{\mathcal W}} of “pips”, is it possible to assign a pip to each unit square in the standard lattice tiling of {{\bf Z}^2}, such that every horizontal (resp. vertical) pair of squares in this tiling is decorated using a domino from {{\mathcal R}_1} (resp. {{\mathcal R}_2})?

Indeed, one just has to interpet each Wang tile as a separate “pip”, and define the domino sets {{\mathcal R}_1}, {{\mathcal R}_2} to be the pairs of horizontally or vertically adjacent Wang tiles with matching colors along their edge.

Next, we embed the domino problem into a Sudoku problem:

Problem 4 (Sudoku problem) Given a column width {N}, a digit set {\Sigma}, a collection {{\mathcal S}} of functions {g: \{0,\dots,N-1\} \rightarrow \Sigma}, and an “initial condition” {{\mathcal C}} (which we will not detail here, as it is a little technical), is it possible to assign a digit {F(n,m)} to each cell {(n,m)} in the “Sudoku board” {\{0,1,\dots,N-1\} \times {\bf Z}} such that for any slope {j \in {\bf Z}} and intercept {i \in {\bf Z}}, the digits {n \mapsto F(n,jn+i)} along the line {\{(n,jn+i): 0 \leq n \leq N-1\}} lie in {{\mathcal S}} (and also that {F} obeys the initial condition {{\mathcal C}})?

The most novel part of the paper is the demonstration that the domino problem can indeed be embedded into the Sudoku problem. The embedding of the Sudoku problem into the monotiling problem follows from a modification of the methods in our previous papers, which had also introduced versions of the Sudoku problem, and created a “tiling language” which could be used to “program” various problems, including the Sudoku problem, as monotiling problems.

To encode the domino problem into the Sudoku problem, we need to take a domino function {{\mathcal T}: {\bf Z}^2 \rightarrow {\mathcal W}} (obeying the domino constraints associated to some domino sets {{\mathcal R}_1, {\mathcal R}_2}) and use it to build a Sudoku function {F: \{0,\dots,N-1\} \times {\bf Z} \rightarrow \Sigma} (obeying some Sudoku constraints relating to the domino sets); conversely, every Sudoku function obeying the rules of our Sudoku puzzle has to arise somehow from a domino function. The route to doing so was not immediately obvious, but after a helpful tip from Emmanuel Jeandel, we were able to adapt some ideas of Aanderaa and Lewis, in which certain hierarchical structures were used to encode one problem in another. Here, we interpret hierarchical structure {p}-adically (using two different primes due to the two-dimensionality of the domino problem). The Sudoku function {F} that will exemplify our embedding is then built from {{\mathcal T}} by the formula

\displaystyle  F(n,m) := ( f_{p_1}(m), f_{p_2}(m), {\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m)) ) \ \ \ \ \ (1) where {p_1,p_2} are two large distinct primes (for instance one can take {p_1=53}, {p_2=59} for concreteness), {\nu_p(m)} denotes the number of times {p} divides {m}, and {f_p(m) \in {\bf Z}/p{\bf Z} \backslash \{0\}} is the last non-zero digit in the base {p} expansion of {m}:

\displaystyle  f_p(m) := \frac{m}{p^{\nu_p(m)}} \hbox{ mod } p (with the conventions {\nu_p(0)=+\infty} and {f_p(0)=1}). In the case {p_1=3, p_2=5}, the first component of (1) looks like this:

and a typical instance of the final component {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))} looks like this:

Amusingly, the decoration here is essentially following the rules of the children’s game “Fizz buzz“.

To demonstrate the embedding, we thus need to produce a specific Sudoku rule {{\mathcal S}} (as well as a more technical initial condition {{\mathcal C}}, which is basically required to exclude degenerate Sudoku solutions such as a constant solution) that can “capture” the target function (1), in the sense that the only solutions to this specific Sudoku puzzle are given by variants of {F} (e.g., {F} composed with various linear transformations). In our previous paper we were able to build a Sudoku puzzle that could similarly capture either of the first two components {f_{p_1}(m)}, {f_{p_2}(m)} of our target function (1) (up to linear transformations), by a procedure very akin to solving an actual Sudoku puzzle (combined with iterative use of a “Tetris” move in which we eliminate rows of the puzzle that we have fully solved, to focus on the remaining unsolved rows). Our previous paper treated the case when {p} was replaced with a power of {2}, as this was the only case that we know how to embed in a monotiling problem of the entirety of {{\bf Z}^d} (as opposed to a periodic subset {E} of {{\bf Z}^d}), but the analysis is in fact easier when {p} is a large odd prime, instead of a power of {2}. Once the first two components {f_{p_1}(m), f_{p_2}(m)} have been solved for, it is a relatively routine matter to design an additional constraint in the Sudoku rule that then constrains the third component to be of the desired form {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))}, with {{\mathcal T}} obeying the domino constraints.

I have just uploaded to the arXiv my paper “Monotone non-decreasing sequences of the Euler totient function“. This paper concerns the quantity {M(x)}, defined as the length of the longest subsequence of the numbers from {1} to {x} for which the Euler totient function {\varphi} is non-decreasing. The first few values of {M} are

\displaystyle  1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 12, 12, \dots

(OEIS A365339). For instance, {M(6)=5} because the totient function is non-decreasing on the set {\{1,2,3,4,5\}} or {\{1,2,3,4,6\}}, but not on the set {\{1,2,3,4,5,6\}}.

Since {\varphi(p)=p-1} for any prime {p}, we have {M(x) \geq \pi(x)}, where {\pi(x)} is the prime counting function. Empirically, the primes come quite close to achieving the maximum length {M(x)}; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had

\displaystyle  M(x) = \pi(x)+64 \ \ \ \ \ (1)

for all {x \geq 31957}; this conjecture is verified up to {x=10^7}. The previous best known upper bound was basically of the form

\displaystyle  M(x) \leq \exp( (C+o(1)) (\log\log\log x)^2 ) \frac{x}{\log x} \ \ \ \ \ (2)

as {x \rightarrow \infty} for an explicit constant {C = 0.81781\dots}, from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic

\displaystyle  M(x) = \left( 1 + O \left(\frac{(\log\log x)^5}{\log x}\right) \right) \frac{x}{\log x}

so in particular {M(x) = (1+o(1))\pi(x)}. This answers a question of Erdős, as well as a closely related question of Pollack, Pomerance, and Treviño.

The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor {p} of a given number {1 \leq n \leq x} which has a sizeable influence on the totient function {\varphi(n)}. For instance, for “typical” numbers {n}, one has a factorization

\displaystyle  n = d p_2 p_1

where {p_2} is a medium sized prime, {p_1} is a significantly larger prime, and {d} is a number with all prime factors less than {p_2}. This leads to an approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} (1-\frac{1}{p_2}) n.

As a consequence, if we temporarily hold {d} fixed, and also localize {n} to a relatively short interval, then {\varphi} can only be non-decreasing in {n} if {p_2} is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if {p_2} is large; this can be formalized by partitioning the range of {p_2} into various subintervals and inspecting how this (and the monotonicity hypothesis on {\varphi}) constrains the values of {n} associated to each subinterval. When {p_2} is small, we instead use a factorization

\displaystyle  n = d p \ \ \ \ \ (3)

where {d} is very smooth (i.e., has no large prime factors), and {p} is a large prime. Now we have the approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} n \ \ \ \ \ (4)

and we can conclude that {\frac{\varphi(d)}{d}} will have to basically be piecewise constant in order for {\varphi} to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality

\displaystyle  \sum_{\frac{\varphi(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (5)

for all positive rational numbers {q}. This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with {\frac{\varphi(d)}{d}} equal to a fixed constant {q} (and omitting a few rare values of {n} where the approximation (4) is bad enough that {\varphi} is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of {\frac{\varphi(d)}{d}} in lowest terms necessarily equals the largest prime factor of {d}, that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of {\frac{1}{p-1}} for some primes {p} ranging up to the largest prime factor of {q}) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function {\sigma(n)}, one would need the analogue

\displaystyle  \sum_{\frac{\sigma(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (6)

of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.

In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.

Kevin Ford, Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “A lower bound on the mean value of the Erdős-Hooley delta function“. This paper complements the recent paper of Dimitris and myself obtaining the upper bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log\log x)^{11/4}

on the mean value of the Erdős-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+1} \}

In this paper we obtain a lower bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg (\log\log x)^{1+\eta-o(1)}

where {\eta = 0.3533227\dots} is an exponent that arose in previous work of result of Ford, Green, and Koukoulopoulos, who showed that

\displaystyle  \Delta(n) \gg (\log\log n)^{\eta-o(1)} \ \ \ \ \ (1)

for all {n} outside of a set of density zero. The previous best known lower bound for the mean value was

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg \log\log x,

due to Hall and Tenenbaum.

The point is the main contributions to the mean value of {\Delta(n)} are driven not by “typical” numbers {n} of some size {x}, but rather of numbers that have a splitting

\displaystyle  n = n' n''

where {n''} is the product of primes between some intermediate threshold {1 \leq y \leq x} and {x} and behaves “typically” (so in particular, it has about {\log\log x - \log\log y + O(\sqrt{\log\log x})} prime factors, as per the Hardy-Ramanujan law and the Erdős-Kac law, but {n'} is the product of primes up to {y} and has double the number of typical prime factors – {2 \log\log y + O(\sqrt{\log\log x})}, rather than {\log\log y + O(\sqrt{\log\log x})} – thus {n''} is the type of number that would make a significant contribution to the mean value of the divisor function {\tau(n'')}. Here {y} is such that {\log\log y} is an integer in the range

\displaystyle  \varepsilon\log\log x \leq \log \log y \leq (1-\varepsilon) \log\log x

for some small constant {\varepsilon>0} there are basically {\log\log x} different values of {y} give essentially disjoint contributions. From the easy inequalities

\displaystyle  \Delta(n) \gg \Delta(n') \Delta(n'') \geq \frac{\tau(n')}{\log n'} \Delta(n'') \ \ \ \ \ (2)

(the latter coming from the pigeonhole principle) and the fact that {\frac{\tau(n')}{\log n'}} has mean about one, one would expect to get the above result provided that one could get a lower bound of the form

\displaystyle  \Delta(n'') \gg (\log \log n'')^{\eta-o(1)} \ \ \ \ \ (3)

for most typical {n''} with prime factors between {y} and {x}. Unfortunately, due to the lack of small prime factors in {n''}, the arguments of Ford, Green, Koukoulopoulos that give (1) for typical {n} do not quite work for the rougher numbers {n''}. However, it turns out that one can get around this problem by replacing (2) by the more efficient inequality

\displaystyle  \Delta(n) \gg \frac{\tau(n')}{\log n'} \Delta^{(\log n')}(n'')

where

\displaystyle  \Delta^{(v)}(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+v} \}

is an enlarged version of {\Delta^{(n)}} when {v \geq 1}. This inequality is easily proven by applying the pigeonhole principle to the factors of {n} of the form {d' d''}, where {d'} is one of the {\tau(n')} factors of {n'}, and {d''} is one of the {\Delta^{(\log n')}(n'')} factors of {n''} in the optimal interval {[e^u, e^{u+\log n'}]}. The extra room provided by the enlargement of the range {[e^u, e^{u+1}]} to {[e^u, e^{u+\log n'}]} turns out to be sufficient to adapt the Ford-Green-Koukoulopoulos argument to the rough setting. In fact we are able to use the main technical estimate from that paper as a “black box”, namely that if one considers a random subset {A} of {[D^c, D]} for some small {c>0} and sufficiently large {D} with each {n \in [D^c, D]} lying in {A} with an independent probability {1/n}, then with high probability there should be {\gg c^{-1/\eta+o(1)}} subset sums of {A} that attain the same value. (Initially, what “high probability” means is just “close to {1}“, but one can reduce the failure probability significantly as {c \rightarrow 0} by a “tensor power trick” taking advantage of Bennett’s inequality.)

I have just uploaded to the arXiv my paper “The convergence of an alternating series of Erdős, assuming the Hardy–Littlewood prime tuples conjecture“. This paper concerns an old problem of Erdős concerning whether the alternating series {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} converges, where {p_n} denotes the {n^{th}} prime. The main result of this paper is that the answer to this question is affirmative assuming a sufficiently strong version of the Hardy–Littlewood prime tuples conjecture.

The alternating series test does not apply here because the ratios {\frac{n}{p_n}} are not monotonically decreasing. The deviations of monotonicity arise from fluctuations in the prime gaps {p_{n+1}-p_n}, so the enemy arises from biases in the prime gaps for odd and even {n}. By changing variables from {n} to {p_n} (or more precisely, to integers in the range between {p_n} and {p_{n+1}}), this is basically equivalent to biases in the parity {(-1)^{\pi(n)}} of the prime counting function. Indeed, it is an unpublished observation of Said that the convergence of {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} is equivalent to the convergence of {\sum_{n=10}^\infty \frac{(-1)^{\pi(n)}}{n \log n}}. So this question is really about trying to get a sufficiently strong amount of equidistribution for the parity of {\pi(n)}.

The prime tuples conjecture does not directly say much about the value of {\pi(n)}; however, it can be used to control differences {\pi(n+\lambda \log x) - \pi(n)} for {n \sim x} and {\lambda>0} not too large. Indeed, it is a famous calculation of Gallagher that for fixed {\lambda}, and {n} chosen randomly from {1} to {x}, the quantity {\pi(n+\lambda \log x) - \pi(n)} is distributed according to the Poisson distribution of mean {\lambda} asymptotically if the prime tuples conjecture holds. In particular, the parity {(-1)^{\pi(n+\lambda \log x)-\pi(n)}} of this quantity should have mean asymptotic to {e^{-2\lambda}}. An application of the van der Corput {A}-process then gives some decay on the mean of {(-1)^{\pi(n)}} as well. Unfortunately, this decay is a bit too weak for this problem; even if one uses the most quantitative version of Gallagher’s calculation, worked out in a recent paper of (Vivian) Kuperberg, the best bound on the mean {|\frac{1}{x} \sum_{n \leq x} (-1)^{\pi(n)}|} is something like {1/(\log\log x)^{-1/4+o(1)}}, which is not quite strong enough to overcome the doubly logarithmic divergence of {\sum_{n=1}^\infty \frac{1}{n \log n}}.

To get around this obstacle, we take advantage of the random sifted model {{\mathcal S}_z} of the primes that was introduced in a paper of Banks, Ford, and myself. To model the primes in an interval such as {[n, n+\lambda \log x]} with {n} drawn randomly from say {[x,2x]}, we remove one random residue class {a_p \hbox{ mod } p} from this interval for all primes {p} up to Pólya’s “magic cutoff” {z \approx x^{1/e^\gamma}}. The prime tuples conjecture can then be intepreted as the assertion that the random set {{\mathcal S}_z} produced by this sieving process is statistically a good model for the primes in {[n, n+\lambda \log x]}. After some standard manipulations (using a version of the Bonferroni inequalities, as well as some upper bounds of Kuperberg), the problem then boils down to getting sufficiently strong estimates for the expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}} of the random sifted set {{\mathcal S}_z}.

For this problem, the main advantage of working with the random sifted model, rather than with the primes or the singular series arising from the prime tuples conjecture, is that the sifted model can be studied iteratively from the partially sifted sets {{\mathcal S}_w} arising from sifting primes {p} up to some intermediate threshold {w<z}, and that the expected parity of the {{\mathcal S}_w} experiences some decay in {w}. Indeed, once {w} exceeds the length {\lambda \log x} of the interval {[n,n+\lambda \log x]}, sifting {{\mathcal S}_w} by an additional prime {p} will cause {{\mathcal S}_w} to lose one element with probability {|{\mathcal S}_w|/p}, and remain unchanged with probability {1 - |{\mathcal S}_w|/p}. If {|{\mathcal S}_w|} concentrates around some value {\overline{S}_w}, this suggests that the expected parity {{\bf E} (-1)^{|{\mathcal S}_w|}} will decay by a factor of about {|1 - 2 \overline{S}_w/p|} as one increases {w} to {p}, and iterating this should give good bounds on the final expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}}. It turns out that existing second moment calculations of Montgomery and Soundararajan suffice to obtain enough concentration to make this strategy work.

Jon Bennett and I have just uploaded to the arXiv our paper “Adjoint Brascamp-Lieb inequalities“. In this paper, we observe that the family of multilinear inequalities known as the Brascamp-Lieb inequalities (or Holder-Brascamp-Lieb inequalities) admit an adjoint formulation, and explore the theory of these adjoint inequalities and some of their consequences.

To motivate matters let us review the classical theory of adjoints for linear operators. If one has a bounded linear operator {T: L^p(X) \rightarrow L^q(Y)} for some measure spaces {X,Y} and exponents {1 < p, q < \infty}, then one can define an adjoint linear operator {T^*: L^{q'}(Y) \rightarrow L^{p'}(X)} involving the dual exponents {\frac{1}{p}+\frac{1}{p'} = \frac{1}{q}+\frac{1}{q'} = 1}, obeying (formally at least) the duality relation

\displaystyle  \langle Tf, g \rangle = \langle f, T^* g \rangle \ \ \ \ \ (1)

for suitable test functions {f, g} on {X, Y} respectively. Using the dual characterization

\displaystyle  \|f\|_{L^{p'}(X)} = \sup_{g: \|g\|_{L^p(X)} \leq 1} |\langle f, g \rangle|

of {L^{p'}(X)} (and similarly for {L^{q'}(Y)}), one can show that {T^*} has the same operator norm as {T}.

There is a slightly different way to proceed using Hölder’s inequality. For sake of exposition let us make the simplifying assumption that {T} (and hence also {T^*}) maps non-negative functions to non-negative functions, and ignore issues of convergence or division by zero in the formal calculations below. Then for any reasonable function {g} on {Y}, we have

\displaystyle  \| T^* g \|_{L^{p'}(X)}^{p'} = \langle (T^* g)^{p'-1}, T^* g \rangle = \langle T (T^* g)^{p'-1}, g \rangle

\displaystyle  \leq \|T\|_{op} \|(T^* g)^{p'-1}\|_{L^p(X)} \|g\|_{L^{p'}(Y)}

\displaystyle  = \|T\|_{op} \|T^* g \|_{L^{p'}(X)}^{p'-1} \|g\|_{L^{p'}(Y)};

by (1) and Hölder; dividing out by {\|T^* g \|_{L^{p'}(X)}^{p'-1}} we obtain {\|T^*\|_{op} \leq \|T\|_{op}}, and a similar argument also recovers the reverse inequality.

The first argument also extends to some extent to multilinear operators. For instance if one has a bounded bilinear operator {B: L^p(X) \times L^q(Y) \rightarrow L^r(Z)} for {1 < p,q,r < \infty} then one can then define adjoint bilinear operators {B^{*1}: L^q(Y) \times L^{r'}(Z) \rightarrow L^{p'}(X)} and {B^{*2}: L^p(X) \times L^{r'}(Z) \rightarrow L^{q'}(Y)} obeying the relations

\displaystyle  \langle B(f, g),h \rangle = \langle B^{*1}(g,h), f \rangle = \langle B^{*2}(f,h), g \rangle

and with exactly the same operator norm as {B}. It is also possible, formally at least, to adapt the Hölder inequality argument to reach the same conclusion.

In this paper we observe that the Hölder inequality argument can be modified in the case of Brascamp-Lieb inequalities to obtain a different type of adjoint inequality. (Continuous) Brascamp-Lieb inequalities take the form

\displaystyle  \int_{{\bf R}^d} \prod_{i=1}^k f_i^{c_i} \circ B_i \leq \mathrm{BL}(\mathbf{B},\mathbf{c}) (\prod_{i=1}^k \int_{{\bf R}^{d_i}} f_i)^{c_i}

for various exponents {c_1,\dots,c_k} and surjective linear maps {B_i: {\bf R}^d \rightarrow {\bf R}^{d_i}}, where {f_i: {\bf R}^{d_i} \rightarrow {\bf R}} are arbitrary non-negative measurable functions and {\mathrm{BL}(\mathbf{B},\mathbf{c})} is the best constant for which this inequality holds for all such {f_i}. [There is also another inequality involving variances with respect to log-concave distributions that is also due to Brascamp and Lieb, but it is not related to the inequalities discussed here.] Well known examples of such inequalities include Hölder’s inequality and the sharp Young convolution inequality; another is the Loomis-Whitney inequality, the first non-trivial example of which is

\displaystyle  \int_{{\bf R}^3} f(y,z)^{1/2} g(x,z)^{1/2} h(x,y)^{1/2}

\displaystyle  \leq (\int_{{\bf R}^2} f)^{1/2} (\int_{{\bf R}^2} g)^{1/2} (\int_{{\bf R}^2} h)^{1/2} \ \ \ \ \ (2)

for all non-negative measurable {f,g,h: {\bf R}^2 \rightarrow {\bf R}}. There are also discrete analogues of these inequalities, in which the Euclidean spaces {{\bf R}^d, {\bf R}^{d_i}} are replaced by discrete abelian groups, and the surjective linear maps {B_i} are replaced by discrete homomorphisms.

The operation {f \mapsto f \circ B_i} of pulling back a function on {{\bf R}^{d_i}} by a linear map {B_i: {\bf R}^d \rightarrow {\bf R}^{d_i}} to create a function on {{\bf R}^d} has an adjoint pushforward map {(B_i)_*}, which takes a function on {{\bf R}^d} and basically integrates it on the fibers of {B_i} to obtain a “marginal distribution” on {{\bf R}^{d_i}} (possibly multiplied by a normalizing determinant factor). The adjoint Brascamp-Lieb inequalities that we obtain take the form

\displaystyle  \|f\|_{L^p({\bf R}^d)} \leq \mathrm{ABL}( \mathbf{B}, \mathbf{c}, \theta, p) \prod_{i=1}^k \|(B_i)_* f \|_{L^{p_i}({\bf R}^{d_i})}^{\theta_i}

for non-negative {f: {\bf R}^d \rightarrow {\bf R}} and various exponents {p, p_i, \theta_i}, where {\mathrm{ABL}( \mathbf{B}, \mathbf{c}, \theta, p)} is the optimal constant for which the above inequality holds for all such {f}; informally, such inequalities control the {L^p} norm of a non-negative function in terms of its marginals. It turns out that every Brascamp-Lieb inequality generates a family of adjoint Brascamp-Lieb inequalities (with the exponent {p} being less than or equal to {1}). For instance, the adjoints of the Loomis-Whitney inequality (2) are the inequalities

\displaystyle  \| f \|_{L^p({\bf R}^3)} \leq \| (B_1)_* f \|_{L^{p_1}({\bf R}^2)}^{\theta_1} \| (B_2)_* f \|_{L^{p_2}({\bf R}^2)}^{\theta_2} \| (B_3)_* f \|_{L^{p_3}({\bf R}^2)}^{\theta_3}

for all non-negative measurable {f: {\bf R}^3 \rightarrow {\bf R}}, all {\theta_1, \theta_2, \theta_3>0} summing to {1}, and all {0 < p \leq 1}, where the {p_i} exponents are defined by the formula

\displaystyle  \frac{1}{2} (1-\frac{1}{p}) = \theta_i (1-\frac{1}{p_i})

and the {(B_i)_* f:{\bf R}^2 \rightarrow {\bf R}} are the marginals of {f}:

\displaystyle  (B_1)_* f(y,z) := \int_{\bf R} f(x,y,z)\ dx

\displaystyle  (B_2)_* f(x,z) := \int_{\bf R} f(x,y,z)\ dy

\displaystyle  (B_3)_* f(x,y) := \int_{\bf R} f(x,y,z)\ dz.

One can derive these adjoint Brascamp-Lieb inequalities from their forward counterparts by a version of the Hölder inequality argument mentioned previously, in conjunction with the observation that the pushforward maps {(B_i)_*} are mass-preserving (i.e., they preserve the {L^1} norm on non-negative functions). Conversely, it turns out that the adjoint Brascamp-Lieb inequalities are only available when the forward Brascamp-Lieb inequalities are. In the discrete case the forward and adjoint Brascamp-Lieb constants are essentially identical, but in the continuous case they can (and often do) differ by up to a constant. Furthermore, whereas in the forward case there is a famous theorem of Lieb that asserts that the Brascamp-Lieb constants can be computed by optimizing over gaussian inputs, the same statement is only true up to constants in the adjoint case, and in fact in most cases the gaussians will fail to optimize the adjoint inequality. The situation appears to be complicated; roughly speaking, the adjoint inequalities only use a portion of the range of possible inputs of the forward Brascamp-Lieb inequality, and this portion often misses the gaussian inputs that would otherwise optimize the inequality.

We have located a modest number of applications of the adjoint Brascamp-Lieb inequality (but hope that there will be more in the future):

  • The inequalities become equalities at {p=1}; taking a derivative at this value (in the spirit of the replica trick in physics) we recover the entropic Brascamp-Lieb inequalities of Carlen and Cordero-Erausquin. For instance, the derivative of the adjoint Loomis-Whitney inequalities at {p=1} yields Shearer’s inequality.
  • The adjoint Loomis-Whitney inequalities, together with a few more applications of Hölder’s inequality, implies the log-concavity of the Gowers uniformity norms on non-negative functions, which was previously observed by Shkredov and by Manners.
  • Averaging the adjoint Loomis-Whitney inequalities over coordinate systems gives reverse {L^p} inequalities for the X-ray transform and other tomographic transforms that appear to be new in the literature. In particular, we obtain some monotonicity of the {L^{p_k}} norms or entropies of the {k}-plane transform in {k} (if the exponents {p_k} are chosen in a dimensionally consistent fashion).

We also record a number of variants of the adjoint Brascamp-Lieb inequalities, including discrete variants, and a reverse inequality involving {L^p} norms with {p>1} rather than {p<1}.

Ben Green, Freddie Manners and I have just uploaded to the arXiv our preprint “Sumsets and entropy revisited“. This paper uses entropy methods to attack the Polynomial Freiman-Ruzsa (PFR) conjecture, which we study in the following two forms:

Conjecture 1 (Weak PFR over {Z}) Let {A \subset {\bf Z}^D} be a finite non-empty set whose doubling constant {\sigma[A] := |A+A|/|A|} is at most {K}. Then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} that has affine dimension {O(\log K)} (i.e., it is contained in an affine space of dimension {O(\log K)}).

Conjecture 2 (PFR over {{\bf F}_2}) Let {A \subset {\bf F}_2^D} be a non-empty set whose doubling constant {\sigma[A]} is at most {K}. Then {A} can be covered by {O(K^{O(1)})} cosets of a subspace of cardinality at most {|A|}.

Our main results are then as follows.

Theorem 3 If {A \subset {\bf Z}^D} with {\sigma[A] \leq K}, then
  • (i) There is a subset {A'} of {A} of density {\gg K^{-O(1)}} of “skew-dimension” (or “query complexity”) {O(\log K)}.
  • (ii) There is a subset {A'} of {A} of density {\gg \exp( -O(\log^{5/3+o(1)} K) )} of affine dimension {O(\log K)} (where {o(1)} goes to zero as {K \rightarrow \infty}).
  • (iii) If Conjecture 2 holds, then there is a subset {A'} of {A} of density {\gg K^{-O(1)}} of affine dimension {O(\log K)}. In other words, Conjecture 2 implies Conjecture 1.

The skew-dimension of a set is a quantity smaller than the affine dimension which is defined recursively; the precise definition is given in the paper, but suffice to say that singleton sets have dimension {0}, and a set {A \subset {\bf Z}^{D_1} \times {\bf Z}^{D_2}} whose projection to {{\bf Z}^{D_1}} has skew-dimension at most {d_1}, and whose fibers in {\{x\} \times {\bf Z}^{D_2}} have skew-dimension at most {d_2} for any {x}, will have skew-dimension at most {d_1+d_2}. (In fact, the skew-dimension is basically the largest quantity which obeys all of these properties.)

Part (i) of this theorem was implicitly proven by Pálvölgi and Zhelezov by a different method. Part (ii) with {5/3+o(1)} replaced by {2} was established by Manners. To our knowledge, part (iii) is completely new.

Our proof strategy is to establish these combinatorial additive combinatorics results by using entropic additive combinatorics, in which we replace sets {A} with random variables {X}, and cardinality with (the exponential of) Shannon entropy. This is in order to take advantage of some superior features of entropic additive combinatorics, most notably good behavior with respect to homomorphisms.

For instance, the analogue of the combinatorial doubling constant {\sigma[A] := |A+A|/|A|} of a finite non-empty subset {A} of an abelian group {G}, is the entropy doubling constant

\displaystyle  \sigma_{\mathrm{ent}}[X] := {\exp( \bf H}(X_1+X_2) - {\bf H}(X) )

of a finitely-valued random variable {X} in {G}, where {X_1,X_2} are independent copies of {X} and {{\bf H}} denotes Shannon entropy. There is also an analogue of the Ruzsa distance

\displaystyle  d(A,B) := \log \frac{|A-B|}{|A|^{1/2} |B|^{1/2}}

between two finite non-empty subsets {A,B} of {G}, namely the entropic Ruzsa distance

\displaystyle  d_{\mathrm{ent}}(X,Y) := {\bf H}(X'+Y') - \frac{1}{2} {\bf H}(X) - \frac{1}{2} {\bf H}(Y)

where {X',Y'} are independent copies of {X,Y} respectively. (Actually, one thing we show in our paper is that the independence hypothesis can be dropped, and this only affects the entropic Ruzsa distance by a factor of three at worst.) Many of the results about sumsets and Ruzsa distance have entropic analogues, but the entropic versions are slightly better behaved; for instance, we have a contraction property

\displaystyle  d_{\mathrm{ent}}(\phi(X),\phi(Y)) \leq d_{\mathrm{ent}}(X,Y)

whenever {\phi: G \rightarrow H} is a homomorphism. In fact we have a refinement of this inequality in which the gap between these two quantities can be used to control the entropic distance between “fibers” of {X,Y} (in which one conditions {\phi(X)} and {\phi(Y)} to be fixed). On the other hand, there are direct connections between the combinatorial and entropic sumset quantities. For instance, if {U_A} is a random variable drawn uniformly from {A}, then

\displaystyle  \sigma_{\mathrm{ent}}[U_A] \leq \sigma[A].

Thus if {A} has small doubling, then {U_A} has small entropic doubling. In the converse direction, if {X} has small entropic doubling, then {X} is close (in entropic Ruzsa distance) to a uniform random variable {U_S} drawn from a set {S} of small doubling; a version of this statement was proven in an old paper of myself, but we establish here a quantitatively efficient version, established by rewriting the entropic Ruzsa distance in terms of certain Kullback-Liebler divergences.

Our first main result is a “99% inverse theorem” for entropic Ruzsa distance: if {d_{\mathrm{ent}}(X,Y)} is sufficiently small, then there exists a finite subgroup {H} of {G} such that

\displaystyle  d_{\mathrm{ent}}(X,U_H), d_{\mathrm{ent}}(Y,U_H) \leq 12 d_{\mathrm{ent}}(X,Y). \ \ \ \ \ (1)

This result uses the results just mentioned to relate {X,Y} to a set {S} of small doubling, which can then be related to a subgroup {H} by standard inverse theorems; this gives a weak version of (1) (roughly speaking losing a square root in the bound), and some additional analysis is needed to bootstrap this initial estimate back to (1).

We now sketch how these tools are used to prove our main theorem. For (i), we reduce matters to establishing the following bilinear entropic analogue: given two non-empty finite subsets {A,B} of {{\bf Z}^D}, one can find subsets {A' \subset A}, {B' \subset B} with

\displaystyle  |A'| |B'| \geq e^{-C d_{\mathrm{ent}}(U_A, U_B)} |A| |B|

such that {A', B'} have skew-dimension at most {C d_{\mathrm{ent}}(U_A, U_B)}, for some absolute constant {C}. This can be shown by an induction on {|A||B|} (say). One applies a non-trivial coordinate projection {\pi: {\bf Z}^D \rightarrow {\bf Z}} to {A,B}. If {\pi(U_A)} and {\pi(U_B)} are very close in entropic Ruzsa distance, then the 99% inverse theorem shows that these random variables must each concentrate at a point (because {{\bf Z}} has no non-trivial finite subgroups), and can pass to a fiber of these points and use the induction hypothesis. If instead {\pi(U_A)} and {\pi(U_B)} are far apart, then by the behavior of entropy under projections one can show that the fibers of {A} and {B} under {\pi} are closer on average in entropic Ruzsa distance of {A} and {B} themselves, and one can again proceed using the induction hypothesis.

For parts (ii) and (iii), we first use an entropic version of an observation of Manners that sets of small doubling in {{\bf Z}^D} must be irregularly distributed modulo {2}. A clean formulation of this in entropic language is the inequality

\displaystyle  d_{\mathrm{ent}}(X, 2Y) \leq 5 d_{\mathrm{ent}}(X,Y)

whenever {X,Y} take values in a torsion-free abelian group such as {{\bf Z}^D}; this turns out to follow from two applications of the entropy submodularity inequality. One corollary of this (and the behavior of entropy under projections) is that

\displaystyle  {\bf H}( X \hbox{ mod } 2 ), {\bf H}( Y \hbox{ mod } 2 ) \leq 10 d_{\mathrm{ent}}(X,Y).

This is the key link between the {{\bf Z}^D} and {{\bf F}_2^D} worlds that is used to prove (ii), (iii): while (iii) relies on the still unproven PFR conjecture over {{\bf F}_2}, (ii) uses the unconditional progress on PFR by Konyagin, as detailed in this survey of Sanders. The argument has a similar inductive structure to that used to establish (i) (and if one is willing to replace {5/3+o(1)} by {2} then the argument is in fact relatively straightforward and does not need any deep partial results on the PFR).

As one byproduct of our analysis we also obtain an appealing entropic reformulation of Conjecture 2, namely that if {X} is an {{\bf F}_2^D}-valued random variable then there exists a subspace {H} of {{\bf F}_2^D} such that

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll \sigma_{\mathrm{ent}}[X].

Right now the best result in this direction is

\displaystyle  d_{\mathrm{ent}}(X, U_H) \ll_\varepsilon \sigma_{\mathrm{ent}}[X] + \sigma_{\mathrm{ent}}^{3+\varepsilon}[X]

for any {\varepsilon > 0}, by using Konyagin’s partial result towards the PFR.

Archives