Let be a non-empty finite set. If is a random variable taking values in , the Shannon entropy of is defined as
There is a nice variational formula that lets one compute logs of sums of exponentials in terms of this entropy:
Lemma 1 (Gibbs variational formula) Let be a function. Then
Proof: Note that shifting by a constant affects both sides of (1) the same way, so we may normalize . Then is now the probability distribution of some random variable , and the inequality can be rewritten as
In this note I would like to use this variational formula (which is also known as the Donsker-Varadhan variational formula) to give another proof of the following inequality of Carbery.
Theorem 2 (Generalized Cauchy-Schwarz inequality) Let , let be finite non-empty sets, and let be functions for each . Let and be positive functions for each . Then
where is the quantity
where is the set of all tuples such that for .
Thus for instance, the identity is trivial for . When , the inequality reads
which is easily proven by Cauchy-Schwarz, while for the inequality reads
which can also be proven by elementary means. However even for , the existing proofs require the “tensor power trick” in order to reduce to the case when the are step functions (in which case the inequality can be proven elementarily, as discussed in the above paper of Carbery).
We now prove this inequality. We write and for some functions and . If we take logarithms in the inequality to be proven and apply Lemma 1, the inequality becomes
where ranges over random variables taking values in , range over tuples of random variables taking values in , and range over random variables taking values in . Comparing the suprema, the claim now reduces to
Lemma 3 (Conditional expectation computation) Let be an -valued random variable. Then there exists a -valued random variable , where each has the same distribution as , and
Proof: We induct on . When we just take . Now suppose that , and the claim has already been proven for , thus one has already obtained a tuple with each having the same distribution as , and
By hypothesis, has the same distribution as . For each value attained by , we can take conditionally independent copies of and conditioned to the events and respectively, and then concatenate them to form a tuple in , with a further copy of that is conditionally independent of relative to . One can the use the entropy chain rule to compute
and the claim now follows from the induction hypothesis.
With a little more effort, one can replace by a more general measure space (and use differential entropy in place of Shannon entropy), to recover Carbery’s inequality in full generality; we leave the details to the interested reader.
In my previous post, I walked through the task of formally deducing one lemma from another in Lean 4. The deduction was deliberately chosen to be short and only showcased a small number of Lean tactics. Here I would like to walk through the process I used for a slightly longer proof I worked out recently, after seeing the following challenge from Damek Davis: to formalize (in a civilized fashion) the proof of the following lemma:
Lemma. Let and be sequences of real numbers indexed by natural numbers , with non-increasing and non-negative. Suppose also that for all . Then for all .
Here I tried to draw upon the lessons I had learned from the PFR formalization project, and to first set up a human readable proof of the lemma before starting the Lean formalization – a lower-case “blueprint” rather than the fancier Blueprint used in the PFR project. The main idea of the proof here is to use the telescoping series identity
Since is non-negative, and by hypothesis, we have
but by the monotone hypothesis on the left-hand side is at least , giving the claim.
This is already a human-readable proof, but in order to formalize it more easily in Lean, I decided to rewrite it as a chain of inequalities, starting at and ending at . With a little bit of pen and paper effort, I obtained
(by field identities)
(by the formula for summing a constant)
(by the monotone hypothesis)
(by the hypothesis
(by telescoping series)
(by the non-negativity of ).
I decided that this was a good enough blueprint for me to work with. The next step is to formalize the statement of the lemma in Lean. For this quick project, it was convenient to use the online Lean playground, rather than my local IDE, so the screenshots will look a little different from those in the previous post. (If you like, you can follow this tour in that playground, by clicking on the screenshots of the Lean code.) I start by importing Lean’s math library, and starting an example of a statement to state and prove:
Now we have to declare the hypotheses and variables. The main variables here are the sequences and , which in Lean are best modeled by functions a, D from the natural numbers ℕ to the reals ℝ. (One can choose to “hardwire” the non-negativity hypothesis into the by making D take values in the nonnegative reals (denoted NNReal in Lean), but this turns out to be inconvenient, because the laws of algebra and summation that we will need are clunkier on the non-negative reals (which are not even a group) than on the reals (which are a field). So we add in the variables:
Now we add in the hypotheses, which in Lean convention are usually given names starting with h. This is fairly straightforward; the one thing is that the property of being monotone decreasing already has a name in Lean’s Mathlib, namely Antitone, and it is generally a good idea to use the Mathlib provided terminology (because that library contains a lot of useful lemmas about such terms).
One thing to note here is that Lean is quite good at filling in implied ranges of variables. Because a and D have the natural numbers ℕ as their domain, the dummy variable k in these hypotheses is automatically being quantified over ℕ. We could have made this quantification explicit if we so chose, for instance using ∀ k : ℕ, 0 ≤ D k instead of ∀ k, 0 ≤ D k, but it is not necessary to do so. Also note that Lean does not require parentheses when applying functions: we write D k here rather than D(k) (which in fact does not compile in Lean unless one puts a space between the D and the parentheses). This is slightly different from standard mathematical notation, but is not too difficult to get used to.
This looks like the end of the hypotheses, so we could now add a colon to move to the conclusion, and then add that conclusion:
This is a perfectly fine Lean statement. But it turns out that when proving a universally quantified statement such as ∀ k, a k ≤ D 0 / (k + 1), the first step is almost always to open up the quantifier to introduce the variable k (using the Lean command intro k). Because of this, it is slightly more efficient to hide the universal quantifier by placing the variable k in the hypotheses, rather than in the quantifier (in which case we have to now specify that it is a natural number, as Lean can no longer deduce this from context):
At this point Lean is complaining of an unexpected end of input: the example has been stated, but not proved. We will temporarily mollify Lean by adding a sorry as the purported proof:
Now Lean is content, other than giving a warning (as indicated by the yellow squiggle under the example) that the proof contains a sorry.
It is now time to follow the blueprint. The Lean tactic for proving an inequality via chains of other inequalities is known as calc. We use the blueprint to fill in the calc that we want, leaving the justifications of each step as “sorry”s for now:
Here, we “open“ed the Finset namespace in order to easily access Finset‘s range function, with range k basically being the finite set of natural numbers , and also “open“ed the BigOperators namespace to access the familiar ∑ notation for (finite) summation, in order to make the steps in the Lean code resemble the blueprint as much as possible. One could avoid opening these namespaces, but then expressions such as ∑ i in range (k+1), a i would instead have to be written as something like Finset.sum (Finset.range (k+1)) (fun i ↦ a i), which looks a lot less like like standard mathematical writing. The proof structure here may remind some readers of the “two column proofs” that are somewhat popular in American high school geometry classes.
Now we have six sorries to fill. Navigating to the first sorry, Lean tells us the ambient hypotheses, and the goal that we need to prove to fill that sorry:
The ⊢ symbol here is Lean’s marker for the goal. The uparrows ↑ are coercion symbols, indicating that the natural number k has to be converted to a real number in order to interact via arithmetic operations with other real numbers such as a k, but we can ignore these coercions for this tour (for this proof, it turns out Lean will basically manage them automatically without need for any explicit intervention by a human).
The goal here is a self-evident algebraic identity; it involves division, so one has to check that the denominator is non-zero, but this is self-evident. In Lean, a convenient way to establish algebraic identities is to use the tactic field_simp to clear denominators, and then ring to verify any identity that is valid for commutative rings. This works, and clears the first sorry:
field_simp, by the way, is smart enough to deduce on its own that the denominator k+1 here is manifestly non-zero (and in fact positive); no human intervention is required to point this out. Similarly for other “clearing denominator” steps that we will encounter in the other parts of the proof.
Now we navigate to the next `sorry`. Lean tells us the hypotheses and goals:
We can reduce the goal by canceling out the common denominator ↑k+1. Here we can use the handy Lean tactic congr, which tries to match two sides of an equality goal as much as possible, and leave any remaining discrepancies between the two sides as further goals to be proven. Applying congr, the goal reduces to
Here one might imagine that this is something that one can prove by induction. But this particular sort of identity – summing a constant over a finite set – is already covered by Mathlib. Indeed, searching for Finset, sum, and const soon leads us to the Finset.sum_const lemma here. But there is an even more convenient path to take here, which is to apply the powerful tactic simp, which tries to simplify the goal as much as possible using all the “simp lemmas” Mathlib has to offer (of which Finset.sum_const is an example, but there are thousands of others). As it turns out, simp completely kills off this identity, without any further human intervention:
Now we move on to the next sorry, and look at our goal:
congr doesn’t work here because we have an inequality instead of an equality, but there is a powerful relative gcongr of congr that is perfectly suited for inequalities. It can also open up sums, products, and integrals, reducing global inequalities between such quantities into pointwise inequalities. If we invoke gcongr with i hi (where we tell gcongr to use i for the variable opened up, and hi for the constraint this variable will satisfy), we arrive at a greatly simplified goal (and a new ambient variable and hypothesis):
Now we need to use the monotonicity hypothesis on a, which we have named ha here. Looking at the documentation for Antitone, one finds a lemma that looks applicable here:
One can apply this lemma in this case by writing apply Antitone.imp ha, but because ha is already of type Antitone, we can abbreviate this to apply ha.imp. (Actually, as indicated in the documentation, due to the way Antitone is defined, we can even just use apply ha here.) This reduces the goal nicely:
The goal is now very close to the hypothesis hi. One could now look up the documentation for Finset.range to see how to unpack hi, but as before simp can do this for us. Invoking simp at hi, we obtain
Now the goal and hypothesis are very close indeed. Here we can just close the goal using the linarith tactic used in the previous tour:
The next sorry can be resolved by similar methods, using the hypothesis hD applied at the variable i:
Now for the penultimate sorry. As in a previous step, we can use congr to remove the denominator, leaving us in this state:
This is a telescoping series identity. One could try to prove it by induction, or one could try to see if this identity is already in Mathlib. Searching for Finset, sum, and sub will locate the right tool (as the fifth hit), but a simpler way to proceed here is to use the exact? tactic we saw in the previous tour:
A brief check of the documentation for sum_range_sub' confirms that this is what we want. Actually we can just use apply sum_range_sub' here, as the apply tactic is smart enough to fill in the missing arguments:
One last sorry to go. As before, we use gcongr to cancel denominators, leaving us with
This looks easy, because the hypothesis hpos will tell us that D (k+1) is nonnegative; specifically, the instance hpos (k+1) of that hypothesis will state exactly this. The linarith tactic will then resolve this goal once it is told about this particular instance:
We now have a complete proof – no more yellow squiggly line in the example. There are two warnings though – there are two variables i and hi introduced in the proof that Lean’s “linter” has noticed are not actually used in the proof. So we can rename them with underscores to tell Lean that we are okay with them not being used:
This is a perfectly fine proof, but upon noticing that many of the steps are similar to each other, one can do a bit of “code golf” as in the previous tour to compactify the proof a bit:
With enough familiarity with the Lean language, this proof actually tracks quite closely with (an optimized version of) the human blueprint.
This concludes the tour of a lengthier Lean proving exercise. I am finding the pre-planning step of the proof (using an informal “blueprint” to break the proof down into extremely granular pieces) to make the formalization process significantly easier than in the past (when I often adopted a sequential process of writing one line of code at a time without first sketching out a skeleton of the argument). (The proof here took only about 15 minutes to create initially, although for this blog post I had to recreate it with screenshots and supporting links, which took significantly more time.) I believe that a realistic near-term goal for AI is to be able to fill in automatically a significant fraction of the sorts of atomic “sorry“s of the size one saw in this proof, allowing one to convert a blueprint to a formal Lean proof even more rapidly.
One final remark: in this tour I filled in the “sorry“s in the order in which they appeared, but there is actually no requirement that one does this, and once one has used a blueprint to atomize a proof into self-contained smaller pieces, one can fill them in in any order. Importantly for a group project, these micro-tasks can be parallelized, with different contributors claiming whichever “sorry” they feel they are qualified to solve, and working independently of each other. (And, because Lean can automatically verify if their proof is correct, there is no need to have a pre-existing bond of trust with these contributors in order to accept their contributions.) Furthermore, because the specification of a “sorry” someone can make a meaningful contribution to the proof by working on an extremely localized component of it without needing the mathematical expertise to understand the global argument. This is not particularly important in this simple case, where the entire lemma is not too hard to understand to a trained mathematician, but can become quite relevant for complex formalization projects.
Since the release of my preprint with Tim, Ben, and Freddie proving the Polynomial Freiman-Ruzsa (PFR) conjecture over , I (together with Yael Dillies and Bhavik Mehta) have started a collaborative project to formalize this argument in the proof assistant language Lean4. It has been less than a week since the project was launched, but it is proceeding quite well, with a significant fraction of the paper already either fully or partially formalized. The project has been greatly assisted by the Blueprint tool of Patrick Massot, which allows one to write a human-readable “blueprint” of the proof that is linked to the Lean formalization; similar blueprints have been used for other projects, such as Scholze’s liquid tensor experiment. For the PFR project, the blueprint can be found here. One feature of the blueprint that I find particularly appealing is the dependency graph that is automatically generated from the blueprint, and can provide a rough snapshot of how far along the formalization has advanced. For PFR, the latest state of the dependency graph can be found here. At the current time of writing, the graph looks like this:
The color coding of the various bubbles (for lemmas) and rectangles (for definitions) is explained in the legend to the dependency graph, but roughly speaking the green bubbles/rectangles represent lemmas or definitions that have been fully formalized, and the blue ones represent lemmas or definitions which are ready to be formalized (their statements, but not proofs, have already been formalized, as well as those of all prerequisite lemmas and proofs). The goal is to get all the bubbles leading up to and including the “pfr” bubble at the bottom colored in green.
In this post I would like to give a quick “tour” of the project, to give a sense of how it operates. If one clicks on the “pfr” bubble at the bottom of the dependency graph, we get the following:
Here, Blueprint is displaying a human-readable form of the PFR statement. This is coming from the corresponding portion of the blueprint, which also comes with a human-readable proof of this statement that relies on other statements in the project:
(I have cropped out the second half of the proof here, as it is not relevant to the discussion.)
Observe that the “pfr” bubble is white, but has a green border. This means that the statement of PFR has been formalized in Lean, but not the proof; and the proof itself is not ready to be formalized, because some of the prerequisites (in particular, “entropy-pfr” (Theorem 6.16)) do not even have their statements formalized yet. If we click on the “Lean” link below the description of PFR in the dependency graph, we are lead to the (auto-generated) Lean documentation for this assertion:
This is what a typical theorem in Lean looks like (after a procedure known as “pretty printing”). There are a number of hypotheses stated before the colon, for instance that is a finite elementary abelian group of order (this is how we have chosen to formalize the finite field vector spaces ), that is a non-empty subset of (the hypothesis that is non-empty was not stated in the LaTeX version of the conjecture, but we realized it was necessary in the formalization, and will update the LaTeX blueprint shortly to reflect this) with the cardinality of less than times the cardinality of , and the statement after the colon is the conclusion: that can be contained in the sum of a subgroup of and a set of cardinality at most .
The astute reader may notice that the above theorem seems to be missing one or two details, for instance it does not explicitly assert that is a subgroup. This is because the “pretty printing” suppresses some of the information in the actual statement of the theorem, which can be seen by clicking on the “Source” link:
Here we see that is required to have the “type” of an additive subgroup of . (Lean’s language revolves very strongly around types, but for this tour we will not go into detail into what a type is exactly.) The prominent “sorry” at the bottom of this theorem asserts that a proof is not yet provided for this theorem, but the intention of course is to replace this “sorry” with an actual proof eventually.
Filling in this “sorry” is too hard to do right now, so let’s look for a simpler task to accomplish instead. Here is a simple intermediate lemma “ruzsa-nonneg” that shows up in the proof:
The expression refers to something called the entropic Ruzsa distance between and , which is something that is defined elsewhere in the project, but for the current discussion it is not important to know its precise definition, other than that it is a real number. The bubble is blue with a green border, which means that the statement has been formalized, and the proof is ready to be formalized also. The blueprint dependency graph indicates that this lemma can be deduced from just one preceding lemma, called “ruzsa-diff“:
“ruzsa-diff” is also blue and bordered in green, so it has the same current status as “ruzsa-nonneg“: the statement is formalized, and the proof is ready to be formalized also, but the proof has not been written in Lean yet. The quantity , by the way, refers to the Shannon entropy of , defined elsewhere in the project, but for this discussion we do not need to know its definition, other than to know that it is a real number.
Looking at Lemma 3.11 and Lemma 3.13 it is clear how the former will imply the latter: the quantity is clearly non-negative! (There is a factor of present in Lemma 3.11, but it can be easily canceled out.) So it should be an easy task to fill in the proof of Lemma 3.13 assuming Lemma 3.11, even if we still don’t know how to prove Lemma 3.11 yet. Let’s first look at the Lean code for each lemma. Lemma 3.11 is formalized as follows:
Again we have a “sorry” to indicate that this lemma does not currently have a proof. The Lean notation (as well as the name of the lemma) differs a little from the LaTeX version for technical reasons that we will not go into here. (Also, the variables are introduced at an earlier stage in the Lean file; again, we will ignore this point for the ensuing discussion.) Meanwhile, Lemma 3.13 is currently formalized as
OK, I’m now going to try to fill in the latter “sorry”. In my local copy of the PFR github repository, I open up the relevant Lean file in my editor (Visual Studio Code, with the lean4 extension) and navigate to the “sorry” of “rdist_nonneg”. The accompanying “Lean infoview” then shows the current state of the Lean proof:
Here we see a number of ambient hypotheses (e.g., that is an additive commutative group, that is a map from to , and so forth; many of these hypotheses are not actually relevant for this particular lemma), and at the bottom we see the goal we wish to prove.
OK, so now I’ll try to prove the claim. This is accomplished by applying a series of “tactics” to transform the goal and/or hypotheses. The first step I’ll do is to put in the factor of that is needed to apply Lemma 3.11. This I will do with the “suffices” tactic, writing in the proof
I now have two goals (and two “sorries”): one to show that implies , and the other to show that . (The yellow squiggly underline indicates that this lemma has not been fully proven yet due to the presence of “sorry”s. The dot “.” is a syntactic marker that is useful to separate the two goals from each other, but you can ignore it for this tour.) The Lean tactic “suffices” corresponds, roughly speaking, to the phrase “It suffices to show that …” (or more precisely, “It suffices to show that … . To see this, … . It remains to verify the claim …”) in Mathematical English. For my own education, I wrote a “Lean phrasebook” of further correspondences between lines of Lean code and sentences or phrases in Mathematical English, which can be found here.
Let’s fill in the first “sorry”. The tactic state now looks like this (cropping out some irrelevant hypotheses):
Here I can use a handy tactic “linarith“, which solves any goal that can be derived by linear arithmetic from existing hypotheses:
This works, and now the tactic state reports no goals left to prove on this branch, so we move on to the remaining sorry, in which the goal is now to prove :
Here we will try to invoke Lemma 3.11. I add the following lines of code:
The Lean tactic “have” roughly corresponds to the Mathematical English phrase “We have the statement…” or “We claim the statement…”; like “suffices”, it splits a goal into two subgoals, though in the reversed order to “suffices”.
I again have two subgoals, one to prove the bound (which I will call “h”), and then to deduce the previous goal from . For the first, I know I should invoke the lemma “diff_ent_le_rdist” that is encoding Lemma 3.11. One way to do this is to try the tactic “exact?”, which will automatically search to see if the goal can already be deduced immediately from an existing lemma. It reports:
So I try this (by clicking on the suggested code, which automatically pastes it into the right location), and it works, leaving me with the final “sorry”:
The lean tactic “exact” corresponds, roughly speaking, to the Mathematical English phrase “But this is exactly …”.
At this point I should mention that I also have the Github Copilot extension to Visual Studio Code installed. This is an AI which acts as an advanced autocomplete that can suggest possible lines of code as one types. In this case, it offered a suggestion which was almost correct (the second line is what we need, whereas the first is not necessary, and in fact does not even compile in Lean):
In any event, “exact?” worked in this case, so I can ignore the suggestion of Copilot this time (it has been very useful in other cases though). I apply the “exact?” tactic a second time and follow its suggestion to establish the matching bound :
(One can find documention for the “abs_nonneg” method here. Copilot, by the way, was also able to resolve this step, albeit with a slightly different syntax; there are also several other search engines available to locate this method as well, such as Moogle. One of the main purposes of the Lean naming conventions for lemmas, by the way, is to facilitate the location of methods such as “abs_nonneg”, which is easier figure out how to search for than a method named (say) “Lemma 1.2.1”.) To fill in the final “sorry”, I try “exact?” one last time, to figure out how to combine and to give the desired goal, and it works!
Note that all the squiggly underlines have disappeared, indicating that Lean has accepted this as a valid proof. The documentation for “ge_trans” may be found here. The reader may observe that this method uses the relation rather than the relation, but in Lean the assertions and are “definitionally equal“, allowing tactics such as “exact” to use them interchangeably. “exact le_trans h’ h” would also have worked in this instance.
It is possible to compactify this proof quite a bit by cutting out several intermediate steps (a procedure sometimes known as “code golf“):
And now the proof is done! In the end, it was literally a “one-line proof”, which makes sense given how close Lemma 3.11 and Lemma 3.13 were to each other.
The current version of Blueprint does not automatically verify the proof (even though it does compile in Lean), so we have to manually update the blueprint as well. The LaTeX for Lemma 3.13 currently looks like this:
I add the “\leanok” macro to the proof, to flag that the proof has now been formalized:
I then push everything back up to the master Github repository. The blueprint will take quite some time (about half an hour) to rebuild, but eventually it does, and the dependency graph (which Blueprint has for some reason decided to rearrange a bit) now shows “ruzsa-nonneg” in green:
And so the formalization of PFR moves a little bit closer to completion. (Of course, this was a particularly easy lemma to formalize, that I chose to illustrate the process; one can imagine that most other lemmas will take a bit more work.) Note that while “ruzsa-nonneg” is now colored in green, we don’t yet have a full proof of this result, because the lemma “ruzsa-diff” that it relies on is not green. Nevertheless, the proof is locally complete at this point; hopefully at some point in the future, the predecessor results will also be locally proven, at which point this result will be completely proven. Note how this blueprint structure allows one to work on different parts of the proof asynchronously; it is not necessary to wait for earlier stages of the argument to be fully formalized to start working on later stages, although I anticipate a small amount of interaction between different components as we iron out any bugs or slight inaccuracies in the blueprint. (For instance, I am suspecting that we may need to add some measurability hypotheses on the random variables in the above two lemmas to make them completely true, but this is something that should emerge organically as the formalization process continues.)
That concludes the brief tour! If you are interested in learning more about the project, you can follow the Zulip chat stream; you can also download Lean and work on the PFR project yourself, using a local copy of the Github repository and sending pull requests to the master copy if you have managed to fill in one or more of the “sorry”s in the current version (but if you plan to work on anything more large scale than filling in a small lemma, it is good to announce your intention on the Zulip chat to avoid duplication of effort) . (One key advantage of working with a project based around a proof assistant language such as Lean is that it makes large-scale mathematical collaboration possible without necessarily having a pre-established level of trust amongst the collaborators; my fellow repository maintainers and I have already approved several pull requests from contributors that had not previously met, as the code was verified to be correct and we could see that it advanced the project. Conversely, as the above example should hopefully demonstrate, it is possible for a contributor to work on one small corner of the project without necessarily needing to understand all the mathematics that goes into the project as a whole.)
If one just wants to experiment with Lean without going to the effort of downloading it, you can playing try the “Natural Number Game” for a gentle introduction to the language, or the Lean4 playground for an online Lean server. Further resources to learn Lean4 may be found here.
Theorem 1 (Polynomial Freiman–Ruzsa conjecture) Let be such that . Then can be covered by at most translates of a subspace of of cardinality at most .
The previous best known result towards this conjecture was by Konyagin (as communicated in this paper of Sanders), who obtained a similar result but with replaced by for any (assuming that say to avoid some degeneracies as approaches , which is not the difficult case of the conjecture). The conjecture (with replaced by an unspecified constant ) has a number of equivalent forms; see this survey of Green, and these papers of Lovett and of Green and myself for some examples; in particular, as discussed in the latter two references, the constants in the inverse theorem are now polynomial in nature (although we did not try to optimize the constant).
The exponent here was the product of a large number of optimizations to the argument (our original exponent here was closer to ), but can be improved even further with additional effort (our current argument, for instance, allows one to replace it with , but we decided to state our result using integer exponents instead).
In this paper we will focus exclusively on the characteristic case (so we will be cavalier in identifying addition and subtraction), but in a followup paper we will establish similar results in other finite characteristics.
Much of the prior progress on this sort of result has proceeded via Fourier analysis. Perhaps surprisingly, our approach uses no Fourier analysis whatsoever, being conducted instead entirely in “physical space”. Broadly speaking, it follows a natural strategy, which is to induct on the doubling constant . Indeed, suppose for instance that one could show that every set of doubling constant was “commensurate” in some sense to a set of doubling constant at most . One measure of commensurability, for instance, might be the Ruzsa distance, which one might hope to control by . Then one could iterate this procedure until doubling constant dropped below say , at which point the conjecture is known to hold (there is an elementary argument that if has doubling constant less than , then is in fact a subspace of ). One can then use several applications of the Ruzsa triangle inequality
to conclude (the fact that we reduce to means that the various Ruzsa distances that need to be summed are controlled by a convergent geometric series).
There are a number of possible ways to try to “improve” a set of not too large doubling by replacing it with a commensurate set of better doubling. We note two particular potential improvements:
(i) Replacing with . For instance, if was a random subset (of density ) of a large subspace of , then replacing with usually drops the doubling constant from down to nearly (under reasonable choices of parameters).
(ii) Replacing with for a “typical” . For instance, if was the union of random cosets of a subspace of large codimension, then replacing with again usually drops the doubling constant from down to nearly .
Unfortunately, there are sets where neither of the above two operations (i), (ii) significantly improves the doubling constant. For instance, if is a random density subset of random translates of a medium-sized subspace , one can check that the doubling constant stays close to if one applies either operation (i) or operation (ii). But in this case these operations don’t actually worsen the doubling constant much either, and by applying some combination of (i) and (ii) (either intersecting with a translate, or taking a sumset of with itself) one can start lowering the doubling constant again.
This begins to suggest a potential strategy: show that at least one of the operations (i) or (ii) will improve the doubling constant, or at least not worsen it too much; and in the latter case, perform some more complicated operation to locate the desired doubling constant improvement.
A sign that this strategy might have a chance of working is provided by the following heuristic argument. If has doubling constant , then the Cartesian product has doubling constant . On the other hand, by using the projection map defined by , we see that projects to , with fibres being essentially a copy of . So, morally, also behaves like a “skew product” of and the fibres , which suggests (non-rigorously) that the doubling constant of is also something like the doubling constant of , times the doubling constant of a typical fibre . This would imply that at least one of and would have doubling constant at most , and thus that at least one of operations (i), (ii) would not worsen the doubling constant.
Unfortunately, this argument does not seem to be easily made rigorous using the traditional doubling constant; even the significantly weaker statement that has doubling constant at most is false (see comments for more discussion). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset in is a random variable taking values in , and the analogue of the (logarithmic) doubling constant is the entropic doubling constant , where are independent copies of . If is a random variable in some additive group and is a homomorphism, one then has what we call the fibring inequality
where the conditional doubling constant is defined as
Indeed, from the chain rule for Shannon entropy one has
and
while from the non-negativity of conditional mutual information one has
and it is an easy matter to combine all these identities and inequalities to obtain the claim.
Applying this inequality with replaced by two independent copies of itself, and using the addition map for , we obtain in particular that
or (since is determined by once one fixes )
So if , then at least one of or will be less than or equal to . This is the entropy analogue of at least one of (i) or (ii) improving, or at least not degrading the doubling constant, although there are some minor technicalities involving how one deals with the conditioning to in the second term that we will gloss over here (one can pigeonhole the instances of to different events , , and “depolarise” the induction hypothesis to deal with distances between pairs of random variables that do not necessarily have the same distribution). Furthermore, we can even calculate the defect in the above inequality: a careful inspection of the above argument eventually reveals that
where we now take four independent copies . This leads (modulo some technicalities) to the following interesting conclusion: if neither (i) nor (ii) leads to an improvement in the entropic doubling constant, then and are conditionally independent relative to . This situation (or an approximation to this situation) is what we refer to in the paper as the “endgame”.
A version of this endgame conclusion is in fact valid in any characteristic. But in characteristic , we can take advantage of the identity
Conditioning on , and using symmetry we now conclude that if we are in the endgame exactly (so that the mutual information is zero), then the independent sum of two copies of has exactly the same distribution; in particular, the entropic doubling constant here is zero, which is certainly a reduction in the doubling constant.
To deal with the situation where the conditional mutual information is small but not completely zero, we have to use an entropic version of the Balog-Szemeredi-Gowers lemma, but fortunately this was already worked out in an old paper of mine (although in order to optimise the final constant, we ended up using a slight variant of that lemma).
whenever and are non-negative. It can be proven as a consequence of the Newton inequality
valid for all and arbitrary real (in particular, here the are allowed to be negative). Note that the case of this inequality is just the arithmetic mean-geometric mean inequality
the general case of this inequality can be deduced from this special case by a number of standard manipulations (the most non-obvious of which is the operation of differentiating the real-rooted polynomial to obtain another real-rooted polynomial, thanks to Rolle’s theorem; the key point is that this operation preserves all the elementary symmetric means up to ). One can think of Maclaurin’s inequality as providing a refined version of the arithmetic mean-geometric mean inequality on variables (which corresponds to the case , ).
Whereas Newton’s inequality works for arbitrary real , the Maclaurin inequality breaks down once one or more of the are permitted to be negative. A key example occurs when is even, half of the are equal to , and half are equal to . Here, one can verify that the elementary symmetric means vanish for odd and are equal to for even . In particular, some routine estimation then gives the order of magnitude bound
for even, thus giving a significant violation of the Maclaurin inequality even after putting absolute values around the . In particular, vanishing of one does not imply vanishing of all subsequent .
On the other hand, it was observed by Gopalan and Yehudayoff that if two consecutive values are small, then this makes all subsequent values small as well. More precise versions of this statement were subsequently observed by Meka-Reingold-Tal and Doron-Hatami-Hoza, who obtained estimates of the shape
whenever and are real (but possibly negative). For instance, setting we obtain the inequality
which can be established by combining the arithmetic mean-geometric mean inequality
As with the proof of the Newton inequalities, the general case of (2) can be obtained from this special case after some standard manipulations (including the differentiation operation mentioned previously).
However, if one inspects the bound (2) against the bounds (1) given by the key example, we see a mismatch – the right-hand side of (2) is larger than the left-hand side by a factor of about . The main result of the paper rectifies this by establishing the optimal (up to constants) improvement
Unlike the previous arguments, we do not rely primarily on the arithmetic mean-geometric mean inequality. Instead, our primary tool is a new inequality
valid for all and . Roughly speaking, the bound (3) would follow from (4) by setting , provided that we can show that the terms of the left-hand side dominate the sum in this regime. This can be done, after a technical step of passing to tuples which nearly optimize the required inequality (3).
We sketch the proof of the inequality (4) as follows. One can use some standard manipulations reduce to the case where and , and after replacing with one is now left with establishing the inequality
Note that equality is attained in the previously discussed example with half of the equal to and the other half equal to , thanks to the binomial theorem.
To prove this identity, we consider the polynomial
Evaluating this polynomial at , taking absolute values, using the triangle inequality, and then taking logarithms, we conclude that
A common task in analysis is to obtain bounds on sums
or integrals
where is some simple region (such as an interval) in one or more dimensions, and is an explicit (and elementary) non-negative expression involving one or more variables (such as or , and possibly also some additional parameters. Often, one would be content with an order of magnitude upper bound such as
or
where we use (or or ) to denote the bound for some constant ; sometimes one wishes to also obtain the matching lower bound, thus obtaining
or
where is synonymous with . Finally, one may wish to obtain a more precise bound, such as
where is a quantity that goes to zero as the parameters of the problem go to infinity (or some other limit). (For a deeper dive into asymptotic notation in general, see this previous blog post.)
Here are some typical examples of such estimation problems, drawn from recent questions on MathOverflow:
as for an explicit constant , and what is this constant?
Compared to other estimation tasks, such as that of controlling oscillatory integrals, exponential sums, singular integrals, or expressions involving one or more unknown functions (that are only known to lie in some function spaces, such as an space), high-dimensional geometry (or alternatively, large numbers of random variables), or number-theoretic structures (such as the primes), estimation of sums or integrals of non-negative elementary expressions is a relatively straightforward task, and can be accomplished by a variety of methods. The art of obtaining such estimates is typically not explicitly taught in textbooks, other than through some examples and exercises; it is typically picked up by analysts (or those working in adjacent areas, such as PDE, combinatorics, or theoretical computer science) as graduate students, while they work through their thesis or their first few papers in the subject.
Somewhat in the spirit of this previous post on analysis problem solving strategies, I am going to try here to collect some general principles and techniques that I have found useful for these sorts of problems. As with the previous post, I hope this will be something of a living document, and encourage others to add their own tips or suggestions in the comments.
Rachel Greenfeld and I have just uploaded to the arXiv our paper “Undecidability of translational monotilings“. This is a sequel to our previous paper in which we constructed a translational monotiling of a high-dimensional lattice (thus the monotile is a finite set and the translates , of partition ) which was aperiodic (there is no way to “repair” this tiling into a periodic tiling , in which is now periodic with respect to a finite index subgroup of ). This disproved the periodic tiling conjecture of Stein, Grunbaum-Shephard and Lagarias-Wang, which asserted that such aperiodic translational monotilings do not exist. (Compare with the “hat monotile“, which is a recently discovered aperiodic isometric monotile for of , where one is now allowed to use rotations and reflections as well as translations, or the even more recent “spectre monotile“, which is similar except that no reflections are needed.)
One of the motivations of this conjecture was the observation of Hao Wang that if the periodic tiling conjecture were true, then the translational monotiling problem is (algorithmically) decidable: there is a Turing machine which, when given a dimension and a finite subset of , can determine in finite time whether can tile . This is because if a periodic tiling exists, it can be found by computer search; and if no tiling exists at all, then (by the compactness theorem) there exists some finite subset of that cannot be covered by disjoint translates of , and this can also be discovered by computer search. The periodic tiling conjecture asserts that these are the only two possible scenarios, thus giving the decidability.
On the other hand, Wang’s argument is not known to be reversible: the failure of the periodic tiling conjecture does not automatically imply the undecidability of the translational monotiling problem, as it does not rule out the existence of some other algorithm to determine tiling that does not rely on the existence of a periodic tiling. (For instance, even with the newly discovered hat and spectre tiles, it remains an open question whether the isometric monotiling problem for (say) polygons with rational coefficients in is decidable, with or without reflections.)
The main result of this paper settles this question (with one caveat):
Theorem 1 There does not exist any algorithm which, given a dimension , a periodic subset of , and a finite subset of , determines in finite time whether there is a translational tiling of by .
The caveat is that we have to work with periodic subsets of , rather than all of ; we believe this is largely a technical restriction of our method, and it is likely that can be removed with additional effort and creativity. We also remark that when , the periodic tiling conjecture was established by Bhattacharya, and so the problem is decidable in the case. It remains open whether the tiling problem is decidable for any fixed value of (note in the above result that the dimension is not fixed, but is part of the input).
Because of a well known link between algorithmic undecidability and logical undecidability (also known as logical independence), the main theorem also implies the existence of an (in principle explicitly describable) dimension , periodic subset of , and a finite subset of , such that the assertion that tiles by translation cannot be proven or disproven in ZFC set theory (assuming of course that this theory is consistent).
As a consequence of our method, we can also replace here by “virtually two-dimensional” groups , with a finite abelian group (which now becomes part of the input, in place of the dimension ).
We now describe some of the main ideas of the proof. It is a common technique to show that a given problem is undecidable by demonstrating that some other problem that was already known to be undecidable can be “encoded” within the original problem, so that any algorithm for deciding the original problem would also decide the embedded problem. Accordingly, we will encode the Wang tiling problem as a monotiling problem in :
Problem 2 (Wang tiling problem) Given a finite collection of Wang tiles (unit squares with each side assigned some color from a finite palette), is it possible to tile the plane with translates of these tiles along the standard lattice , such that adjacent tiles have matching colors along their common edge?
It is a famous result of Berger that this problem is undecidable. The embedding of this problem into the higher-dimensional translational monotiling problem proceeds through some intermediate problems. Firstly, it is an easy matter to embed the Wang tiling problem into a similar problem which we call the domino problem:
Problem 3 (Domino problem) Given a finite collection (resp. ) of horizontal (resp. vertical) dominoes – pairs of adjacent unit squares, each of which is decorated with an element of a finite set of “pips”, is it possible to assign a pip to each unit square in the standard lattice tiling of , such that every horizontal (resp. vertical) pair of squares in this tiling is decorated using a domino from (resp. )?
Indeed, one just has to interpet each Wang tile as a separate “pip”, and define the domino sets , to be the pairs of horizontally or vertically adjacent Wang tiles with matching colors along their edge.
Next, we embed the domino problem into a Sudoku problem:
Problem 4 (Sudoku problem) Given a column width , a digit set , a collection of functions , and an “initial condition” (which we will not detail here, as it is a little technical), is it possible to assign a digit to each cell in the “Sudoku board” such that for any slope and intercept , the digits along the line lie in (and also that obeys the initial condition )?
The most novel part of the paper is the demonstration that the domino problem can indeed be embedded into the Sudoku problem. The embedding of the Sudoku problem into the monotiling problem follows from a modification of the methods in our previouspapers, which had also introduced versions of the Sudoku problem, and created a “tiling language” which could be used to “program” various problems, including the Sudoku problem, as monotiling problems.
To encode the domino problem into the Sudoku problem, we need to take a domino function (obeying the domino constraints associated to some domino sets ) and use it to build a Sudoku function (obeying some Sudoku constraints relating to the domino sets); conversely, every Sudoku function obeying the rules of our Sudoku puzzle has to arise somehow from a domino function. The route to doing so was not immediately obvious, but after a helpful tip from Emmanuel Jeandel, we were able to adapt some ideas of Aanderaa and Lewis, in which certain hierarchical structures were used to encode one problem in another. Here, we interpret hierarchical structure -adically (using two different primes due to the two-dimensionality of the domino problem). The Sudoku function that will exemplify our embedding is then built from by the formula
where are two large distinct primes (for instance one can take , for concreteness), denotes the number of times divides , and is the last non-zero digit in the base expansion of :
(with the conventions and ). In the case , the first component of (1) looks like this:
and a typical instance of the final component looks like this:
Amusingly, the decoration here is essentially following the rules of the children’s game “Fizz buzz“.
To demonstrate the embedding, we thus need to produce a specific Sudoku rule (as well as a more technical initial condition , which is basically required to exclude degenerate Sudoku solutions such as a constant solution) that can “capture” the target function (1), in the sense that the only solutions to this specific Sudoku puzzle are given by variants of (e.g., composed with various linear transformations). In our previous paper we were able to build a Sudoku puzzle that could similarly capture either of the first two components , of our target function (1) (up to linear transformations), by a procedure very akin to solving an actual Sudoku puzzle (combined with iterative use of a “Tetris” move in which we eliminate rows of the puzzle that we have fully solved, to focus on the remaining unsolved rows). Our previous paper treated the case when was replaced with a power of , as this was the only case that we know how to embed in a monotiling problem of the entirety of (as opposed to a periodic subset of ), but the analysis is in fact easier when is a large odd prime, instead of a power of . Once the first two components have been solved for, it is a relatively routine matter to design an additional constraint in the Sudoku rule that then constrains the third component to be of the desired form , with obeying the domino constraints.
(OEIS A365339). For instance, because the totient function is non-decreasing on the set or , but not on the set .
Since for any prime , we have , where is the prime counting function. Empirically, the primes come quite close to achieving the maximum length ; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had
for all ; this conjecture is verified up to . The previous best known upper bound was basically of the form
as for an explicit constant , from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic
The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor of a given number which has a sizeable influence on the totient function . For instance, for “typical” numbers , one has a factorization
where is a medium sized prime, is a significantly larger prime, and is a number with all prime factors less than . This leads to an approximation
As a consequence, if we temporarily hold fixed, and also localize to a relatively short interval, then can only be non-decreasing in if is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if is large; this can be formalized by partitioning the range of into various subintervals and inspecting how this (and the monotonicity hypothesis on ) constrains the values of associated to each subinterval. When is small, we instead use a factorization
where is very smooth (i.e., has no large prime factors), and is a large prime. Now we have the approximation
and we can conclude that will have to basically be piecewise constant in order for to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality
for all positive rational numbers . This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with equal to a fixed constant (and omitting a few rare values of where the approximation (4) is bad enough that is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of in lowest terms necessarily equals the largest prime factor of , that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of for some primes ranging up to the largest prime factor of ) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function, one would need the analogue
of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.
In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.
As someone who had a relatively light graduate education in algebra, the import of Yoneda’s lemma in category theory has always eluded me somewhat; the statement and proof are simple enough, but definitely have the “abstract nonsense” flavor that one often ascribes to this part of mathematics, and I struggled to connect it to the more grounded forms of intuition, such as those based on concrete examples, that I was more comfortable with. There is a popular MathOverflow post devoted to this question, with many answers that were helpful to me, but I still felt vaguely dissatisfied. However, recently when pondering the very concrete concept of a polynomial, I managed to accidentally stumble upon a special case of Yoneda’s lemma in action, which clarified this lemma conceptually for me. In the end it was a very simple observation (and would be extremely pedestrian to anyone who works in an algebraic field of mathematics), but as I found this helpful to a non-algebraist such as myself, and I thought I would share it here in case others similarly find it helpful.
In algebra we see a distinction between a polynomial form (also known as a formal polynomial), and a polynomial function, although this distinction is often elided in more concrete applications. A polynomial form in, say, one variable with integer coefficients, is a formal expression of the form
where are coefficients in the integers, and is an indeterminate: a symbol that is often intended to be interpreted as an integer, real number, complex number, or element of some more general ring , but is for now a purely formal object. The collection of such polynomial forms is denoted , and is a commutative ring.
A polynomial form can be interpreted in any ring (even non-commutative ones) to create a polynomial function, defined by the formula
for any . This definition (2) looks so similar to the definition (1) that we usually abuse notation and conflate with . This conflation is supported by the identity theorem for polynomials, that asserts that if two polynomial forms agree at an infinite number of (say) complex numbers, thus for infinitely many , then they agree as polynomial forms (i.e., their coefficients match). But this conflation is sometimes dangerous, particularly when working in finite characteristic. For instance:
(i) The linear forms and are distinct as polynomial forms, but agree when interpreted in the ring , since for all .
(ii) Similarly, if is a prime, then the degree one form and the degree form are distinct as polynomial forms (and in particular have distinct degrees), but agree when interpreted in the ring , thanks to Fermat’s little theorem.
(iii) The polynomial form has no roots when interpreted in the reals , but has roots when interpreted in the complex numbers . Similarly, the linear form has no roots when interpreted in the integers , but has roots when interpreted in the rationals .
The above examples show that if one only interprets polynomial forms in a specific ring , then some information about the polynomial could be lost (and some features of the polynomial, such as roots, may be “invisible” to that interpretation). But this turns out not to be the case if one considers interpretations in all rings simultaneously, as we shall now discuss.
If are two different rings, then the polynomial functions and arising from interpreting a polynomial form in these two rings are, strictly speaking, different functions. However, they are often closely related to each other. For instance, if is a subring of , then agrees with the restriction of to . More generally, if there is a ring homomorphism from to , then and are intertwined by the relation
which basically asserts that ring homomorphism respect polynomial operations. Note that the previous observation corresponded to the case when was an inclusion homomorphism. Another example comes from the complex conjugation automorphism on the complex numbers, in which case (3) asserts the identity
for any polynomial function on the complex numbers, and any complex number .
What was surprising to me (as someone who had not internalized the Yoneda lemma) was that the converse statement was true: if one had a function associated to every ring that obeyed the intertwining relation
for every ring homomorphism , then there was a unique polynomial form such that for all rings . This seemed surprising to me because the functions were a priori arbitrary functions, and as an analyst I would not expect them to have polynomial structure. But the fact that (4) holds for all rings and all homomorphisms is in fact rather powerful. As an analyst, I am tempted to proceed by first working with the ring of complex numbers and taking advantage of the aforementioned identity theorem, but this turns out to be tricky because does not “talk” to all the other rings enough, in the sense that there are not always as many ring homomorphisms from to as one would like. But there is in fact a more elementary argument that takes advantage of a particularly relevant (and “talkative”) ring to the theory of polynomials, namely the ring of polynomials themselves. Given any other ring , and any element of that ring, there is a unique ring homomorphism from to that maps to , namely the evaluation map
that sends a polynomial form to its evaluation at . Applying (4) to this ring homomorphism, and specializing to the element of , we conclude that
for any ring and any . If we then define to be the formal polynomial
then this identity can be rewritten as
and so we have indeed shown that the family arises from a polynomial form . Conversely, from the identity
valid for any polynomial form , we see that two polynomial forms can only generate the same polynomial functions for all rings if they are identical as polynomial forms. So the polynomial form associated to the family is unique.
We have thus created an identification of form and function: polynomial forms are in one-to-one correspondence with families of functions obeying the intertwining relation (4). But this identification can be interpreted as a special case of the Yoneda lemma, as follows. There are two categories in play here: the category of rings (where the morphisms are ring homomorphisms), and the category of sets (where the morphisms are arbitrary functions). There is an obvious forgetful functor between these two categories that takes a ring and removes all of the algebraic structure, leaving behind just the underlying set. A collection of functions (i.e., -morphisms) for each in that obeys the intertwining relation (4) is precisely the same thing as a natural transformation from the forgetful functor to itself. So we have identified formal polynomials in as a set with natural endomorphisms of the forgetful functor:
Informally: polynomial forms are precisely those operations on rings that are respected by ring homomorphisms.
What does this have to do with Yoneda’s lemma? Well, remember that every element of a ring came with an evaluation homomorphism . Conversely, every homomorphism from to will be of the form for a unique – indeed, will just be the image of under this homomorphism. So the evaluation homomorphism provides a one-to-one correspondence between elements of , and ring homomorphisms in . This correspondence is at the level of sets, so this gives the identification
Thus our identification can be written as
which is now clearly a special case of the Yoneda lemma
that applies to any functor from a (locally small) category and any object in . And indeed if one inspects the standard proof of this lemma, it is essentially the same argument as the argument we used above to establish the identification (5). More generally, it seems to me that the Yoneda lemma is often used to identify “formal” objects with their “functional” interpretations, as long as one simultaneously considers interpretations across an entire category (such as the category of rings), as opposed to just a single interpretation in a single object of the category in which there may be some loss of information due to the peculiarities of that specific object. Grothendieck’s “functor of points” interpretation of a scheme, discussed in this previous blog post, is one typical example of this.
The point is the main contributions to the mean value of are driven not by “typical” numbers of some size , but rather of numbers that have a splitting
where is the product of primes between some intermediate threshold and and behaves “typically” (so in particular, it has about prime factors, as per the Hardy-Ramanujan law and the Erdős-Kac law, but is the product of primes up to and has double the number of typical prime factors – , rather than – thus is the type of number that would make a significant contribution to the mean value of the divisor function . Here is such that is an integer in the range
for some small constant there are basically different values of give essentially disjoint contributions. From the easy inequalities
(the latter coming from the pigeonhole principle) and the fact that has mean about one, one would expect to get the above result provided that one could get a lower bound of the form
for most typical with prime factors between and . Unfortunately, due to the lack of small prime factors in , the arguments of Ford, Green, Koukoulopoulos that give (1) for typical do not quite work for the rougher numbers . However, it turns out that one can get around this problem by replacing (2) by the more efficient inequality
where
is an enlarged version of when . This inequality is easily proven by applying the pigeonhole principle to the factors of of the form , where is one of the factors of , and is one of the factors of in the optimal interval . The extra room provided by the enlargement of the range to turns out to be sufficient to adapt the Ford-Green-Koukoulopoulos argument to the rough setting. In fact we are able to use the main technical estimate from that paper as a “black box”, namely that if one considers a random subset of for some small and sufficiently large with each lying in with an independent probability , then with high probability there should be subset sums of that attain the same value. (Initially, what “high probability” means is just “close to “, but one can reduce the failure probability significantly as by a “tensor power trick” taking advantage of Bennett’s inequality.)
Recent Comments