Traditionally, mathematics research projects are conducted by a small number (typically one to five) of expert mathematicians, each of which are familiar enough with all aspects of the project that they can verify each other’s contributions. It has been challenging to organize mathematical projects at larger scales, and particularly those that involve contributions from the general public, due to the need to verify all of the contributions; a single error in one component of a mathematical argument could invalidate the entire project. Furthermore, the sophistication of a typical math project is such that it would not be realistic to expect a member of the public, with say an undergraduate level of mathematics education, to contribute in a meaningful way to many such projects.

For related reasons, it is also challenging to incorporate assistance from modern AI tools into a research project, as these tools can “hallucinate” plausible-looking, but nonsensical arguments, which therefore need additional verification before they could be added into the project.

Proof assistant languages, such as Lean, provide a potential way to overcome these obstacles, and allow for large-scale collaborations involving professional mathematicians, the broader public, and/or AI tools to all contribute to a complex project, provided that it can be broken up in a modular fashion into smaller pieces that can be attacked without necessarily understanding all aspects of the project as a whole. Projects to formalize an existing mathematical result (such as the formalization of the recent proof of the PFR conjecture of Marton, discussed in this previous blog post) are currently the main examples of such large-scale collaborations that are enabled via proof assistants. At present, these formalizations are mostly crowdsourced by human contributors (which include both professional mathematicians and interested members of the general public), but there are also some nascent efforts to incorporate more automated tools (either “good old-fashioned” automated theorem provers, or more modern AI-based tools) to assist with the (still quite tedious) task of formalization.

However, I believe that this sort of paradigm can also be used to explore new mathematics, as opposed to formalizing existing mathematics. The online collaborative “Polymath” projects that several people including myself organized in the past are one example of this; but as they did not incorporate proof assistants into the workflow, the contributions had to be managed and verified by the human moderators of the project, which was quite a time-consuming responsibility, and one which limited the ability to scale these projects up further. But I am hoping that the addition of proof assistants will remove this bottleneck.

I am particularly interested in the possibility of using these modern tools to explore a class of many mathematical problems at once, as opposed to the current approach of focusing on only one or two problems at a time. This seems like an inherently modularizable and repetitive task, which could particularly benefit from both crowdsourcing and automated tools, if given the right platform to rigorously coordinate all the contributions; and it is a type of mathematics that previous methods usually could not scale up to (except perhaps over a period of many years, as individual papers slowly explore the class one data point at a time until a reasonable intuition about the class is attained). Among other things, having a large data set of problems to work on could be helpful for benchmarking various automated tools and compare the efficacy of different workflows.

One recent example of such a project was the Busy Beaver Challenge, which showed this July that the fifth Busy Beaver number {BB(5)} was equal to {47176870}. Some older crowdsourced computational projects, such as the Great Internet Mersenne Prime Search (GIMPS), are also somewhat similar in spirit to this type of project (though using more traditional proof of work certificates instead of proof assistants). I would be interested in hearing of any other extant examples of crowdsourced projects exploring a mathematical space, and whether there are lessons from those examples that could be relevant for the project I propose here.

More specifically I would like to propose the following (admittedly artificial) project as a pilot to further test out this paradigm, which was inspired by a MathOverflow question from last year, and discussed somewhat further on my Mastodon account shortly afterwards.

The problem is in the field of universal algebra, and concerns the (medium-scale) exploration of simple equational theories for magmas. A magma is nothing more than a set {G} equipped with a binary operation {\circ: G \times G \rightarrow G}. Initially, no additional axioms on this operation {\circ} are imposed, and as such magmas by themselves are somewhat boring objects. Of course, with additional axioms, such as the identity axiom or the associative axiom, one can get more familiar mathematical objects such as groups, semigroups, or monoids. Here we will be interested in (constant-free) equational axioms, which are axioms of equality involving expressions built from the operation {\circ} and one or more indeterminate variables in {G}. Two familiar examples of such axioms are the commutative axiom

\displaystyle  x \circ y = y \circ x

and the associative axiom

\displaystyle  (x \circ y) \circ z = x \circ (y \circ z),

where {x,y,z} are indeterminate variables in the magma {G}. On the other hand the (left) identity axiom {e \circ x = x} would not be considered an equational axiom here, as it involves a constant {e \in G} (the identity element), which we will not consider here.

To illustrate the project I have in mind, let me first introduce eleven examples of equational axioms for magmas:

  • Equation1: {x=y}
  • Equation2: {x \circ y = z \circ w}
  • Equation3: {x \circ y = x}
  • Equation4: {(x \circ x) \circ y = y \circ x}
  • Equation5: {x \circ (y \circ z) = (w \circ u) \circ v}
  • Equation6: {x \circ y = x \circ z}
  • Equation7: {x \circ y = y \circ x}
  • Equation8: {x \circ (y \circ z) = (x \circ w) \circ u}
  • Equation9: {x \circ (y \circ z) = (x \circ y) \circ w}
  • Equation10: {x \circ (y \circ z) = (x \circ y) \circ z}
  • Equation11: {x = x}
Thus, for instance, Equation7 is the commutative axiom, and Equation10 is the associative axiom. The constant axiom Equation1 is the strongest, as it forces the magma {G} to have at most one element; at the opposite extreme, the reflexive axiom Equation11 is the weakest, being satisfied by every single magma.

One can then ask which axioms imply which others. For instance, Equation1 implies all the other axioms in this list, which in turn imply Equation11. Equation8 implies Equation9 as a special case, which in turn implies Equation10 as a special case. The full poset of implications can be depicted by the following Hasse diagram:

This in particular answers the MathOverflow question of whether there were equational axioms intermediate between the constant axiom Equation1 and the associative axiom Equation10.

Most of the implications here are quite easy to prove, but there is one non-trivial one, obtained in this answer to a MathOverflow post closely related to the preceding one:

Proposition 1 Equation4 implies Equation7.

Proof: Suppose that {G} obeys Equation4, thus

\displaystyle  (x \circ x) \circ y = y \circ x \ \ \ \ \ (1)

for all {x,y \in G}. Specializing to {y=x \circ x}, we conclude

\displaystyle (x \circ x) \circ (x \circ x) = (x \circ x) \circ x

and hence by another application of (1) we see that {x \circ x} is idempotent:

\displaystyle  (x \circ x) \circ (x \circ x) = x \circ x. \ \ \ \ \ (2)

Now, replacing {x} by {x \circ x} in (1) and then using (2), we see that

\displaystyle  (x \circ x) \circ y = y \circ (x \circ x),

so in particular {x \circ x} commutes with {y \circ y}:

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ (x \circ x). \ \ \ \ \ (3)

Also, from two applications (1) one has

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ x = x \circ y.

Thus (3) simplifies to {x \circ y = y \circ x}, which is Equation7. \Box

A formalization of the above argument in Lean can be found here.

I will remark that the general question of determining whether one set of equational axioms determines another is undecidable; see Theorem 14 of this paper of Perkins. (This is similar in spirit to the more well known undecidability of various word problems.) So, the situation here is somewhat similar to the Busy Beaver Challenge, in that past a certain point of complexity, we would necessarily encounter unsolvable problems; but hopefully there would be interesting problems and phenomena to discover before we reach that threshold.

The above Hasse diagram does not just assert implications between the listed equational axioms; it also asserts non-implications between the axioms. For instance, as seen in the diagram, the commutative axiom Equation7 does not imply the Equation4 axiom

\displaystyle  (x+x)+y = y + x.

To see this, one simply has to produce an example of a magma that obeys the commutative axiom Equation7, but not the Equation4 axiom; but in this case one can simply choose (for instance) the natural numbers {{\bf N}} with the addition operation {x \circ y := x+y}. More generally, the diagram asserts the following non-implications, which (together with the indicated implications) completely describes the poset of implications between the eleven axioms:
  • Equation2 does not imply Equation3.
  • Equation3 does not imply Equation5.
  • Equation3 does not imply Equation7.
  • Equation5 does not imply Equation6.
  • Equation5 does not imply Equation7.
  • Equation6 does not imply Equation7.
  • Equation6 does not imply Equation10.
  • Equation7 does not imply Equation6.
  • Equation7 does not imply Equation10.
  • Equation9 does not imply Equation8.
  • Equation10 does not imply Equation9.
  • Equation10 does not imply Equation6.
The reader is invited to come up with counterexamples that demonstrate some of these implications. The hardest type of counterexamples to find are the ones that show that Equation9 does not imply Equation8: a solution (in Lean) can be found here. I placed proofs in Lean of all the above implications and anti-implications can be found in this github repository file.

As one can see, it is already somewhat tedious to compute the Hasse diagram of just eleven equations. The project I propose is to try to expand this Hasse diagram by a couple orders of magnitude, covering a significantly larger set of equations. The set I propose is the set {{\mathcal E}} of equations that use the magma operation {\circ} at most four times, up to relabeling and the reflexive and symmetric axioms of equality; this includes the eleven equations above, but also many more. How many more? Recall that the Catalan number {C_n} is the number of ways one can form an expression out of {n} applications of a binary operation {\circ} (applied to {n+1} placeholder variables); and, given a string of {m} placeholder variables, the Bell number {B_m} is the number of ways (up to relabeling) to assign names to each of these variables, where some of the placeholders are allowed to be assigned the same name. As a consequence, ignoring symmetry, the number of equations that involve at most four operations is

\displaystyle  \sum_{n,m \geq 0: n+m \leq 4} C_n C_m B_{n+m+2} = 9131.

The number of equations in which the left-hand side and right-hand side are identical is

\displaystyle  \sum_{n=0}^2 C_n B_{n+1} = 1 * 1 + 1 * 2 + 2 * 5 = 13;

these are all equivalent to reflexive axiom (Equation11). The remaining {9118} equations come in pairs by the symmetry of equality, so the total size of {{\mathcal E}} is

\displaystyle  1 + \frac{9118}{2} = 4560.

I have not yet generated the full list of such identities, but presumably this will be straightforward to do in a standard computer language such as Python (I have not tried this, but I imagine some back-and-forth with a modern AI would let one generate most of the required code). [UPDATE, Sep 26: Amir Livne Bar-on has kindly enumerated all the equations, of which there are actually 4694.]

It is not clear to me at all what the geometry of {{\mathcal E}} will look like. Will most equations be incomparable with each other? Will it stratify into layers of “strong” and “weak” axioms? Will there be a lot of equivalent axioms? It might be interesting to record now any speculations as what the structure of this poset, and compare these predictions with the outcome of the project afterwards.

A brute force computation of the poset {{\mathcal E}} would then require {4560 \times (4560-1) = 20789040} comparisons, which looks rather daunting; but of course due to the axioms of a partial order, one could presumably identify the poset by a much smaller number of comparisons. I am thinking that it should be possible to crowdsource the exploration of this poset in the form of submissions to a central repository (such as the github repository I just created) of proofs in Lean of implications or non-implications between various equations, which could be validated in Lean, and also checked against some file recording the current status (true, false, or open) of all the {20789040} comparisons, to avoid redundant effort. Most submissions could be handled automatically, with relatively little human moderation required; and the status of the poset could be updated after each such submission.

I would imagine that there is some “low-hanging fruit” that could establish a large number of implications (or anti-implications) quite easily. For instance, laws such as Equation2 or Equation3 more or less completely describe the binary operation {\circ}, and it should be quite easy to check which of the {4560} laws are implied by either of these two laws. The poset {{\mathcal E}} has a reflection symmetry associated to replacing the binary operator {\circ} by its reflection {\circ^{\mathrm{op}}: (x,y) \mapsto y \circ x}, which in principle cuts down the total work by a factor of about two. Specific examples of magmas, such as the natural numbers with the addition operation, obey some set of equations in {{\mathcal E}} but not others, and so could be used to generate a large number of anti-implications. Some existing automated proving tools for equational logic, such as Prover9 and Mace4 (for obtaining implications and anti-implications respectively), could then be used to handle most of the remaining “easy” cases (though some work may be needed to convert the outputs of such tools into Lean). The remaining “hard” cases could then be targeted by some combination of human contributors and more advanced AI tools.

Perhaps, in analogy with formalization projects, we could have a semi-formal “blueprint” evolving in parallel with the formal Lean component of the project. This way, the project could accept human-written proofs by contributors who do not necessarily have any proficiency in Lean, as well as contributions from automated tools (such as the aforementioned Prover9 and Mace4), whose output is in some other format than Lean. The task of converting these semi-formal proofs into Lean could then be done by other humans or automated tools; in particular I imagine modern AI tools could be particularly valuable for this portion of the workflow. I am not quite sure though if existing blueprint software can scale to handle the large number of individual proofs that would be generated by this project; and as this portion would not be formally verified, a significant amount of human moderation might also be needed here, and this also might not scale properly. Perhaps the semi-formal portion of the project could instead be coordinated on a forum such as this blog, in a similar spirit to past Polymath projects.

It would be nice to be able to integrate such a project with some sort of graph visualization software that can take an incomplete determination of the poset {{\mathcal E}} as input (in which each potential comparison {E \implies E'} in {{\mathcal E}} is marked as either true, false, or open), completes the graph as much as possible using the axioms of partial order, and then presents the partially known poset in a visually appealing way. If anyone knows of such a software package, I would be happy to hear of it in the comments.

Anyway, I would be happy to receive any feedback on this project; in addition to the previous requests, I would be interested in any suggestions for improving the project, as well as gauging whether there is sufficient interest in participating to actually launch it. (I am imagining running it vaguely along the lines of a Polymath project, though perhaps not formally labeled as such.)

Ben Krause, Hamed Mousavi, Joni Teräväinen, and I have just uploaded to the arXiv the paper “Pointwise convergence of bilinear polynomial averages over the primes“. This paper builds upon a previous result of Krause, Mirek, and myself, in which we demonstrated the pointwise almost everywhere convergence of the ergodic averages

\displaystyle  \frac{1}{N} \sum_{n=1}^N f(T^n x) g(T^{P(n)} x) \ \ \ \ \ (1)

as {N \rightarrow \infty} and almost all {x \in X}, whenever {(X,T,\mu)} is a measure-preserving system (not necessarily of finite measure), and {f \in L^{p_1}(X,\mu)}, {g \in L^{p_2}(X,\mu)} for some {1 < p_1,p_2 < \infty} with {1/p_1 + 1/p_2 \leq 1}, where {P} is a polynomial with integer coefficients and degree at least two. Here we establish the prime version of this theorem, that is to say we establish the pointwise almost everywhere convergence of the averages

\displaystyle  \frac{1}{\pi(N)} \sum_{p \leq N} f(T^p x) g(T^{P(p)} x)

under the same hypotheses on {(X,T,\mu)}, {f, g}. By standard arguments this is equivalent to the pointwise almost everywhere convergence of the weighted averages

\displaystyle  \frac{1}{N} \sum_{n \leq N} \Lambda(n) f(T^n x) g(T^{P(n)} x) \ \ \ \ \ (2)

where {\Lambda} is the von Mangoldt function. Our argument also borrows from results in a recent paper of Teräväinen, who showed (among other things) that the similar averages

\displaystyle  \frac{1}{N} \sum_{n \leq N} \mu(n) f(T^n x) g(T^{P(n)} x)

converge almost everywhere (quite fast) to zero, at least if {X} is assumed to be finite measure. Here of course {\mu} denotes the Möbius function.

The basic strategy is to try to insert the weight {\Lambda} everywhere in the proof of the convergence of (1) and adapt as needed. The weighted averages are bilinear averages associated to the bilinear symbol

\displaystyle  (\xi_1,\xi_2) \mapsto \frac{1}{N} \sum_{n \leq N} \Lambda(n) e(n \xi_1 + P(n) \xi_2).

In the unweighted case, results from the additive combinatorics theory of Peluse and Prendiville were used to essentially reduce matters to the contribution where {\xi_1,\xi_2} were “major arc”, at which point this symbol could be approximated by a more tractable symbol. Setting aside the Peluse-Prendiville step for now, the first obstacle is that the natural approximation to the symbol does not have sufficiently accurate error bounds if a Siegel zero exists. While one could in principle fix this by adding a Siegel correction term to the approximation, we found it simpler to use the arguments of Teräväinen to essentially replace the von Mangoldt weight {\Lambda} by a “Cramér approximant”

\displaystyle  \Lambda_{\mathrm{Cramer}, w}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}

where {W = \prod_{p \leq w} p} and {w} is a parameter (we make the quasipolynomial choice {w = \exp(\log^{1/C_0} N)} for a suitable absolute constant {N}). This approximant is then used for most of the argument, with relatively routine changes; for instance, an {L^p} improving estimate needs to be replaced by a weighted analogue that is relatively easy to establish from the unweighted version due to an {L^2} smoothing effect, and a sharp {p}-adic bilinear averaging estimate for large {p} can also be adapted to handle a suitable {p}-adic weight by a minor variant of the arguments. The most tricky step is to obtain a weighted version of the Peluse-Prendiville inverse theorem. Here we encounter the technical problem that the Cramér approximant, despite having many good properties (in particular, it is non-negative and has well-controlled correlations thanks to the fundamental lemma of sieve theory), is not of “Type I”, which turns out to be quite handy when establishing inverse theorems. So for this portion of the argument, we switch from the Cramér approximant to the Heath-Brown approximant

\displaystyle  \Lambda_{\mathrm{HB},Q}(n) := \sum_{q<Q} \frac{\mu(q)}{\phi(q)} c_q(n)

where {c_q(n)} is the Ramanujan sum

\displaystyle  c_q(n) := \sum_{r \in ({\bf Z}/q{\bf Z})^\times} e(-rn/q).

While this approximant is no longer non-negative, it is of Type I, and thus well suited for inverse theory. In our paper we set up some basic comparison theorems between {\Lambda}, {\Lambda_{\mathrm{Cramer},w}}, and {\Lambda_{\mathrm{HB},Q}} in various Gowers uniformity-type norms, which allows us to switch relatively easily between the different weights in practice; hopefully these comparison theorems will be useful in other applications as well.

Given a smooth compact Riemannian manifold {M = (M,g)}, the incompressible Euler equations can be written in abstract index notation as

\displaystyle  \partial_t u^\alpha + u^\beta \nabla_\beta u^\alpha = - \nabla^\alpha p

\displaystyle  \nabla_\alpha u^\alpha = 0

where {p: [0,T) \rightarrow C^\infty(M)} is a time-dependent scalar field (representing pressure), and {u: [0,T) \rightarrow \Gamma(TM)} is a time-dependent vector field (representing fluid velocity). Here {\nabla} is the Levi-Civita connection associated to {g}. One can recover {p} from {u} (up to constants), so I will abuse notation and refer to the solution to this system as {u} rather than {(u,p)}. Over the last few years I have been interested in the following conjecture:

Conjecture 1 (Finite time blowup) There exists a manifold {M} and a smooth solution {u} to the Euler equations that blows up at some finite time {T}.

This remains open, however there has been progress on rougher versions of this problem. For instance, there is the well-known result of Elgindi (discussed in this previous post) that when {M = {\bf R}^3} and {\alpha>0} is sufficiently small, there exists a {C^{1,\alpha}} solution {u} to the Euler equations on {{\bf R}^3} that blows up in finite time. There has also been progress in establishing various “universality” properties of the Euler flow on manifolds (which informally state that “fluid computers” are possible); see for instance this recent survey of Cardona, Miranda, and Peralta-Salas. Unfortunately, these “fluid computers” do not combine well with scaling symmetries, and so thus far have not been able to produce (finite energy) blowups.

I have been playing with one approach to this conjecture, which reduces to solving a certain underdetermined system of partial differential equations, and then establishing some stability result for the resulting solution. However, I have not been able to make headway on solving this latter system despite its underdetermined nature; so I thought I would record my partial attempt here in case anyone is interested in pursuing it further (and also to contribute to the practice of sharing unsuccessful attempts to solve a problem, which is still quite infrequently done in our community).

To avoid technicalities let us simplify the problem by adding a forcing term {f: [0,T) \rightarrow \Gamma(TM)}:

\displaystyle  \partial_t u^\alpha + u^\beta \nabla_\beta u^\alpha = - \nabla^\alpha p + f^\alpha

\displaystyle  \nabla_\alpha u^\alpha = 0.

Standard local well-posedness theory (using the vorticity-stream formulation of the Euler equations) tells us that this problem is well-posed if the initial data {u_0} is divergence free and in {C^{1,\alpha}(M)} for some {\alpha>0}, and the forcing term {f} is in {L^1_t C^{1,\alpha}(M)}. We have the following recent result of Córdoba and Martínez-Zoroa, solving a version of the above conjecture in the presence of a reasonably regular forcing term:

Theorem 2 (Finite time blowup for the forced equation) There exists a smooth solution to the forced Euler equations on {{\bf R}^3} that exhibits finite time blowup, in which the forcing term {f} stays uniformly bounded in {C^{1,\alpha}({\bf R}^3)} for any {\alpha < 1/2}.

Roughly speaking, their argument proceeds by a multiscale construction, in which the solution is set up to eventually have some presence at a spatial scale {1/N}, which is conducive to generating an exponential “stretching” of a small forcing term at a much higher spatial scale {1/M}, which one then introduces to then set up the solution for the next scale.

As a model problem, I tried to reproduce this type of result from a more geometric perspective, trying to aim for a more “self-similar” blowup than a “multi-scale” one, in the hope that this latter type of blowup might be more tractable to analyze and eventually resolve Conjecture 1. I didn’t fully succeed; but I think the approach I outline below is in principle feasible.

The manifold I will work on is a cylinder {M = {\bf R} \times N}, where {N = (N,h)} is a smooth compact manifold, and the metric on {M} is just the sum of the standard metric {dx_1^2} on the first coordinate and {h}:

\displaystyle  dg^2 = dx_1^2 + dh^2.

(I have experimented with working with more warped metrics to gain more flexibility, but this seems to only influence lower order terms. However, such tricks may be useful to improve the regularity of the forcing term, or perhaps even to eliminate it entirely.) The idea is to try to create a solution that blows up on a slice {\{0\} \times N} on this cylinder, but stays smooth until the blowup time, and also uniformly smooth away from this slice. As such, one can localize the solution away from this slice, and replace the unbounded component {{\bf R}} of {M} by a compact circle {{\bf R}/L{\bf Z}} for some large {L} if desired. However, I prefer to work with the unbounded component {{\bf R}} here in order to scale in this direction.

If we now use Greek indices to only denote coordinates in the “vertical” coordinate {N}, the velocity field {u} now becomes {(u^1, u^\alpha)}, and the Euler equations now split as

\displaystyle  \partial_t u^1 + u^1 \partial_1 u^1 + u^\beta \nabla_\beta u^1 = - \partial_1 p + f^1

\displaystyle  \partial_t u^\alpha + u^1 \partial_1 u^\alpha + u^\beta \nabla_\beta u^\alpha = - \nabla^\alpha p + f^\alpha

\displaystyle  \partial_1 u^1 + \nabla_\alpha u^\alpha = 0.

If the solution is concentrating in a narrow neighborhood of the slice {\{0\} \times N}, we expect the terms involving {\partial_1} to be quite large, and the terms involving {u_1} to be rather small. This suggests that the {\partial_1 p} pressure term is going to be more significant than the {\nabla^\alpha p} term. We therefore select the forcing term to cancel this term by choosing

\displaystyle  f^1 = 0; f^\alpha = \nabla^\alpha p

leaving us with the simplified equation

\displaystyle  \partial_t u^1 + u^1 \partial_1 u^1 + u^\beta \nabla_\beta u^1 = - \partial_1 p

\displaystyle  \partial_t u^\alpha + u^1 \partial_1 u^\alpha + u^\beta \nabla_\beta u^\alpha = 0

\displaystyle  \partial_1 u^1 + \nabla_\alpha u^\alpha = 0.

The nice thing about this latter equation is that the first equation is basically just a formula for {p} and so can be dropped, leaving us with

\displaystyle  \partial_t u^\alpha + u^1 \partial_1 u^\alpha + u^\beta \nabla_\beta u^\alpha = 0 \ \ \ \ \ (1)

\displaystyle  \partial_1 u^1 + \nabla_\alpha u^\alpha = 0. \ \ \ \ \ (2)

This equation now admits a two-parameter family of scale invariances

\displaystyle  u^1(t,x^1,y) \mapsto \frac{\lambda}{\mu} u^1(t/\mu, x/\lambda,y); u^\alpha(t,x^1,y) \mapsto \frac{1}{\mu} u^1(t/\mu, x/\lambda,x^\alpha)

for {\lambda, \mu > 0}, where we use {y} to ednote the {N} coordinate. It also inherits an energy conservation law from the original Euler equations; the conserved energy is

\displaystyle  \frac{1}{2} \int_{\bf R} \int_N u^\alpha(t,x^1,y) u_\alpha(t,x^1,y)\ dy dx^1

using the metric on {N} to raise and lower the Greek indices.

It is now tempting to try to set up an approximately scale-invariant blowup solution. It seems that the first step in this is to construct a “soliton” type localized steady state solution, that is a solution {u_1 \in C^\infty({\bf R} \times N)}, {u^\alpha : {\bf R} \rightarrow \Gamma(TN)} to the equation

\displaystyle  u^1 \partial_1 u^\alpha + u^\beta \nabla_\beta u^\alpha = 0

\displaystyle  \partial_1 u^1 + \nabla_\alpha u^\alpha = 0.

that decays in the {x_1} variable; one can then hope to do a suitable stability (or instability) analysis of this soliton to perturb it to a blowup solution, as there are many results of this type for other equations that one could use as a model. The energy conservation law does constrain to some extent the nature of the blowup (basically, the two scaling parameters {\mu,\lambda} above become linked by the relation {\mu \sim \lambda^{1/2}}), but does not seem to otherwise prevent such a blowup from occuring.

Analytically, this is not a particularly pleasant equation to try to solve; one can substitute the second equation into the first to obtain a single equation

\displaystyle  -(\partial_1^{-1} \nabla_\beta u^\beta) \partial_1 u^\alpha + u^\beta \nabla_\beta u^\alpha = 0

but the inverse derivative {\partial_1^{-1}} is difficult to work with and seems to create ill-posedness (somewhat reminiscent of the ill-posedness of the Prandtl boundary layer equation).

Nevertheless, one can still attempt to solve this equation by separation of variables. If one makes the ansatz

\displaystyle  u^1(x^1,y) = \frac{1}{1+(x^1)^2} \varphi(y)

\displaystyle  u^\alpha(x^1,y) = \frac{v^\alpha(y) + x^1 w^\alpha(y)}{(1+(x^1)^2)^2}

for some smooth {\varphi \in C^\infty(N)} and {v,w \in \Gamma(TN)}, some calculation shows that the system now reduces to a system purely on {N}:

\displaystyle  \varphi w^\alpha + v^\beta \nabla_\beta v^\alpha = 0

\displaystyle  -2 \varphi v^\alpha + v^\beta \nabla_\beta w^\alpha + w^\beta \nabla_\beta v^\alpha = 0

\displaystyle  -3 \varphi w^\alpha + w^\beta \nabla_\beta w^\beta = 0

\displaystyle  \nabla_\alpha v^\alpha = 0.

\displaystyle  \nabla_\alpha w^\alpha = 2\varphi.

The metric {h} is hidden in this system through the covariant derivative {\nabla}. To eliminate the metric, we can lower indices to write

\displaystyle  \varphi w_\alpha + v^\beta \nabla_\beta v_\alpha = 0

\displaystyle  -2 \varphi v_\alpha + v^\beta \nabla_\beta w_\alpha + w^\beta \nabla_\beta v_\alpha = 0

\displaystyle  -3 \varphi w_\alpha + w^\beta \nabla_\beta w_\beta = 0

\displaystyle  \mathrm{div}_h v = 0.

\displaystyle  \mathrm{div}_h w = 2\varphi.

Here the divergence is relative to the volume form induced by {h}, but by a dimension lifting trick (see Section 3 of this paper of mine) one can replace this divergence with a divergence {\mathrm{div}_{vol}} with respect to any other volume form on {N}. We have the identity

\displaystyle  v^\beta \nabla_\beta v_\alpha = v^\beta (\nabla_\beta v_\alpha - \nabla_\alpha v_\beta) + \frac{1}{2} \partial_\alpha (v^\beta v_\beta)

\displaystyle v \neg d\theta + \frac{1}{2} d (\theta(v))

where we have switched to coordinate-free notation, and {\theta = h \cdot v} is the {1}-form associated to {v}. If we similarly let {\lambda} be the {1}-form associated to {w}, and eliminate {\varphi}, the system now becomes

\displaystyle  \frac{1}{2} \lambda \mathrm{div}_{vol} w + v \neg d\theta + \frac{1}{2} d (\theta(v)) = 0 \ \ \ \ \ (3)

\displaystyle  -\theta \mathrm{div}_{vol} w + v \neg d\lambda + w \neg d\theta + \frac{1}{2} d (\theta(w) + \lambda(v)) = 0 \ \ \ \ \ (4)

\displaystyle  -\frac{3}{2}\lambda \mathrm{div}_{vol} w + w \neg d\lambda + \frac{1}{2} d (\lambda(w)) = 0 \ \ \ \ \ (5)

\displaystyle  \mathrm{div}_{vol} v = 0. \ \ \ \ \ (6)

Also, the fact that {\lambda,\theta} are dual to {v,w} with respect to some unspecified Riemannian metric {h} turns out to essentially be equivalent to the assumption that the Gram matrix is positive definite,

\displaystyle  \begin{pmatrix} \lambda(v) & \lambda(w) \\ \theta(v) & \theta(w) \end{pmatrix} \succ 0; \ \ \ \ \ (7)

see Section 4 of the aforementioned paper. This looks like a rather strange system; but it is three vector equations (3), (4), (5) in four vector (or one-form) unknowns {v, w, \theta, \lambda}, together with a divergence-free condition (6) and a positive definiteness condition (7), which I view as basically being scalar conditions. Thus, this system becomes underdetermined when the dimension of {N} is large enough (in fact a naive count of degrees of freedom suggests that dimension at least three should be sufficient). It should thus be possible to locate an abundant number of solutions to this system; but to my frustration, the system is just barely complicated enough to prevent me from simplifying it to the point where it becomes evident how to construct solutions; in particular, I have not been able to find a viable further ansatz to transform the system to a more tractable one. Part of the problem is that while the system is technically underdetermined, there are not that many degrees of freedom to spare to ensure this underdetermined nature, so one cannot afford to make an ansatz that sacrifices too many of these degrees of freedom. In my previous paper I was able to use a very symmetric construction (taking {N} to basically be the product of an orthogonal group and a torus) to solve a similar system that was in fact quite overdetermined, but I have not been able to exploit such symmetries here. I have also experimented with other separation of variable ansatzes, but they tend to give similar systems to the ones here (up to some changes in the coefficients).

Remark 3 One can also try to directy create a self-similar blowup to (1), (2), for instance by making the ansatz

\displaystyle  u^1(t,x^1,y) = (t^6+(x^1)^2)^{1/3} \varphi(y)

\displaystyle  u^\alpha(t,x^1,y) = \frac{v^\alpha(y)}{(t^6+(x^1)^2)^{1/3}} + \frac{w^\alpha(y) + x^1 q^\alpha(y)}{(t^6+(x^1)^2)^{2/3}} + \frac{r^\alpha(y) + x^1 s^\alpha(y)}{(t^6+(x^1)^2)}

for {t<0} and some fields {\varphi \in C^\infty(N)} and {v,w,q,r,s \in \Gamma(TN)}. This particular ansatz seems consistent with all known conservation laws; however it works out to basically be ten vector equations (plus some additional scalar constraints) on ten vector field unknowns, so is just barely overdetermined. I have not been able to locate a self-similar blowup ansatz that is underdetermined.

I’ve just uploaded to the arXiv my paper “Planar point sets with forbidden {4}-point patterns and few distinct distance“. This (very) short paper was a byproduct of my recent explorations of the Erdös problem website in recent months, with a vague emerging plan to locate a suitable problem that might be suitable for some combination of a crowdsourced “Polymath” style project and/or a test case for emerging AI tools. The question below was one potential candidate; however, upon reviewing the literature on the problem, I noticed that the existing techniques only needed one additional tweak to fully resolve the problem. So I ended up writing this note instead to close off the problem.

I’ve arranged this post so that this additional trick is postponed to below the fold, so that the reader can, if desired, try to guess for themselves what the final missing ingredient needed to solve the problem was. Here is the problem (Erdös problem #135), which was asked multiple times by Erdös over more than two decades (and who even offered a small prize for the solution on one of these occasions):

Problem 1 (Erdös #135) Let {A \subset {\bf R}^2} be a set of {n} points such that any four points in the set determine at least five distinct distances. Must {A} determine {\gg n^2} many distances?

This is a cousin of the significantly more famous Erdös distinct distances problem (Erdös problem #89), which asks what is the minimum number of distances determined by a set {A \subset {\bf R}^2} of {n} points in the plane, without the restriction on four-point configurations. The example of a square grid {\{0,\dots,\sqrt{n}-1\}^2} (assuming for sake of argument that {n} is a perfect square), together with some standard analytic number theory calculations, shows that {A} can determine {\asymp n/\sqrt{\log n}} distances, and it is conjectured that this is best possible up to constants. A celebrated result of Guth and Katz, discussed in this previous blog post, shows that {A} will determine at least {\gg n/\log n} distances. Note that the lower bound {\gg n^2} here is far larger, and in fact comparable to the total number {\binom{n}{2}} of distances available, thus expressing the belief that the “local” condition that every four points determine at least five distances forces the global collection distances to be almost completely distinct. In fact, in one of the papers posing the problem, Erdös made the even stronger conjecture that the set {A} must contain a subset {A'} of cardinality {\gg n} for which all the {\binom{|A'|}{2}} distances generated by {A} are distinct.

A paper of Dumitrescu came close to resolving this problem. Firstly, the number of ways in which four points could fail to determine five distinct distances was classified in that paper, with the four-point configurations necessarily being one of the following eight patterns:

  • {\pi_1}: An equilateral triangle plus an arbitrary vertex.
  • {\pi_2}: A parallelogram.
  • {\pi_3}: An isosceles trapezoid (four points on a line, {P_1,P_2,P_3,P_4}, where {\overleftrightarrow{P_1P_2} = \overleftrightarrow{P_3P_4}}, form a degenerate isosceles trapezoid).
  • {\pi_4}: A star with three edges of the same length.
  • {\pi_5}: A path with three edges of the same length.
  • {\pi_6}: A kite.
  • {\pi_7}: An isosceles triangle plus an edge incident to a base endpoint, and whose length equals the length of the base.
  • {\pi_8}: An isosceles triangle plus an edge incident to the apex, and whose length equals the length of the base.
(See Figure 1 and Lemma 1 of Dumitrescu’s paper.) So the question is asking whether if an {n} point set {A} avoids all of these patterns {\pi_1,\dots,\pi_8}, then it must generate {\gg n^2} distances.

Given that the grid {\{0,\dots,n-1\}^2} determine only {\asymp n^2 / \sqrt{\log n}} distances, one could seek a counterexample to this by finding a set of {\asymp n} points in the grid {\{0,\dots,n-1\}^2} that avoided all of the eight patterns {\pi_1,\dots,\pi_8}.

Dumitrescu then counted how often each of the patterns {\pi_1,\dots,\pi_8} occured inside the grid {\{0,\dots,n-1\}^2}. The answer is:

  • {\pi_1} does not occur at all. (This is related to the irrationality of {\sin \pi/3 = \sqrt{3}/2}.)
  • {\pi_2} occurs {\asymp n^6} times.
  • {\pi_3} occurs {\asymp n^5} times.
  • {\pi_4} occurs {O(n^{14/3} \log n)} times.
  • {\pi_5} occurs {O(n^{14/3} \log n)} times.
  • {\pi_6} occurs {\asymp n^5} times.
  • {\pi_7} occurs {O(n^{14/3} \log n)} times.
  • {\pi_8} occurs {O(n^{14/3} \log n)} times.
(The bounds involving {O(n^{14/3} \log n)} were obtained using the Szemerédi-Trotter theorem, and might not be optimal for this problem.) In particular, with the exception of the parallelogram pattern {\pi_2}, the other seven forbidden {4}-point patterns {\pi_1,\pi_3,\dots,\pi_8} occur at most {O(n^5)} times.

Using this and a standard probabilistic argument, Dumitrescu then established the following “near miss” to a negative answer to the above problem:

Theorem 2 (First near miss) If {n} is sufficiently large, then there exists a subset of {\{0,\dots,n-1\}^2} of cardinality {\asymp n} which avoids all of the patterms {\pi_1, \pi_3,\dots,\pi_8}.

In particular, this generates a set of {\asymp n} points with {O(n^2/\sqrt{\log n})} distances that avoids seven out of the eight required forbidden patterns; it is only the parallelograms {\pi_2} that are not avoided, and are the only remaining obstacle to a negative answer to the problem.

Proof: Let {\varepsilon>0} be a small constant, and let {A} be a random subset of {\{0,\dots,n-1\}^2}, formed by placing each element of {\{0,\dots,n-1\}^2} with an independent probability of {\varepsilon/n}. A standard application of Hoeffding’s inequality (or even the second moment method) shows that this set {A} will have cardinality {\asymp \varepsilon n} with high probability if {n} is large enough. On the other hand, each of the {O(n^5)} patterns {\pi_1,\pi_3,\dots,\pi_8} has a probability {\varepsilon^4/n^4} of lying inside {A}, so by linearity of expectation, the total number of such patterns inside {A} is {O( n^5 \varepsilon^4 / n^4 ) = O(\varepsilon^4 n)} on the average. In particular, by Markov’s inequality, we can find a set {A} of cardinality {\asymp \varepsilon n} with only {O(\varepsilon^4 n)} such patterns. Deleting all of these patterns from {A}, we obtain a set {A'} of cardinality {\asymp \varepsilon n - O(\varepsilon^4 n)}, which is {\asymp n} if {\varepsilon} is a sufficiently small constant. This establishes the claim. \Box

Unfortunately, this random set contains far too many parallelograms {\pi_2} ({\asymp n^2} such parallelograms, in fact) for this deletion argument to work. On the other hand, in earlier work of Thiele and of Dumitrescu, a separate construction of a set of {\asymp n} points in {\{0,\dots,n-1\}^2} that avoids all of the parallelograms {\pi_2} was given:

Theorem 3 (Second near miss) For {n} large, there exists a subset {S} of {\{0,\dots,n-1\}^2} of cardinality {\asymp n} which contains no parallelograms {\pi_2}. Furthermore, this set is in general position: no three points in {S} are collinear, and no four are concyclic. As a consequence, this set {S} in fact avoids the three patterns {\pi_1, \pi_2, \pi_3} (the pattern in {\pi_3} is concyclic, and the pattern {\pi_1} does not occur at all in the grid).

Proof: One uses an explicit algebraic construction, going back to an old paper of Erdös and Turán involving constructions of Sidon sets. Namely, one considers the set

\displaystyle  S := \{ (x,y) \in \{0,\dots,n-1\}^2: y = x^2 \hbox{ mod } p \} \ \ \ \ \ (1)

where {p} is a prime between {4n} and {8n} (the existence of which is guaranteed by Bertrand’s postulate). Standard Gauss sum estimates can be used to show that {S} has cardinality {\asymp n}. If {S} contained four points that were in a parallelogram or on a circle, or three points in a line, then one could lift up from {\{0,\dots,n-1\}^2} to the finite field plane {{\mathbf F}_p^2} and conclude that the finite field parabola {\{ (x,x^2): x \in {\bf F}_p \}} also contained four points in a parallelogram or a circle, or three points on a line. But straightforward algebraic calculations can be performed to show that none of these scenarios can occur. For instance, if {P, P+H, P+K, P+H+K} were four points on a parallelogram that were contained in a parabola, this would imply that an alternating sum of the form

\displaystyle  (x, x^2) - (x+h, (x+h)^2) - (x+k, (x+k)^2) + (x+h+k, (x+h+k)^2)

would vanish for some non-zero {h,k}; but this expression simplifies to {(0, 2hk)}, which cannot vanish for non-zero {h,k} as {p} is odd. (For the concylic claim, the parabola in {{\mathbf F}_p^2} can in fact contain four points on a circle, but only if their {x} coordinates sum to zero, and this cannot happen in {S})

Given that we have one “near-miss” in the literature that avoids {\pi_1, \pi_3, \dots, \pi_8}, and another “near-miss” that avoids {\pi_1, \pi_2, \pi_3}, it is natural to try to combine these two constructions to obtain a set that avoids all eight patterns {\pi_1,\dots,\pi_8}. This inspired the following problem of Dumitrescu (see Problem 2 of this paper):

Problem 4 Does the set {S} in (1) contain a subset of cardinality {\gg n} that avoids all eight of the patterns {\pi_1, \dots, \pi_8}?

Unfortunately, this problem looked difficult, as the number-theoretic task of counting the patterns {\pi_4,\dots,\pi_8} in {S} looked quite daunting.

This ends the survey of the prior literature on this problem. Can you guess the missing ingredient needed to resolve the problem? I will place the answer below the fold.

Read the rest of this entry »

The Erdös problem site was created last year, and announced earlier this year on this blog. Every so often, I have taken a look at a random problem from the site for fun. A few times, I was able to make progress on one of the problems, leading to a couple papers; but the more common outcome is that I play around with the problem for a while, see why the problem is difficult, and then eventually give up and do something else. But, as is common in this field, I don’t make public the observations that I made, and the next person who looks at the same problem would likely have to go through the same process of trial and error to work out what the main obstructions that are present are.

So, as an experiment, I thought I would record here my preliminary observations on one such problem – Erdös problem #385 – to discuss why it looks difficult to solve with our current understanding of the primes. Here is the problem:

Problem 1 (Erdös Problem #385) Let

\displaystyle F(n) =\max_{\stackrel{m<n}{m\hbox{\ composite}}} m+p(m),

where {p(m)} is the least prime divisor of {m} . Is it true that {F(n)>n} for all sufficiently large {n}? Does {F(n)-n \rightarrow \infty} as {n \rightarrow \infty}?

This problem is mentioned on page 73 of this 1979 paper of Erdös (where he attributes the problem to an unpublished work of Eggelton, Erdös, and Selfridge that, to my knowledge, has never actually appeared), as well as briefly in page 92 of this 1980 paper of Erdös and Graham.

At first glance, this looks like a somewhat arbitrary problem (as many of Erdös’s problems initially do), as the function {F} is not obviously related to any other well-known function or problem. However, it turns out that this problem is closely related to the parity barrier in sieve theory (as discussed in this previous post), with the possibility of Siegel zeroes presenting a particular obstruction. I suspect that Erdös was well aware of this connection; certainly he mentions the relation with questions on gaps between primes (or almost primes), which is in turn connected to the parity problem and Siegel zeroes (as is discussed recently in my paper with Banks and Ford, and in more depth in these papers of Ford and of Granville).

Let us now explore the problem further. Let us call a natural number {n} bad if {F(n) \leq n}, so the first part of the problem is asking whether there exist bad numbers that are sufficiently large. We unpack the definitions: {n} is bad if and only if {m+p(m) \leq n} for any composite {m}, so placing {m} in intervals of the form {[n-h,n]} we are asking to show that

\displaystyle  p(m) \leq h \hbox{ for all composite } m \in [n-h, n]

for each {1 \leq h \leq n}. To put it another way, the badness of {n} asserts that for each {1 \leq h \leq n} that the residue classes {0 \hbox{ mod } p} for {p \leq h} cover all the natural numbers in the interval {[n-h,n]} except for the primes.

It is now natural to try to understand this problem for a specific choice of interval {h} as a function of {n}. If {h} is large in the sense that {h > \sqrt{n}}, then the claimed covering property is automatic, since every composite number less than or equal to {n} has a prime factor less than or equal to {\sqrt{n}}. On the other hand, for {h} very small, in particular {h = o(\log n)}, it is also possible to find {n} with this property. Indeed, if one takes {n} to lie in the residue class {0 \hbox{ mod } \prod_{p \leq h}}, then we see that the residue classes cover all of {[n-h,n]} except for {n-1}, and from Linnik’s theorem we can ensure that {n-1} is prime. Thus, to rule out bad numbers, we need to understand the covering problem at intermediate scales {\log n \ll h \ll \sqrt{n}}.

A key case is when {h = n^{1/u}} for some {2 < u < 3}. Here, the residue classes {0 \hbox{ mod } p} for {p \leq h} sieve out everything in {[n-h,n]} except for primes and semiprimes, and specifically the semiprimes that are product of two primes between {n^{1/u}} and {n^{1-1/u}}. If one can show for some {2 < u < 3} that the largest gap between semiprimes in say {[x,2x]} with prime factors in {[x^{1/u}, x^{1-1/u}]} is {o( x^{1/u} )}, then this would affirmatively answer the first part of this problem (and also the second). This is certainly very plausible – it would follow from a semiprime version of the Cramér conjecture (and this would also make the more precise prediction {F(n) = n + n^{1/2-o(1)}}) – but remains well out of reach for now. Even assuming the Riemann hypothesis, the best upper bound on prime gaps in {[x,2x]} is {O( \sqrt{x} \log x )}, and the best upper bound on semiprime gaps is not significantly better than this – in particular, one cannot reach {x^{1/u}} for any {2 < u < 3}. (There is a remote possibility that an extremely delicate analysis near {u=2}, together with additional strong conjectures on the zeta function, such as a sufficiently quantitative version of the GUE hypothesis, may barely be able to resolve this problem, but I am skeptical of this, absent some further major breakthrough in analytic number theory.)

Given that multiplicative number theory does not seem powerful enough (even on RH) to resolve these problems, the other main approach would be to use sieve theory. In this theory, we do not really know how to exploit the specific location of the interval {[n-h,n]} or the specific congruence classes used, so one can study the more general problem of trying to cover an interval {I} of length {h} by one residue class mod {p} for each {p \leq h}, and only leaving a small number of survivors which could potentially be classified as “primes”. The discussion of the small {h} case already reveals a problem with this level of generality: one can sieve out the interval {[-h, 1]} by the residue classes {0 \hbox{ mod } p} for {p \leq h}, and leave only one survivor, {-1}. Indeed, thanks to known bounds on Jacobsthal’s function, one can be more efficient than this; for instance, using equation (1.2) from this paper of Ford, Green, Konyagin, Maynard, and myself, it is possible to completely sieve out any interval of sufficiently large length {h} using only those primes {p} up to {O(\frac{h \log\log h}{\log h \log\log\log h})}. On the other hand, from the work of Iwaniec, we know that sieving up to {o(h^{1/2})} is insufficient to completely sieve out such an interval; related to this, if one only sieves up to {h^{1/u}} for some {2 < u < 3}, the linear sieve (see e.g., Theorem 2 of this previous blog post) shows that one must have at least {(f(u)+o(1)) \frac{h}{\log h}} survivors, where {f(u)} can be given explicitly in the regime {2 < u < 3} by the formula

\displaystyle  f(u) := \frac{2e^\gamma}{u} \log(u-1).

These lower bounds are not believed to be best possible. For instance, the Maier–Pomerance conjecture on Jacobsthal’s function would indicate that one needs to sieve out primes up to {\gg h/\log^2 h} in order to completely sieve out an interval of length {h}, and it is also believed that sieving up to {h^{1-\varepsilon}} should leave {\gg_\varepsilon h/\log h} survivors, although even these strong conjectures are not enough to positively resolve this problem, since we are permitted to sieve all the way up to {h} (and we are allowed to leave every prime number as a survivor, which in view of the Brun–Titchmarsh theorem could permit as many as {O(h/\log n)} survivors).

Unfortunately, as discussed in this previous blog post, the parity problem blocks such improvements from taking place from most standard analytic number theory methods, in particular sieve theory. A particularly dangerous enemy arises from Siegel zeroes. This is discussed in detail in the papers of of Ford and of Granville mentioned previously, but an informal discussion is as follows. If there is a Siegel zero associated to the quadratic character of some conductor {q}, this roughly speaking means that almost all primes {p} (in certain ranges) will be quadratic non-residues mod {q}. In particular, if one restricts attention to numbers {n} in a residue class {a \hbox{ mod } q} that is a quadratic residue, we then expect most numbers in this class to have an even number of prime factors, rather than an odd number.

This alters the effect of sieving in such residue classes. Consider for instance the classical sieve of Eratosthenes. If one sieves out {0 \hbox{ mod } p} for each prime {p \leq \sqrt{h}}, the sieve of Eratosthenes tells us that the surviving elements of {[1,h]} are simply the primes between {\sqrt{h}} and {h}, of which there are about {h/\log h} many. However, if one restricts attention to {[1,h] \cap a \hbox{ mod } q} for a quadratic residue class {a \hbox{ mod } q} (and taking {h} to be somewhat large compared to {q}), then by the preceding discussion, this eliminates most primes, and so now sieving out {0 \hbox{ mod } p} should leave almost no survivors. Shifting this example by {a} and then dividing by {q}, one can end up with an example of an interval {I} of length {h} that can be sieved by residue classes {b_p \hbox{ mod } p} for each {p \leq \sqrt{h}} in such a manner as to leave almost no survivors (in particular, {o(h/\log h)} many). In the presence of a Siegel zero, it seems quite difficult to prevent this scenario from “infecting” the above problem, creating a bad scenario in which for all {\log n \ll h \ll \sqrt{n}}, the residue classes {0 \hbox{ mod } p} for {p \leq \sqrt{h}} already eliminate almost all elements of {[n-h,n]}, leaving it mathematically possible for the remaining survivors to either be prime, or eliminated by the remaining residue classes {0 \hbox{ mod } p} for {\sqrt{h} < p \leq h}.

Because of this, I suspect that it will not be possible to resolve this Erdös problem without a major breakthrough on the parity problem that (at a bare minimum) is enough to exclude the possibility of Siegel zeroes existing. (But it is not clear at all that Siegel zeroes are be the only “enemy” here, so absent a major advance in “inverse sieve theory”, one cannot simply assume GRH to run away from this problem).

— 0.1. Addendum: heuristics for Siegel zero scenarios —

This post also provides a good opportunity to refine some heuristics I had previously proposed regarding Siegel zeroes and their impact on various problems in analytic number theory. In this previous blog post, I wrote

“The parity problem can also be sometimes be overcome when there is an exceptional Siegel zero … [this] suggests that to break the parity barrier, we may assume without loss of generality that there are no Siegel zeroes.”

On the other hand, it was pointed out in a more recent article of Granville that (as with the current situation), Siegel zeroes can sometimes serve to enforce the parity barrier, rather than overcome it, and responds to my previous statement with the comment “this claim needs to be treated with caution, since its truth depends on the context”.

I actually agree with Granville here, and I propose here a synthesis of the two situations. In the absence of a Siegel zero, standard heuristic models in analytic number theory (such as the ones discussed in this post) typically suggest that a given quantity {X} of interest in number theory (e.g., the number of primes in a certain set) obey an asymptotic law of the form

\displaystyle  X = \mathrm{Main\ term} + \mathrm{Error\ term}

where {\mathrm{Main\ term}} is generally fairly well understood, while {\mathrm{Error\ term}} is expected to fluctuate “randomly” (or more precisely, pseudorandomly) and thus be smaller than the main term. However, a major difficulty in analytic number theory is that we often cannot prevent a “conspiracy” from occurring in which the error term becomes as large as, or even larger than the main term: the fluctuations present in that term are often too poorly understood to be under good control. The parity barrier manifests by providing examples of analogous situations in which the error term is indeed as large as the main term (with an unfavorable sign).

However, the presence of a Siegel zero tends to “magnetize” the error term by pulling most of the fluctuations in a particular direction. In many situations, what this means is that one can obtain a refined asymptotic of the form

\displaystyle  X = \mathrm{Main\ term} + \mathrm{Siegel\ correction} + \mathrm{Better\ error\ term}

where {\mathrm{Better\ error\ term}} now fluctuates less than the original {\mathrm{Error\ term}}, and in particular can (in some cases) be shown to be lower order than {\mathrm{Main\ term}}, while {\mathrm{Siegel\ correction}} is a new term that is often explicitly describable in terms of the exceptional character {\chi} associated to the Siegel zero, as well as the location {\beta} of the Siegel zero {L(\beta,\chi)=0}. A typical example is the problem of estimating the sum {\sum_{n \leq x: n = a\ (q)} \Lambda(n)} of primes in an arithmetic progression. The Siegel–Walfisz theorem gives a bound of the form

\displaystyle  \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\varphi(q)} + O_A( x \log^{-A} x )

for any {A>0} (with an ineffective constant); in the regime {q = O(\log^{O(1)} x)} one can improve the error term to {O( x \exp(-c \log^{1/2} x) )}, but for large {q} one cannot do better than the Brun–Titchmarsh bound of {O(x / \varphi(q))}. However, when there is a Siegel zero {L(\beta,\chi)} in an appropriate range, we can obtain the refined bound

\displaystyle  \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\varphi(q)} - \chi(a) 1_{q_0|q} 1_{(a,q)=1} \frac{x^{\beta-1}}{\beta} + O( x \exp(\log^{-c} x))

for some {c>0}, where {q_0} is the conductor of {\chi}; see e.g., Theorem 5.27 of Iwaniec–Kowalski. Thus we see the error term is much improved (and in fact can even be made effective), at the cost of introduicing a Siegel correction term which (for {\beta} close to {1} and {x} not too large) is of comparable size to the main term, and can either be aligned with or against the main term depending on the sign of {\chi(a)}.

The implications of this refined asymptotic then depend rather crucially on how the Siegel correction term is aligned with the main term, and also whether it is of comparable order or lower order. In many situations (particularly those concerning “average case” problems, in which one wants to understand the behavior for typical choices of parameters), the Siegel correction term ends up being lower order, and so one ends up with the situation described in my initial blog post, where we are able to get the predicted asymptotic {X \approx \mathrm{Main\ term}} in the Siegel zero case. However, as pointed out by Granville, there are other situations (particularly those involving “worst case” problems, in which some key parameter can be chosen adversarially) in which the Siegel correction term can align to completely cancel (or to highly reinforce) the main term. In such cases, the Siegel zero becomes a very concrete manifestation of the parity barrier, rather than a means to avoid it. (There is a tiny chance that there may be some sort of “repulsion” phenomenon in which having no semiprimes in {[n-h,n]} for one value of {h} somehow generates semiprimes in {[n-h',n]} for another value of {h'}, which would allow one to solve the problem without having to directly address the Siegel issue, but I don’t see how two such intervals could “communicate” in order to achieve such a repulsion effect.)

The following problem was posed by Erdös and Graham (and is listed as problem #437 on the Erdös problems website):

Problem 1 Let {1 \leq a_1 < \dots < a_k \leq x} be integers. How many of the partial products {a_1}, {a_1 a_2}, {\dots}, {a_1 \dots a_k} can be squares? Is it true that, for any {\varepsilon>0}, there can be more than {x^{1-\varepsilon}} squares?

If one lets {L(x)} denote the maximal number of squares amongst such partial products, it was observed in the paper of Erdös and Graham that the bound {L(x) = o(x)} is “trivial” (no proof was provided, but one can for instance argue using the fact that the number of integer solutions to hyperelliptic equations of the form {(n+h_1) \dots (n+h_d) = m^2} for fixed {h_1 < \dots < h_d} is quite sparse, and in fact finite for {d>2} thanks to Siegel’s theorem), and the problem then asks if {L(x) = x^{1-o(1)}}.

It turns out that this problem was essentially solved (though not explicitly) by a recently published paper of Bui, Pratt, and Zaharescu, who studied a closely related quantity {t_n} introduced by Erdös, Graham, and Selfridge (see also Problem B30 of Guy’s book), defined for any natural number {n} as the least natural number {t_n} such that some subset of {n+1,\dots,n+t_{n}}, when multiplied together with {n}, produced a square. Among the several results proven about {t_n} in that paper was the following:

Theorem 2 (Bui–Pratt–Zaharescu, Theorem 1.2) For {x} sufficiently large, there exist {\gg x \exp(-(3\sqrt{2}/2+o(1)) \sqrt{\log x} \sqrt{\log\log x})} integers {1 \leq n \leq x} such that {t_n \leq \exp((\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x})}.

The arguments were in fact quite elementary, with the main tool being the theory of smooth numbers (the theory of hyperelliptic equations is used elsewhere in the paper, but not for this particular result).

If one uses this result as a “black box”, then an easy greedy algorithm argument gives the lower bound

\displaystyle  L(x) \geq x\exp(-(5\sqrt{2}/2+o(1)) \sqrt{\log x} \sqrt{\log\log x}),

but with a small amount of additional work, one can modify the proof of the theorem to give a slightly better bound:

Theorem 3 (Bounds for {L}) As {x \rightarrow \infty}, we have the lower bound

\displaystyle L(x) \geq x\exp(-(\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x})

and the upper bound

\displaystyle  L(x) \leq x\exp(-(1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x}).

In particular, for any {\varepsilon>0}, one has {L(x) \geq x^{1-\varepsilon}} for sufficiently large {x}.

The purpose of this blog post is to record this modification of the argument, which is short enough to present immediately. For a large {x}, let {u} denote the quantity

\displaystyle  u := \sqrt{2} \sqrt{\log x} / \sqrt{\log\log x}.

We call a natural number {x^{1/u}}-smooth if all of its prime factors are at most {x^{1/u}}. From a result of Hildebrand (or the older results of de Bruijn), we know that the number {\pi(x, x^{1/u})} of {x^{1/u}}-smooth numbers less than or equal to {x} is

\displaystyle  \pi(x, x^{1/u}) = x \exp( - (1+o(1)) u \log u ) \ \ \ \ \ (1)

\displaystyle  = x \exp( - (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x} ).

Let {\pi(x^{1/u})} be the number of primes up to {x^{1/u}}. From the prime number theorem we have

\displaystyle  \pi(x^{1/u}) = (1+o(1)) x^{1/u} / \log x^{1/u} \ \ \ \ \ (2)

\displaystyle  = \exp( (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x} ).

To prove the lower bound on {L(x)}, which is a variant of Theorem 2. The key observation is that given any {\pi(x^{1/u})+1} {x^{1/u}}-smooth numbers {b_1,\dots,b_{\pi(x^{1/u})+1}}, some non-trivial subcollection of them will multiply to a square. This is essentially Lemma 4.2 of Bui–Pratt–Zaharescu, but for the convenience of the reader we give a full proof here. Consider the multiplicative homomorphism {f: {\bf N} \rightarrow ({\bf Z}/2{\bf Z})^{\pi(x^{1/u})}} defined by

\displaystyle  f(n) := (\nu_{p_i}(n) \mod 2)_{i=1}^{\pi(x^{1/u})},

where {p_i} is the {i^{\mathrm{th}}} prime and {\nu_{p_i}(n)} is the number of times {p_i} divides {n}. The vectors {f(b_1),\dots,f(b_{\pi(x^{1/u})+1})} lie in a {\pi(x^{1/u})}-dimensional vector space over {{\bf Z}/2{\bf Z}}, and thus are linearly dependent. Thus there exists a non-trivial collection of these vectors that sums to zero, which implies that the corresponding elements of the sequence {b_1,\dots,b_{\pi(x^{1/u})+1}} multiply to a square.

From (1), (2) we can find {x \exp( - (\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x} )} sequences of {x^{1/u}}-smooth numbers {b_1 < \dots < b_{\pi(x^{1/u})+1}} in {\{1,\dots,x\}}, with each sequence being to the right of the previous sequence. By the above observation, each sequence contains some non-trivial subcollection that multiplies to a square. Concatenating all these subsequences together, we obtain a single sequence {1 \leq a_1 < \dots < a_k \leq x} with at least {x \exp( - (\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x} )} partial products multiplying to a square, giving the desired lower bound on {L(x)}.

Next, we prove the upper bound on {L(x)}. Suppose that a sequence {1 \leq a_1 < \dots < a_k \leq x} has {L(x)} partial products {a_1 \dots a_{i_l}} that are squares for some {1 \leq i_1 < \dots < i_{L(x)} \leq k}. Then we have {a_{i_l+1} \dots a_{i_{l+1}}} a square for all {0 \leq l < L(x)} (with the convention {i_0=0}). The key observation (essentially Lemma 3.4 of Bui–Pratt–Zaharescu) is that, for each {0 \leq l < L(x)}, one of the following must hold:

  • (i) At least one of the {a_{i_l+1},\dots,a_{i_{l+1}}} is {x^{1/u}}-smooth.
  • (ii) At least one of the {a_{i_l+1},\dots,a_{i_{l+1}}} is divisible by {p^2} for some prime {p>x^{1/u}}.
  • (iii) {a_{i_{l+1}} - a_{i_l+1} > x^{1/u}}.
Indeed, suppose that (i) and (ii) are not true, then one of the terms in the sequence {a_{i_l+1},\dots,a_{i_{l+1}}} is divisible by exactly one copy of {p} for some prime {p > x^{1/u}}. In order for the product {a_{i_l+1} \dots a_{i_{l+1}}} to be a square, another element of the sequence must also be divisible by the same prime; but this implies (iii).

From (1) we see that the number of {l} for which (i) occurs is at most {x \exp( - (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x})}. From the union bound we see that the number of {l} for which (ii) occurs is at most

\displaystyle  \ll \sum_{p > x^{1/u}} x/p^2 \ll x^{1-1/u} = x \exp( - (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x}).

Finally, from the pigeonhole principle we see that the number of {l} for which (iii) occurs is also at most

\displaystyle  x^{1-1/u} = x \exp( - (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x}).

Thus one has {L(x) \ll x \exp( - (1/\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x})}, as desired. This completes the proof.

The upper bound arguments seem more crude to the author than the lower bound arguments, so I conjecture that the lower bound is in fact the truth: {L(x) = x\exp(-(\sqrt{2}+o(1)) \sqrt{\log x} \sqrt{\log\log x})}.

In a previous blog post, I discussed how, from a Bayesian perspective, learning about some new information {E} can update one’s perceived odds {{\mathbb P}(H_1) / {\mathbb P}(H_0)} about how likely an “alternative hypothesis” {H_1} is, compared to a “null hypothesis” {H_0}. The mathematical formula here is

\displaystyle  \frac{{\mathbb P}(H_1|E)}{{\mathbb P}(H_0|E)} = \frac{{\mathbb P}(E|H_1)}{{\mathbb P}(E|H_0)} \frac{{\mathbb P}(H_1)}{{\mathbb P}(H_0)}. \ \ \ \ \ (1)

Thus, provided one has
  • (i) A precise formulation of the null hypothesis {H_0} and the alternative hypothesis {H_1}, and the new information {E};
  • (ii) A reasonable estimate of the prior odds {{\mathbb P}(H_1) / {\mathbb P}(H_0)} of the alternative hypothesis {H_1} being true (compared to the null hypothesis {H_0});
  • (iii) A reasonable estimate of the probability {{\mathbb P}(E|H_0)} that the event {E} would occur under the null hypothesis {H_0}; and
  • (iv) A reasonable estimate of the probability {{\mathbb P}(E|H_1)} that the event {E} would occur under the alternative hypothesis {H_1},
one can then obtain an estimate of the posterior odds {{\mathbb P}(H_1|E) / {\mathbb P}(H_0|E)} of the alternative hypothesis being true after observing the event {E}. Ingredient (iii) is usually the easiest to compute, but is only one of the key inputs required; ingredients (ii) and (iv) are far trickier, involve subjective non-mathematical considerations, and as discussed in the previous post, depend rather crucially on ingredient (i). For instance, selectively reporting some relevant information for {E}, and witholding other key information from {E}, can make the final mathematical calculation {{\mathbb P}(H_1|E) / {\mathbb P}(H_0|E)} misleading with regard to the “true” odds of {H_1} being true compared to {H_1} based on all available information, even if the calculation was technically accurate with regards to the partial information {E} provided. Also, the computation of the relative odds of two competing hypotheses {H_0} and {H_1} can become somewhat moot if there is a third hypothesis {H_2} that ends up being more plausible than either of these two hypotheses. Nevertheless, the formula (1) can still lead to useful conclusions, albeit ones that are qualified by the particular assumptions and estimates made in the analysis.

At a qualitative level, the Bayesian identity (1) is telling us the following: if an alternative hypothesis {H_1} was already somewhat plausible (so that the prior odds {{\mathbb P}(H_1) / {\mathbb P}(H_0)} was not vanishingly small), and the observed event {E} was significantly more likely to occur under hypothesis {H_1} than under {H_0}, then the hypothesis {H_1} becomes significantly more plausible (in that the posterior odds {{\mathbb P}(H_1|E) / {\mathbb P}(H_0|E)} become quite elevated). This is quite intuitive, but as discussed in the previous post, a lot hinges on how one is defining the alternative hypothesis {H_1}.

In the previous blog post, this calculation was initially illustrated with the following choices of {H_0}, {H_1}, and {E} (thus fulfilling ingredient (i)):

  • {E} was the event that the October 1, 2022 PSCO Grand Lotto in the Phillippines drew the numbers {9, 18, 27, 36, 45, 54} (that is to say, consecutive multiples if {9}), though not necessarily in that order;
  • {H_0} was the null hypothesis that the lottery was fair and the numbers were drawn uniformly at random (without replacement) from the set {\{1,\dots,55\}}; and
  • {H_1} was the alternative hypothesis that the lottery was rigged by some corrupt officials for their personal gain.
In this case, ingredient (iii) can be computed mathematically and precisely:

\displaystyle  {\mathbb P}(E|H_0) = \frac{1}{\binom{55}{6}} = \frac{1}{28,989,675}. \ \ \ \ \ (2)

And, with a not inconceivable level of cynicism about the integrity of the lottery, the prior odds (ingredient (ii)) can be argued to be non-negligible. However, ingredient (iv) was nearly impossible to estimate: indeed, as argued in that post, there is no reason to suspect that {{\mathbb P}(E|H_1)} is much larger than the tiny probability (2), and in fact it could well be smaller (since would likely be in the interest of corrupt lottery officials to not draw attention to their activities). So, despite the very small numerical value of the probability (2), this did not lead to any significant increase in the odds of the alternative hypothesis. In the previous blog post, several other variants {H'_1}, {H''_1}, {H'''_1} of the alternative hypothesis {H_1} were also discussed; the conclusion was that while some choices of alternative hypothesis could lead to elevated probabilities for ingredient (iv), they came at the cost of significantly reducing the prior odds in ingredient (ii), and so no alternative hypothesis was located which ended up being significantly more plausible than the null hypothesis {H_0} after observing the event {E}.

In this post, I would like to run the same analysis on a numerical anomaly in the recent Venezuelan presidential election of June 28, 2024. Here are the officially reported vote totals for the two main candidates, incumbent president Nicolás Maduro and opposition candidate Edmundo González, in the election:

  • Maduro: 5,150,092 votes
  • González: 4,445,978 votes
  • Other: 462,704 votes
  • Total: 10,058,774 votes.
The numerical anomaly is that if one multiplies the total number of voters {10,058,774} by the round percentages {51.2\%}, {44.2\%}, {4.6\%}, one recovers exactly the above vote counts after rounding to the nearest integer:

\displaystyle  51.2\% \times 10,058,774 = 5,150,092.288

\displaystyle  44.2\% \times 10,058,774 = 4,445,978.108

\displaystyle  4.6\% \times 10,058,774 = 462,703.604.

Let us try to apply the above Bayesian framework to this situation, bearing in mind the caveats that this analysis is only strong as the inputs supplied and assumptions made (for instance, to simplify the discussion, we will not also discuss information from exit polling, which in this case gave significantly different predictions from the percentages above).

The first step (ingredient (i)) is to formulate the null hypothesis {H_0}, the alternative hypothesis {H_1}, and the event {E}. Here is one possible choice:

  • {E} is the event that the reported vote total for Maduro, González, and Other are all equal to the nearest integer of the total number of voters, multiplied by a round percentage with one decimal point (i.e., an integer multiple of {0.1\%}).
  • {H_0} is the null hypothesis that the vote totals were reported accurately (or with only inconsequential inaccuracies).
  • {H_1} is the alternative hypothesis that the vote totals were manipulated by officials from the incumbent administration.

Ingredient (ii) – the prior odds that {H_1} is true over {H_0} – is highly subjective, and an individual’s estimation of (ii) would likely depend on, or at least be correlated with, their opinion of the current Venezulan administration. Discussion of this ingredient is therefore more political than mathematical, and I will not attempt to quantify it further here. Now we turn to (iii), the estimation of the probability {{\mathbb P}(E|H_0)} that {E} occurs given the hypothesis {H_0}. This cannot be computed exactly without a precise probabilistic model of the voting electorate, but let us make a rough order of magnitude calculation as follows. One can focus on the anomaly just for the number of votes received by Maduro and González, since if both of these counts were the nearest integer to a round percentages then just from simple subtraction the number of votes for “other” would also be forced to also be the nearest integer from a round percentage, possibly plus or minus one due to carries, so up to a factor of two or so we can ignore the latter anomaly. As a simple model, suppose that the voting percentages for Maduro and González were distributed more or less uniformly in some square {[p-\varepsilon,p+\varepsilon] \times [q-\varepsilon,q+\varepsilon]}, where {p, q} are some proportions not too close to either {0} or {1}, and {\varepsilon} is some reasonably large margin of error (the exact values of these parameters will end up not being too important, nor will the specific shape of the distribution; indeed, the shape and size of the square here only impacts the analysis through the area {(2\varepsilon)^2} of the square, and even this quantity cancels itself out in the end). Thus, the number of votes for Maduro is distributed in an interval of length about {2\varepsilon N}, where {N = 10,058,774} is the number of voters, and similarly for González, so the total number of different outcomes here is {(2\varepsilon N)^2}, and by our model we have a uniform distribution amongst all these outcomes. On the other hand, the total number of attainable round percentages for Maduro is about {(2\varepsilon) / 0.1\% = 1000 \times 2\varepsilon}, and similarly for González, so our estimate for {{\mathbb P}(E|H_0)} is

\displaystyle {\mathbb P}(E|H_0) \approx \frac{(1000 \times 2\varepsilon)^2}{(2\varepsilon N)^2} = (1000/N)^2 \approx 10^{-8}.

This looks quite unlikely! But we are not done yet, because we also need to estimate {{\mathbb P}(E|H_1)}, the probability that the event {E} would occur under the alternative hypothesis {H_1}. Here one has to be careful, because while it could happen under hypothesis {H_1} that the vote counts were manipulated to be exactly the nearest integer to a round percentage, this is not the only outcome under this hypothesis, and indeed one could argue that it would not be in the interest of an administration to generate such a striking numerical anomaly. But one can create a reasonable chain of events with which to estimate (from below) this probability by a kind of “Drake equation“. Consider the following variants of {H_1}:
  • {H'_1}: {H_1} is true, and the administration directs election officials to report vote outcomes with some explicitly preferred (round) percentages, regardless of the actual election results.
  • {H''_1}: {H'_1} is true, and the election officials dutifully generate a report by multiplying these preferred percentages by the total number {N} of voters, and rounding to the nearest integer, without any attempt to disguise their actions.
By the chain rule for conditional probability, one has a lower bound

\displaystyle  {\mathbb P}(E|H_1) \geq {\mathbb P}(E, H''_1, H'_1|H_1) = {\mathbb P}(E|H''_1) {\mathbb P}(H''_1|H'_1) {\mathbb P}(H'_1|H_1).

Inserting this into (1), we obtain our final lower bound:

\displaystyle  \frac{{\mathbb P}(H_1|E)}{{\mathbb P}(H_0|E)} \gtrapprox 10^8 \times {\mathbb P}(E|H''_1) {\mathbb P}(H''_1|H'_1) {\mathbb P}(H'_1|H_1) \frac{{\mathbb P}(H_1)}{{\mathbb P}(H_0)}.

This is about as far as one can get purely with mathematical analysis. Beyond this, one has to make some largely subjective estimations for each of the remaining probabilities and odds in this formula. As mentioned, the prior odds {{\mathbb P}(H_1)/{\mathbb P}(H_0)} will likely depend on the individual making this calculation, and will not be discussed further here. The remaining question then is how large the probabilities {{\mathbb P}(H'_1|H_1)}, {{\mathbb P}(H''_1|H'_1)}, and {{\mathbb P}(E|H''_1)} are. In other words:
  • If one assumes that the administration wishes to manipulate the vote totals, how likely is it a priori (i.e., without being aware of the anomaly {E}) that they would do so by explictly selecting preferred round percentages and then requesting that election officials report these percentages?
  • If one assumes that election officials are being ordered to report vote totals to reflect a preferred round percentage, how likely is it a priori that they would follow the orders without question, and performing simple rounding instead of any more sophisticated numerical manipulation?
  • If one assumes that election officials did indeed follow the orders as above, how likely is it a priori that the report would be published as is without any concerns raised by other officials or observers?
If one’s estimate of the product of these three probabilities multiplies to be significantly greater than {10^{-8}}, then we can conclude that the event {E} has indeed significantly raised the odds of some sort of voting manipulation present. The scenario described above is somewhat plausible, especially in light of the anomaly {E}, and so certainly the posterior probabilities {{\mathbb P}(H'_1|H_1,E)}, {{\mathbb P}(H''_1|H'_1,E)} seem quite large (and the posterior probability {{\mathbb P}(E|H''_1,E)} is of course equal to {1}). But it is important here to avoid confirmation bias and work only with a priori probabilities – roughly speaking, the probabilities that one would assign to such events on July 27, 2024, before knowledge of the anomaly {E} came to light. Nevertheless, even after accounting for confirmation bias, I think it is plausible that the above product of a priori probabilities is indeed significantly larger than {10^{-8}} (for instance, to assign probabilities somewhat randomly, this would be the case of each of the conditional probabilities {{\mathbb P}(H'_1|H_1)}, {{\mathbb P}(H''_1|H'_1)} and {{\mathbb P}(E|H''_1)} all exceed {1\%}), giving credence to the theory of the election report being manipulated (though it is possible that the manipulation could occur through a third hypothesis {H_2} not covered by the original two hypotheses, such as a software glitch). If one adds in additional information beyond the purely numerical anomaly {E}, such as the fact that the reported totals were not broken down further by voting district (which would be less likely under hypothesis {H_0} than hypothesis {H_1}), and that exit polls gave significantly different results from the reported totals (which is again less likely under hypothesis {H_0} than hypothesis {H_1}), the evidence for voting irregularities becomes quite significant.

One can contrast this analysis with that of the Phillipine lottery in the original post. In both cases the probability {{\mathbb P}(E|H_0)} of the observed event under the null hypothesis was extremely small. However, in the case of the Venezuelan election, there is a plausible causal chain {H_1 \implies H'_1 \implies H''_1 \implies E} that leads to an elevated probability {{\mathbb P}(E|H_1)} of the observed event under the alternative hypothesis, whereas in the case of the lottery, only extremely implausible chains could be constructed that would lead to the specific outcome of a multiples-of-9 lottery draw for that specific lottery on that specific date.

I’ve just uploaded to the arXiv my paper “Dense sets of natural numbers with unusually large least common multiples“. This short paper answers (in the negative) a somewhat obscure question of Erdős and Graham:

Problem 1 Is it true that if {A} is a set of natural numbers for which

\displaystyle  \frac{1}{\log\log x} \sum_{n \in A: n \leq x} \frac{1}{n} \ \ \ \ \ (1)

goes to infinity as {x \rightarrow \infty}, then the quantity

\displaystyle  \frac{1}{(\sum_{n \in A: n \leq x} \frac{1}{n})^2} \sum_{n,m \in A: n < m \leq x} \frac{1}{\mathrm{lcm}(n,m)} \ \ \ \ \ (2)

also goes to infinity as {x \rightarrow \infty}?

At first glance, this problem may seem rather arbitrary, but it can be motivated as follows. The hypothesis that (1) goes to infinity is a largeness condition on {A}; in view of Mertens’ theorem, it can be viewed as an assertion that {A} is denser than the set of primes. On the other hand, the conclusion that (2) grows is an assertion that {\frac{1}{\mathrm{lcm}(n,m)}} becomes significantly larger than {\frac{1}{nm}} on the average for large {n,m \in A}; that is to say, that many pairs of numbers in {A} share a common factor. Intuitively, the problem is then asking whether sets that are significantly denser than the primes must start having lots of common factors on average.

For sake of comparison, it is easy to see that if (1) goes to infinity, then at least one pair {(n,m)} of distinct elements in {A} must have a non-trivial common factor. For if this were not the case, then the elements of {A} are pairwise coprime, so each prime {p} has at most one multiple in {A}, and so can contribute at most {1/p} to the sum in (1), and hence by Mertens’ theorem, and the fact that every natural number greater than one is divisible by at least one prime {p}, the quantity (1) stays bounded, a contradiction.

It turns out, though, that the answer to the above problem is negative; one can find sets {A} that are denser than the primes, but for which (2) stays bounded, so that the least common multiples in the set are unusually large. It was a bit surprising to me that this question had not been resolved long ago (in fact, I was not able to find any prior literature on the problem beyond the original reference of Erdős and Graham); in contrast, another problem of Erdős and Graham concerning sets with unusually small least common multiples was extensively studied (and essentially solved) about twenty years ago, while the study of sets with unusually large greatest common divisor for many pairs in the set has recently become somewhat popular, due to their role in the proof of the Duffin-Schaeffer conjecture by Koukoulopoulos and Maynard.

To search for counterexamples, it is natural to look for numbers with relatively few prime factors, in order to reduce their common factors and increase their least common multiple. A particularly simple example, whose verification is on the level of an exercise in a graduate analytic number theory course, is the set of semiprimes (products of two primes), for which one can readily verify that (1) grows like {\log\log x} but (2) stays bounded. With a bit more effort, I was able to optimize the construction and uncover the true threshold for boundedness of (2), which was a little unexpected:

Theorem 2
  • (i) For any {C>0}, there exists a set of natural numbers {A} with

    \displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} = \exp( (C+o(1)) (\log\log x)^{1/2} \log\log\log x )

    for all large {x}, for which (2) stays bounded.
  • (ii) Conversely, if (2) stays bounded, then

    \displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} \ll \exp( O( (\log\log x)^{1/2} \log\log\log x ) )

    for all large {x}.

The proofs are not particularly long or deep, but I thought I would record here some of the process towards finding them. My first step was to try to simplify the condition that (2) stays bounded. In order to use probabilistic intuition, I first expressed this condition in probabilistic terms as

\displaystyle  \mathbb{E} \frac{\mathbf{n} \mathbf{m}}{\mathrm{lcm}(\mathbf{n}, \mathbf{m})} \ll 1

for large {x}, where {\mathbf{n}, \mathbf{m}} are independent random variables drawn from {\{ n \in A: n \leq x \}} with probability density function

\displaystyle  \mathbb{P} (\mathbf{n} = n) = \frac{1}{\sum_{m \in A: m \leq x} \frac{1}{m}} \frac{1}{n}.

The presence of the least common multiple in the denominator is annoying, but one can easily flip the expression to the greatest common divisor:

\displaystyle  \mathbb{E} \mathrm{gcd}(\mathbf{n}, \mathbf{m}) \ll 1.

If the expression {\mathrm{gcd}(\mathbf{n}, \mathbf{m})} was a product of a function of {\mathbf{n}} and a function of {\mathbf{m}}, then by independence this expectation would decouple into simpler averages involving just one random variable instead of two. Of course, the greatest common divisor is not of this form, but there is a standard trick in analytic number theory to decouple the greatest common divisor, namely to use the classic Gauss identity {n = \sum_{d|n} \varphi(d)}, with {\varphi} the Euler totient function, to write

\displaystyle  \mathrm{gcd}(\mathbf{n}, \mathbf{m}) = \sum_{d | \mathbf{n}, \mathbf{m}} \varphi(d).

Inserting this formula and interchanging the sum and expectation, we can now express the condition as bounding a sum of squares:

\displaystyle  \sum_d \varphi(d) \mathbb{P}(d|\mathbf{n})^2 \ll 1.

Thus, the condition (2) is really an assertion to the effect that typical elements of {A} do not have many divisors. From experience in sieve theory, the probabilities {\mathbb{P}(d|\mathbf{n})} tend to behave multiplicatively in {d}, so the expression here heuristically behaves like an Euler product that looks something like

\displaystyle  \prod_p (1 + \varphi(p) \mathbb{P}(p|\mathbf{n})^2)

and so the condition (2) is morally something like

\displaystyle  \sum_p p \mathbb{P}(p|\mathbf{n})^2 \ll 1. \ \ \ \ \ (3)

Comparing this with the Mertens’ theorems, this leads to the heuristic prediction that {\mathbb{P}(p|\mathbf{n})} (for a typical prie {p} much smaller than {x}) should decay somewhat like {\frac{1}{p (\log\log p)^{1/2}}} (ignoring for now factors of {\log\log\log p}). This can be compared to the example of the set of primes or semiprimes on one hand, where the probability is like {\frac{1}{p \log\log p}}, and the set of all natural numbers on the other hand, where the probability is like {\frac{1}{p}}. So the critical behavior should come from sets that are in some sense “halfway” between the primes and the natural numbers.

It is then natural to try a random construction, in which one sieves out the natural numbers by permitting each natural number {n} to survive with a probability resembling {\prod_{p|n} \frac{1}{(\log\log p)^{1/2}}}, in order to get the predicted behavior for {\mathbb{P}(p|\mathbf{n})}. Performing some standard calculations, this construction could ensure (2) bounded with a density a little bit less than the one stated in the main theorem; after optimizing the parameters, I could only get something like

\displaystyle  \sum_{n \in A: n \leq x} \frac{1}{n} = \exp( (\log\log x)^{1/2} (\log\log\log x)^{-1/2-o(1)} ).

I was stuck on optimising the construction further, so I turned my attention to a positive result in the spirit of (ii) of the main theorem. On playing around with (3), I observed that one could use Cauchy-Schwarz and Mertens’ theorem to obtain the bound

\displaystyle  \sum_{p \leq x} \mathbb{P}(p|\mathbf{n}) \ll (\log\log x)^{1/2}

which was in line with the previous heuristic that {\mathbb{P}(p|\mathbf{n})} should behave like {\frac{1}{p (\log\log p)^{1/2}}}. The left-hand side had a simple interpretation: by linearity of expectation, it was the expected number {\mathbb{E} \omega(\mathbf{n})} of prime factors of {\mathbf{n}}. So the boundedness of (2) implied that a typical element of {A} only had about {(\log\log x)^{1/2}} prime factors, in contrast to the {\log\log x} predicted by the Hardy-Ramanujan law. Standard methods from the anatomy of integers can then be used to see how dense a set with that many prime factors could be, and this soon led to a short proof of part (ii) of the main theorem (I eventually found for instance that Jensen’s inequality could be used to create a particularly slick argument).

It then remained to improve the lower bound construction to eliminate the {\log\log\log x} losses in the exponents. By deconstructing the proof of the upper bound, it became natural to consider something like the set of natural numbers {n} that had at most {(\log\log n)^{1/2}} prime factors. This construction actually worked for some scales {x} – namely those {x} for which {(\log\log x)^{1/2}} was a natural number – but there was some strange “discontinuities” in the analysis that prevented me from establishing the boundedness of (2) for arbitrary scales {x}. The basic problem was that increasing the number of permitted prime factors from one natural number threshold {k} to another {k+1} ended up increasing the density of the set by an unbounded factor (of the order of {k}, in practice), which heavily disrupted the task of trying to keep the ratio (2) bounded. Usually the resolution to these sorts of discontinuities is to use some sort of random “average” of two or more deterministic constructions – for instance, by taking some random union of some numbers with {k} prime factors and some numbers with {k+1} prime factors – but the numerology turned out to be somewhat unfavorable, allowing for some improvement in the lower bounds over my previous construction, but not enough to close the gap entirely. It was only after substantial trial and error that I was able to find a working deterministic construction, where at a given scale one collected either numbers with at most {k} prime factors, or numbers with {k+1} prime factors but with the largest prime factor in a specific range, in which I could finally get the numerator and denominator in (2) to be in balance for every {x}. But once the construction was written down, the verification of the required properties ended up being quite routine.

Many modern mathematical proofs are a combination of conceptual arguments and technical calculations. There is something of a tradeoff between the two: one can add more conceptual arguments to try to reduce the technical computations, or vice versa. (Among other things, this leads to a Berkson paradox-like phenomenon in which a negative correlation can be observed between the two aspects of a proof; see this recent Mastodon post of mine for more discussion.)

In a recent article, Heather Macbeth argues that the preferred balance between conceptual and computational arguments is quite different for a computer-assisted proof than it is for a purely human-readable proof. In the latter, there is a strong incentive to minimize the amount of calculation to the point where it can be checked by hand, even if this requires a certain amount of ad hoc rearrangement of cases, unmotivated parameter selection, or otherwise non-conceptual additions to the arguments in order to reduce the calculation. But in the former, once one is willing to outsource any tedious verification or optimization task to a computer, the incentives are reversed: freed from the need to arrange the argument to reduce the amount of calculation, one can now describe an argument by listing the main ingredients and then letting the computer figure out a suitable way to combine them to give the stated result. The two approaches can thus be viewed as complementary ways to describe a result, with neither necessarily being superior to the other.

In this post, I would like to illustrate this computation-outsourced approach with the topic of zero-density theorems for the Riemann zeta function, in which all computer verifiable calculations (as well as other routine but tedious arguments) are performed “off-stage”, with the intent of focusing only on the conceptual inputs to these theorems.

Zero-density theorems concern upper bounds for the quantity {N(\sigma,T)} for a given {1/2 \leq \sigma \leq 1} and large {T}, which is defined as the number of zeroes of the Riemann zeta function in the rectangle {\{ \beta+i\gamma: \sigma \leq \beta \leq 1; 0 \leq \gamma \leq T \}}. (There is also an important generalization of this quantity to {L}-functions, but for simplicity we will focus on the classical zeta function case here). Such quantities are important in analytic number theory for many reasons, one of which is through explicit formulae such as the Riemann-von Mangoldt explicit formula

\displaystyle  \sum_{n \leq x}^{\prime} \Lambda(n) = x - \sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho} - \log(2\pi) - \frac{1}{2} \log(1-x^{-2}) \ \ \ \ \ (1)

relating the prime numbers to the zeroes of the zeta function (the “music of the primes”). The better bounds one has on {N(\sigma,T)}, the more control one has on the complicated term {\sum_{\rho:\zeta(\rho)=0} \frac{x^\rho}{\rho}} on the right-hand side.

Clearly {N(\sigma,T)} is non-increasing in {\sigma}. The Riemann-von Mangoldt formula, together with the functional equation, gives us the asymptotic

\displaystyle  N(1/2,T) \asymp T \log T

in the {\sigma=1/2} case, while the prime number theorem tells us that

\displaystyle  N(1,T) = 0. \ \ \ \ \ (2)

The various zero free regions for the zeta function can be viewed as slight improvements to (2); for instance, the classical zero-free region is equivalent to the assertion that {N(\sigma,T)} vanishes if {\sigma > 1 - c/\log T} for some small absolute constant {c>0}, and the Riemann hypothesis is equivalent to the assertion that {N(\sigma,T)=0} for all {\sigma>1/2}.

Experience has shown that the most important quantity to control here is the exponent {A(\sigma)}, defined as the least constant for which one has an asymptotic

\displaystyle  N(\sigma,T) = T^{A(\sigma)(1-\sigma)+o(1)}

as {T \rightarrow \infty}. Thus, for instance,

\displaystyle  A(1/2) = 2, \ \ \ \ \ (3)

{A(1) = 0}, and {A(\sigma)(1-\sigma)} is a non-increasing function of {\sigma}, so we obtain the trivial “von Mangoldt” zero density theorem

\displaystyle  A(\sigma) \leq \frac{1}{1-\sigma}.

Of particular interest is the supremal value {\|A\|_\infty := \sup_{1/2 \leq \sigma \leq 1} A(\sigma)} of {A}, which has to be at least {2} thanks to (3). The density hypothesis asserts that the maximum is in fact exactly {2}, or equivalently that

\displaystyle  A(\sigma) \leq 2, \ \ \ \ \ (4)

for all {1/2 \leq \sigma \leq 1}. This is of course implied by the Riemann hypothesis (which clearly implies that {A(\sigma)=0} for all {\sigma>1/2}), but is a more tractable hypothesis to work with; for instance, the hypothesis is already known to hold for {\sigma \geq 25/32 = 0.78125} by the work of Bourgain (building upon many previous authors). The quantity {\|A\|_\infty} directly impacts our understanding of the prime number theorem in short intervals; indeed, it is not difficult using (1) (as well as the Vinogradov-Korobov zero-free region) to establish a short interval prime number theorem

\displaystyle  \sum_{x \leq n \leq x + x^\theta} \Lambda(n) = (1+o(1)) x^\theta

for all {x \rightarrow \infty} if {1 - \frac{1}{\|A\|_\infty} < \theta < 1} is a fixed exponent, or for almost all {x \rightarrow \infty} if {1 - \frac{2}{\|A\|_\infty} < \theta < 1} is a fixed exponent. Until recently, the best upper bound for {\|A\|_\infty} was {12/5 = 2.4}, thanks to a 1972 result of Huxley; but this was recently lowered to {30/13=2.307\ldots} in a breakthrough work of Guth and Maynard.

In between the papers of Huxley and Guth-Maynard are dozens of additional improvements on {A(\sigma)}, though it is only the Guth-Maynard paper that actually lowered the supremum norm {\|A\|_\infty}. A summary of most of the state of the art before Guth-Maynard may be found in Table 2 of this recent paper of Trudgian and Yang; it is complicated, but it is easy enough to get a computer to illustrate it with a plot:

(For an explanation of what is going on under the assumption of the Lindelöf hypothesis, see below the fold.) This plot represents the combined effort of nearly a dozen papers, each one of which claims one or more components of the depicted piecewise smooth curve, and is written in the “human-readable” style mentioned above, where the argument is arranged to reduce the amount of tedious computation to human-verifiable levels, even if this comes the cost of obscuring the conceptual ideas. (For an animation of how this bound improved over time, see here.) Below the fold, I will try to describe (in sketch form) some of the standard ingredients that go into these papers, in particular the routine reduction of deriving zero density estimates from large value theorems for Dirichlet series. We will not attempt to rewrite the entire literature of zero-density estimates in this fashion, but focus on some illustrative special cases.

Read the rest of this entry »

The Salem prize was established in 1968 and named in honor of Raphaël Salem (1898-1963), a mathematician famous notably for his deep study of the links between Fourier series and number theory and for pioneering applications of probabilistic methods to these fields. It was not awarded from 2019-2022, due to both the COVID pandemic and the death of Jean Bourgain who had been almost single-handedly administering the prize, but is now active again, being administered by Akshay Ventakesh and the IAS. I chair the scientific committee for this prize, whose other members are Guy David and Mikhail Sodin. Last year, the prize was awarded to Sarah Peluse and Julian Sahasrabudhe.

Nominations for the 2024 Salem Prize are now open until September 1st. Nominations should include a CV of the nominee and a nomination letter explaining the significance of the nominee’s work. Supplementary documentation, such as supporting letters of recommendation or key publications, can additionally be provided, but are not required.

Nominees may be individuals from any country or institution. Preference will be given to nominees who have received their PhD in the last ten years, although this rule may be relaxed if there are mitigating personal circumstances, or if there have been few Salem prize winners in recent years.  Self-nominations will not be considered, nor are past Prize winners or Scientific Committee members eligible.

The prize does not come with a direct monetary award, but winners will be invited to visit the IAS and to give a lecture associated with the award of the prize.

See also the previous year’s announcement of the Salem prize nomination period.

Archives