You are currently browsing the category archive for the ‘math.NT’ category.

Earlier this year, I gave a series of lectures at the Joint Mathematics Meetings at San Francisco. I am uploading here the slides for these talks:

I also have written a text version of the first talk, which has been submitted to the Notices of the American Mathematical Society.

I have just uploaded to the arXiv my paper “Monotone non-decreasing sequences of the Euler totient function“. This paper concerns the quantity {M(x)}, defined as the length of the longest subsequence of the numbers from {1} to {x} for which the Euler totient function {\varphi} is non-decreasing. The first few values of {M} are

\displaystyle  1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 12, 12, \dots

(OEIS A365339). For instance, {M(6)=5} because the totient function is non-decreasing on the set {\{1,2,3,4,5\}} or {\{1,2,3,4,6\}}, but not on the set {\{1,2,3,4,5,6\}}.

Since {\varphi(p)=p-1} for any prime {p}, we have {M(x) \geq \pi(x)}, where {\pi(x)} is the prime counting function. Empirically, the primes come quite close to achieving the maximum length {M(x)}; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had

\displaystyle  M(x) = \pi(x)+64 \ \ \ \ \ (1)

for all {x \geq 31957}; this conjecture is verified up to {x=10^7}. The previous best known upper bound was basically of the form

\displaystyle  M(x) \leq \exp( (C+o(1)) (\log\log\log x)^2 ) \frac{x}{\log x} \ \ \ \ \ (2)

as {x \rightarrow \infty} for an explicit constant {C = 0.81781\dots}, from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic

\displaystyle  M(x) = \left( 1 + O \left(\frac{(\log\log x)^5}{\log x}\right) \right) \frac{x}{\log x}

so in particular {M(x) = (1+o(1))\pi(x)}. This answers a question of Erdős, as well as a closely related question of Pollack, Pomerance, and Treviño.

The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor {p} of a given number {1 \leq n \leq x} which has a sizeable influence on the totient function {\varphi(n)}. For instance, for “typical” numbers {n}, one has a factorization

\displaystyle  n = d p_2 p_1

where {p_2} is a medium sized prime, {p_1} is a significantly larger prime, and {d} is a number with all prime factors less than {p_2}. This leads to an approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} (1-\frac{1}{p_2}) n.

As a consequence, if we temporarily hold {d} fixed, and also localize {n} to a relatively short interval, then {\varphi} can only be non-decreasing in {n} if {p_2} is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if {p_2} is large; this can be formalized by partitioning the range of {p_2} into various subintervals and inspecting how this (and the monotonicity hypothesis on {\varphi}) constrains the values of {n} associated to each subinterval. When {p_2} is small, we instead use a factorization

\displaystyle  n = d p \ \ \ \ \ (3)

where {d} is very smooth (i.e., has no large prime factors), and {p} is a large prime. Now we have the approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} n \ \ \ \ \ (4)

and we can conclude that {\frac{\varphi(d)}{d}} will have to basically be piecewise constant in order for {\varphi} to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality

\displaystyle  \sum_{\frac{\varphi(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (5)

for all positive rational numbers {q}. This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with {\frac{\varphi(d)}{d}} equal to a fixed constant {q} (and omitting a few rare values of {n} where the approximation (4) is bad enough that {\varphi} is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of {\frac{\varphi(d)}{d}} in lowest terms necessarily equals the largest prime factor of {d}, that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of {\frac{1}{p-1}} for some primes {p} ranging up to the largest prime factor of {q}) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function {\sigma(n)}, one would need the analogue

\displaystyle  \sum_{\frac{\sigma(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (6)

of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.

In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.

Kevin Ford, Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “A lower bound on the mean value of the Erdős-Hooley delta function“. This paper complements the recent paper of Dimitris and myself obtaining the upper bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log\log x)^{11/4}

on the mean value of the Erdős-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+1} \}

In this paper we obtain a lower bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg (\log\log x)^{1+\eta-o(1)}

where {\eta = 0.3533227\dots} is an exponent that arose in previous work of result of Ford, Green, and Koukoulopoulos, who showed that

\displaystyle  \Delta(n) \gg (\log\log n)^{\eta-o(1)} \ \ \ \ \ (1)

for all {n} outside of a set of density zero. The previous best known lower bound for the mean value was

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \gg \log\log x,

due to Hall and Tenenbaum.

The point is the main contributions to the mean value of {\Delta(n)} are driven not by “typical” numbers {n} of some size {x}, but rather of numbers that have a splitting

\displaystyle  n = n' n''

where {n''} is the product of primes between some intermediate threshold {1 \leq y \leq x} and {x} and behaves “typically” (so in particular, it has about {\log\log x - \log\log y + O(\sqrt{\log\log x})} prime factors, as per the Hardy-Ramanujan law and the Erdős-Kac law, but {n'} is the product of primes up to {y} and has double the number of typical prime factors – {2 \log\log y + O(\sqrt{\log\log x})}, rather than {\log\log y + O(\sqrt{\log\log x})} – thus {n''} is the type of number that would make a significant contribution to the mean value of the divisor function {\tau(n'')}. Here {y} is such that {\log\log y} is an integer in the range

\displaystyle  \varepsilon\log\log x \leq \log \log y \leq (1-\varepsilon) \log\log x

for some small constant {\varepsilon>0} there are basically {\log\log x} different values of {y} give essentially disjoint contributions. From the easy inequalities

\displaystyle  \Delta(n) \gg \Delta(n') \Delta(n'') \geq \frac{\tau(n')}{\log n'} \Delta(n'') \ \ \ \ \ (2)

(the latter coming from the pigeonhole principle) and the fact that {\frac{\tau(n')}{\log n'}} has mean about one, one would expect to get the above result provided that one could get a lower bound of the form

\displaystyle  \Delta(n'') \gg (\log \log n'')^{\eta-o(1)} \ \ \ \ \ (3)

for most typical {n''} with prime factors between {y} and {x}. Unfortunately, due to the lack of small prime factors in {n''}, the arguments of Ford, Green, Koukoulopoulos that give (1) for typical {n} do not quite work for the rougher numbers {n''}. However, it turns out that one can get around this problem by replacing (2) by the more efficient inequality

\displaystyle  \Delta(n) \gg \frac{\tau(n')}{\log n'} \Delta^{(\log n')}(n'')

where

\displaystyle  \Delta^{(v)}(n) := \sup_u \# \{ d|n: e^u < d \leq e^{u+v} \}

is an enlarged version of {\Delta^{(n)}} when {v \geq 1}. This inequality is easily proven by applying the pigeonhole principle to the factors of {n} of the form {d' d''}, where {d'} is one of the {\tau(n')} factors of {n'}, and {d''} is one of the {\Delta^{(\log n')}(n'')} factors of {n''} in the optimal interval {[e^u, e^{u+\log n'}]}. The extra room provided by the enlargement of the range {[e^u, e^{u+1}]} to {[e^u, e^{u+\log n'}]} turns out to be sufficient to adapt the Ford-Green-Koukoulopoulos argument to the rough setting. In fact we are able to use the main technical estimate from that paper as a “black box”, namely that if one considers a random subset {A} of {[D^c, D]} for some small {c>0} and sufficiently large {D} with each {n \in [D^c, D]} lying in {A} with an independent probability {1/n}, then with high probability there should be {\gg c^{-1/\eta+o(1)}} subset sums of {A} that attain the same value. (Initially, what “high probability” means is just “close to {1}“, but one can reduce the failure probability significantly as {c \rightarrow 0} by a “tensor power trick” taking advantage of Bennett’s inequality.)

I have just uploaded to the arXiv my paper “The convergence of an alternating series of Erdős, assuming the Hardy–Littlewood prime tuples conjecture“. This paper concerns an old problem of Erdős concerning whether the alternating series {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} converges, where {p_n} denotes the {n^{th}} prime. The main result of this paper is that the answer to this question is affirmative assuming a sufficiently strong version of the Hardy–Littlewood prime tuples conjecture.

The alternating series test does not apply here because the ratios {\frac{n}{p_n}} are not monotonically decreasing. The deviations of monotonicity arise from fluctuations in the prime gaps {p_{n+1}-p_n}, so the enemy arises from biases in the prime gaps for odd and even {n}. By changing variables from {n} to {p_n} (or more precisely, to integers in the range between {p_n} and {p_{n+1}}), this is basically equivalent to biases in the parity {(-1)^{\pi(n)}} of the prime counting function. Indeed, it is an unpublished observation of Said that the convergence of {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} is equivalent to the convergence of {\sum_{n=10}^\infty \frac{(-1)^{\pi(n)}}{n \log n}}. So this question is really about trying to get a sufficiently strong amount of equidistribution for the parity of {\pi(n)}.

The prime tuples conjecture does not directly say much about the value of {\pi(n)}; however, it can be used to control differences {\pi(n+\lambda \log x) - \pi(n)} for {n \sim x} and {\lambda>0} not too large. Indeed, it is a famous calculation of Gallagher that for fixed {\lambda}, and {n} chosen randomly from {1} to {x}, the quantity {\pi(n+\lambda \log x) - \pi(n)} is distributed according to the Poisson distribution of mean {\lambda} asymptotically if the prime tuples conjecture holds. In particular, the parity {(-1)^{\pi(n+\lambda \log x)-\pi(n)}} of this quantity should have mean asymptotic to {e^{-2\lambda}}. An application of the van der Corput {A}-process then gives some decay on the mean of {(-1)^{\pi(n)}} as well. Unfortunately, this decay is a bit too weak for this problem; even if one uses the most quantitative version of Gallagher’s calculation, worked out in a recent paper of (Vivian) Kuperberg, the best bound on the mean {|\frac{1}{x} \sum_{n \leq x} (-1)^{\pi(n)}|} is something like {1/(\log\log x)^{-1/4+o(1)}}, which is not quite strong enough to overcome the doubly logarithmic divergence of {\sum_{n=1}^\infty \frac{1}{n \log n}}.

To get around this obstacle, we take advantage of the random sifted model {{\mathcal S}_z} of the primes that was introduced in a paper of Banks, Ford, and myself. To model the primes in an interval such as {[n, n+\lambda \log x]} with {n} drawn randomly from say {[x,2x]}, we remove one random residue class {a_p \hbox{ mod } p} from this interval for all primes {p} up to Pólya’s “magic cutoff” {z \approx x^{1/e^\gamma}}. The prime tuples conjecture can then be intepreted as the assertion that the random set {{\mathcal S}_z} produced by this sieving process is statistically a good model for the primes in {[n, n+\lambda \log x]}. After some standard manipulations (using a version of the Bonferroni inequalities, as well as some upper bounds of Kuperberg), the problem then boils down to getting sufficiently strong estimates for the expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}} of the random sifted set {{\mathcal S}_z}.

For this problem, the main advantage of working with the random sifted model, rather than with the primes or the singular series arising from the prime tuples conjecture, is that the sifted model can be studied iteratively from the partially sifted sets {{\mathcal S}_w} arising from sifting primes {p} up to some intermediate threshold {w<z}, and that the expected parity of the {{\mathcal S}_w} experiences some decay in {w}. Indeed, once {w} exceeds the length {\lambda \log x} of the interval {[n,n+\lambda \log x]}, sifting {{\mathcal S}_w} by an additional prime {p} will cause {{\mathcal S}_w} to lose one element with probability {|{\mathcal S}_w|/p}, and remain unchanged with probability {1 - |{\mathcal S}_w|/p}. If {|{\mathcal S}_w|} concentrates around some value {\overline{S}_w}, this suggests that the expected parity {{\bf E} (-1)^{|{\mathcal S}_w|}} will decay by a factor of about {|1 - 2 \overline{S}_w/p|} as one increases {w} to {p}, and iterating this should give good bounds on the final expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}}. It turns out that existing second moment calculations of Montgomery and Soundararajan suffice to obtain enough concentration to make this strategy work.

Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “An upper bound on the mean value of the Erdős-Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Erdos-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \Delta(n;u)

where

\displaystyle  \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.

The function {\Delta} measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, {e}-dyadic) interval {(e^u, e^{u+1}]}. From the pigeonhole principle, we have the bounds

\displaystyle  \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),

where {\tau(n) := \# \{ d: d|n\}} is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if {n} is drawn at random from {1} to {x}, then the mean value of {\tau(n)} is roughly {\log x}, the median is roughly {\log^{\log 2} x}, and (by the Erdős-Kac theorem) {\tau(n)} asymptotically has a log-normal distribution. In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Erdős-Hooley delta function is significantly less well understood, even conjecturally. Again drawing {n} at random from {1} to {x} for large {x}, the median is known to be somewhere between {(\log\log x)^{0.3533\dots}} and {(\log\log x)^{0.6102\dots}} for large {x} – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and La Bretèche and Tenenbaum (for the upper bound). And the mean {\frac{1}{x} \sum_{n \leq x} \Delta(n)} was even less well controlled; the best previous bounds were

\displaystyle  \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )

for any {c > \sqrt{2} \log 2}, with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of La Bretèche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no non-empty subset of {A} sums to a perfect square? Erdos observed that one can obtain sets of size {\gg N^{1/3}} (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of {\ll N^{1/3} (\log N)^{O(1)}}. With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to {\ll N^{1/3} (\log\log N)^{O(1)})}.

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number {n} factors as {n = d^2 m} with {m} squarefree, then {\Delta(n) \leq \tau(d^2) \Delta(m)}). Next, because a square-free number {n>1} can be uniquely factored as {n = pm} where {p} is a prime and {m} lies in the finite set {{\mathcal S}_{<p}} of squarefree numbers whose prime factors are less than {p}, and {\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}, it is not difficult to establish the bound

\displaystyle  \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \frac{\Delta(n)}{n}.

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)

We actually prove a slightly more refined distributional estimate: for any {A \geq 2}, we have a bound

\displaystyle  \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)

outside of an exceptional set {E} which is small in the sense that

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent {3/4} to {3/4+2 = 11/4}).

To get some intuition on the size of {\Delta(n)}, we observe that if {y > 0} and {n_{<y}} is the factor of {n} coming from the prime factors less than {y}, then

\displaystyle  \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)

On the other hand, standard estimates let one establish that

\displaystyle  \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)

for all {y}, and all {n} outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless {\log y} is close to {A^{\log 4}}, which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum {\Delta(n) = \sup_u \Delta(n;u)} by the moments

\displaystyle  M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du

for natural numbers {q}; it is not difficult to establish the bound

\displaystyle  \Delta(n) \ll M_q(n)^{1/q}

and one expects this bound to become essentially sharp once {q \sim \log\log x}. We will be able to show a moment bound

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A

for any {q \geq 2} for some exceptional set {E_q} obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on {q}); this will imply the distributional bound (2) from a standard Markov inequality argument (setting {q \sim \log\log x}).

The strategy is then to obtain a good recursive inequality for (averages of) {M_q(n)}. As in the reduction to (1), we factor {n=pm} where {p} is a prime and {m \in {\mathcal S}_{<p}}. One observes the identity

\displaystyle  \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)

for any {u}; taking moments, one obtains the identity

\displaystyle  M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

As in previous literature, one can try to average in {p} here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in {a,b} to reduce to the case of relatively small values of {b}:

\displaystyle  M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

One can extract out the {b=0} term as

\displaystyle  M_q(n) \leq 2 M_q(m)

\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

It is convenient to eliminate the factor of {2} by dividing out by the divisor function:

\displaystyle  \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}

\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

This inequality is suitable for iterating and also averaging in {p} and {m}. After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)

in terms of sums such as

\displaystyle  \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}

(assuming a certain monotonicity property of the exceptional set {E_q} that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on {M_b} (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}

for various {a < q}. This allows one to estimate these expressions efficiently by induction.

Tamar Ziegler and I have just uploaded to the arXiv our paper “Infinite partial sumsets in the primes“. This is a short paper inspired by a recent result of Kra, Moreira, Richter, and Robertson (discussed for instance in this Quanta article from last December) showing that for any set {A} of natural numbers of positive upper density, there exists a sequence {b_1 < b_2 < b_3 < \dots} of natural numbers and a shift {t} such that {b_i + b_j + t \in A} for all {i<j} this answers a question of Erdős). In view of the “transference principle“, it is then plausible to ask whether the same result holds if {A} is replaced by the primes. We can show the following results:

Theorem 1
  • (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence {b_1 < b_2 < b_3 < \dots} of primes such that {b_i + b_j + 1} is prime for all {i < j}.
  • (ii) Unconditionally, there exist increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i + b_j} is prime for all {i<j}.
  • (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).

We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but finite sequences {b_1 < \dots < b_k} of primes such that {b_i + b_j + 1} is prime for all {i < j \leq k}. (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i+b_j} is prime for all {i,j}.

The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given {k}, there exist infinitely many {k}-tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.

Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence {b_1 < b_2 < \dots} recursively by repeated application of the prime tuples conjecture.

The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples {(h_1,\dots,h_k)} for which there are infinitely many {n} with {n+h_1,\dots,n+h_k} all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence {h_1 < h_2 < \dots} such that every initial segment {(h_1,\dots,h_k)} is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:

Lemma 2 (Bergelson intersectivity lemma) Let {E_1,E_2,\dots} be subsets of a probability space {(X,\mu)} of measure uniformly bounded away from zero, thus {\inf_i \mu(E_i) > 0}. Then there exists a subsequence {E_{i_1}, E_{i_2}, \dots} such that

\displaystyle  \mu(E_{i_1} \cap \dots \cap E_{i_k} ) > 0

for all {k}.

This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from {X}, one can assume that all finite intersections {E_{i_1} \cap \dots \cap E_{i_k}} are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function {\limsup_N \frac{1}{N} \sum_{i=1}^N 1_{E_i}} has a positive integral, hence must be positive at some point {x_0}. Thus there is a subsequence {E_{i_1}, E_{i_2}, \dots} whose finite intersections all contain {x_0}, thus have positive measure as desired by the previous reduction.

It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events {E_i} that show up (which roughly correspond to the event that {n + h_i} is prime for some random number {n} (with a well-chosen probability distribution) and some shift {h_i}) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts {h_i} into various clusters {h_{i,1},\dots,h_{i,J_1}}, chosen in such a way that the probability that at least one of {n+h_{i,1},\dots,n+h_{i,J_1}} is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.

Joni Teräväinen and I have just uploaded to the arXiv our preprint “The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero“. This paper is a development of the theme that certain conjectures in analytic number theory become easier if one makes the hypothesis that Siegel zeroes exist; this places one in a presumably “illusory” universe, since the widely believed Generalised Riemann Hypothesis (GRH) precludes the existence of such zeroes, yet this illusory universe seems remarkably self-consistent and notoriously impossible to eliminate from one’s analysis.

For the purposes of this paper, a Siegel zero is a zero {\beta} of a Dirichlet {L}-function {L(\cdot,\chi)} corresponding to a primitive quadratic character {\chi} of some conductor {q_\chi}, which is close to {1} in the sense that

\displaystyle  \beta = 1 - \frac{1}{\eta \log q_\chi}

for some large {\eta \gg 1} (which we will call the quality) of the Siegel zero. The significance of these zeroes is that they force the Möbius function {\mu} and the Liouville function {\lambda} to “pretend” to be like the exceptional character {\chi} for primes of magnitude comparable to {q_\chi}. Indeed, if we define an exceptional prime to be a prime {p^*} in which {\chi(p^*) \neq -1}, then very few primes near {q_\chi} will be exceptional; in our paper we use some elementary arguments to establish the bounds

\displaystyle  \sum_{q_\chi^{1/2+\varepsilon} < p^* \leq x} \frac{1}{p^*} \ll_\varepsilon \frac{\log x}{\eta \log q_\chi} \ \ \ \ \ (1)

for any {x \geq q_\chi^{1/2+\varepsilon}} and {\varepsilon>0}, where the sum is over exceptional primes in the indicated range {q_\chi^{1/2+\varepsilon} < p^* \leq x}; this bound is non-trivial for {x} as large as {q_\chi^{\eta^{1-\varepsilon}}}. (See Section 1 of this blog post for some variants of this argument, which were inspired by work of Heath-Brown.) There is also a companion bound (somewhat weaker) that covers a range of {p^*} a little bit below {q_\chi^{1/2}}.

One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:

Theorem 1 (Hardy-Littlewood assuming Siegel zero) Let {h} be a fixed natural number. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then we have

\displaystyle  \sum_{n \leq x} \Lambda(n) \Lambda(n+h) = ({\mathfrak S} + O( \frac{1}{\log\log \eta} )) x

for all {q_\chi^{250} \leq x \leq q_\chi^{300}}, where {\Lambda} is the von Mangoldt function and {{\mathfrak S}} is the singular series

\displaystyle  {\mathfrak S} = \prod_{p|h} \frac{p}{p-1} \prod_{p \nmid h} (1 - \frac{1}{(p-1)^2})

In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.

Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):

Theorem 2 (Chowla assuming Siegel zero) Let {h_1,\dots,h_\ell} be distinct fixed natural numbers. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then one has

\displaystyle  \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_\ell) \ll \frac{x}{(\log\log \eta)^{1/2} (\log \eta)^{1/12}}

in the range {q_\chi^{10} \leq x \leq q_\chi^{\log\log \eta / 3}}, where {\lambda} is the Liouville function.

In our paper we unify these results and also improve the quantitative estimates and range of {x}:

Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Let {h_1,\dots,h_k,h'_1,\dots,h'_\ell} be distinct fixed natural numbers with {k \leq 2}. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then one has

\displaystyle  \sum_{n \leq x} \Lambda(n+h_1) \dots \Lambda(n+h_k) \lambda(n+h'_1) \dots \lambda(n+h'_\ell)

\displaystyle = ({\mathfrak S} + O_\varepsilon( \frac{1}{\log^{1/10\max(1,k)} \eta} )) x

for

\displaystyle  q_\chi^{10k+\frac{1}{2}+\varepsilon} \leq x \leq q_\chi^{\eta^{1/2}}

for any fixed {\varepsilon>0}.

Our argument proceeds by a series of steps in which we replace {\Lambda} and {\lambda} by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:

  • (i) Replace the Liouville function {\lambda} with an approximant {\lambda_{\mathrm{Siegel}}}, which is a completely multiplicative function that agrees with {\lambda} at small primes and agrees with {\chi} at large primes.
  • (ii) Replace the von Mangoldt function {\Lambda} with an approximant {\Lambda_{\mathrm{Siegel}}}, which is the Dirichlet convolution {\chi * \log} multiplied by a Selberg sieve weight {\nu} to essentially restrict that convolution to almost primes.
  • (iii) Replace {\lambda_{\mathrm{Siegel}}} with a more complicated truncation {\lambda_{\mathrm{Siegel}}^\sharp} which has the structure of a “Type I sum”, and which agrees with {\lambda_{\mathrm{Siegel}}} on numbers that have a “typical” factorization.
  • (iv) Replace the approximant {\Lambda_{\mathrm{Siegel}}} with a more complicated approximant {\Lambda_{\mathrm{Siegel}}^\sharp} which has the structure of a “Type I sum”.
  • (v) Now that all terms in the correlation have been replaced with tractable Type I sums, use standard Euler product calculations and Fourier analysis, similar in spirit to the proof of the pseudorandomness of the Selberg sieve majorant for the primes in this paper of Ben Green and myself, to evaluate the correlation to high accuracy.

Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.

The restriction {k \leq 2} in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant {\Lambda_{\mathrm{Siegel}}} to {\Lambda} is a twisted, sieved version of the divisor function {\tau}, and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum

\displaystyle  \sum_{n \leq x} \tau(n+h_1) \dots \tau(n+h_k).

For {k=1} this sum can be easily controlled by the Dirichlet hyperbola method. For {k=2} one needs the fact that {\tau} has a level of distribution greater than {1/2}; in fact Kloosterman sum bounds give a level of distribution of {2/3}, a folklore fact that seems to have first been observed by Linnik and Selberg. We use a (notationally more complicated) version of this argument to treat the sums arising in (iv) for {k \leq 2}. Unfortunately for {k > 2} there are no known techniques to unconditionally obtain asymptotics, even for the model sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1) \tau(n+2),

although we do at least have fairly convincing conjectures as to what the asymptotics should be. Because of this, it seems unlikely that one will be able to relax the {k \leq 2} hypothesis in our main theorem at our current level of understanding of analytic number theory.

Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.

Joni Teräväinen and myself have just uploaded to the arXiv our preprint “Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions“. This paper makes quantitative the Gowers uniformity estimates on the Möbius function {\mu} and the von Mangoldt function {\Lambda}.

To discuss the results we first discuss the situation of the Möbius function, which is technically simpler in some (though not all) ways. We assume familiarity with Gowers norms and standard notations around these norms, such as the averaging notation {\mathop{\bf E}_{n \in [N]}} and the exponential notation {e(\theta) = e^{2\pi i \theta}}. The prime number theorem in qualitative form asserts that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) = o(1)

as {N \rightarrow \infty}. With Vinogradov-Korobov error term, the prime number theorem is strengthened to

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} );

we refer to such decay bounds (With {\exp(-c\log^c N)} type factors) as pseudopolynomial decay. Equivalently, we obtain pseudopolynomial decay of Gowers {U^1} seminorm of {\mu}:

\displaystyle  \| \mu \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} ).

As is well known, the Riemann hypothesis would be equivalent to an upgrade of this estimate to polynomial decay of the form

\displaystyle  \| \mu \|_{U^1([N])} \ll_\varepsilon N^{-1/2+\varepsilon}

for any {\varepsilon>0}.

Once one restricts to arithmetic progressions, the situation gets worse: the Siegel-Walfisz theorem gives the bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll_A \log^{-A} N \ \ \ \ \ (1)

for any residue class {a \hbox{ mod } q} and any {A>0}, but with the catch that the implied constant is ineffective in {A}. This ineffectivity cannot be removed without further progress on the notorious Siegel zero problem.

In 1937, Davenport was able to show the discorrelation estimate

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) e(-\alpha n) \ll_A \log^{-A} N

for any {A>0} uniformly in {\alpha \in {\bf R}}, which leads (by standard Fourier arguments) to the Fourier uniformity estimate

\displaystyle  \| \mu \|_{U^2([N])} \ll_A \log^{-A} N.

Again, the implied constant is ineffective. If one insists on effective constants, the best bound currently available is

\displaystyle  \| \mu \|_{U^2([N])} \ll \log^{-c} N \ \ \ \ \ (2)

for some small effective constant {c>0}.

For the situation with the {U^3} norm the previously known results were much weaker. Ben Green and I showed that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll_{A,F,G/\Gamma} \log^{-A} N \ \ \ \ \ (3)

uniformly for any {A>0}, any degree two (filtered) nilmanifold {G/\Gamma}, any polynomial sequence {g: {\bf Z} \rightarrow G}, and any Lipschitz function {F}; again, the implied constants are ineffective. On the other hand, in a separate paper of Ben Green and myself, we established the following inverse theorem: if for instance we knew that

\displaystyle  \| \mu \|_{U^3([N])} \geq \delta

for some {0 < \delta < 1/2}, then there exists a degree two nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \delta^{-O(1)} )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\delta^{-O(1)})} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp(-\delta^{-O(1)}).

Putting the two assertions together and comparing all the dependencies on parameters, one can establish the qualitative decay bound

\displaystyle  \| \mu \|_{U^3([N])} = o(1).

However the decay rate {o(1)} produced by this argument is completely ineffective: obtaining a bound on when this {o(1)} quantity dips below a given threshold {\delta} depends on the implied constant in (3) for some {G/\Gamma} whose dimension depends on {\delta}, and the dependence on {\delta} obtained in this fashion is ineffective in the face of a Siegel zero.

For higher norms {U^k, k \geq 3}, the situation is even worse, because the quantitative inverse theory for these norms is poorer, and indeed it was only with the recent work of Manners that any such bound is available at all (at least for {k>4}). Basically, Manners establishes if

\displaystyle  \| \mu \|_{U^k([N])} \geq \delta

then there exists a degree {k-1} nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \exp\exp(\delta^{-O(1)}) )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\exp\exp(\delta^{-O(1)}))} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp\exp(-\delta^{-O(1)}).

(We allow all implied constants to depend on {k}.) Meanwhile, the bound (3) was extended to arbitrary nilmanifolds by Ben and myself. Again, the two results when concatenated give the qualitative decay

\displaystyle  \| \mu \|_{U^k([N])} = o(1)

but the decay rate is completely ineffective.

Our first result gives an effective decay bound:

Theorem 1 For any {k \geq 2}, we have {\| \mu \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

This is off by a logarithm from the best effective bound (2) in the {k=2} case. In the {k=3} case there is some hope to remove this logarithm based on the improved quantitative inverse theory currently available in this case, but there is a technical obstruction to doing so which we will discuss later in this post. For {k>3} the above bound is the best one could hope to achieve purely using the quantitative inverse theory of Manners.

We have analogues of all the above results for the von Mangoldt function {\Lambda}. Here a complication arises that {\Lambda} does not have mean close to zero, and one has to subtract off some suitable approximant {\Lambda^\sharp} to {\Lambda} before one would expect good Gowers norms bounds. For the prime number theorem one can just use the approximant {1}, giving

\displaystyle  \| \Lambda - 1 \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} )

but even for the prime number theorem in arithmetic progressions one needs a more accurate approximant. In our paper it is convenient to use the “Cramér approximant”

\displaystyle  \Lambda_{\hbox{Cram\'er}}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}

where

\displaystyle  W := \prod_{p<Q} p

and {Q} is the quasipolynomial quantity

\displaystyle  Q = \exp(\log^{1/10} N). \ \ \ \ \ (4)

Then one can show from the Siegel-Walfisz theorem and standard bilinear sum methods that

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Cram\'er}}(n)) e(-\alpha n) \ll_A \log^{-A} N

and

\displaystyle  \| \Lambda - \Lambda_{\hbox{Cram\'er}}\|_{U^2([N])} \ll_A \log^{-A} N

for all {A>0} and {\alpha \in {\bf R}} (with an ineffective dependence on {A}), again regaining effectivity if {A} is replaced by a sufficiently small constant {c>0}. All the previously stated discorrelation and Gowers uniformity results for {\mu} then have analogues for {\Lambda}, and our main result is similarly analogous:

Theorem 2 For any {k \geq 2}, we have {\| \Lambda - \Lambda_{\hbox{Cram\'er}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

By standard methods, this result also gives quantitative asymptotics for counting solutions to various systems of linear equations in primes, with error terms that gain a factor of {O((\log\log N)^{-c})} with respect to the main term.

We now discuss the methods of proof, focusing first on the case of the Möbius function. Suppose first that there is no “Siegel zero”, by which we mean a quadratic character {\chi} of some conductor {q \leq Q} with a zero {L(\beta,\chi)} with {1 - \beta \leq \frac{c}{\log Q}} for some small absolute constant {c>0}. In this case the Siegel-Walfisz bound (1) improves to a quasipolynomial bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N). \ \ \ \ \ (5)

To establish Theorem 1 in this case, it suffices by Manners’ inverse theorem to establish the polylogarithmic bound

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (6)

for all degree {k-1} nilmanifolds {G/\Gamma} of dimension {O((\log\log N)^c)} and complexity {O( \exp(\log^c N))}, all polynomial sequences {g}, and all Lipschitz functions {F} of norm {O( \exp(\log^c N))}. If the nilmanifold {G/\Gamma} had bounded dimension, then one could repeat the arguments of Ben and myself more or less verbatim to establish this claim from (5), which relied on the quantitative equidistribution theory on nilmanifolds developed in a separate paper of Ben and myself. Unfortunately, in the latter paper the dependence of the quantitative bounds on the dimension {d} was not explicitly given. In an appendix to the current paper, we go through that paper to account for this dependence, showing that all exponents depend at most doubly exponentially in the dimension {d}, which is barely sufficient to handle the dimension of {O((\log\log N)^c)} that arises here.

Now suppose we have a Siegel zero {L(\beta,\chi)}. In this case the bound (5) will not hold in general, and hence also (6) will not hold either. Here, the usual way out (while still maintaining effective estimates) is to approximate {\mu} not by {0}, but rather by a more complicated approximant {\mu_{\hbox{Siegel}}} that takes the Siegel zero into account, and in particular is such that one has the (effective) pseudopolynomial bound

\displaystyle  \| (\mu - \mu_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N) \ \ \ \ \ (7)

for all residue classes {a \hbox{ mod } q}. The Siegel approximant to {\mu} is actually a little bit complicated, and to our knowledge the first appearance of this sort of approximant only appears as late as this 2010 paper of Germán and Katai. Our version of this approximant is defined as the multiplicative function such that

\displaystyle \mu_{\hbox{Siegel}}(p^j) = \mu(p^j)

when {p < Q}, and

\displaystyle  \mu_{\hbox{Siegel}}(n) = \alpha n^{\beta-1} \chi(n)

when {n} is coprime to all primes {p<Q}, and {\alpha} is a normalising constant given by the formula

\displaystyle  \alpha := \frac{1}{L'(\beta,\chi)} \prod_{p<Q} (1-\frac{1}{p})^{-1} (1 - \frac{\chi(p)}{p^\beta})^{-1}

(this constant ends up being of size {O(1)} and plays only a minor role in the analysis). This is a rather complicated formula, but it seems to be virtually the only choice of approximant that allows for bounds such as (7) to hold. (This is the one aspect of the problem where the von Mangoldt theory is simpler than the Möbius theory, as in the former one only needs to work with very rough numbers for which one does not need to make any special accommodations for the behavior at small primes when introducing the Siegel correction term.) With this starting point it is then possible to repeat the analysis of my previous papers with Ben and obtain the pseudopolynomial discorrelation bound

\displaystyle  \mathop{\bf E}_{n \in [N]} (\mu - \mu_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N)

for {F(g(n)\Gamma)} as before, which when combined with Manners’ inverse theorem gives the doubly logarithmic bound

\displaystyle \| \mu - \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

Meanwhile, a direct sieve-theoretic computation ends up giving the singly logarithmic bound

\displaystyle \| \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N

(indeed, there is a good chance that one could improve the bounds even further, though it is not helpful for this current argument to do so). Theorem 1 then follows from the triangle inequality for the Gowers norm. It is interesting that the Siegel approximant {\mu_{\hbox{Siegel}}} seems to play a rather essential component in the proof, even if it is absent in the final statement. We note that this approximant seems to be a useful tool to explore the “illusory world” of the Siegel zero further; see for instance the recent paper of Chinis for some work in this direction.

For the analogous problem with the von Mangoldt function (assuming a Siegel zero for sake of discussion), the approximant {\Lambda_{\hbox{Siegel}}} is simpler; we ended up using

\displaystyle \Lambda_{\hbox{Siegel}}(n) = \Lambda_{\hbox{Cram\'er}}(n) (1 - n^{\beta-1} \chi(n))

which allows one to state the standard prime number theorem in arithmetic progressions with classical error term and Siegel zero term compactly as

\displaystyle  \| (\Lambda - \Lambda_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N).

Routine modifications of previous arguments also give

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (8)

and

\displaystyle \| \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N.

The one tricky new step is getting from the discorrelation estimate (8) to the Gowers uniformity estimate

\displaystyle \| \Lambda - \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

One cannot directly apply Manners’ inverse theorem here because {\Lambda} and {\Lambda_{\hbox{Siegel}}} are unbounded. There is a standard tool for getting around this issue, now known as the dense model theorem, which is the standard engine powering the transference principle from theorems about bounded functions to theorems about certain types of unbounded functions. However the quantitative versions of the dense model theorem in the literature are expensive and would basically weaken the doubly logarithmic gain here to a triply logarithmic one. Instead, we bypass the dense model theorem and directly transfer the inverse theorem for bounded functions to an inverse theorem for unbounded functions by using the densification approach to transference introduced by Conlon, Fox, and Zhao. This technique turns out to be quantitatively quite efficient (the dependencies of the main parameters in the transference are polynomial in nature), and also has the technical advantage of avoiding the somewhat tricky “correlation condition” present in early transference results which are also not beneficial for quantitative bounds.

In principle, the above results can be improved for {k=3} due to the stronger quantitative inverse theorems in the {U^3} setting. However, there is a bottleneck that prevents us from achieving this, namely that the equidistribution theory of two-step nilmanifolds has exponents which are exponential in the dimension rather than polynomial in the dimension, and as a consequence we were unable to improve upon the doubly logarithmic results. Specifically, if one is given a sequence of bracket quadratics such as {\lfloor \alpha_1 n \rfloor \beta_1 n, \dots, \lfloor \alpha_d n \rfloor \beta_d n} that fails to be {\delta}-equidistributed, one would need to establish a nontrivial linear relationship modulo 1 between the {\alpha_1,\beta_1,\dots,\alpha_d,\beta_d} (up to errors of {O(1/N)}), where the coefficients are of size {O(\delta^{-d^{O(1)}})}; current methods only give coefficient bounds of the form {O(\delta^{-\exp(d^{O(1)})})}. An old result of Schmidt demonstrates proof of concept that these sorts of polynomial dependencies on exponents is possible in principle, but actually implementing Schmidt’s methods here seems to be a quite non-trivial task. There is also another possible route to removing a logarithm, which is to strengthen the inverse {U^3} theorem to make the dimension of the nilmanifold logarithmic in the uniformity parameter {\delta} rather than polynomial. Again, the Freiman-Bilu theorem (see for instance this paper of Ben and myself) demonstrates proof of concept that such an improvement in dimension is possible, but some work would be needed to implement it.

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than {1} appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer {t \geq 2}, there are at most {O(1)} solutions to the equation

\displaystyle  \binom{n}{m} = t \ \ \ \ \ (1)

with {1 \leq m < n}. Currently, the largest number of solutions that is known to be attainable is eight, with {t} equal to

\displaystyle  3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}

\displaystyle = \binom{78}{76} = \binom{3003}{3002}.

Because of the symmetry {\binom{n}{m} = \binom{n}{n-m}} of Pascal’s triangle it is natural to restrict attention to the left half {1 \leq m \leq n/2} of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (1) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)

and hence at most four in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).

Also, there is at most one solution in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

\displaystyle  2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)

(or equivalently {n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}); we have deleted the {m=1} case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for {t} sufficiently large there are no further collisions {\binom{n}{m} = \binom{n'}{m'}=t} for {(n,m), (n',m')} in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large {t}. This is latter claim known for bounded values of {m,m'} by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

\displaystyle  \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)

coming from {n = F_{2j+2} F_{2j+3}-1}, {m = F_{2j} F_{2j+3}-1} and {F_j} is the {j^{th}} Fibonacci number.

The appearance of the quantity {\exp( \log^{2/3+\varepsilon} n )} in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view {n,m} primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

\displaystyle  \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.

One can use this equation to solve for {n} in terms of {m,t} as

\displaystyle  n = f_t(m)

for a certain real analytic function {f_t} whose asymptotics are easily computable (for instance one has the asymptotic {f_t(m) \asymp m t^{1/m}}). One can then view the problem as one of trying to control the number of lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}. Here we can take advantage of the fact that in the regime {m \leq f_t(m)/2} (which corresponds to working in the left half {m \leq n/2} of Pascal’s triangle), the function {f_t} can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of {f_t} (in fact one can show that {f''_t(m) \asymp f_t(m) (\log t/m^2)^2}). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}, basically because the area subtended by the triangle connecting three of these points would lie between {0} and {1/2}, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let {\varepsilon>0}, and suppose {t} is sufficiently large depending on {\varepsilon}. If {(m,n)} is a solution to (1) in the left half {m \leq n/2} of Pascal’s triangle, then there is at most one other solution {(m',n')} to this equation in the left half with

\displaystyle  |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions {(m,n), (m',n')} to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the {p}-adic valuations {\nu_p( \binom{n}{m} )} of the binomial coefficients, defined as the number of times a prime {p} divides {\binom{n}{m}}. From the fundamental theorem of arithmetic, a collision

\displaystyle  \binom{n}{m} = \binom{n'}{m'}

between binomial coefficients occurs if and only if one has agreement of valuations

\displaystyle  \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)

From the Legendre formula

\displaystyle  \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor

we can rewrite this latter identity (5) as

\displaystyle  \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)

where {\{x\} := x - \lfloor x\rfloor} denotes the fractional part of {x}. (These sums are not truly infinite, because the summands vanish once {p^j} is larger than {\max(n,n')}.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing {p} as a prime drawn randomly from an interval such as {[P, P + P \log^{-100} P]} for some suitably chosen scale parameter {P}, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if {n} and {n'} are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part {\{ \frac{n}{p}\}}, since this term shows up in the summation on the left-hand side but not the right. Similarly if {m} and {m'} are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if {n'-m'} is a rational multiple of {m} where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

\displaystyle  \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )

for various choices of parameters {P, N, M, j}, where {e(\theta) := e^{2\pi i \theta}}. Fortunately, the methods of Vinogradov (which more generally can handle sums such as {\sum_{n \in I} e(f(n))} and {\sum_{p \in I} e(f(p))} for various analytic functions {f}) can give useful bounds on such sums as long as {N} and {M} are not too large compared to {P}; more specifically, Vinogradov’s estimates are non-trivial in the regime {N,M \ll \exp( \log^{3/2-\varepsilon} P )}, and this ultimately leads to a distance bound

\displaystyle  m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

between any colliding pair {(n,m), (n',m')} in the left half of Pascal’s triangle, as well as the variant bound

\displaystyle  n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

under the additional assumption

\displaystyle  m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).

Comparing these bounds with Proposition 2 and using some basic estimates about the function {f_t}, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

\displaystyle  (n)_m = t \ \ \ \ \ (7)

where {(n)_m := n (n-1) \dots (n-m+1)} is the falling factorial:

Theorem 3 If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (7) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)

Again the upper bound of two is best possible, thanks to identities such as

\displaystyle  (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.

Archives