You are currently browsing the category archive for the ‘math.NT’ category.

Tamar Ziegler and I have just uploaded to the arXiv our paper “Infinite partial sumsets in the primes“. This is a short paper inspired by a recent result of Kra, Moreira, Richter, and Robertson (discussed for instance in this Quanta article from last December) showing that for any set {A} of natural numbers of positive upper density, there exists a sequence {b_1 < b_2 < b_3 < \dots} of natural numbers and a shift {t} such that {b_i + b_j + t \in A} for all {i<j} this answers a question of Erdős). In view of the “transference principle“, it is then plausible to ask whether the same result holds if {A} is replaced by the primes. We can show the following results:

Theorem 1
  • (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence {b_1 < b_2 < b_3 < \dots} of primes such that {b_i + b_j + 1} is prime for all {i < j}.
  • (ii) Unconditionally, there exist increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i + b_j} is prime for all {i<j}.
  • (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).

We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but finite sequences {b_1 < \dots < b_k} of primes such that {b_i + b_j + 1} is prime for all {i < j \leq k}. (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i+b_j} is prime for all {i,j}.

The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given {k}, there exist infinitely many {k}-tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.

Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence {b_1 < b_2 < \dots} recursively by repeated application of the prime tuples conjecture.

The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples {(h_1,\dots,h_k)} for which there are infinitely many {n} with {n+h_1,\dots,n+h_k} all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence {h_1 < h_2 < \dots} such that every initial segment {(h_1,\dots,h_k)} is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:

Lemma 2 (Bergelson intersectivity lemma) Let {E_1,E_2,\dots} be subsets of a probability space {(X,\mu)} of measure uniformly bounded away from zero, thus {\inf_i \mu(E_i) > 0}. Then there exists a subsequence {E_{i_1}, E_{i_2}, \dots} such that

\displaystyle  \mu(E_{i_1} \cap \dots \cap E_{i_k} ) > 0

for all {k}.

This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from {X}, one can assume that all finite intersections {E_{i_1} \cap \dots \cap E_{i_k}} are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function {\limsup_N \frac{1}{N} \sum_{i=1}^N 1_{E_i}} has a positive integral, hence must be positive at some point {x_0}. Thus there is a subsequence {E_{i_1}, E_{i_2}, \dots} whose finite intersections all contain {x_0}, thus have positive measure as desired by the previous reduction.

It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events {E_i} that show up (which roughly correspond to the event that {n + h_i} is prime for some random number {n} (with a well-chosen probability distribution) and some shift {h_i}) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts {h_i} into various clusters {h_{i,1},\dots,h_{i,J_1}}, chosen in such a way that the probability that at least one of {n+h_{i,1},\dots,n+h_{i,J_1}} is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.

Joni Teräväinen and I have just uploaded to the arXiv our preprint “The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero“. This paper is a development of the theme that certain conjectures in analytic number theory become easier if one makes the hypothesis that Siegel zeroes exist; this places one in a presumably “illusory” universe, since the widely believed Generalised Riemann Hypothesis (GRH) precludes the existence of such zeroes, yet this illusory universe seems remarkably self-consistent and notoriously impossible to eliminate from one’s analysis.

For the purposes of this paper, a Siegel zero is a zero {\beta} of a Dirichlet {L}-function {L(\cdot,\chi)} corresponding to a primitive quadratic character {\chi} of some conductor {q_\chi}, which is close to {1} in the sense that

\displaystyle  \beta = 1 - \frac{1}{\eta \log q_\chi}

for some large {\eta \gg 1} (which we will call the quality) of the Siegel zero. The significance of these zeroes is that they force the Möbius function {\mu} and the Liouville function {\lambda} to “pretend” to be like the exceptional character {\chi} for primes of magnitude comparable to {q_\chi}. Indeed, if we define an exceptional prime to be a prime {p^*} in which {\chi(p^*) \neq -1}, then very few primes near {q_\chi} will be exceptional; in our paper we use some elementary arguments to establish the bounds

\displaystyle  \sum_{q_\chi^{1/2+\varepsilon} < p^* \leq x} \frac{1}{p^*} \ll_\varepsilon \frac{\log x}{\eta \log q_\chi} \ \ \ \ \ (1)

for any {x \geq q_\chi^{1/2+\varepsilon}} and {\varepsilon>0}, where the sum is over exceptional primes in the indicated range {q_\chi^{1/2+\varepsilon} < p^* \leq x}; this bound is non-trivial for {x} as large as {q_\chi^{\eta^{1-\varepsilon}}}. (See Section 1 of this blog post for some variants of this argument, which were inspired by work of Heath-Brown.) There is also a companion bound (somewhat weaker) that covers a range of {p^*} a little bit below {q_\chi^{1/2}}.

One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:

Theorem 1 (Hardy-Littlewood assuming Siegel zero) Let {h} be a fixed natural number. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then we have

\displaystyle  \sum_{n \leq x} \Lambda(n) \Lambda(n+h) = ({\mathfrak S} + O( \frac{1}{\log\log \eta} )) x

for all {q_\chi^{250} \leq x \leq q_\chi^{300}}, where {\Lambda} is the von Mangoldt function and {{\mathfrak S}} is the singular series

\displaystyle  {\mathfrak S} = \prod_{p|h} \frac{p}{p-1} \prod_{p \nmid h} (1 - \frac{1}{(p-1)^2})

In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.

Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):

Theorem 2 (Chowla assuming Siegel zero) Let {h_1,\dots,h_\ell} be distinct fixed natural numbers. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then one has

\displaystyle  \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_\ell) \ll \frac{x}{(\log\log \eta)^{1/2} (\log \eta)^{1/12}}

in the range {q_\chi^{10} \leq x \leq q_\chi^{\log\log \eta / 3}}, where {\lambda} is the Liouville function.

In our paper we unify these results and also improve the quantitative estimates and range of {x}:

Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Let {h_1,\dots,h_k,h'_1,\dots,h'_\ell} be distinct fixed natural numbers with {k \leq 2}. Suppose one has a Siegel zero {\beta} associated to some conductor {q_\chi}. Then one has

\displaystyle  \sum_{n \leq x} \Lambda(n+h_1) \dots \Lambda(n+h_k) \lambda(n+h'_1) \dots \lambda(n+h'_\ell)

\displaystyle = ({\mathfrak S} + O_\varepsilon( \frac{1}{\log^{1/10\max(1,k)} \eta} )) x

for

\displaystyle  q_\chi^{10k+\frac{1}{2}+\varepsilon} \leq x \leq q_\chi^{\eta^{1/2}}

for any fixed {\varepsilon>0}.

Our argument proceeds by a series of steps in which we replace {\Lambda} and {\lambda} by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:

  • (i) Replace the Liouville function {\lambda} with an approximant {\lambda_{\mathrm{Siegel}}}, which is a completely multiplicative function that agrees with {\lambda} at small primes and agrees with {\chi} at large primes.
  • (ii) Replace the von Mangoldt function {\Lambda} with an approximant {\Lambda_{\mathrm{Siegel}}}, which is the Dirichlet convolution {\chi * \log} multiplied by a Selberg sieve weight {\nu} to essentially restrict that convolution to almost primes.
  • (iii) Replace {\lambda_{\mathrm{Siegel}}} with a more complicated truncation {\lambda_{\mathrm{Siegel}}^\sharp} which has the structure of a “Type I sum”, and which agrees with {\lambda_{\mathrm{Siegel}}} on numbers that have a “typical” factorization.
  • (iv) Replace the approximant {\Lambda_{\mathrm{Siegel}}} with a more complicated approximant {\Lambda_{\mathrm{Siegel}}^\sharp} which has the structure of a “Type I sum”.
  • (v) Now that all terms in the correlation have been replaced with tractable Type I sums, use standard Euler product calculations and Fourier analysis, similar in spirit to the proof of the pseudorandomness of the Selberg sieve majorant for the primes in this paper of Ben Green and myself, to evaluate the correlation to high accuracy.

Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.

The restriction {k \leq 2} in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant {\Lambda_{\mathrm{Siegel}}} to {\Lambda} is a twisted, sieved version of the divisor function {\tau}, and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum

\displaystyle  \sum_{n \leq x} \tau(n+h_1) \dots \tau(n+h_k).

For {k=1} this sum can be easily controlled by the Dirichlet hyperbola method. For {k=2} one needs the fact that {\tau} has a level of distribution greater than {1/2}; in fact Kloosterman sum bounds give a level of distribution of {2/3}, a folklore fact that seems to have first been observed by Linnik and Selberg. We use a (notationally more complicated) version of this argument to treat the sums arising in (iv) for {k \leq 2}. Unfortunately for {k > 2} there are no known techniques to unconditionally obtain asymptotics, even for the model sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1) \tau(n+2),

although we do at least have fairly convincing conjectures as to what the asymptotics should be. Because of this, it seems unlikely that one will be able to relax the {k \leq 2} hypothesis in our main theorem at our current level of understanding of analytic number theory.

Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.

Joni Teräväinen and myself have just uploaded to the arXiv our preprint “Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions“. This paper makes quantitative the Gowers uniformity estimates on the Möbius function {\mu} and the von Mangoldt function {\Lambda}.

To discuss the results we first discuss the situation of the Möbius function, which is technically simpler in some (though not all) ways. We assume familiarity with Gowers norms and standard notations around these norms, such as the averaging notation {\mathop{\bf E}_{n \in [N]}} and the exponential notation {e(\theta) = e^{2\pi i \theta}}. The prime number theorem in qualitative form asserts that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) = o(1)

as {N \rightarrow \infty}. With Vinogradov-Korobov error term, the prime number theorem is strengthened to

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} );

we refer to such decay bounds (With {\exp(-c\log^c N)} type factors) as pseudopolynomial decay. Equivalently, we obtain pseudopolynomial decay of Gowers {U^1} seminorm of {\mu}:

\displaystyle  \| \mu \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} ).

As is well known, the Riemann hypothesis would be equivalent to an upgrade of this estimate to polynomial decay of the form

\displaystyle  \| \mu \|_{U^1([N])} \ll_\varepsilon N^{-1/2+\varepsilon}

for any {\varepsilon>0}.

Once one restricts to arithmetic progressions, the situation gets worse: the Siegel-Walfisz theorem gives the bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll_A \log^{-A} N \ \ \ \ \ (1)

for any residue class {a \hbox{ mod } q} and any {A>0}, but with the catch that the implied constant is ineffective in {A}. This ineffectivity cannot be removed without further progress on the notorious Siegel zero problem.

In 1937, Davenport was able to show the discorrelation estimate

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) e(-\alpha n) \ll_A \log^{-A} N

for any {A>0} uniformly in {\alpha \in {\bf R}}, which leads (by standard Fourier arguments) to the Fourier uniformity estimate

\displaystyle  \| \mu \|_{U^2([N])} \ll_A \log^{-A} N.

Again, the implied constant is ineffective. If one insists on effective constants, the best bound currently available is

\displaystyle  \| \mu \|_{U^2([N])} \ll \log^{-c} N \ \ \ \ \ (2)

for some small effective constant {c>0}.

For the situation with the {U^3} norm the previously known results were much weaker. Ben Green and I showed that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll_{A,F,G/\Gamma} \log^{-A} N \ \ \ \ \ (3)

uniformly for any {A>0}, any degree two (filtered) nilmanifold {G/\Gamma}, any polynomial sequence {g: {\bf Z} \rightarrow G}, and any Lipschitz function {F}; again, the implied constants are ineffective. On the other hand, in a separate paper of Ben Green and myself, we established the following inverse theorem: if for instance we knew that

\displaystyle  \| \mu \|_{U^3([N])} \geq \delta

for some {0 < \delta < 1/2}, then there exists a degree two nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \delta^{-O(1)} )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\delta^{-O(1)})} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp(-\delta^{-O(1)}).

Putting the two assertions together and comparing all the dependencies on parameters, one can establish the qualitative decay bound

\displaystyle  \| \mu \|_{U^3([N])} = o(1).

However the decay rate {o(1)} produced by this argument is completely ineffective: obtaining a bound on when this {o(1)} quantity dips below a given threshold {\delta} depends on the implied constant in (3) for some {G/\Gamma} whose dimension depends on {\delta}, and the dependence on {\delta} obtained in this fashion is ineffective in the face of a Siegel zero.

For higher norms {U^k, k \geq 3}, the situation is even worse, because the quantitative inverse theory for these norms is poorer, and indeed it was only with the recent work of Manners that any such bound is available at all (at least for {k>4}). Basically, Manners establishes if

\displaystyle  \| \mu \|_{U^k([N])} \geq \delta

then there exists a degree {k-1} nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \exp\exp(\delta^{-O(1)}) )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\exp\exp(\delta^{-O(1)}))} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp\exp(-\delta^{-O(1)}).

(We allow all implied constants to depend on {k}.) Meanwhile, the bound (3) was extended to arbitrary nilmanifolds by Ben and myself. Again, the two results when concatenated give the qualitative decay

\displaystyle  \| \mu \|_{U^k([N])} = o(1)

but the decay rate is completely ineffective.

Our first result gives an effective decay bound:

Theorem 1 For any {k \geq 2}, we have {\| \mu \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

This is off by a logarithm from the best effective bound (2) in the {k=2} case. In the {k=3} case there is some hope to remove this logarithm based on the improved quantitative inverse theory currently available in this case, but there is a technical obstruction to doing so which we will discuss later in this post. For {k>3} the above bound is the best one could hope to achieve purely using the quantitative inverse theory of Manners.

We have analogues of all the above results for the von Mangoldt function {\Lambda}. Here a complication arises that {\Lambda} does not have mean close to zero, and one has to subtract off some suitable approximant {\Lambda^\sharp} to {\Lambda} before one would expect good Gowers norms bounds. For the prime number theorem one can just use the approximant {1}, giving

\displaystyle  \| \Lambda - 1 \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} )

but even for the prime number theorem in arithmetic progressions one needs a more accurate approximant. In our paper it is convenient to use the “Cramér approximant”

\displaystyle  \Lambda_{\hbox{Cram\'er}}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}

where

\displaystyle  W := \prod_{p<Q} p

and {Q} is the quasipolynomial quantity

\displaystyle  Q = \exp(\log^{1/10} N). \ \ \ \ \ (4)

Then one can show from the Siegel-Walfisz theorem and standard bilinear sum methods that

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Cram\'er}}(n)) e(-\alpha n) \ll_A \log^{-A} N

and

\displaystyle  \| \Lambda - \Lambda_{\hbox{Cram\'er}}\|_{U^2([N])} \ll_A \log^{-A} N

for all {A>0} and {\alpha \in {\bf R}} (with an ineffective dependence on {A}), again regaining effectivity if {A} is replaced by a sufficiently small constant {c>0}. All the previously stated discorrelation and Gowers uniformity results for {\mu} then have analogues for {\Lambda}, and our main result is similarly analogous:

Theorem 2 For any {k \geq 2}, we have {\| \Lambda - \Lambda_{\hbox{Cram\'er}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

By standard methods, this result also gives quantitative asymptotics for counting solutions to various systems of linear equations in primes, with error terms that gain a factor of {O((\log\log N)^{-c})} with respect to the main term.

We now discuss the methods of proof, focusing first on the case of the Möbius function. Suppose first that there is no “Siegel zero”, by which we mean a quadratic character {\chi} of some conductor {q \leq Q} with a zero {L(\beta,\chi)} with {1 - \beta \leq \frac{c}{\log Q}} for some small absolute constant {c>0}. In this case the Siegel-Walfisz bound (1) improves to a quasipolynomial bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N). \ \ \ \ \ (5)

To establish Theorem 1 in this case, it suffices by Manners’ inverse theorem to establish the polylogarithmic bound

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (6)

for all degree {k-1} nilmanifolds {G/\Gamma} of dimension {O((\log\log N)^c)} and complexity {O( \exp(\log^c N))}, all polynomial sequences {g}, and all Lipschitz functions {F} of norm {O( \exp(\log^c N))}. If the nilmanifold {G/\Gamma} had bounded dimension, then one could repeat the arguments of Ben and myself more or less verbatim to establish this claim from (5), which relied on the quantitative equidistribution theory on nilmanifolds developed in a separate paper of Ben and myself. Unfortunately, in the latter paper the dependence of the quantitative bounds on the dimension {d} was not explicitly given. In an appendix to the current paper, we go through that paper to account for this dependence, showing that all exponents depend at most doubly exponentially in the dimension {d}, which is barely sufficient to handle the dimension of {O((\log\log N)^c)} that arises here.

Now suppose we have a Siegel zero {L(\beta,\chi)}. In this case the bound (5) will not hold in general, and hence also (6) will not hold either. Here, the usual way out (while still maintaining effective estimates) is to approximate {\mu} not by {0}, but rather by a more complicated approximant {\mu_{\hbox{Siegel}}} that takes the Siegel zero into account, and in particular is such that one has the (effective) pseudopolynomial bound

\displaystyle  \| (\mu - \mu_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N) \ \ \ \ \ (7)

for all residue classes {a \hbox{ mod } q}. The Siegel approximant to {\mu} is actually a little bit complicated, and to our knowledge the first appearance of this sort of approximant only appears as late as this 2010 paper of Germán and Katai. Our version of this approximant is defined as the multiplicative function such that

\displaystyle \mu_{\hbox{Siegel}}(p^j) = \mu(p^j)

when {p < Q}, and

\displaystyle  \mu_{\hbox{Siegel}}(n) = \alpha n^{\beta-1} \chi(n)

when {n} is coprime to all primes {p<Q}, and {\alpha} is a normalising constant given by the formula

\displaystyle  \alpha := \frac{1}{L'(\beta,\chi)} \prod_{p<Q} (1-\frac{1}{p})^{-1} (1 - \frac{\chi(p)}{p^\beta})^{-1}

(this constant ends up being of size {O(1)} and plays only a minor role in the analysis). This is a rather complicated formula, but it seems to be virtually the only choice of approximant that allows for bounds such as (7) to hold. (This is the one aspect of the problem where the von Mangoldt theory is simpler than the Möbius theory, as in the former one only needs to work with very rough numbers for which one does not need to make any special accommodations for the behavior at small primes when introducing the Siegel correction term.) With this starting point it is then possible to repeat the analysis of my previous papers with Ben and obtain the pseudopolynomial discorrelation bound

\displaystyle  \mathop{\bf E}_{n \in [N]} (\mu - \mu_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N)

for {F(g(n)\Gamma)} as before, which when combined with Manners’ inverse theorem gives the doubly logarithmic bound

\displaystyle \| \mu - \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

Meanwhile, a direct sieve-theoretic computation ends up giving the singly logarithmic bound

\displaystyle \| \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N

(indeed, there is a good chance that one could improve the bounds even further, though it is not helpful for this current argument to do so). Theorem 1 then follows from the triangle inequality for the Gowers norm. It is interesting that the Siegel approximant {\mu_{\hbox{Siegel}}} seems to play a rather essential component in the proof, even if it is absent in the final statement. We note that this approximant seems to be a useful tool to explore the “illusory world” of the Siegel zero further; see for instance the recent paper of Chinis for some work in this direction.

For the analogous problem with the von Mangoldt function (assuming a Siegel zero for sake of discussion), the approximant {\Lambda_{\hbox{Siegel}}} is simpler; we ended up using

\displaystyle \Lambda_{\hbox{Siegel}}(n) = \Lambda_{\hbox{Cram\'er}}(n) (1 - n^{\beta-1} \chi(n))

which allows one to state the standard prime number theorem in arithmetic progressions with classical error term and Siegel zero term compactly as

\displaystyle  \| (\Lambda - \Lambda_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N).

Routine modifications of previous arguments also give

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (8)

and

\displaystyle \| \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N.

The one tricky new step is getting from the discorrelation estimate (8) to the Gowers uniformity estimate

\displaystyle \| \Lambda - \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

One cannot directly apply Manners’ inverse theorem here because {\Lambda} and {\Lambda_{\hbox{Siegel}}} are unbounded. There is a standard tool for getting around this issue, now known as the dense model theorem, which is the standard engine powering the transference principle from theorems about bounded functions to theorems about certain types of unbounded functions. However the quantitative versions of the dense model theorem in the literature are expensive and would basically weaken the doubly logarithmic gain here to a triply logarithmic one. Instead, we bypass the dense model theorem and directly transfer the inverse theorem for bounded functions to an inverse theorem for unbounded functions by using the densification approach to transference introduced by Conlon, Fox, and Zhao. This technique turns out to be quantitatively quite efficient (the dependencies of the main parameters in the transference are polynomial in nature), and also has the technical advantage of avoiding the somewhat tricky “correlation condition” present in early transference results which are also not beneficial for quantitative bounds.

In principle, the above results can be improved for {k=3} due to the stronger quantitative inverse theorems in the {U^3} setting. However, there is a bottleneck that prevents us from achieving this, namely that the equidistribution theory of two-step nilmanifolds has exponents which are exponential in the dimension rather than polynomial in the dimension, and as a consequence we were unable to improve upon the doubly logarithmic results. Specifically, if one is given a sequence of bracket quadratics such as {\lfloor \alpha_1 n \rfloor \beta_1 n, \dots, \lfloor \alpha_d n \rfloor \beta_d n} that fails to be {\delta}-equidistributed, one would need to establish a nontrivial linear relationship modulo 1 between the {\alpha_1,\beta_1,\dots,\alpha_d,\beta_d} (up to errors of {O(1/N)}), where the coefficients are of size {O(\delta^{-d^{O(1)}})}; current methods only give coefficient bounds of the form {O(\delta^{-\exp(d^{O(1)})})}. An old result of Schmidt demonstrates proof of concept that these sorts of polynomial dependencies on exponents is possible in principle, but actually implementing Schmidt’s methods here seems to be a quite non-trivial task. There is also another possible route to removing a logarithm, which is to strengthen the inverse {U^3} theorem to make the dimension of the nilmanifold logarithmic in the uniformity parameter {\delta} rather than polynomial. Again, the Freiman-Bilu theorem (see for instance this paper of Ben and myself) demonstrates proof of concept that such an improvement in dimension is possible, but some work would be needed to implement it.

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than {1} appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer {t \geq 2}, there are at most {O(1)} solutions to the equation

\displaystyle  \binom{n}{m} = t \ \ \ \ \ (1)

with {1 \leq m < n}. Currently, the largest number of solutions that is known to be attainable is eight, with {t} equal to

\displaystyle  3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}

\displaystyle = \binom{78}{76} = \binom{3003}{3002}.

Because of the symmetry {\binom{n}{m} = \binom{n}{n-m}} of Pascal’s triangle it is natural to restrict attention to the left half {1 \leq m \leq n/2} of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (1) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)

and hence at most four in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).

Also, there is at most one solution in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

\displaystyle  2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)

(or equivalently {n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}); we have deleted the {m=1} case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for {t} sufficiently large there are no further collisions {\binom{n}{m} = \binom{n'}{m'}=t} for {(n,m), (n',m')} in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large {t}. This is latter claim known for bounded values of {m,m'} by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

\displaystyle  \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)

coming from {n = F_{2j+2} F_{2j+3}-1}, {m = F_{2j} F_{2j+3}-1} and {F_j} is the {j^{th}} Fibonacci number.

The appearance of the quantity {\exp( \log^{2/3+\varepsilon} n )} in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view {n,m} primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

\displaystyle  \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.

One can use this equation to solve for {n} in terms of {m,t} as

\displaystyle  n = f_t(m)

for a certain real analytic function {f_t} whose asymptotics are easily computable (for instance one has the asymptotic {f_t(m) \asymp m t^{1/m}}). One can then view the problem as one of trying to control the number of lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}. Here we can take advantage of the fact that in the regime {m \leq f_t(m)/2} (which corresponds to working in the left half {m \leq n/2} of Pascal’s triangle), the function {f_t} can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of {f_t} (in fact one can show that {f''_t(m) \asymp f_t(m) (\log t/m^2)^2}). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}, basically because the area subtended by the triangle connecting three of these points would lie between {0} and {1/2}, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let {\varepsilon>0}, and suppose {t} is sufficiently large depending on {\varepsilon}. If {(m,n)} is a solution to (1) in the left half {m \leq n/2} of Pascal’s triangle, then there is at most one other solution {(m',n')} to this equation in the left half with

\displaystyle  |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions {(m,n), (m',n')} to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the {p}-adic valuations {\nu_p( \binom{n}{m} )} of the binomial coefficients, defined as the number of times a prime {p} divides {\binom{n}{m}}. From the fundamental theorem of arithmetic, a collision

\displaystyle  \binom{n}{m} = \binom{n'}{m'}

between binomial coefficients occurs if and only if one has agreement of valuations

\displaystyle  \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)

From the Legendre formula

\displaystyle  \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor

we can rewrite this latter identity (5) as

\displaystyle  \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)

where {\{x\} := x - \lfloor x\rfloor} denotes the fractional part of {x}. (These sums are not truly infinite, because the summands vanish once {p^j} is larger than {\max(n,n')}.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing {p} as a prime drawn randomly from an interval such as {[P, P + P \log^{-100} P]} for some suitably chosen scale parameter {P}, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if {n} and {n'} are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part {\{ \frac{n}{p}\}}, since this term shows up in the summation on the left-hand side but not the right. Similarly if {m} and {m'} are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if {n'-m'} is a rational multiple of {m} where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

\displaystyle  \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )

for various choices of parameters {P, N, M, j}, where {e(\theta) := e^{2\pi i \theta}}. Fortunately, the methods of Vinogradov (which more generally can handle sums such as {\sum_{n \in I} e(f(n))} and {\sum_{p \in I} e(f(p))} for various analytic functions {f}) can give useful bounds on such sums as long as {N} and {M} are not too large compared to {P}; more specifically, Vinogradov’s estimates are non-trivial in the regime {N,M \ll \exp( \log^{3/2-\varepsilon} P )}, and this ultimately leads to a distance bound

\displaystyle  m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

between any colliding pair {(n,m), (n',m')} in the left half of Pascal’s triangle, as well as the variant bound

\displaystyle  n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

under the additional assumption

\displaystyle  m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).

Comparing these bounds with Proposition 2 and using some basic estimates about the function {f_t}, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

\displaystyle  (n)_m = t \ \ \ \ \ (7)

where {(n)_m := n (n-1) \dots (n-m+1)} is the falling factorial:

Theorem 3 If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (7) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)

Again the upper bound of two is best possible, thanks to identities such as

\displaystyle  (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let {\pi(x)} denote the number of primes less than a given real number {x}. Then

\displaystyle  \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1

(or in asymptotic notation, {\pi(x) = (1+o(1)) \frac{x}{\ln x}} as {x \rightarrow \infty}).

(Actually, it turns out to be slightly more natural to replace the approximation {\frac{x}{\ln x}} in the prime number theorem by the logarithmic integral {\int_2^x \frac{dt}{\ln t}}, which happens to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function {\zeta}. Initially, it is only defined on the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}:

Definition 2 (Riemann zeta function, preliminary definition) Let {s \in {\bf C}} be such that {\mathrm{Re} s > 1}. Then we define

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)

Note that the series is locally uniformly convergent in the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}, so in particular {\zeta} is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

\displaystyle  \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)
  • (i) (Euler product formula) For any {s \in {\bf C}} with {\mathrm{Re} s > 1}, we have

    \displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (3)

    where the product is absolutely convergent (and locally uniform in {s}) and is over the prime numbers {p = 2, 3, 5, \dots}.
  • (ii) (Trivial zero-free region) {\zeta(s)} has no zeroes in the region {\{s: \mathrm{Re}(s) > 1 \}}.
  • (iii) (Meromorphic continuation) {\zeta} has a unique meromorphic continuation to the complex plane (which by abuse of notation we also call {\zeta}), with a simple pole at {s=1} and no other poles. Furthermore, the Riemann xi function

    \displaystyle  \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s) \ \ \ \ \ (4)

    is an entire function of order {1} (after removing all singularities). The function {(s-1) \zeta(s)} is an entire function of order one after removing the singularity at {s=1}.
  • (iv) (Functional equation) After applying the meromorphic continuation from (iii), we have

    \displaystyle  \zeta(s) = 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s) \zeta(1-s) \ \ \ \ \ (5)

    for all {s \in {\bf C}} (excluding poles). Equivalently, we have

    \displaystyle  \xi(s) = \xi(1-s) \ \ \ \ \ (6)

    for all {s \in {\bf C}}. (The equivalence between the (5) and (6) is a routine consequence of the Euler reflection formula and the Legendre duplication formula, see Exercises 26 and 31 of Notes 1.)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number {n} is uniquely representable as a product {n = \prod_p p^{a_p}} over primes, where the {a_p} are natural numbers, all but finitely many of which are zero. Writing this representation as {\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}, we see that

\displaystyle  \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}

whenever {x \geq 1}, {m \geq 0}, and {S_{x,m}} consists of all the natural numbers of the form {n = \prod_{p \leq x} p^{a_p}} for some {a_p \leq m}. Sending {m} and {x} to infinity, we conclude from monotone convergence and the geometric series formula that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}

whenever {s>1} is real, and then from dominated convergence we see that the same formula holds for complex {s} with {\mathrm{Re} s > 1} as well. Local uniform convergence then follows from the product form of the Weierstrass {M}-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product {\prod_p (1-\frac{1}{p^s})^{-1}} is absolutely convergent and all terms are non-zero. \Box

We remark that by sending {s} to {1} in Theorem 3(i) we conclude that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}

and from the divergence of the harmonic series we then conclude Euler’s theorem {\sum_p \frac{1}{p} = \infty}. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers {n} occurring in the sum (1), or as a consequence of various integral representations of {\zeta} (or slight modifications thereof). We will focus in this set of notes on a particular representation of {\zeta} as essentially the Mellin transform of the theta function {\theta} that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of {\zeta}. The functional equation (iv), when combined with special values of {\zeta} such as (2), gives some additional values of {\zeta} outside of its initial domain {\{s: \mathrm{Re} s > 1\}}, most famously

\displaystyle  \zeta(-1) = -\frac{1}{12}.

If one formally compares this formula with (1), one arrives at the infamous identity

\displaystyle  1 + 2 + 3 + \dots = -\frac{1}{12}

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of {\Gamma}, we see that {\zeta} has simple zeroes (known as trivial zeroes) at the negative even integers {-2, -4, \dots}, and all other zeroes (the non-trivial zeroes) inside the critical strip {\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if {\rho} is a non-trivial zero then so is {1-\rho}; also, we have the identity

\displaystyle  \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)

for all {s>1} by (1), hence for all {s} (except the pole at {s=1}) by meromorphic continuation. Thus if {\rho} is a non-trivial zero then so is {\overline{\rho}}. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of {\zeta} lie on the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where {\mathrm{Re} s > 1}. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

\displaystyle  -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.

From the geometric series formula we have

\displaystyle  \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}

and so (by another application of Fubini’s theorem) we have the identity

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)

for {\mathrm{Re} s > 1}, where the von Mangoldt function {\Lambda(n)} is defined to equal {\Lambda(n) = \ln p} whenever {n = p^j} is a power {p^j} of a prime {p} for some {j=1,2,\dots}, and {\Lambda(n)=0} otherwise. The contribution of the higher prime powers {p^2, p^3, \dots} is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series {\sum_{n=1}^\infty \frac{1}{n^s}} and {\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}} that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let {s} be a complex number with {\mathrm{Re} s > 1}.
  • (i) Show that {-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}.
  • (ii) Show that {\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}, where {\tau(n) := \sum_{d|n} 1} is the divisor function of {n} (the number of divisors of {n}).
  • (iii) Show that {\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}, where {\mu(n)} is the Möbius function, defined to equal {(-1)^k} when {n} is the product of {k} distinct primes for some {k \geq 0}, and {0} otherwise.
  • (iv) Show that {\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}, where {\lambda(n)} is the Liouville function, defined to equal {(-1)^k} when {n} is the product of {k} (not necessarily distinct) primes for some {k \geq 0}.
  • (v) Show that {\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}, where {\log \zeta} is the holomorphic branch of the logarithm that is real for {s>1}, and with the convention that {\Lambda(n)/\ln n} vanishes for {n=1}.
  • (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function {\Lambda: {\bf N} \rightarrow {\bf R}} such that

    \displaystyle  \ln n = \sum_{d|n} \Lambda(d)

    for every positive integer {n}. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function {\Lambda}, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

\displaystyle  \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1

(or in asymptotic notation, {\sum_{n\leq x} \Lambda(n) = x + o(x)} as {x \rightarrow \infty}).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any {x \geq 2}, we can write

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.

The sum {\sum_{p \leq x^{1/j}} \ln p} is non-zero for only {O(\ln x)} values of {j}, and is of size {O( x^{1/2} \ln x )}, thus

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).

Since {x^{1/2} \ln^2 x = o(x)}, we conclude from Theorem 6 that

\displaystyle  \sum_{p \leq x} \ln p = x + o(x)

as {x \rightarrow \infty}. Next, observe from the fundamental theorem of calculus that

\displaystyle  \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.

Multiplying by {\log p} and summing over all primes {p \leq x}, we conclude that

\displaystyle  \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.

From Theorem 6 we certainly have {\sum_{p \leq y} \ln p = O(y)}, thus

\displaystyle  \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).

By splitting the integral into the ranges {2 \leq y \leq \sqrt{x}} and {\sqrt{x} < y \leq x} we see that the right-hand side is {o(x/\ln x)}, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between {\zeta} and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer {x > 1}, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )

where {\rho} ranges over the non-trivial zeroes of {\zeta} with imaginary part in {[-T,T]}. Furthermore, the convergence of the limit is locally uniform in {x}.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes {\rho} of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in {x}, one is led to the (quite nonrigorous) approximation

\displaystyle  \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)

or (writing {\rho = \sigma+i\gamma})

\displaystyle  \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.

Thus we see that each zero {\rho = \sigma + i\gamma} induces an oscillation in the von Mangoldt function, with {\gamma} controlling the frequency of the oscillation and {\sigma} the rate to which the oscillation dies out as {n \rightarrow \infty}. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of {\zeta} on the line {\{ 1+it: t \in {\bf R} \}}.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at {1+it}, then there would also be a different zero at {1-it} (note {t} cannot vanish due to the pole at {s=1}), and then the approximation (9) becomes

\displaystyle  \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \ln n) + \dots.

But the expression {1 - 2 \cos(t \ln n)} can be negative for large regions of the variable {n}, whereas {\Lambda(n)} is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real {t}, show that

\displaystyle  -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})

as {\sigma \rightarrow 1^+}, where {o( \frac{1}{\sigma-1})} denotes a quantity that goes to zero as {\sigma \rightarrow 1^+} after being multiplied by {\sigma-1}. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

Read the rest of this entry »

Previous set of notes: Notes 2. Next set of notes: Notes 4.

On the real line, the quintessential examples of a periodic function are the (normalised) sine and cosine functions {\sin(2\pi x)}, {\cos(2\pi x)}, which are {1}-periodic in the sense that

\displaystyle  \sin(2\pi(x+1)) = \sin(2\pi x); \quad \cos(2\pi (x+1)) = \cos(2\pi x).

By taking various polynomial combinations of {\sin(2\pi x)} and {\cos(2\pi x)} we obtain more general trigonometric polynomials that are {1}-periodic; and the theory of Fourier series tells us that all other {1}-periodic functions (with reasonable integrability conditions) can be approximated in various senses by such polynomial combinations. Using Euler’s identity, one can use {e^{2\pi ix}} and {e^{-2\pi ix}} in place of {\sin(2\pi x)} and {\cos(2\pi x)} as the basic generating functions here, provided of course one is willing to use complex coefficients instead of real ones. Of course, by rescaling one can also make similar statements for other periods than {1}. {1}-periodic functions {f: {\bf R} \rightarrow {\bf C}} can also be identified (by abuse of notation) with functions {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} on the quotient space {{\bf R}/{\bf Z}} (known as the additive {1}-torus or additive unit circle), or with functions {f: [0,1] \rightarrow {\bf C}} on the fundamental domain (up to boundary) {[0,1]} of that quotient space with the periodic boundary condition {f(0)=f(1)}. The map {x \mapsto (\cos(2\pi x), \sin(2\pi x))} also identifies the additive unit circle {{\bf R}/{\bf Z}} with the geometric unit circle {S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2=1\} \subset {\bf R}^2}, thanks in large part to the fundamental trigonometric identity {\cos^2 x + \sin^2 x = 1}; this can also be identified with the multiplicative unit circle {S^1 = \{ z \in {\bf C}: |z|=1 \}}. (Usually by abuse of notation we refer to all of these three sets simultaneously as the “unit circle”.) Trigonometric polynomials on the additive unit circle then correspond to ordinary polynomials of the real coefficients {x,y} of the geometric unit circle, or Laurent polynomials of the complex variable {z}.

What about periodic functions on the complex plane? We can start with singly periodic functions {f: {\bf C} \rightarrow {\bf C}} which obey a periodicity relationship {f(z+\omega)=f(z)} for all {z} in the domain and some period {\omega \in {\bf C} \backslash \{0\}}; such functions can also be viewed as functions on the “additive cylinder” {\omega {\bf Z} \backslash {\bf C}} (or equivalently {{\bf C} / \omega {\bf Z}}). We can rescale {\omega=1} as before. For holomorphic functions, we have the following characterisations:

Proposition 1 (Description of singly periodic holomorphic functions)
  • (i) Every {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} has an absolutely convergent expansion

    \displaystyle  f(z) = \sum_{n=-\infty}^\infty a_n e^{2\pi i nz} = \sum_{n=-\infty}^\infty a_n q^n \ \ \ \ \ (1)

    where {q} is the nome {q := e^{2\pi i z}}, and the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} = \limsup_{n \rightarrow +\infty} |a_{-n}|^{1/n} = 0. \ \ \ \ \ (2)

    Conversely, every doubly infinite sequence {(a_n)_{n \in {\bf Z}}} of coefficients obeying (2) gives rise to a {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} via the formula (1).
  • (ii) Every bounded {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} on the upper half-plane {\{ z: \mathrm{Im}(z) > 0\}} has an expansion

    \displaystyle  f(z) = \sum_{n=0}^\infty a_n e^{2\pi i nz} = \sum_{n=0}^\infty a_n q^n \ \ \ \ \ (3)

    where the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq 1. \ \ \ \ \ (4)

    Conversely, every infinite sequence {(a_n)_{n \in {\bf Z}}} obeying (4) gives rise to a {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} which is bounded away from the real axis (i.e., bounded on {\{ z: \mathrm{Im}(z) \geq \varepsilon\}} for every {\varepsilon > 0}).
In both cases, the coefficients {a_n} can be recovered from {f} by the Fourier inversion formula

\displaystyle  a_n = \int_{\gamma_{z_0 \rightarrow z_0+1}} f(z) e^{-2\pi i nz}\ dz \ \ \ \ \ (5)

for any {z_0} in {{\bf C}} (in case (i)) or {{\bf H}} (in case (ii)).

Proof: If {f: {\bf C} \rightarrow {\bf C}} is {1}-periodic, then it can be expressed as {f(z) = F(q) = F(e^{2\pi i z})} for some function {F: {\bf C} \backslash \{0\} \rightarrow {\bf C}} on the “multiplicative cylinder” {{\bf C} \backslash \{0\}}, since the fibres of the map {z \mapsto e^{2\pi i z}} are cosets of the integers {{\bf Z}}, on which {f} is constant by hypothesis. As the map {z \mapsto e^{2\pi i z}} is a covering map from {{\bf C}} to {{\bf C} \backslash \{0\}}, we see that {F} will be holomorphic if and only if {f} is. Thus {F} must have a Laurent series expansion {F(q) = \sum_{n=-\infty}^\infty a_n q^n} with coefficients {a_n} obeying (2), which gives (1), and the inversion formula (5) follows from the usual contour integration formula for Laurent series coefficients. The converse direction to (i) also follows by reversing the above arguments.

For part (ii), we observe that the map {z \mapsto e^{2\pi i z}} is also a covering map from {{\bf H}} to the punctured disk {D(0,1) \backslash \{0\}}, so we can argue as before except that now {F} is a bounded holomorphic function on the punctured disk. By the Riemann singularity removal theorem (Exercise 35 of 246A Notes 3) {F} extends to be holomorphic on all of {D(0,1)}, and thus has a Taylor expansion {F(q) = \sum_{n=0}^\infty a_n q^n} for some coefficients {a_n} obeying (4). The argument now proceeds as with part (i). \Box

The additive cylinder {{\bf Z} \backslash {\bf C}} and the multiplicative cylinder {{\bf C} \backslash \{0\}} can both be identified (on the level of smooth manifolds, at least) with the geometric cylinder {\{ (x,y,z) \in {\bf R}^3: x^2+y^2=1\}}, but we will not use this identification here.

Now let us turn attention to doubly periodic functions of a complex variable {z}, that is to say functions {f} that obey two periodicity relations

\displaystyle  f(z+\omega_1) = f(z); \quad f(z+\omega_2) = f(z)

for all {z \in {\bf C}} and some periods {\omega_1,\omega_2 \in {\bf C}}, which to avoid degeneracies we will assume to be linearly independent over the reals (thus {\omega_1,\omega_2} are non-zero and the ratio {\omega_2/\omega_1} is not real). One can rescale {\omega_1,\omega_2} by a common scaling factor {\lambda \in {\bf C} \backslash \{0\}} to normalise either {\omega_1=1} or {\omega_2=1}, but one of course cannot simultaneously normalise both parameters in this fashion. As in the singly periodic case, such functions can also be identified with functions on the additive {2}-torus {\Lambda \backslash {\bf C}}, where {\Lambda} is the lattice {\Lambda := \omega_1 {\bf Z} + \omega_2 {\bf Z}}, or with functions {f} on the solid parallelogram bounded by the contour {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} (a fundamental domain up to boundary for that torus), obeying the boundary periodicity conditions

\displaystyle  f(z+\omega_1) = f(z)

for {z} in the edge {\gamma_{\omega_2 \rightarrow 0}}, and

\displaystyle  f(z+\omega_2) = f(z)

for {z} in the edge {\gamma_{\omega_0 \rightarrow 1}}.

Within the world of holomorphic functions, the collection of doubly periodic functions is boring:

Proposition 2 Let {f: {\bf C} \rightarrow {\bf C}} be an entire doubly periodic function (with periods {\omega_1,\omega_2} linearly independent over {{\bf R}}). Then {f} is constant.

In the language of Riemann surfaces, this proposition asserts that the torus {\Lambda \backslash {\bf C}} is a non-hyperbolic Riemann surface; it cannot be holomorphically mapped non-trivially into a bounded subset of the complex plane.

Proof: The fundamental domain (up to boundary) enclosed by {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} is compact, hence {f} is bounded on this domain, hence bounded on all of {{\bf C}} by double periodicity. The claim now follows from Liouville’s theorem. (One could alternatively have argued here using the compactness of the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}. \Box

To obtain more interesting examples of doubly periodic functions, one must therefore turn to the world of meromorphic functions – or equivalently, holomorphic functions into the Riemann sphere {{\bf C} \cup \{\infty\}}. As it turns out, a particularly fundamental example of such a function is the Weierstrass elliptic function

\displaystyle  \wp(z) := \frac{1}{z^2} + \sum_{z_0 \in \Lambda \backslash 0} \left( \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} \right) \ \ \ \ \ (6)

which plays a role in doubly periodic functions analogous to the role of {x \mapsto \cos(2\pi x)} for {1}-periodic real functions. This function will have a double pole at the origin {0}, and more generally at all other points on the lattice {\Lambda}, but no other poles. The derivative

\displaystyle  \wp'(z) = -2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3} \ \ \ \ \ (7)

of the Weierstrass function is another doubly periodic meromorphic function, now with a triple pole at every point of {\Lambda}, and plays a role analogous to {x \mapsto \sin(2\pi x)}. Remarkably, all the other doubly periodic meromorphic functions with these periods will turn out to be rational combinations of {\wp} and {\wp'}; furthermore, in analogy with the identity {\cos^2 x+ \sin^2 x = 1}, one has an identity of the form

\displaystyle  \wp'(z)^2 = 4 \wp(z)^3 - g_2 \wp(z) - g_3 \ \ \ \ \ (8)

for all {z \in {\bf C}} (avoiding poles) and some complex numbers {g_2,g_3} that depend on the lattice {\Lambda}. Indeed, much as the map {x \mapsto (\cos 2\pi x, \sin 2\pi x)} creates a diffeomorphism between the additive unit circle {{\bf R}/{\bf Z}} to the geometric unit circle {\{ (x,y) \in{\bf R}^2: x^2+y^2=1\}}, the map {z \mapsto (\wp(z), \wp'(z))} turns out to be a complex diffeomorphism between the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}} and the elliptic curve

\displaystyle  \{ (z, w) \in {\bf C}^2: z^2 = 4w^3 - g_2 w - g_3 \} \cup \{\infty\}

with the convention that {(\wp,\wp')} maps the origin {\omega_1 {\bf Z} + \omega_2 {\bf Z}} of the torus to the point {\infty} at infinity. (Indeed, one can view elliptic curves as “multiplicative tori”, and both the additive and multiplicative tori can be identified as smooth manifolds with the more familiar geometric torus, but we will not use such an identification here.) This fundamental identification with elliptic curves and tori motivates many of the further remarkable properties of elliptic curves; for instance, the fact that tori are obviously an abelian group gives rise to an abelian group law on elliptic curves (and this law can be interpreted as an analogue of the trigonometric sum identities for {\wp, \wp'}). The description of the various meromorphic functions on the torus also helps motivate the more general Riemann-Roch theorem that is a fundamental law governing meromorphic functions on other compact Riemann surfaces (and is discussed further in these 246C notes). So far we have focused on studying a single torus {\Lambda \backslash {\bf C}}. However, another important mathematical object of study is the space of all such tori, modulo isomorphism; this is a basic example of a moduli space, known as the (classical, level one) modular curve {X_0(1)}. This curve can be described in a number of ways. On the one hand, it can be viewed as the upper half-plane {{\bf H} = \{ z: \mathrm{Im}(z) > 0 \}} quotiented out by the discrete group {SL_2({\bf Z})}; on the other hand, by using the {j}-invariant, it can be identified with the complex plane {{\bf C}}; alternatively, one can compactify the modular curve and identify this compactification with the Riemann sphere {{\bf C} \cup \{\infty\}}. (This identification, by the way, produces a very short proof of the little and great Picard theorems, which we proved in 246A Notes 4.) Functions on the modular curve (such as the {j}-invariant) can be viewed as {SL_2({\bf Z})}-invariant functions on {{\bf H}}, and include the important class of modular functions; they naturally generalise to the larger class of (weakly) modular forms, which are functions on {{\bf H}} which transform in a very specific way under {SL_2({\bf Z})}-action, and which are ubiquitous throughout mathematics, and particularly in number theory. Basic examples of modular forms include the Eisenstein series, which are also the Laurent coefficients of the Weierstrass elliptic functions {\wp}. More number theoretic examples of modular forms include (suitable powers of) theta functions {\theta}, and the modular discriminant {\Delta}. Modular forms are {1}-periodic functions on the half-plane, and hence by Proposition 1 come with Fourier coefficients {a_n}; these coefficients often turn out to encode a surprising amount of number-theoretic information; a dramatic example of this is the famous modularity theorem, (a special case of which was) used amongst other things to establish Fermat’s last theorem. Modular forms can be generalised to other discrete groups than {SL_2({\bf Z})} (such as congruence groups) and to other domains than the half-plane {{\bf H}}, leading to the important larger class of automorphic forms, which are of major importance in number theory and representation theory, but which are well outside the scope of this course to discuss.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers {{\bf Z}} (or lattices {{\bf Z}^d}), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers {{\bf A}_{\bf Z}} (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function {m: {\bf R}/{\bf Z} \rightarrow {\bf C}} then defines a discrete Fourier multiplier operator {T_m: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any {1 \leq p \leq \infty} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_m f(\xi) =: m(\xi) {\mathcal F}_{\bf Z} f(\xi)

where {{\mathcal F}_{\bf Z} f(\xi) := \sum_{n \in {\bf Z}} f(n) e(n \xi)} is the Fourier transform on {{\bf Z}}; similarly, any test function {m: {\bf R} \rightarrow {\bf C}} defines a continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})} by the formula

\displaystyle  {\mathcal F}_{\bf R} T_m f(\xi) := m(\xi) {\mathcal F}_{\bf R} f(\xi)

where {{\mathcal F}_{\bf R} f(\xi) := \int_{\bf R} f(x) e(x \xi)\ dx}. In both cases we refer to {m} as the symbol of the multiplier operator {T_m}.

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}, and any frequency {\alpha \in {\bf R}/{\bf Z}}, we define the discrete Fourier multiplier operator {T_{m;\alpha}: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any frequency shift {\alpha \in {\bf R}/{\bf Z}} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_{m,\alpha} f(\xi) := \sum_{\theta \in {\bf R}: \xi = \alpha + \theta} m(\theta) {\mathcal F}_{\bf Z} f(\xi)

or equivalently

\displaystyle  T_{m;\alpha} f(n) = \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

More generally, given any finite set {\Sigma \subset {\bf R}/{\bf Z}}, we can form a multifrequency projection operator {T_{m;\Sigma}} on {\ell^p({\bf Z})} by the formula

\displaystyle  T_{m;\Sigma} := \sum_{\alpha \in \Sigma} T_{m;\alpha}

thus

\displaystyle  T_{m;\alpha} f(n) = \sum_{\alpha \in \Sigma} \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]}, then {T_{m;\Sigma}} is a Fourier multiplier whose symbol is supported on the set {\bigcup_{\alpha \in \Sigma} \alpha + [-\varepsilon,\varepsilon]}.

There are a body of results relating the {\ell^p({\bf Z})} theory of discrete Fourier multiplier operators such as {T_{m;\alpha}} or {T_{m;\Sigma}} with the {L^p({\bf R})} theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

Proposition 1 (Magyar-Stein-Wainger sampling principle) Let {1 \leq p \leq \infty} and {\alpha \in {\bf R}/{\bf Z}}.
  • (i) If {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2,1/2]}, then {\|T_{m;\alpha}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}, where {B(V)} denotes the operator norm of an operator {T: V \rightarrow V}.
  • (ii) More generally, if {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2Q,1/2Q]} for some natural number {Q}, then {\|T_{m;\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}.

When {p=2} the implied constant in these bounds can be set to equal {1}. In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other {p}; in an appendix to this paper I show that the answer is negative if {p} is sufficiently close to {1} or {\infty}, but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups {\frac{1}{Q}{\bf Z}/{\bf Z}}; for instance it shows that a discrete Fourier multiplier with symbol {\sum_{\alpha \in \frac{1}{Q}{\bf Z}/{\bf Z}} \phi(Q(\xi-\alpha))} for a fixed test function {\phi} is bounded on {\ell^p({\bf Z})}, uniformly in {p} and {Q}. For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

\displaystyle  \bigcup_{q=1}^N \bigcup_{\alpha \in \frac{1}{q}{\bf Z}/{\bf Z}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (1)

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection {S} of pairwise coprime natural numbers, and a natural number {k}, one can form the major arc type set

\displaystyle  \bigcup_{\alpha \in \Sigma_{\leq k}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (2)

where {\Sigma_{\leq k} \subset {\bf R}/{\bf Z}} consists of all rational points in the unit circle of the form {\frac{a}{Q} \mod 1} where {Q} is the product of at most {k} elements from {S} and {a} is an integer. For suitable choices of {S} and {k} not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses {S} to consist of (small powers of) large primes between {N^\rho} and {N} for some small constant {\rho>0}, together with something like the product of all the primes up to {N^\rho} (raised to suitable powers)).

In the regime where {k} is fixed and {\varepsilon} is small, there is a good theory:

Theorem 2 (Ionescu-Wainger theorem, rough version) If {p} is an even integer or the dual of an even integer, and {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]} for a sufficiently small {\varepsilon > 0}, then

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \lesssim_{p, k} (\log(1+|S|))^{O_k(1)} \|T_m\|_{B(L^p({\bf R}))}.

There is a more explicit description of how small {\varepsilon} needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs {\alpha + [-\varepsilon,\varepsilon]} in (2) to be disjoint), but we will not give it here. The logarithmic loss of {(\log(1+|S|))^{O_k(1)}} was reduced to {\log(1+|S|)} by Mirek. In this paper we refine the bound further to

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \leq O(r \log(2+kr))^k \|T_m\|_{B(L^p({\bf R}))}. \ \ \ \ \ (3)

when {p = 2r} or {p = (2r)'} for some integer {r}. In particular there is no longer any logarithmic loss in the cardinality of the set {S}.

The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume {p=2r}. We use the following standard sequence of steps:

  • (i) (Denominator orthogonality) First one splits {T_{m;\Sigma_{\leq k}} f} into various pieces depending on the denominator {Q} appearing in the element of {\Sigma_{\leq k}}, and exploits “superorthogonality” in {Q} to estimate the {\ell^p} norm by the {\ell^p} norm of an appropriate square function.
  • (ii) (Nonconcentration) One expands out the {p^{th}} power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
  • (iii) (Numerator orthogonality) We now decompose based on the numerators {a} appearing in the relevant elements of {\Sigma_{\leq k}}, and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets {\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}.
  • (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier {m}.
  • (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

\displaystyle  \sum_{A \in \binom{S}{k}} f_A,

where {\binom{S}{k}} is the set of {k}-element subsets of an index set {S}, and {f_A} are various complex numbers, as an average

\displaystyle  \sum_{A \in \binom{S}{k}} f_A = \frac{k^k}{k!} {\bf E} \sum_{s_1 \in {\bf S}_1,\dots,s_k \in {\bf S}_k} f_{\{s_1,\dots,s_k\}}

where {S = {\bf S}_1 \cup \dots \cup {\bf S}_k} is a random partition of {S} into {k} subclasses (chosen uniformly over all such partitions), basically because every {k}-element subset {A} of {S} has a probability exactly {\frac{k!}{k^k}} of being completely shattered by such a random partition. This “decouples” the index set {\binom{S}{k}} into a Cartesian product {{\bf S}_1 \times \dots \times {\bf S}_k} which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form

\displaystyle  (\sum_{A \in \binom{S}{k}} F_A)^r \lesssim_{k,r} \sum_{A_1,\dots,A_r \in \binom{S}{k} \hbox{ sunflower}} F_{A_1} \dots F_{A_r}

where {F_A} are various non-negative quantities, and a sunflower is a collection of sets {A_1,\dots,A_r} that consist of a common “core” {A_0} and disjoint “petals” {A_1 \backslash A_0,\dots,A_r \backslash A_0}. The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case {k=1}.

In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function {f: {\bf R} \rightarrow {\bf C}} has its Fourier transform supported on {[-1/2,1/2]}, then {f} can be recovered uniquely from its restriction {f|_{\bf Z}: {\bf Z} \rightarrow {\bf C}}. In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to {[-c,c]} for some {0 < c < 1/2}, then the restriction {f|_{\bf Z}} has the same {L^p} behaviour as the original function, in the sense that

\displaystyle  \| f|_{\bf Z} \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R})} \ \ \ \ \ (4)

for all {0 < p \leq \infty}; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions {f} should behave like a constant at scales {\sim 1/c}.

The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

\displaystyle  T_{m;0} (f|_{\bf Z}) = (T_m f)_{\bf Z}

whenever {f: {\bf R} \rightarrow {\bf C}} is Schwartz and has Fourier transform supported in {[-1/2,1/2]}, and {m} is also supported on {[-1/2,1/2]}; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of {{\bf R}} is now played by {{\bf R} \times {\bf Z}/Q{\bf Z}}, and the standard embedding of {{\bf Z}} into {{\bf R}} is now replaced by the embedding {\iota_Q: n \mapsto (n, n \hbox{ mod } Q)} of {{\bf Z}} into {{\bf R} \times {\bf Z}/Q{\bf Z}}; the analogue of (4) is now

\displaystyle  \| f \circ \iota_Q \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R} \times {\bf Z}/Q{\bf Z})} \ \ \ \ \ (5)

whenever {f: {\bf R} \times {\bf Z}/Q{\bf Z} \rightarrow {\bf C}} is Schwartz and has Fourier transform {{\mathcal F}_{{\bf R} \times {\bf Z}/Q{\bf Z}} f\colon {\bf R} \times \frac{1}{Q}{\bf Z}/{\bf Z} \rightarrow {\bf C}} supported in {[-c/Q,c/Q] \times \frac{1}{Q}{\bf Z}/{\bf Z}}, and {{\bf Z}/Q{\bf Z}} is endowed with probability Haar measure.

The locally compact abelian groups {{\bf R}} and {{\bf R} \times {\bf Z}/Q{\bf Z}} can all be viewed as projections of the adelic integers {{\bf A}_{\bf Z} := {\bf R} \times \hat {\bf Z}} (the product of the reals and the profinite integers {\hat {\bf Z}}). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

\displaystyle  \| f \circ \iota \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf A}_{\bf Z})}

whenever {1 < p < \infty}, {f: {\bf A}_{\bf Z} \rightarrow {\bf C}} is Schwartz-Bruhat and has Fourier transform {{\mathcal F}_{{\bf A}_{\bf Z}} f: {\bf R} \times {\bf Q}/{\bf Z} \rightarrow {\bf C}} supported on {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}} for some sufficiently small {\varepsilon} (the precise bound on {\varepsilon} depends on {S, p, c} in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the {\ell^p({\bf Z})} operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the {L^p({\bf A}_{\bf Z})} operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers {{\bf Z}} to “low frequency” harmonic analysis on the adelic integers {{\bf A}_{\bf Z}}, which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}).

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function {\lambda}. One form of this conjecture is the assertion that

\displaystyle  \int_0^X \| \lambda \|_{U^k([x,x+H])}\ dx = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any fixed {k \geq 0} and any {H = H(X) \leq X} that goes to infinity as {X \rightarrow \infty}, where {U^k([x,x+H])} is the (normalized) Gowers uniformity norm. Among other things this conjecture implies (logarithmically averaged version of) the Chowla and Sarnak conjectures for the Liouville function (or the Möbius function), see this previous blog post.

The conjecture gets more difficult as {k} increases, and also becomes more difficult the more slowly {H} grows with {X}. The {k=0} conjecture is equivalent to the assertion

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n)| \ dx = o(HX)

which was proven (for arbitrarily slowly growing {H}) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.

For {k=1}, the conjecture is equivalent to the assertion

\displaystyle  \int_0^X \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)| \ dx = o(HX). \ \ \ \ \ (2)

This remains open for sufficiently slowly growing {H} (and it would be a major breakthrough in particular if one could obtain this bound for {H} as small as {\log^\varepsilon X} for any fixed {\varepsilon>0}, particularly if applicable to more general bounded multiplicative functions than {\lambda}, as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range {H \geq X^\varepsilon} (in fact we have since worked out in the current paper that we can get {H} as small as {\exp(\log^{5/8+\varepsilon} X)}). In our current paper we establish Fourier uniformity conjecture for higher {k} for the same range of {H}. This in particular implies local orthogonality to polynomial phases,

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) e(-P(n))| \ dx = o(HX) \ \ \ \ \ (3)

where {\mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} denotes the polynomials of degree at most {k-1}, but the full conjecture is a bit stronger than this, establishing the more general statement

\displaystyle  \int_0^X \sup_{g \in \mathrm{Poly}({\bf R} \rightarrow G)} |\sum_{x \leq n \leq x+H} \lambda(n) \overline{F}(g(n) \Gamma)| \ dx = o(HX) \ \ \ \ \ (4)

for any degree {k} filtered nilmanifold {G/\Gamma} and Lipschitz function {F: G/\Gamma \rightarrow {\bf C}}, where {g} now ranges over polynomial maps from {{\bf R}} to {G}. The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-P_x(n))| \gg H

for many {x \sim X} and some polynomial {P_x} depending on {x}. The difficulty here is to understand how {P_x} can depend on {x}. We write the above correlation estimate more suggestively as

\displaystyle  \lambda(n) \sim_{[x,x+H]} e(P_x(n)).

Because of the multiplicativity {\lambda(np) = -\lambda(p)} at small primes {p}, one expects to have a relation of the form

\displaystyle  e(P_{x'}(p'n)) \sim_{[x/p,x/p+H/p]} e(P_x(pn)) \ \ \ \ \ (5)

for many {x,x'} for which {x/p \approx x'/p'} for some small primes {p,p'}. (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between {P_x} and {P_{x'}} for “edges” {x,x'} in a rather sparse “graph” connecting the elements of say {[X/2,X]}. Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the {P_x} to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form

\displaystyle  P_x(a_x \cdot) \approx P_x(b_x \cdot)

for some large and close (but not identical) integers {a_x,b_x}, where {\approx} should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials {P_x} should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form

\displaystyle  P_x(t) \approx T_x \log t

for some real number {T_x} of polynomial size, and with further analysis of the relation (5) one can make {T_x} basically independent of {x}. This simplifies (3) to something like

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}| \ dx = o(HX)

and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because {n \mapsto \lambda(n) n^{-iT}} is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)

Now we apply the same strategy to (4). For abelian {G} the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence {g_x \in \mathrm{Poly}({\bf R} \rightarrow G)} attached to many {x \sim X}, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

\displaystyle  g_x(a_x \cdot) \Gamma \approx g_x(b_x \cdot) \Gamma \ \ \ \ \ (6)

where the relation {\approx} is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as

\displaystyle  g_x(t) = \gamma^{\frac{\log(a_x t)}{\log(a_x/b_x)}}

for some {\gamma \in \Gamma}, which do not necessarily lead to multiplicative characters like {n^{-iT}} as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of {e(\lfloor \alpha \log n \rfloor \beta \log n)} as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each {x}, but many, leading to dilation invariances

\displaystyle  g_x((1+\theta) t) \Gamma \approx g_x(t) \Gamma

for a “dense” set of {\theta}. From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that {G} is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) {g_x(t)} must behave like {\gamma_x^{\log t}} for some central element {\gamma_x} of {G}. This eventually brings one back to the multiplicative characters {n^{-iT}} that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

\displaystyle  s(k) := |\{ (\lambda(n+1),\dots,\lambda(n+k)): n \in {\bf N} \}|

of length {k} sign patterns in the Liouville function. The Chowla conjecture implies that {s(k) = 2^k}, but even the weaker conjecture of Sarnak that {s(k) \gg (1+\varepsilon)^k} for some {\varepsilon>0} remains open. Until recently, the best asymptotic lower bound on {s(k)} was {s(k) \gg k^2}, due to McNamara; with our result, we can now show {s(k) \gg_A k^A} for any {A} (in fact we can get {s(k) \gg_\varepsilon \exp(\log^{8/5-\varepsilon} k)} for any {\varepsilon>0}). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor {p 1_{p|n}-1}). However the observation is that if there are extremely few sign patterns of length {k}, then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length {k} sign patterns. If there are not extremely few sign patterns of length {k} then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.

The second application is to obtain cancellation for various polynomial averages involving the Liouville function {\lambda} or von Mangoldt function {\Lambda}, such as

\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \lambda(n+P_2(m)) \dots \lambda(n+P_k(m))

or

\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \Lambda(n+P_2(m)) \dots \Lambda(n+P_k(m))

where {P_1,\dots,P_k} are polynomials of degree at most {d}, no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials {P} had distinct degrees, in which one could use the {k=0} theory of Matomäki and Radziwill; now that higher {k} is available at the scale {H=X^{1/d}} we can now remove this restriction.

Define the Collatz map {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} on the natural numbers {{\bf N}+1 = \{1,2,\dots\}} by setting {\mathrm{Col}(N)} to equal {3N+1} when {N} is odd and {N/2} when {N} is even, and let {\mathrm{Col}^{\bf N}(N) := \{ N, \mathrm{Col}(N), \mathrm{Col}^2(N), \dots \}} denote the forward Collatz orbit of {N}. The notorious Collatz conjecture asserts that {1 \in \mathrm{Col}^{\bf N}(N)} for all {N \in {\bf N}+1}. Equivalently, if we define the backwards Collatz orbit {(\mathrm{Col}^{\bf N})^*(N) := \{ M \in {\bf N}+1: N \in \mathrm{Col}^{\bf N}(M) \}} to be all the natural numbers {M} that encounter {N} in their forward Collatz orbit, then the Collatz conjecture asserts that {(\mathrm{Col}^{\bf N})^*(1) = {\bf N}+1}. As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (1)

 

for all {x \geq 1} and {\gamma = 0.84}. (This improved upon previous values of {\gamma = 0.81} obtained by Applegate and Lagarias in 1995, {\gamma = 0.65} by Applegate and Lagarias in 1995 by a different method, {\gamma=0.48} by Wirsching in 1993, {\gamma=0.43} by Krasikov in 1989, {\gamma=0.3} by Sander in 1990, and some {\gamma>0} by Crandall in 1978.) This is still the largest value of {\gamma} for which (1) has been established. Of course, the Collatz conjecture would imply that we can take {\gamma} equal to {1}, which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the {\gamma=1} case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to {1}. In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called Syracuse random variables.

Definition 1 (Syracuse random variables) For any natural number {n}, a Syracuse random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on the cyclic group {{\bf Z}/3^n{\bf Z}} is defined as a random variable of the form

\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = \sum_{m=1}^n 3^{n-m} 2^{-{\mathbf a}_m-\dots-{\mathbf a}_n} \ \ \ \ \ (2)

 

where {\mathbf{a}_1,\dots,\mathbf{a_n}} are independent copies of a geometric random variable {\mathbf{Geom}(2)} on the natural numbers with mean {2}, thus

\displaystyle \mathop{\bf P}( \mathbf{a}_1=a_1,\dots,\mathbf{a}_n=a_n) = 2^{-a_1-\dots-a_n}

} for {a_1,\dots,a_n \in {\bf N}+1}. In (2) the arithmetic is performed in the ring {{\bf Z}/3^n{\bf Z}}.

Thus for instance

\displaystyle \mathrm{Syrac}({\bf Z}/3{\bf Z}) = 2^{-\mathbf{a}_1} \hbox{ mod } 3

\displaystyle \mathrm{Syrac}({\bf Z}/3^2{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2} + 3 \times 2^{-\mathbf{a}_2} \hbox{ mod } 3^2

\displaystyle \mathrm{Syrac}({\bf Z}/3^3{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2-\mathbf{a}_3} + 3 \times 2^{-\mathbf{a}_2-\mathbf{a}_3} + 3^2 \times 2^{-\mathbf{a}_3} \hbox{ mod } 3^3

and so forth. After reversing the labeling of the {\mathbf{a}_1,\dots,\mathbf{a}_n}, one could also view {\mathrm{Syrac}({\bf Z}/3^n{\bf Z})} as the mod {3^n} reduction of a {3}-adic random variable

\displaystyle \mathbf{Syrac}({\bf Z}_3) = \sum_{m=1}^\infty 3^{m-1} 2^{-{\mathbf a}_1-\dots-{\mathbf a}_m}.

The probability density function {b \mapsto \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b )} of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when {n=1}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3{\bf Z}) = b )} is equal to {0,1/3,2/3} for {x=b,1,2 \hbox{ mod } 3} respectively, while when {n=2}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = b )} is equal to

\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

when {b=0,\dots,8 \hbox{ mod } 9} respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let {2{\bf N}+1 = \{1,3,5,\dots\}} denote the odd natural numbers, and define the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1} by

\displaystyle \mathrm{Syr}(N) := \frac{3n+1}{2^{\nu_2(3N+1)}}

where the {2}valuation {\nu_2(3n+1) \in {\bf N}} is the number of times {2} divides {3N+1}. We can define the forward orbit {\mathrm{Syr}^{\bf N}(n)} and backward orbit {(\mathrm{Syr}^{\bf N})^*(N)} of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion {(\mathrm{Syr}^{\bf N})^*(1) = 2{\bf N}+1}, and that the assertion (1) for a given {\gamma} is equivalent to the assertion

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (3)

 

for all {x \geq 1}, where {N} is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number {N} and natural number {n}, one has

\displaystyle \mathrm{Syr}^n(N) = 3^n 2^{-a_1-\dots-a_n} N + \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n}

where the natural numbers {a_1,\dots,a_n} are defined by the formula

\displaystyle a_i := \nu_2( 3 \mathrm{Syr}^{i-1}(N) + 1 ),

so in particular

\displaystyle \mathrm{Syr}^n(N) = \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n} \hbox{ mod } 3^n.

Heuristically, one expects the {2}-valuation {a = \nu_2(N)} of a typical odd number {N} to be approximately distributed according to the geometric distribution {\mathbf{Geom}(2)}, so one therefore expects the residue class {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} to be distributed approximately according to the random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}.

The Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} will always avoid multiples of three (this reflects the fact that {\mathrm{Syr}(N)} is never a multiple of three), but attains any non-multiple of three in {{\bf Z}/3^n{\bf Z}} with positive probability. For any natural number {n}, set

\displaystyle c_n := \inf_{b \in {\bf Z}/3^n{\bf Z}: 3 \nmid b} \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b ).

Equivalently, {c_n} is the greatest quantity for which we have the inequality

\displaystyle \sum_{(a_1,\dots,a_n) \in S_{n,N}} 2^{-a_1-\dots-a_m} \geq c_n \ \ \ \ \ (4)

 

for all integers {N} not divisible by three, where {S_{n,N} \subset ({\bf N}+1)^n} is the set of all tuples {(a_1,\dots,a_n)} for which

\displaystyle N = \sum_{m=1}^n 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n.

Thus for instance {c_0=1}, {c_1 = 1/3}, and {c_2 = 2/63}. On the other hand, since all the probabilities {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b)} sum to {1} as {b \in {\bf Z}/3^n{\bf Z}} ranges over the non-multiples of {3}, we have the trivial upper bound

\displaystyle c_n \leq \frac{3}{2} 3^{-n}.

There is also an easy submultiplicativity result:

Lemma 2 For any natural numbers {n_1,n_2}, we have

\displaystyle c_{n_1+n_2-1} \geq c_{n_1} c_{n_2}.

Proof: Let {N} be an integer not divisible by {3}, then by (4) we have

\displaystyle \sum_{(a_1,\dots,a_{n_1}) \in S_{n_1,N}} 2^{-a_1-\dots-a_{n_1}} \geq c_{n_1}.

If we let {S'_{n_1,N}} denote the set of tuples {(a_1,\dots,a_{n_1-1})} that can be formed from the tuples in {S_{n_1,N}} by deleting the final component {a_{n_1}} from each tuple, then we have

\displaystyle \sum_{(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}} 2^{-a_1-\dots-a_{n_1-1}} \geq c_{n_1}. \ \ \ \ \ (5)

 

Next, observe that if {(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}}, then

\displaystyle N = \sum_{m=1}^{n_1-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_1-1} 2^{-a_1-\dots-a_{n_1-1}} M

with {M = M_{N,n_1,a_1,\dots,a_{n_1-1}}} an integer not divisible by three. By definition of {S_{n_2,M}} and a relabeling, we then have

\displaystyle M = \sum_{m=1}^{n_2} 3^{m-1} 2^{-a_{n_1}-\dots-a_{m+n_1-1}} \hbox{ mod } 3^{n_2}

for all {(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}}. For such tuples we then have

\displaystyle N = \sum_{m=1}^{n_1+n_2-1} 3^{m-1} 2^{-a_1-\dots-a_{n_1+n_2-1}} \hbox{ mod } 3^{n_1+n_2-1}

so that {(a_1,\dots,a_{n_1+n_2-1}) \in S_{n_1+n_2-1,N}}. Since

\displaystyle \sum_{(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}} 2^{-a_{n_1}-\dots-a_{n_1+n_2-1}} \geq c_{n_2}

for each {M}, the claim follows. \Box

From this lemma we see that {c_n = 3^{-\beta n + o(n)}} for some absolute constant {\beta \geq 1}. Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of {{\bf Z}/3^n{\bf Z}} (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that {\beta=1}. I cannot prove this, but I can show that this conjecture would imply that we can take the exponent {\gamma} in (1), (3) arbitrarily close to one:

Proposition 3 Suppose that {\beta=1} (that is to say, {c_n = 3^{-n+o(n)}} as {n \rightarrow \infty}). Then

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}, or equivalently

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}. In other words, (1), (3) hold for all {\gamma < 1}.

I prove this proposition below the fold. A variant of the argument shows that for any value of {\beta}, (1), (3) holds whenever {\gamma < f(\beta)}, where {f: [0,1] \rightarrow [0,1]} is an explicitly computable function with {f(\beta) \rightarrow 1} as {\beta \rightarrow 1}. In principle, one could then improve the Krasikov-Lagarias result {\gamma = 0.84} by getting a sufficiently good upper bound on {\beta}, which is in principle achievable numerically (note for instance that Lemma 2 implies the bound {c_n \leq 3^{-\beta(n-1)}} for any {n}, since {c_{kn-k+1} \geq c_n^k} for any {k}).

Read the rest of this entry »

Archives