You are currently browsing the category archive for the ‘math.NT’ category.

A basic estimate in multiplicative number theory (particularly if one is using the Granville-Soundararajan “pretentious” approach to this subject) is the following inequality of Halasz (formulated here in a quantitative form introduced by Montgomery and Tenenbaum).

Theorem 1 (Halasz inequality) Let {f: {\bf N} \rightarrow {\bf C}} be a multiplicative function bounded in magnitude by {1}, and suppose that {x \geq 3}, {T \geq 1}, and { M \geq 0} are such that

\displaystyle  \sum_{p \leq x} \frac{1 - \hbox{Re}(f(p) p^{-it})}{p} \geq M \ \ \ \ \ (1)

for all real numbers {t} with {|t| \leq T}. Then

\displaystyle  \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{\sqrt{T}}.

As a qualitative corollary, we conclude (by standard compactness arguments) that if

\displaystyle  \sum_{p} \frac{1 - \hbox{Re}(f(p) p^{-it})}{p} = +\infty

for all real {t}, then

\displaystyle  \frac{1}{x} \sum_{n \leq x} f(n) = o(1) \ \ \ \ \ (2)

as {x \rightarrow \infty}. In the more recent work of this paper of Granville and Soundararajan, the sharper bound

\displaystyle  \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{T} + \frac{\log\log x}{\log x}

is obtained (with a more precise description of the {(1+M) e^{-M}} term).

The usual proofs of Halasz’s theorem are somewhat lengthy (though there has been a recent simplification, in forthcoming work of Granville, Harper, and Soundarajan). Below the fold I would like to give a relatively short proof of the following “cheap” version of the inequality, which has slightly weaker quantitative bounds, but still suffices to give qualitative conclusions such as (2).

Theorem 2 (Cheap Halasz inequality) Let {f: {\bf N} \rightarrow {\bf C}} be a multiplicative function bounded in magnitude by {1}. Let {T \geq 1} and {M \geq 0}, and suppose that {x} is sufficiently large depending on {T,M}. If (1) holds for all {|t| \leq T}, then

\displaystyle  \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M/2} + \frac{1}{T}.

The non-optimal exponent {1/2} can probably be improved a bit by being more careful with the exponents, but I did not try to optimise it here. A similar bound appears in the first paper of Halasz on this topic.

The idea of the argument is to split {f} as a Dirichlet convolution {f = f_1 * f_2 * f_3} where {f_1,f_2,f_3} is the portion of {f} coming from “small”, “medium”, and “large” primes respectively (with the dividing line between the three types of primes being given by various powers of {x}). Using a Perron-type formula, one can express this convolution in terms of the product of the Dirichlet series of {f_1,f_2,f_3} respectively at various complex numbers {1+it} with {|t| \leq T}. One can use {L^2} based estimates to control the Dirichlet series of {f_2,f_3}, while using the hypothesis (1) one can get {L^\infty} estimates on the Dirichlet series of {f_1}. (This is similar to the Fourier-analytic approach to ternary additive problems, such as Vinogradov’s theorem on representing large odd numbers as the sum of three primes.) This idea was inspired by a similar device used in the work of Granville, Harper, and Soundarajan. A variant of this argument also appears in unpublished work of Adam Harper.

I thank Andrew Granville for helpful comments which led to significant simplifications of the argument.

Read the rest of this entry »

Kevin Ford, James Maynard, and I have uploaded to the arXiv our preprint “Chains of large gaps between primes“. This paper was announced in our previous paper with Konyagin and Green, which was concerned with the largest gap

\displaystyle  G_1(X) := \max_{p_n, p_{n+1} \leq X} (p_{n+1} - p_n)

between consecutive primes up to {X}, in which we improved the Rankin bound of

\displaystyle  G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}


\displaystyle  G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{\log_3 X}

for large {X} (where we use the abbreviations {\log_2 X := \log\log X}, {\log_3 X := \log\log\log X}, and {\log_4 X := \log\log\log\log X}). Here, we obtain an analogous result for the quantity

\displaystyle  G_k(X) := \max_{p_n, \dots, p_{n+k} \leq X} \min( p_{n+1} - p_n, p_{n+2}-p_{n+1}, \dots, p_{n+k} - p_{n+k-1} )

which measures how far apart the gaps between chains of {k} consecutive primes can be. Our main result is

\displaystyle  G_k(X) \gg \frac{1}{k^2} \log X \frac{\log_2 X \log_4 X}{\log_3 X}

whenever {X} is sufficiently large depending on {k}, with the implied constant here absolute (and effective). The factor of {1/k^2} is inherent to the method, and related to the basic probabilistic fact that if one selects {k} numbers at random from the unit interval {[0,1]}, then one expects the minimum gap between adjacent numbers to be about {1/k^2} (i.e. smaller than the mean spacing of {1/k} by an additional factor of {1/k}).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

\displaystyle  G_k(X) \gg_k  \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}

for an infinite sequence of {X} going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval {n+1,\dots,n+y} which contained a block of at least {k} primes, each separated from each other by at least {g} (ultimately, {y} will be something like {\log X \frac{\log_2 X \log_4 X}{\log_3 X}} and {g} something like {y/k^2}). One can do this by the probabilistic method: pick {n} to be a random large natural number {{\mathbf n}} (with the precise distribution to be chosen later), and try to lower bound the probability that the interval {{\mathbf n}+1,\dots,{\mathbf n}+y} contains at least {k} primes, no two of which are within {g} of each other.

By carefully choosing the residue class of {{\mathbf n}} with respect to small primes, one can eliminate several of the {{\mathbf n}+j} from consideration of being prime immediately. For instance, if {{\mathbf n}} is chosen to be large and even, then the {{\mathbf n}+j} with {j} even have no chance of being prime and can thus be eliminated; similarly if {{\mathbf n}} is large and odd, then {{\mathbf n}+j} cannot be prime for any odd {j}. Using the methods of our previous paper, we can find a residue class {m \hbox{ mod } P} (where {P} is a product of a large number of primes) such that, if one chooses {{\mathbf n}} to be a large random element of {m \hbox{ mod } P} (that is, {{\mathbf n} = {\mathbf z} P + m} for some large random integer {{\mathbf z}}), then the set {{\mathcal T}} of shifts {j \in \{1,\dots,y\}} for which {{\mathbf n}+j} still has a chance of being prime has size comparable to something like {k \log X / \log_2 X}; furthermore this set {{\mathcal T}} is fairly well distributed in {\{1,\dots,y\}} in the sense that it does not concentrate too strongly in any short subinterval of {\{1,\dots,y\}}. The main new difficulty, not present in the previous paper, is to get lower bounds on the size of {{\mathcal T}} in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift {j \in {\mathcal T}}, {{\mathbf n}+j} is going to be prime with probability comparable to {\log_2 X / \log X}, so one expects about {k} primes in the set {\{{\mathbf n} + j: j \in {\mathcal T}\}}. An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct {j,j' \in {\mathcal T}}, the probability that {{\mathbf n}+j} and {{\mathbf n}+j'} are both prime is {O( (\log_2 X / \log X)^2 )}. Using this and some routine second moment calculations, one can then show that with large probability, the set {\{{\mathbf n} + j: j \in {\mathcal T}\}} will indeed contain about {k} primes, no two of which are closer than {g} to each other; with no other numbers in this interval being prime, this gives a lower bound on {G_k(X)}.

Klaus Roth, who made fundamental contributions to analytic number theory, died this Tuesday, aged 90.

I never met or communicated with Roth personally, but was certainly influenced by his work; he wrote relatively few papers, but they tended to have outsized impact. For instance, he was one of the key people (together with Bombieri) to work on simplifying and generalising the large sieve, taking it from the technically formidable original formulation of Linnik and Rényi to the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper proving what is now known as Roth’s theorem on arithmetic progressions:

Theorem 1 (Roth’s theorem on arithmetic progressions) Let {A} be a set of natural numbers of positive upper density (thus {\limsup_{N \rightarrow\infty} |A \cap \{1,\dots,N\}|/N > 0}). Then {A} contains infinitely many arithmetic progressions {a,a+r,a+2r} of length three (with {r} non-zero of course).

At the heart of Roth’s elegant argument was the following (surprising at the time) dichotomy: if {A} had some moderately large density within some arithmetic progression {P}, either one could use Fourier-analytic methods to detect the presence of an arithmetic progression of length three inside {A \cap P}, or else one could locate a long subprogression {P'} of {P} on which {A} had increased density. Iterating this dichotomy by an argument now known as the density increment argument, one eventually obtains Roth’s theorem, no matter which side of the dichotomy actually holds. This argument (and the many descendants of it), based on various “dichotomies between structure and randomness”, became essential in many other results of this type, most famously perhaps in Szemerédi’s proof of his celebrated theorem on arithmetic progressions that generalised Roth’s theorem to progressions of arbitrary length. More recently, my recent work on the Chowla and Elliott conjectures that was a crucial component of the solution of the Erdös discrepancy problem, relies on an entropy decrement argument which was directly inspired by the density increment argument of Roth.

The Erdös discrepancy problem also is connected with another well known theorem of Roth:

Theorem 2 (Roth’s discrepancy theorem for arithmetic progressions) Let {f(1),\dots,f(n)} be a sequence in {\{-1,+1\}}. Then there exists an arithmetic progression {a+r, a+2r, \dots, a+kr} in {\{1,\dots,n\}} with {r} positive such that

\displaystyle  |\sum_{j=1}^k f(a+jr)| \geq c n^{1/4}

for an absolute constant {c>0}.

In fact, Roth proved a stronger estimate regarding mean square discrepancy, which I am not writing down here; as with the Roth theorem in arithmetic progressions, his proof was short and Fourier-analytic in nature (although non-Fourier-analytic proofs have since been found, for instance the semidefinite programming proof of Lovasz). The exponent {1/4} is known to be sharp (a result of Matousek and Spencer).

As a particular corollary of the above theorem, for an infinite sequence {f(1), f(2), \dots} of signs, the sums {|\sum_{j=1}^k f(a+jr)|} are unbounded in {a,r,k}. The Erdös discrepancy problem asks whether the same statement holds when {a} is restricted to be zero. (Roth also established discrepancy theorems for other sets, such as rectangles, which will not be discussed here.)

Finally, one has to mention Roth’s most famous result, cited for instance in his Fields medal citation:

Theorem 3 (Roth’s theorem on Diophantine approximation) Let {\alpha} be an irrational algebraic number. Then for any {\varepsilon > 0} there is a quantity {c_{\alpha,\varepsilon}} such that

\displaystyle  |\alpha - \frac{a}{q}| > \frac{c_{\alpha,\varepsilon}}{q^{2+\varepsilon}}.

From the Dirichlet approximation theorem (or from the theory of continued fractions) we know that the exponent {2+\varepsilon} in the denominator cannot be reduced to {2} or below. A classical and easy theorem of Liouville gives the claim with the exponent {2+\varepsilon} replaced by the degree of the algebraic number {\alpha}; work of Thue and Siegel reduced this exponent, but Roth was the one who obtained the near-optimal result. An important point is that the constant {c_{\alpha,\varepsilon}} is ineffective – it is a major open problem in Diophantine approximation to produce any bound significantly stronger than Liouville’s theorem with effective constants. This is because the proof of Roth’s theorem does not exclude any single rational {a/q} from being close to {\alpha}, but instead very ingeniously shows that one cannot have two different rationals {a/q}, {a'/q'} that are unusually close to {\alpha}, even when the denominators {q,q'} are very different in size. (I refer to this sort of argument as a “dueling conspiracies” argument; they are strangely prevalent throughout analytic number theory.)

Chantal David, Andrew Granville, Emmanuel Kowalski, Phillipe Michel, Kannan Soundararajan, and I are running a program at MSRI in the Spring of 2017 (more precisely, from Jan 17, 2017 to May 26, 2017) in the area of analytic number theory, with the intention to bringing together many of the leading experts in all aspects of the subject and to present recent work on the many active areas of the subject (e.g. the distribution of the prime numbers, refinements of the circle method, a deeper understanding of the asymptotics of bounded multiplicative functions (and applications to Erdos discrepancy type problems!) and of the “pretentious” approach to analytic number theory, more “analysis-friendly” formulations of the theorems of Deligne and others involving trace functions over fields, and new subconvexity theorems for automorphic forms, to name a few).  Like any other semester MSRI program, there will be a number of workshops, seminars, and similar activities taking place while the members are in residence.  I’m personally looking forward to the program, which should be occurring in the midst of a particularly productive time for the subject.  Needless to say, I (and the rest of the organising committee) plan to be present for most of the program.

Applications for Postdoctoral Fellowships and Research Memberships for this program (and for other MSRI programs in this time period, namely the companion program in Harmonic Analysis and the Fall program in Geometric Group Theory, as well as the complementary program in all other areas of mathematics) remain open until Dec 1.  Applications are open to everyone, but require supporting documentation, such as a CV, statement of purpose, and letters of recommendation from other mathematicians; see the application page for more details.

The Chowla conjecture asserts, among other things, that one has the asymptotic

\displaystyle \frac{1}{X} \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(1)

as {X \rightarrow \infty} for any distinct integers {h_1,\dots,h_k}, where {\lambda} is the Liouville function. (The usual formulation of the conjecture also allows one to consider more general linear forms {a_i n + b_i} than the shifts {n+h_i}, but for sake of discussion let us focus on the shift case.) This conjecture remains open for {k \geq 2}, though there are now some partial results when one averages either in {x} or in the {h_1,\dots,h_k}, as discussed in this recent post.

A natural generalisation of the Chowla conjecture is the Elliott conjecture. Its original formulation was basically as follows: one had

\displaystyle \frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k) = o(1) \ \ \ \ \ (1)

whenever {g_1,\dots,g_k} were bounded completely multiplicative functions and {h_1,\dots,h_k} were distinct integers, and one of the {g_i} was “non-pretentious” in the sense that

\displaystyle \sum_p \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} = +\infty \ \ \ \ \ (2)

for all Dirichlet characters {\chi} and real numbers {t}. It is easy to see that some condition like (2) is necessary; for instance if {g(n) := \chi(n) n^{it}} and {\chi} has period {q} then {\frac{1}{X} \sum_{n \leq X} g(n+q) \overline{g(n)}} can be verified to be bounded away from zero as {X \rightarrow \infty}.

In a previous paper with Matomaki and Radziwill, we provided a counterexample to the original formulation of the Elliott conjecture, and proposed that (2) be replaced with the stronger condition

\displaystyle \inf_{|t| \leq X} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \rightarrow +\infty \ \ \ \ \ (3)

as {X \rightarrow \infty} for any Dirichlet character {\chi}. To support this conjecture, we proved an averaged and non-asymptotic version of this conjecture which roughly speaking showed a bound of the form

\displaystyle \frac{1}{H^k} \sum_{h_1,\dots,h_k \leq H} |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon

whenever {H} was an arbitrarily slowly growing function of {X}, {X} was sufficiently large (depending on {\varepsilon,k} and the rate at which {H} grows), and one of the {g_i} obeyed the condition

\displaystyle \inf_{|t| \leq AX} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \geq A \ \ \ \ \ (4)

for some {A} that was sufficiently large depending on {k,\varepsilon}, and all Dirichlet characters {\chi} of period at most {A}. As further support of this conjecture, I recently established the bound

\displaystyle \frac{1}{\log \omega} |\sum_{X/\omega \leq n \leq X} \frac{g_1(n+h_1) g_2(n+h_2)}{n}| \leq \varepsilon

under the same hypotheses, where {\omega} is an arbitrarily slowly growing function of {X}.

In view of these results, it is tempting to conjecture that the condition (4) for one of the {g_i} should be sufficient to obtain the bound

\displaystyle |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon

when {A} is large enough depending on {k,\varepsilon}. This may well be the case for {k=2}. However, the purpose of this blog post is to record a simple counterexample for {k>2}. Let’s take {k=3} for simplicity. Let {t_0} be a quantity much larger than {X} but much smaller than {X^2} (e.g. {t = X^{3/2}}), and set

\displaystyle g_1(n) := n^{it_0}; \quad g_2(n) := n^{-2it_0}; \quad g_3(n) := n^{it_0}.

For {X/2 \leq n \leq X}, Taylor expansion gives

\displaystyle (n+1)^{it} = n^{it_0} \exp( i t_0 / n ) + o(1)


\displaystyle (n+2)^{it} = n^{it_0} \exp( 2 i t_0 / n ) + o(1)

and hence

\displaystyle g_1(n) g_2(n+1) g_3(n+2) = 1 + o(1)

and hence

\displaystyle |\frac{1}{X} \sum_{X/2 \leq n \leq X} g_1(n) g_2(n+1) g_3(n+2)| \gg 1.

On the other hand one can easily verify that all of the {g_1,g_2,g_3} obey (4) (the restriction {|t| \leq AX} there prevents {t} from getting anywhere close to {t_0}). So it seems the correct non-asymptotic version of the Elliott conjecture is the following:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let {k} be a natural number, and let {h_1,\dots,h_k} be integers. Let {\varepsilon > 0}, let {A} be sufficiently large depending on {k,\varepsilon,h_1,\dots,h_k}, and let {X} be sufficiently large depending on {k,\varepsilon,h_1,\dots,h_k,A}. Let {g_1,\dots,g_k} be bounded multiplicative functions such that for some {1 \leq i \leq k}, one has

\displaystyle \inf_{|t| \leq AX^{k-1}} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \geq A

for all Dirichlet characters {\chi} of conductor at most {A}. Then

\displaystyle |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon.

The {k=1} case of this conjecture follows from the work of Halasz; in my recent paper a logarithmically averaged version of the {k=2} case of this conjecture is established. The requirement to take {t} to be as large as {A X^{k-1}} does not emerge in the averaged Elliott conjecture in my previous paper with Matomaki and Radziwill; it thus seems that this averaging has concealed some of the subtler features of the Elliott conjecture. (However, this subtlety does not seem to affect the asymptotic version of the conjecture formulated in that paper, in which the hypothesis is of the form (3), and the conclusion is of the form (1).)

A similar subtlety arises when trying to control the maximal integral

\displaystyle \frac{1}{X} \int_X^{2X} \sup_\alpha \frac{1}{H} |\sum_{x \leq n \leq x+H} g(n) e(\alpha n)|\ dx. \ \ \ \ \ (5)

In my previous paper with Matomaki and Radziwill, we could show that easier expression

\displaystyle \frac{1}{X} \sup_\alpha \int_X^{2X} \frac{1}{H} |\sum_{x \leq n \leq x+H} g(n) e(\alpha n)|\ dx. \ \ \ \ \ (6)

was small (for {H} a slowly growing function of {X}) if {g} was bounded and completely multiplicative, and one had a condition of the form

\displaystyle \inf_{|t| \leq AX} \sum_{p \leq X} \frac{1 - \hbox{Re}( g(p) \overline{\chi(p)} p^{-it})}{p} \geq A \ \ \ \ \ (7)

for some large {A}. However, to obtain an analogous bound for (5) it now appears that one needs to strengthen the above condition to

\displaystyle \inf_{|t| \leq AX^2} \sum_{p \leq X} \frac{1 - \hbox{Re}( g(p) \overline{\chi(p)} p^{-it})}{p} \geq A

in order to address the counterexample in which {g(n) = n^{it_0}} for some {t_0} between {X} and {X^2}. This seems to suggest that proving (5) (which is closely related to the {k=3} case of the Chowla conjecture) could in fact be rather difficult; the estimation of (6) relied primarily of prior work of Matomaki and Radziwill which used the hypothesis (7), but as this hypothesis is not sufficient to conclude (5), some additional input must also be used.

Let {X} and {Y} be two random variables taking values in the same (discrete) range {R}, and let {E} be some subset of {R}, which we think of as the set of “bad” outcomes for either {X} or {Y}. If {X} and {Y} have the same probability distribution, then clearly

\displaystyle  {\bf P}( X \in E ) = {\bf P}( Y \in E ).

In particular, if it is rare for {Y} to lie in {E}, then it is also rare for {X} to lie in {E}.

If {X} and {Y} do not have exactly the same probability distribution, but their probability distributions are close to each other in some sense, then we can expect to have an approximate version of the above statement. For instance, from the definition of the total variation distance {\delta(X,Y)} between two random variables (or more precisely, the total variation distance between the probability distributions of two random variables), we see that

\displaystyle  {\bf P}(Y \in E) - \delta(X,Y) \leq {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \delta(X,Y) \ \ \ \ \ (1)

for any {E \subset R}. In particular, if it is rare for {Y} to lie in {E}, and {X,Y} are close in total variation, then it is also rare for {X} to lie in {E}.

A basic inequality in information theory is Pinsker’s inequality

\displaystyle  \delta(X,Y) \leq \sqrt{\frac{1}{2} D_{KL}(X||Y)}

where the Kullback-Leibler divergence {D_{KL}(X||Y)} is defined by the formula

\displaystyle  D_{KL}(X||Y) = \sum_{x \in R} {\bf P}( X=x ) \log \frac{{\bf P}(X=x)}{{\bf P}(Y=x)}.

(See this previous blog post for a proof of this inequality.) A standard application of Jensen’s inequality reveals that {D_{KL}(X||Y)} is non-negative (Gibbs’ inequality), and vanishes if and only if {X}, {Y} have the same distribution; thus one can think of {D_{KL}(X||Y)} as a measure of how close the distributions of {X} and {Y} are to each other, although one should caution that this is not a symmetric notion of distance, as {D_{KL}(X||Y) \neq D_{KL}(Y||X)} in general. Inserting Pinsker’s inequality into (1), we see for instance that

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \sqrt{\frac{1}{2} D_{KL}(X||Y)}.

Thus, if {X} is close to {Y} in the Kullback-Leibler sense, and it is rare for {Y} to lie in {E}, then it is rare for {X} to lie in {E} as well.

We can specialise this inequality to the case when {Y} a uniform random variable {U} on a finite range {R} of some cardinality {N}, in which case the Kullback-Leibler divergence {D_{KL}(X||U)} simplifies to

\displaystyle  D_{KL}(X||U) = \log N - {\bf H}(X)


\displaystyle  {\bf H}(X) := \sum_{x \in R} {\bf P}(X=x) \log \frac{1}{{\bf P}(X=x)}

is the Shannon entropy of {X}. Again, a routine application of Jensen’s inequality shows that {{\bf H}(X) \leq \log N}, with equality if and only if {X} is uniformly distributed on {R}. The above inequality then becomes

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(U \in E) + \sqrt{\frac{1}{2}(\log N - {\bf H}(X))}. \ \ \ \ \ (2)

Thus, if {E} is a small fraction of {R} (so that it is rare for {U} to lie in {E}), and the entropy of {X} is very close to the maximum possible value of {\log N}, then it is rare for {X} to lie in {E} also.

The inequality (2) is only useful when the entropy {{\bf H}(X)} is close to {\log N} in the sense that {{\bf H}(X) = \log N - O(1)}, otherwise the bound is worse than the trivial bound of {{\bf P}(X \in E) \leq 1}. In my recent paper on the Chowla and Elliott conjectures, I ended up using a variant of (2) which was still non-trivial when the entropy {{\bf H}(X)} was allowed to be smaller than {\log N - O(1)}. More precisely, I used the following simple inequality, which is implicit in the arguments of that paper but which I would like to make more explicit in this post:

Lemma 1 (Pinsker-type inequality) Let {X} be a random variable taking values in a finite range {R} of cardinality {N}, let {U} be a uniformly distributed random variable in {R}, and let {E} be a subset of {R}. Then

\displaystyle  {\bf P}(X \in E) \leq \frac{(\log N - {\bf H}(X)) + \log 2}{\log 1/{\bf P}(U \in E)}.

Proof: Consider the conditional entropy {{\bf H}(X | 1_{X \in E} )}. On the one hand, we have

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf H}(X, 1_{X \in E}) - {\bf H}(1_{X \in E} )

\displaystyle  = {\bf H}(X) - {\bf H}(1_{X \in E})

\displaystyle  \geq {\bf H}(X) - \log 2

by Jensen’s inequality. On the other hand, one has

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf P}(X \in E) {\bf H}(X | X \in E )

\displaystyle  + (1-{\bf P}(X \in E)) {\bf H}(X | X \not \in E)

\displaystyle  \leq {\bf P}(X \in E) \log |E| + (1-{\bf P}(X \in E)) \log N

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{N}{|E|}

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{1}{{\bf P}(U \in E)},

where we have again used Jensen’s inequality. Putting the two inequalities together, we obtain the claim. \Box

Remark 2 As noted in comments, this inequality can be viewed as a special case of the more general inequality

\displaystyle  {\bf P}(X \in E) \leq \frac{D(X||Y) + \log 2}{\log 1/{\bf P}(Y \in E)}

for arbitrary random variables {X,Y} taking values in the same discrete range {R}, which follows from the data processing inequality

\displaystyle  D( f(X)||f(Y)) \leq D(X|| Y)

for arbitrary functions {f}, applied to the indicator function {f = 1_E}. Indeed one has

\displaystyle  D( 1_E(X) || 1_E(Y) ) = {\bf P}(X \in E) \log \frac{{\bf P}(X \in E)}{{\bf P}(Y \in E)}

\displaystyle + {\bf P}(X \not \in E) \log \frac{{\bf P}(X \not \in E)}{{\bf P}(Y \not \in E)}

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - h( {\bf P}(X \in E) )

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - \log 2

where {h(u) := u \log \frac{1}{u} + (1-u) \log \frac{1}{1-u}} is the entropy function.

Thus, for instance, if one has

\displaystyle  {\bf H}(X) \geq \log N - o(K)


\displaystyle  {\bf P}(U \in E) \leq \exp( - K )

for some {K} much larger than {1} (so that {1/K = o(1)}), then

\displaystyle  {\bf P}(X \in E) = o(1).

More informally: if the entropy of {X} is somewhat close to the maximum possible value of {\log N}, and it is exponentially rare for a uniform variable to lie in {E}, then it is still somewhat rare for {X} to lie in {E}. The estimate given is close to sharp in this regime, as can be seen by calculating the entropy of a random variable {X} which is uniformly distributed inside a small set {E} with some probability {p} and uniformly distributed outside of {E} with probability {1-p}, for some parameter {0 \leq p \leq 1}.

It turns out that the above lemma combines well with concentration of measure estimates; in my paper, I used one of the simplest such estimates, namely Hoeffding’s inequality, but there are of course many other estimates of this type (see e.g. this previous blog post for some others). Roughly speaking, concentration of measure inequalities allow one to make approximations such as

\displaystyle  F(U) \approx {\bf E} F(U)

with exponentially high probability, where {U} is a uniform distribution and {F} is some reasonable function of {U}. Combining this with the above lemma, we can then obtain approximations of the form

\displaystyle  F(X) \approx {\bf E} F(U) \ \ \ \ \ (3)

with somewhat high probability, if the entropy of {X} is somewhat close to maximum. This observation, combined with an “entropy decrement argument” that allowed one to arrive at a situation in which the relevant random variable {X} did have a near-maximum entropy, is the key new idea in my recent paper; for instance, one can use the approximation (3) to obtain an approximation of the form

\displaystyle  \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+j+p) 1_{p|n+j}

\displaystyle  \approx \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+j+p)}{p}

for “most” choices of {n} and a suitable choice of {H} (with the latter being provided by the entropy decrement argument). The left-hand side is tied to Chowla-type sums such as {\sum_{n \leq x} \frac{\lambda(n)\lambda(n+1)}{n}} through the multiplicativity of {\lambda}, while the right-hand side, being a linear correlation involving two parameters {j,p} rather than just one, has “finite complexity” and can be treated by existing techniques such as the Hardy-Littlewood circle method. One could hope that one could similarly use approximations such as (3) in other problems in analytic number theory or combinatorics.

I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case {k=2} of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let {a_1,a_2} be natural numbers, and let {b_1,b_2} be integers such that {a_1 b_2 - a_2 b_1 \neq 0}. Let {1 \leq \omega(x) \leq x} be a quantity depending on {x} that goes to infinity as {x \rightarrow \infty}. Let {\lambda} denote the Liouville function. Then one has

\displaystyle  \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)

as {x \rightarrow \infty}.

Thus for instance one has

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)

For comparison, the non-averaged Chowla conjecture would imply that

\displaystyle  \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence {f(1), f(2), \dots} takes values in the unit sphere of an arbitrary Hilbert space, rather than in {\{-1,+1\}}.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity {\lambda(pn) = -\lambda(n)}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}

is also large and positive for all primes {p} that are not too large; note here how the logarithmic averaging allows us to leave the constraint {n \leq x} unchanged. Summing in {p}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}

is large and positive for any given set {{\mathcal P}} of medium-sized primes. By a standard averaging argument, this implies that

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)

is large for many choices of {n}, where {H} is a medium-sized parameter at our disposal to choose, and we take {{\mathcal P}} to be some set of primes that are somewhat smaller than {H}. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view {n} as a random variable, in which case (4) is essentially a bilinear sum of the random sequence {(\lambda(n+1),\dots,\lambda(n+H))} along a random graph {G_{n,H}} on {\{1,\dots,H\}}, in which two vertices {j, j+p} are connected if they differ by a prime {p} in {{\mathcal P}} that divides {n+j}. A key difficulty in controlling this sum is that for randomly chosen {n}, the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter {H} (in some suitable range {[H_-,H_+]}) for which the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence {(\lambda(n+1),\dots,\lambda(n+H))} has significant mutual information with {G_{n,H}}, then the entropy of the sequence {(\lambda(n+1),\dots,\lambda(n+H'))} for {H' > H} will grow a little slower than linearly, due to the fact that the graph {G_{n,H}} has zero entropy (knowledge of {G_{n,H}} more or less completely determines the shifts {G_{n+kH,H}} of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing {H} as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}}. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than {\log H} on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series {\sum_{j \geq 2} \frac{1}{j \log j \log\log j}} diverges, which is only barely true).

Once one locates a scale {H} with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

\displaystyle  \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

\displaystyle  \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},

one ends up having to consider expressions such as

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.

where {c_p} is the coefficient {c_p := \overline{g_1}(p) \overline{g_2}(p)}. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

\displaystyle  S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient {c_p} is identically {1}, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, {S(\alpha)} is large when {\alpha} lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions {g_1,g_2}, the coefficients {c_p} are more or less arbitrary; the large values of {S(\alpha)} are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations {\alpha} (up to the uncertainty mandated by the Heisenberg uncertainty principle) where {S(\alpha)} is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the {L^4} restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+p+2j)}{p}.

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

\displaystyle  \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)

where {X} goes to infinity and {H} is a very slowly growing function of {X}. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the {k=3} case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher {k} than {k=3}, the same line of analysis requires one to replace the linear phase {e(\alpha n)} by more complicated phases, such as quadratic phases {e(\alpha n^2 + \beta n)} or even {k-2}-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all {k}, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale {H} that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

The Chowla conjecture asserts that all non-trivial correlations of the Liouville function are asymptotically negligible; for instance, it asserts that

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+h) = o(X)

as {X \rightarrow \infty} for any fixed natural number {h}. This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h).

For such correlations, the conjecture was that one had

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h) = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any natural number {h}, as long as {g} was a completely multiplicative function with magnitude bounded by {1}, and such that

\displaystyle  \sum_p \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p} = +\infty \ \ \ \ \ (2)

for any Dirichlet character {\chi} and any real number {t}. In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function {g} does not “pretend” to be like the completely multiplicative function {n \mapsto \chi(n) n^{it}} for any character {\chi} and real number {t}. A condition of this form is necessary; for instance, if {g(n)} is precisely equal to {\chi(n) n^{it}} and {\chi} has period {q}, then {g(n) \overline{g}(n+q)} is equal to {1_{(n,q)=1} + o(1)} as {n \rightarrow \infty} and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample {g} having the property that {g} “pretends” locally to be the function {n \mapsto n^{it_j}} for {n} in various intervals {[1, X_j]}, where {X_j} and {t_j} go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when {g} is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

\displaystyle  \sup_{|t| \leq X} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \rightarrow +\infty

as {X \rightarrow +\infty} for all fixed Dirichlet characters {\chi}. In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the {X} parameter does not go to infinity, or equivalently that the function {g} is permitted to depend on {X}:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let {\varepsilon > 0}, let {A \geq 1} be sufficiently large depending on {\varepsilon}, and let {X} be sufficiently large depending on {A,\varepsilon}. Suppose that {g} is a completely multiplicative function with magnitude bounded by {1}, such that

\displaystyle  \inf_{|t| \leq AX} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \geq A

for all Dirichlet characters {\chi} of period at most {A}. Then one has

\displaystyle  |\sum_{n \leq X} g(n) \overline{g(n+h)}| \leq \varepsilon X

for all natural numbers {1 \leq h \leq 1/\varepsilon}.

The {\varepsilon}-dependent factor {A} in the constraint {|t| \leq AX} is necessary, as can be seen by considering the completely multiplicative function {g(n) := n^{2iX}} (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture) For any function {f: {\bf N} \rightarrow \{-1,+1\}}, the discrepancy

\displaystyle  \sup_{n,d \in {\bf N}} |\sum_{j=1}^n f(jd)|

is infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character {f(n) = \chi(n)} that takes values in {\{-1,0,+1\}} rather than {\{-1,+1\}}. For this function, one has from the complete multiplicativity of {\chi} that

\displaystyle  |\sum_{j=1}^n f(jd)| = |\sum_{j=1}^n \chi(j) \chi(d)|

\displaystyle  \leq |\sum_{j=1}^n \chi(j)|.

If {q} denotes the period of {\chi}, then {\chi} has mean zero on every interval of length {q}, and thus

\displaystyle  |\sum_{j=1}^n f(jd)| \leq |\sum_{j=1}^n \chi(j)| \leq q.

Thus {\chi} has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because {\chi} can take the value {0}. Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let {\chi = \chi_3} be the non-principal Dirichlet character of period {3} (thus {\chi(n)} equals {+1} when {n=1 \hbox{ mod } 3}, {-1} when {n = 2 \hbox{ mod } 3}, and {0} when {n = 0 \hbox{ mod } 3}), and define the completely multiplicative function {f = \tilde \chi: {\bf N} \rightarrow \{-1,+1\}} by setting {\tilde \chi(p) := \chi(p)} when {p \neq 3} and {\tilde \chi(3) = +1}. This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

\displaystyle  \sum_{j=1}^n \tilde \chi(j)

with {n := 1 + 3 + 3^2 + \dots + 3^k} for some large {k}. Writing {j = 3^a m} with {m} coprime to {3} and {a} at most {k}, we can write this sum as

\displaystyle  \sum_{a=0}^k \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m).

Now observe that {\tilde \chi(3^a m) = \tilde \chi(3)^a \tilde \chi(m) = \chi(m)}. The function {\chi} has mean zero on every interval of length three, and {\lfloor n/3^j\rfloor} is equal to {1} mod {3}, and thus

\displaystyle  \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m) = 1

for every {a=0,\dots,k}, and thus

\displaystyle  \sum_{j=1}^n \tilde \chi(j) = k+1 \gg \log n.

Thus {\tilde \chi} also has unbounded discrepancy, but only barely so (it grows logarithmically in {n}). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions {f} that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when {f} is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy) Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions {g}. An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then {g} must pretend to be like a function of the form {n \mapsto \chi(n) n^{it}}. One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

Read the rest of this entry »

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)

for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))


\displaystyle  \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle  \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle  \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)

where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)

whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle  \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle  \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle  e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle  d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle  m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle  \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle  \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle  \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

Read the rest of this entry »

The Poincaré upper half-plane {{\mathbf H} := \{ z: \hbox{Im}(z) > 0 \}} (with a boundary consisting of the real line {{\bf R}} together with the point at infinity {\infty}) carries an action of the projective special linear group

\displaystyle  \hbox{PSL}_2({\bf R}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf R}: ad-bc = 1 \} / \{\pm 1\}

via fractional linear transformations:

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z := \frac{az+b}{cz+d}. \ \ \ \ \ (1)

Here and in the rest of the post we will abuse notation by identifying elements {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of the special linear group {\hbox{SL}_2({\bf R})} with their equivalence class {\{ \pm \begin{pmatrix} a & b \\ c & d \end{pmatrix} \}} in {\hbox{PSL}_2({\bf R})}; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} by its negation {\begin{pmatrix} -a & -b \\ -c & -d \end{pmatrix}}. In particular, we recommend that the reader ignore the signs {\pm} that appear from time to time in the discussion below.

As the action of {\hbox{PSL}_2({\bf R})} on {{\mathbf H}} is transitive, and any given point in {{\mathbf H}} (e.g. {i}) has a stabiliser isomorphic to the projective rotation group {\hbox{PSO}_2({\bf R})}, we can view the Poincaré upper half-plane {{\mathbf H}} as a homogeneous space for {\hbox{PSL}_2({\bf R})}, and more specifically the quotient space of {\hbox{PSL}_2({\bf R})} of a maximal compact subgroup {\hbox{PSO}_2({\bf R})}. In fact, we can make the half-plane a symmetric space for {\hbox{PSL}_2({\bf R})}, by endowing {{\mathbf H}} with the Riemannian metric

\displaystyle  dg^2 := \frac{dx^2 + dy^2}{y^2}

(using Cartesian coordinates {z=x+iy}), which is invariant with respect to the {\hbox{PSL}_2({\bf R})} action. Like any other Riemannian metric, the metric on {{\mathbf H}} generates a number of other important geometric objects on {{\mathbf H}}, such as the distance function {d(z,w)} which can be computed to be given by the formula

\displaystyle  2(\cosh(d(z_1,z_2))-1) = \frac{|z_1-z_2|^2}{\hbox{Im}(z_1) \hbox{Im}(z_2)}, \ \ \ \ \ (2)

the volume measure {\mu = \mu_{\mathbf H}}, which can be computed to be

\displaystyle  d\mu = \frac{dx dy}{y^2},

and the Laplace-Beltrami operator, which can be computed to be {\Delta = y^2 (\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2})} (here we use the negative definite sign convention for {\Delta}). As the metric {dg} was {\hbox{PSL}_2({\bf R})}-invariant, all of these quantities arising from the metric are similarly {\hbox{PSL}_2({\bf R})}-invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant {-1}, thus {{\mathbf H}} is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere {S^2} in {{\bf R}^3} is a model for two-dimensional spherical geometry (or {{\bf R}^2} is a model for two-dimensional Euclidean geometry). (Indeed, {{\mathbf H}} is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid {\{ (x,t) \in {\bf R}^{2+1}: t = \sqrt{1+|x|^2}\}} in the Minkowski spacetime {{\bf R}^{2+1}}, which is the direct analogue of the unit sphere in Euclidean spacetime {{\bf R}^3} or the plane {{\bf R}^2} in Galilean spacetime {{\bf R}^2 \times {\bf R}}.)

One can inject arithmetic into this geometric structure by passing from the Lie group {\hbox{PSL}_2({\bf R})} to the full modular group

\displaystyle  \hbox{PSL}_2({\bf Z}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf Z}: ad-bc = 1 \} / \{\pm 1\}

or congruence subgroups such as

\displaystyle  \Gamma_0(q) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \hbox{PSL}_2({\bf Z}): c = 0\ (q) \} / \{ \pm 1 \} \ \ \ \ \ (3)

for natural number {q}, or to the discrete stabiliser {\Gamma_\infty} of the point at infinity:

\displaystyle  \Gamma_\infty := \{ \pm \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}: b \in {\bf Z} \} / \{\pm 1\}. \ \ \ \ \ (4)

These are discrete subgroups of {\hbox{PSL}_2({\bf R})}, nested by the subgroup inclusions

\displaystyle  \Gamma_\infty \leq \Gamma_0(q) \leq \Gamma_0(1)=\hbox{PSL}_2({\bf Z}) \leq \hbox{PSL}_2({\bf R}).

There are many further discrete subgroups of {\hbox{PSL}_2({\bf R})} (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup {\Gamma} of {\hbox{PSL}_2({\bf R})} generates a quotient space {\Gamma \backslash {\mathbf H}}, which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain {\hbox{Fund}( \Gamma \backslash {\mathbf H})} – a set consisting of a single representative of each of the orbits {\Gamma z} of {\Gamma} in {{\mathbf H}}. This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view {\Gamma \backslash {\mathbf H}} as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure {\mu = \mu_{\Gamma \backslash {\mathbf H}}} on {\Gamma \backslash {\mathbf H}} from the volume measure {\mu = \mu_{\mathbf H}} on {{\mathbf H}} (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as {\mu} when there is no chance of confusion.

For instance, a fundamental domain for {\Gamma_\infty \backslash {\mathbf H}} is given (up to null sets) by the strip {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2} \}}, with {\Gamma_\infty \backslash {\mathbf H}} identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} is famously given (again up to null sets) by an upper portion {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2}; |z| > 1 \}}, with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for {\Gamma_0(q) \backslash {\mathbf H}} can be formed by gluing together

\displaystyle  [\hbox{PSL}_2({\bf Z}) : \Gamma_0(q)] = q \prod_{p|q} (1 + \frac{1}{p}) = q^{1+o(1)}

copies of a fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces {\Gamma \backslash {\mathbf H}}. In order to analyse functions {f: \Gamma \backslash {\mathbf H} \rightarrow {\bf C}} on such orbifolds, it is convenient to lift such functions back up to {{\mathbf H}} and identify them with functions {f: {\mathbf H} \rightarrow {\bf C}} which are {\Gamma}-automorphic in the sense that {f( \gamma z ) = f(z)} for all {z \in {\mathbf H}} and {\gamma \in \Gamma}. Such functions will be referred to as {\Gamma}-automorphic forms, or automorphic forms for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over {\Gamma \backslash {\mathbf H}} than the trivial bundle {(\Gamma \backslash {\mathbf H}) \times {\bf C}} that is implicitly present when analysing scalar functions {f: {\mathbf H} \rightarrow {\bf C}}. However, we will not discuss this (important) more general situation here.)

An important way to create a {\Gamma}-automorphic form is to start with a non-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series {P_\Gamma[f]: {\mathbf H} \rightarrow {\bf C}} defined by

\displaystyle  P_{\Gamma}[f](z) = \sum_{\gamma \in \Gamma} f(\gamma z),

which is clearly {\Gamma}-automorphic. (One could equivalently write {f(\gamma^{-1} z)} in place of {f(\gamma z)} here; there are good argument for both conventions, but I have ultimately decided to use the {f(\gamma z)} convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over {\Gamma} associated with {\Gamma}-automorphic forms. A little more generally, given a subgroup {\Gamma_\infty} of {\Gamma} and a {\Gamma_\infty}-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} of suitable decay, we can form a relative Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[f]: {\mathbf H} \rightarrow {\bf C}} by

\displaystyle  P_{\Gamma_\infty \backslash \Gamma}[f](z) = \sum_{\gamma \in \hbox{Fund}(\Gamma_\infty \backslash \Gamma)} f(\gamma z)

where {\hbox{Fund}(\Gamma_\infty \backslash \Gamma)} is any fundamental domain for {\Gamma_\infty \backslash \Gamma}, that is to say a subset of {\Gamma} consisting of exactly one representative for each right coset of {\Gamma_\infty}. As {f} is {\Gamma_\infty}-automorphic, we see (if {f} has suitable decay) that {P_{\Gamma_\infty \backslash \Gamma}[f]} does not depend on the precise choice of fundamental domain, and is {\Gamma}-automorphic. These operations are all compatible with each other, for instance {P_\Gamma = P_{\Gamma_\infty \backslash \Gamma} \circ P_{\Gamma_\infty}}. A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function {f}.

For future reference we record the basic but fundamental unfolding identities

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_\Gamma[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\mathbf H} f g\ d\mu_{\mathbf H} \ \ \ \ \ (5)

for any function {f: {\mathbf H} \rightarrow {\bf C}} with sufficient decay, and any {\Gamma}-automorphic function {g} of reasonable growth (e.g. {f} bounded and compact support, and {g} bounded, will suffice). Note that {g} is viewed as a function on {\Gamma \backslash {\mathbf H}} on the left-hand side, and as a {\Gamma}-automorphic function on {{\mathbf H}} on the right-hand side. More generally, one has

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_{\Gamma_\infty \backslash \Gamma}[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\Gamma_\infty \backslash {\mathbf H}} f g\ d\mu_{\Gamma_\infty \backslash {\mathbf H}} \ \ \ \ \ (6)

whenever {\Gamma_\infty \leq \Gamma} are discrete subgroups of {\hbox{PSL}_2({\bf R})}, {f} is a {\Gamma_\infty}-automorphic function with sufficient decay on {\Gamma_\infty \backslash {\mathbf H}}, and {g} is a {\Gamma}-automorphic (and thus also {\Gamma_\infty}-automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains {{\mathbf H}}, {\Gamma_\infty \backslash {\mathbf H}}, and {\Gamma \backslash {\mathbf H}} in our analysis.

When computing various statistics of a Poincaré series {P_\Gamma[f]}, such as its values {P_\Gamma[f](z)} at special points {z}, or the {L^2} quantity {\int_{\Gamma \backslash {\mathbf H}} |P_\Gamma[f]|^2\ d\mu}, expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1), \ \ \ \ \ (7)

where {\tau(n) := \sum_{d|n} 1} is the divisor function. This can be rewritten (by factoring {n=bc} and {n+1=ad}) as

\displaystyle  \sum_{ a,b,c,d \in {\bf N}: ad-bc = 1} 1_{bc \leq x} \ \ \ \ \ (8)

which is basically a sum over the full modular group {\hbox{PSL}_2({\bf Z})}. At this point we will “cheat” a little by moving to the related, but different, sum

\displaystyle  \sum_{a,b,c,d \in {\bf Z}: ad-bc = 1} 1_{a^2+b^2+c^2+d^2 \leq x}. \ \ \ \ \ (9)

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between {i} and {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} is given by the formula

\displaystyle  2(\cosh(d(i,\begin{pmatrix} a & b \\ c & d \end{pmatrix} i))-1) = a^2+b^2+c^2+d^2 - 2

and so one can express the above sum as

\displaystyle  2 \sum_{\gamma \in \hbox{PSL}_2({\bf Z})} 1_{d(i,\gamma i) \leq \hbox{cosh}^{-1}(x/2)}

(the factor of {2} coming from the quotient by {\{\pm 1\}} in the projective special linear group); one can express this as {P_\Gamma[f](i)}, where {\Gamma = \hbox{PSL}_2({\bf Z})} and {f} is the indicator function of the ball {B(i, \hbox{cosh}^{-1}(x/2))}. Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

\displaystyle  \sum_{n \leq x} \tau(n^2+1) \ \ \ \ \ (10)

of the sum (7). Note from multiplicativity that (7) can be written as {\sum_{n \leq x} \tau(n^2+n)}, which is superficially very similar to (10), but with the key difference that the polynomial {n^2+1} is irreducible over the integers.

As with (7), we may expand (10) as

\displaystyle  \sum_{A,B,C \in {\bf N}: B^2 - AC = -1} 1_{B \leq x}.

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation {B + i = (a-bi) (c+di)} of {B+i} into Gaussian integers {a-bi, c+di} gives rise (upon taking norms) to an identity of the form {B^2 - AC = -1}, where {A = a^2+b^2} and {C = c^2+d^2}. Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form {B^2-AC=-1} gives rise to a factorisation of the form {B+i = (a-bi) (c+di)}, essentially uniquely up to units. Now note that {(a-bi)(c+di)} is of the form {B+i} if and only if {ad-bc=1}, in which case {B = ac+bd}. Thus we can essentially write the above sum as something like

\displaystyle  \sum_{a,b,c,d: ad-bc = 1} 1_{|ac+bd| \leq x} \ \ \ \ \ (11)

and one the modular group {\hbox{PSL}_2({\bf Z})} is now manifest. An equivalent way to see these manipulations is as follows. A triple {A,B,C} of natural numbers with {B^2-AC=1} gives rise to a positive quadratic form {Ax^2+2Bxy+Cy^2} of normalised discriminant {B^2-AC} equal to {-1} with integer coefficients (it is natural here to allow {B} to take integer values rather than just natural number values by essentially doubling the sum). The group {\hbox{PSL}_2({\bf Z})} acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse {\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}} of an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {\hbox{SL}_2({\bf Z})}). Because the discriminant {-1} has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form {Ax^2 + 2Bxy + Cy^2} in this space is equivalent (under the action of some element of {\hbox{PSL}_2({\bf Z})}) with the standard quadratic form {x^2+y^2}. In other words, one has

\displaystyle  Ax^2 + 2Bxy + Cy^2 = (dx-by)^2 + (-cx+ay)^2

which (up to a harmless sign) is exactly the representation {B = ac+bd}, {A = c^2+d^2}, {C = a^2+b^2} introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial {n^2+1} is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} has real part {\frac{ac+bd}{c^2+d^2}} and imaginary part {\frac{1}{c^2+d^2}}. Thus (11) is (up to a factor of two) the Poincaré series {P_\Gamma[f](i)} as in the preceding example, except that {f} is now the indicator of the sector {\{ z: |\hbox{Re} z| \leq x |\hbox{Im} z| \}}.

Sums involving subgroups of the full modular group, such as {\Gamma_0(q)}, often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression {\sum_{n \leq x: q|n} \tau(n^2+1)} when {q} and {x} are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over {\Gamma_0(q)} rather than {\hbox{PSL}_2({\bf Z})}.

The third and final example concerns averages of Kloosterman sums

\displaystyle  S(m,n;c) := \sum_{x \in ({\bf Z}/c{\bf Z})^\times} e( \frac{mx + n\overline{x}}{c} ) \ \ \ \ \ (12)

where {e(\theta) := e^{2p\i i\theta}} and {\overline{x}} is the inverse of {x} in the multiplicative group {({\bf Z}/c{\bf Z})^\times}. It turns out that the {L^2} norms of Poincaré series {P_\Gamma[f]} or {P_{\Gamma_\infty \backslash \Gamma}[f]} are closely tied to such averages. Consider for instance the quantity

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]|^2\ d\mu_{\Gamma \backslash {\mathbf H}} \ \ \ \ \ (13)

where {q} is a natural number and {f} is a {\Gamma_\infty}-automorphic form that is of the form

\displaystyle  f(x+iy) = F(my) e(m x)

for some integer {m} and some test function {f: (0,+\infty) \rightarrow {\bf C}}, which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} \overline{f} P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}.

To compute this, we use the double coset decomposition

\displaystyle  \Gamma_0(q) = \Gamma_\infty \cup \bigcup_{c \in {\mathbf N}: q|c} \bigcup_{1 \leq d \leq c: (d,c)=1} \Gamma_\infty \begin{pmatrix} a & b \\ c & d \end{pmatrix} \Gamma_\infty,

where for each {c,d}, {a,b} are arbitrarily chosen integers such that {ad-bc=1}. To see this decomposition, observe that every element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} in {\Gamma_0(q)} outside of {\Gamma_\infty} can be assumed to have {c>0} by applying a sign {\pm}, and then using the row and column operations coming from left and right multiplication by {\Gamma_\infty} (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place {d} in the interval {[1,c]} and {(a,b)} to be any specified integer pair with {ad-bc=1}. From this we see that

\displaystyle  P_{\Gamma_\infty \backslash \Gamma_0(q)}[f] = f + \sum_{c \in {\mathbf N}: q|c} \sum_{1 \leq d \leq c: (d,c)=1} P_{\Gamma_\infty}[ f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot ) ]

and so from further use of the unfolding formula (5) we may expand (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}

\displaystyle  + \sum_{c \in {\mathbf N}} \sum_{1 \leq d \leq c: (d,c)=1} \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}.

The first integral is just {m \int_0^\infty |F(y)|^2 \frac{dy}{y^2}}. The second expression is more interesting. We have

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z = \frac{az+b}{cz+d} = \frac{a}{c} - \frac{1}{c(cz+d)}

\displaystyle  = \frac{a}{c} - \frac{cx+d}{c((cx+d)^2+c^2y^2)} + \frac{iy}{(cx+d)^2 + c^2y^2}

so we can write

\displaystyle  \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}


\displaystyle  \int_0^\infty \int_{\bf R} \overline{F}(my) F(\frac{imy}{(cx+d)^2 + c^2y^2}) e( -mx + \frac{ma}{c} - m \frac{cx+d}{c((cx+d)^2+c^2y^2)} )

\displaystyle \frac{dx dy}{y^2}

which on shifting {x} by {d/c} simplifies a little to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(my) \bar{F}(\frac{imy}{c^2(x^2 + y^2)}) e(- mx - m \frac{x}{c^2(x^2+y^2)} )

\displaystyle  \frac{dx dy}{y^2}

and then on scaling {x,y} by {m} simplifies a little further to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(y) \bar{F}(\frac{m^2}{c^2} \frac{iy}{x^2 + y^2}) e(- x - \frac{m^2}{c^2} \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

Note that as {ad-bc=1}, we have {a = \overline{d}} modulo {c}. Comparing the above calculations with (12), we can thus write (13) as

\displaystyle  m (\int_0^\infty |F(y)|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c})) \ \ \ \ \ (14)


\displaystyle  V(u) := \frac{1}{u} \int_0^\infty \int_{\bf R} F(y) \bar{F}(u^2 \frac{y}{x^2 + y^2}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}

is a certain integral involving {F} and a parameter {u}, but which does not depend explicitly on parameters such as {m,c,d}. Thus we have indeed expressed the {L^2} expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of {L^2} expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator {-\Delta} on spaces such as {\Gamma\backslash {\mathbf H}} or {\Gamma_\infty \backslash {\mathbf H}}, so that a Poincaré series such as {P_\Gamma[f]} might be expanded out using inner products of {P_\Gamma[f]} (or, by the unfolding identities, {f}) with various generalised eigenfunctions of {-\Delta} (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions {K_{it}} of the second kind, play a prominent role, basically because the {\Gamma_\infty}-automorphic functions

\displaystyle  x+iy \mapsto y^{1/2} K_{it}(2\pi |m| y) e(mx)

for {t \in {\bf R}} and {m \in {\bf Z}} non-zero are generalised eigenfunctions of {-\Delta} (with eigenvalue {\frac{1}{4}+t^2}), and are almost square-integrable on {\Gamma_\infty \backslash {\mathbf H}} (the {L^2} norm diverges only logarithmically at one end {y \rightarrow 0^+} of the cylinder {\Gamma_\infty \backslash {\mathbf H}}, while decaying exponentially fast at the other end {y \rightarrow +\infty}).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as {-\Delta} is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation {(-\Delta + k^2) u = f}, the heat equation {\partial_t u = \Delta u}, the Schrödinger equation {i\partial_t u + \Delta u = 0}, or the wave equation {\partial_{tt} u = \Delta u}. Thus, one can hope to rephrase many arguments that involve spectral data of {-\Delta} into arguments that instead involve resolvents {(-\Delta + k^2)^{-1}}, heat kernels {e^{t\Delta}}, Schrödinger propagators {e^{it\Delta}}, or wave propagators {e^{\pm it\sqrt{-\Delta}}}, or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

\displaystyle  \partial_{tt} u - \Delta u = F.

Actually it will be a bit more convenient to normalise the Laplacian by {\frac{1}{4}}, and look instead at the automorphic wave equation

\displaystyle  \partial_{tt} u + (-\Delta - \frac{1}{4}) u = F. \ \ \ \ \ (15)

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of {\frac{1}{4}} that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance {u} and {F} are {\Gamma_\infty}-automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[u]} and {P_{\Gamma_\infty \backslash \Gamma}[F]} will be {\Gamma}-automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

  • Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for {-\Delta} on {\Gamma_0(q) \backslash {\mathbf H}} (discussed previously here);
  • Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s {3/16} bound to the optimal {1/4}) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
  • Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 5,412 other followers