You are currently browsing the category archive for the ‘expository’ category.

In this blog post, I would like to specialise the arguments of Bourgain, Demeter, and Guth from the previous post to the two-dimensional case of the Vinogradov main conjecture, namely

Theorem 1 (Two-dimensional Vinogradov main conjecture) One has

\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^6\ dx dy \ll N^{3+o(1)}

as {N \rightarrow \infty}.

This particular case of the main conjecture has a classical proof using some elementary number theory. Indeed, the left-hand side can be viewed as the number of solutions to the system of equations

\displaystyle j_1 + j_2 + j_3 = k_1 + k_2 + k_3

\displaystyle j_1^2 + j_2^2 + j_3^2 = k_1^2 + k_2^2 + k_3^2

with {j_1,j_2,j_3,k_1,k_2,k_3 \in \{0,\dots,N\}}. These two equations can combine (using the algebraic identity {(a+b-c)^2 - (a^2+b^2-c^2) = 2 (a-c)(b-c)} applied to {(a,b,c) = (j_1,j_2,k_3), (k_1,k_2,j_3)}) to imply the further equation

\displaystyle (j_1 - k_3) (j_2 - k_3) = (k_1 - j_3) (k_2 - j_3)

which, when combined with the divisor bound, shows that each {k_1,k_2,j_3} is associated to {O(N^{o(1)})} choices of {j_1,j_2,k_3} excluding diagonal cases when two of the {j_1,j_2,j_3,k_1,k_2,k_3} collide, and this easily yields Theorem 1. However, the Bourgain-Demeter-Guth argument (which, in the two dimensional case, is essentially contained in a previous paper of Bourgain and Demeter) does not require the divisor bound, and extends for instance to the the more general case where {j} ranges in a {1}-separated set of reals between {0} to {N}.

In this special case, the Bourgain-Demeter argument simplifies, as the lower dimensional inductive hypothesis becomes a simple {L^2} almost orthogonality claim, and the multilinear Kakeya estimate needed is also easy (collapsing to just Fubini’s theorem). Also one can work entirely in the context of the Vinogradov main conjecture, and not turn to the increased generality of decoupling inequalities (though this additional generality is convenient in higher dimensions). As such, I am presenting this special case as an introduction to the Bourgain-Demeter-Guth machinery.

We now give the specialisation of the Bourgain-Demeter argument to Theorem 1. It will suffice to establish the bound

\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^p\ dx dy \ll N^{p/2+o(1)}

for all {4<p<6}, (where we keep {p} fixed and send {N} to infinity), as the {L^6} bound then follows by combining the above bound with the trivial bound {|\sum_{j=0}^N e( j x + j^2 x^2)| \ll N}. Accordingly, for any {\eta > 0} and {4<p<6}, we let {P(p,\eta)} denote the claim that

\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^p\ dx dy \ll N^{p/2+\eta+o(1)}

as {N \rightarrow \infty}. Clearly, for any fixed {p}, {P(p,\eta)} holds for some large {\eta}, and it will suffice to establish

Proposition 2 Let {4<p<6}, and let {\eta>0} be such that {P(p,\eta)} holds. Then there exists {0 < \eta' < \eta} (depending continuously on \eta) such that {P(p,\eta')} holds.

Indeed, this proposition shows that for {4<p<6}, the infimum of the {\eta} for which {P(p,\eta)} holds is zero.

We prove the proposition below the fold, using a simplified form of the methods discussed in the previous blog post. To simplify the exposition we will be a bit cavalier with the uncertainty principle, for instance by essentially ignoring the tails of rapidly decreasing functions.

Read the rest of this entry »

Given any finite collection of elements {(f_i)_{i \in I}} in some Banach space {X}, the triangle inequality tells us that

\displaystyle \| \sum_{i \in I} f_i \|_X \leq \sum_{i \in I} \|f_i\|_X.

However, when the {f_i} all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if {X} is a Hilbert space and the {f_i} are mutually orthogonal, we have the Pythagorean theorem

\displaystyle \| \sum_{i \in I} f_i \|_X = (\sum_{i \in I} \|f_i\|_X^2)^{1/2}.

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \leq (\# I)^{1/2} (\sum_{i \in I} \|f_i\|_X^2)^{1/2} \ \ \ \ \ (1)


for any finite collection {(f_i)_{i \in I}} in any Banach space {X}, where {\# I} denotes the cardinality of {I}. Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of {(\# I)^{1/2}} or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection {(f_i)_{i \in I}} exhibits decoupling in {X} if one has the Pythagorean-like inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \ll_\varepsilon (\# I)^\varepsilon (\sum_{i \in I} \|f_i\|_X^2)^{1/2}

for any {\varepsilon>0}, thus one obtains almost the full square root cancellation in the {X} norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as {L^2({\bf R}^n)}. In {L^p} spaces for {p < 2} one usually does not expect this sort of decoupling; for instance, if the {f_i} are disjointly supported one has

\displaystyle \| \sum_{i \in I} f_i \|_{L^p} = (\sum_{i \in I} \|f_i\|_{L^p}^p)^{1/p}

and the right-hand side can be much larger than {(\sum_{i \in I} \|f_i\|_{L^p}^2)^{1/2}} when {p < 2}. At the opposite extreme, one usually does not expect to get decoupling in {L^\infty}, since one could conceivably align the {f_i} to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in {L^\infty} becomes sharp.

However, in some cases one can get decoupling for certain {2 < p < \infty}. For instance, suppose we are in {L^4}, and that {f_1,\dots,f_N} are bi-orthogonal in the sense that the products {f_i f_j} for {1 \leq i < j \leq N} are pairwise orthogonal in {L^2}. Then we have

\displaystyle \| \sum_{i = 1}^N f_i \|_{L^4}^2 = \| (\sum_{i=1}^N f_i)^2 \|_{L^2}

\displaystyle = \| \sum_{1 \leq i,j \leq N} f_i f_j \|_{L^2}

\displaystyle \ll (\sum_{1 \leq i,j \leq N} \|f_i f_j \|_{L^2}^2)^{1/2}

\displaystyle = \| (\sum_{1 \leq i,j \leq N} |f_i f_j|^2)^{1/2} \|_{L^2}

\displaystyle = \| \sum_{i=1}^N |f_i|^2 \|_{L^2}

\displaystyle \leq \sum_{i=1}^N \| |f_i|^2 \|_{L^2}

\displaystyle = \sum_{i=1}^N \|f_i\|_{L^4}^2

giving decoupling in {L^4}. (Similarly if each of the {f_i f_j} is orthogonal to all but {O_\varepsilon( N^\varepsilon )} of the other {f_{i'} f_{j'}}.) A similar argument also gives {L^6} decoupling when one has tri-orthogonality (with the {f_i f_j f_k} mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed {2 < p < \infty} if one multiplies each of the {f_i} by an independent random sign {\epsilon_i \in \{-1,+1\}}.

In recent years, Bourgain and Demeter have been establishing decoupling theorems in {L^p({\bf R}^n)} spaces for various key exponents of {2 < p < \infty}, in the “restriction theory” setting in which the {f_i} are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve {\gamma({\bf R}) \subset {\bf R}^n} parameterised by the polynomial curve

\displaystyle \gamma: t \mapsto (t, t^2, \dots, t^n).

For any ball {B = B(x_0,r)} in {{\bf R}^n}, let {w_B: {\bf R}^n \rightarrow {\bf R}^+} denote the weight

\displaystyle w_B(x) := \frac{1}{(1 + \frac{|x-x_0|}{r})^{100n}},

which should be viewed as a smoothed out version of the indicator function {1_B} of {B}. In particular, the space {L^p(w_B) = L^p({\bf R}^n, w_B(x)\ dx)} can be viewed as a smoothed out version of the space {L^p(B)}. For future reference we observe a fundamental self-similarity of the curve {\gamma({\bf R})}: any arc {\gamma(I)} in this curve, with {I} a compact interval, is affinely equivalent to the standard arc {\gamma([0,1])}.

Theorem 1 (Decoupling theorem) Let {n \geq 1}. Subdivide the unit interval {[0,1]} into {N} equal subintervals {I_i} of length {1/N}, and for each such {I_i}, let {f_i: {\bf R}^n \rightarrow {\bf R}} be the Fourier transform

\displaystyle f_i(x) = \int_{\gamma(I_i)} e(x \cdot \xi)\ d\mu_i(\xi)

of a finite Borel measure {\mu_i} on the arc {\gamma(I_i)}, where {e(\theta) := e^{2\pi i \theta}}. Then the {f_i} exhibit decoupling in {L^{n(n+1)}(w_B)} for any ball {B} of radius {N^n}.

Orthogonality gives the {n=1} case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in {L^p} up to the range {2 \leq p \leq 2n}; the point here is that we can now get a much larger value of {n}. The {n=2} case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent {n(n+1)} (and the radius {N^n}) is best possible, as can be seen by the following basic example. If

\displaystyle f_i(x) := \int_{I_i} e(x \cdot \gamma(\xi)) g_i(\xi)\ d\xi

where {g_i} is a bump function adapted to {I_i}, then standard Fourier-analytic computations show that {f_i} will be comparable to {1/N} on a rectangular box of dimensions {N \times N^2 \times \dots \times N^n} (and thus volume {N^{n(n+1)/2}}) centred at the origin, and exhibit decay away from this box, with {\|f_i\|_{L^{n(n+1)}(w_B)}} comparable to

\displaystyle 1/N \times (N^{n(n+1)/2})^{1/(n(n+1))} = 1/\sqrt{N}.

On the other hand, {\sum_{i=1}^N f_i} is comparable to {1} on a ball of radius comparable to {1} centred at the origin, so {\|\sum_{i=1}^N f_i\|_{L^{n(n+1)}(w_B)}} is {\gg 1}, which is just barely consistent with decoupling. This calculation shows that decoupling will fail if {n(n+1)} is replaced by any larger exponent, and also if the radius of the ball {B} is reduced to be significantly smaller than {N^n}.

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture) Let {s, n, N \geq 1} be integers, and let {\varepsilon > 0}. Then

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{2s}\ dx_1 \dots dx_n

\displaystyle \ll_{\varepsilon,s,n} N^{s+\varepsilon} + N^{2s - \frac{n(n+1)}{2}+\varepsilon}.

Proof: By the Hölder inequality (and the trivial bound of {N} for the exponential sum), it suffices to treat the critical case {s = n(n+1)/2}, that is to say to show that

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{n(n+1)}\ dx_1 \dots dx_n \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+\varepsilon}.

We can rescale this as

\displaystyle \int_{[0,N] \times [0,N^2] \times \dots \times [0,N^n]} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{n(n+1)+\varepsilon}.

As the integrand is periodic along the lattice {N{\bf Z} \times N^2 {\bf Z} \times \dots \times N^n {\bf Z}}, this is equivalent to

\displaystyle \int_{[0,N^n]^n} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+n^2+\varepsilon}.

The left-hand side may be bounded by {\ll \| \sum_{j=1}^N f_j \|_{L^{n(n+1)}(w_B)}^{n(n+1)}}, where {B := B(0,N^n)} and {f_j(x) := e(x \cdot \gamma(j/N))}. Since

\displaystyle \| f_j \|_{L^{n(n+1)}(w_B)} \ll (N^{n^2})^{\frac{1}{n(n+1)}},

the claim now follows from the decoupling theorem and a brief calculation. \Box

Using the Plancherel formula, one may equivalently (when {s} is an integer) write the Vinogradov main conjecture in terms of solutions {j_1,\dots,j_s,k_1,\dots,k_s \in \{1,\dots,N\}} to the system of equations

\displaystyle j_1^i + \dots + j_s^i = k_1^i + \dots + k_s^i \forall i=1,\dots,n,

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for {n \leq 3}, or for {n > 3} and {s} either below {n(n+1)/2 - n/3 + O(n^{2/3})} or above {n(n-1)}, with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of {23} fifth powers (the previous best result required {28} fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals {I_i} in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set {\{1,\dots,N\}} that {j} is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

Read the rest of this entry »

Let {\lambda} denote the Liouville function. The prime number theorem is equivalent to the estimate

\displaystyle \sum_{n \leq x} \lambda(n) = o(x)

as {x \rightarrow \infty}, that is to say that {\lambda} exhibits cancellation on large intervals such as {[1,x]}. This result can be improved to give cancellation on shorter intervals. For instance, using the known zero density estimates for the Riemann zeta function, one can establish that

\displaystyle \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o( HX ) \ \ \ \ \ (1)


as {X \rightarrow \infty} if {X^{1/6+\varepsilon} \leq H \leq X} for some fixed {\varepsilon>0}; I believe this result is due to Ramachandra (see also Exercise 21 of this previous blog post), and in fact one could obtain a better error term on the right-hand side that for instance gained an arbitrary power of {\log X}. On the Riemann hypothesis (or the weaker density hypothesis), it was known that the {X^{1/6+\varepsilon}} could be lowered to {X^\varepsilon}.

Early this year, there was a major breakthrough by Matomaki and Radziwill, who (among other things) showed that the asymptotic (1) was in fact valid for any {H = H(X)} with {H \leq X} that went to infinity as {X \rightarrow \infty}, thus yielding cancellation on extremely short intervals. This has many further applications; for instance, this estimate, or more precisely its extension to other “non-pretentious” bounded multiplicative functions, was a key ingredient in my recent solution of the Erdös discrepancy problem, as well as in obtaining logarithmically averaged cases of Chowla’s conjecture, such as

\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)


It is of interest to twist the above estimates by phases such as the linear phase {n \mapsto e(\alpha n) := e^{2\pi i \alpha n}}. In 1937, Davenport showed that

\displaystyle \sup_\alpha |\sum_{n \leq x} \lambda(n) e(\alpha n)| \ll_A x \log^{-A} x

which of course improves the prime number theorem. Recently with Matomaki and Radziwill, we obtained a common generalisation of this estimate with (1), showing that

\displaystyle \sup_\alpha \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)


as {X \rightarrow \infty}, for any {H = H(X) \leq X} that went to infinity as {X \rightarrow \infty}. We were able to use this estimate to obtain an averaged form of Chowla’s conjecture.

In that paper, we asked whether one could improve this estimate further by moving the supremum inside the integral, that is to say to establish the bound

\displaystyle \int_X^{2X} \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (4)


as {X \rightarrow \infty}, for any {H = H(X) \leq X} that went to infinity as {X \rightarrow \infty}. This bound is asserting that {\lambda} is locally Fourier-uniform on most short intervals; it can be written equivalently in terms of the “local Gowers {U^2} norm” as

\displaystyle \int_X^{2X} \sum_{1 \leq a \leq H} |\sum_{x \leq n \leq x+H} \lambda(n) \lambda(n+a)|^2\ dx = o( H^3 X )

from which one can see that this is another averaged form of Chowla’s conjecture (stronger than the one I was able to prove with Matomaki and Radziwill, but a consequence of the unaveraged Chowla conjecture). If one inserted such a bound into the machinery I used to solve the Erdös discrepancy problem, it should lead to further averaged cases of Chowla’s conjecture, such as

\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x), \ \ \ \ \ (5)


though I have not fully checked the details of this implication. It should also have a number of new implications for sign patterns of the Liouville function, though we have not explored these in detail yet.

One can write (4) equivalently in the form

\displaystyle \int_X^{2X} \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )\ dx = o(HX) \ \ \ \ \ (6)


uniformly for all {x}-dependent phases {\alpha(x), \beta(x)}. In contrast, (3) is equivalent to the subcase of (6) when the linear phase coefficient {\alpha(x)} is independent of {x}. This dependency of {\alpha(x)} on {x} seems to necessitate some highly nontrivial additive combinatorial analysis of the function {x \mapsto \alpha(x)} in order to establish (4) when {H} is small. To date, this analysis has proven to be elusive, but I would like to record what one can do with more classical methods like Vaughan’s identity, namely:

Proposition 1 The estimate (4) (or equivalently (6)) holds in the range {X^{2/3+\varepsilon} \leq H \leq X} for any fixed {\varepsilon>0}. (In fact one can improve the right-hand side by an arbitrary power of {\log X} in this case.)

The values of {H} in this range are far too large to yield implications such as new cases of the Chowla conjecture, but it appears that the {2/3} exponent is the limit of “classical” methods (at least as far as I was able to apply them), in the sense that one does not do any combinatorial analysis on the function {x \mapsto \alpha(x)}, nor does one use modern equidistribution results on “Type III sums” that require deep estimates on Kloosterman-type sums. The latter may shave a little bit off of the {2/3} exponent, but I don’t see how one would ever hope to go below {1/2} without doing some non-trivial combinatorics on the function {x \mapsto \alpha(x)}. UPDATE: I have come across this paper of Zhan which uses mean-value theorems for L-functions to lower the {2/3} exponent to {5/8}.

Let me now sketch the proof of the proposition, omitting many of the technical details. We first remark that known estimates on sums of the Liouville function (or similar functions such as the von Mangoldt function) in short arithmetic progressions, based on zero-density estimates for Dirichlet {L}-functions, can handle the “major arc” case of (4) (or (6)) where {\alpha} is restricted to be of the form {\alpha = \frac{a}{q} + O( X^{-1/6-\varepsilon} )} for {q = O(\log^{O(1)} X)} (the exponent here being of the same numerology as the {X^{1/6+\varepsilon}} exponent in the classical result of Ramachandra, tied to the best zero density estimates currently available); for instance a modification of the arguments in this recent paper of Koukoulopoulos would suffice. Thus we can restrict attention to “minor arc” values of {\alpha} (or {\alpha(x)}, using the interpretation of (6)).

Next, one breaks up {\lambda} (or the closely related Möbius function) into Dirichlet convolutions using one of the standard identities (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the Liouville and Möbius functions). The exact choice of identity is not terribly important, but the upshot is that {\lambda(n)} can be decomposed into {\log^{O(1)} X} terms, each of which is either of the “Type I” form

\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d

for some coefficients {a_d} that are roughly of logarithmic size on the average, and scales {D, M} with {D \ll X^{2/3}} and {DM \sim X}, or else of the “Type II” form

\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d b_m

for some coefficients {a_d, b_m} that are roughly of logarithmic size on the average, and scales {D,M} with {X^{1/3} \ll D,M \ll X^{2/3}} and {DM \sim X}. As discussed in the previous post, the {2/3} exponent is a natural barrier in these identities if one is unwilling to also consider “Type III” type terms which are roughly of the shape of the third divisor function {\tau_3(n) := \sum_{d_1d_2d_3=1} 1}.

A Type I sum makes a contribution to { \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )} that can be bounded (via Cauchy-Schwarz) in terms of an expression such as

\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} e(\alpha(x) dm )|^2.

The inner sum exhibits a lot of cancellation unless {\alpha(x) d} is within {O(D/H)} of an integer. (Here, “a lot” should be loosely interpreted as “gaining many powers of {\log X} over the trivial bound”.) Since {H} is significantly larger than {D}, standard Vinogradov-type manipulations (see e.g. Lemma 13 of these previous notes) show that this bad case occurs for many {d} only when {\alpha} is “major arc”, which is the case we have specifically excluded. This lets us dispose of the Type I contributions.

A Type II sum makes a contribution to { \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )} roughly of the form

\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} b_m e(\alpha(x) dm)|.

We can break this up into a number of sums roughly of the form

\displaystyle \sum_{d = d_0 + O( H / M )} | \sum_{x/d_0 \leq m \leq x/d_0 + H/D} b_m e(\alpha(x) dm)|

for {d_0 \sim D}; note that the {d} range is non-trivial because {H} is much larger than {M}. Applying the usual bilinear sum Cauchy-Schwarz methods (e.g. Theorem 14 of these notes) we conclude that there is a lot of cancellation unless one has {\alpha(x) = a/q + O( \frac{X \log^{O(1)} X}{H^2} )} for some {q = O(\log^{O(1)} X)}. But with {H \geq X^{2/3+\varepsilon}}, {X \log^{O(1)} X/H^2} is well below the threshold {X^{-1/6-\varepsilon}} for the definition of major arc, so we can exclude this case and obtain the required cancellation.

A basic estimate in multiplicative number theory (particularly if one is using the Granville-Soundararajan “pretentious” approach to this subject) is the following inequality of Halasz (formulated here in a quantitative form introduced by Montgomery and Tenenbaum).

Theorem 1 (Halasz inequality) Let {f: {\bf N} \rightarrow {\bf C}} be a multiplicative function bounded in magnitude by {1}, and suppose that {x \geq 3}, {T \geq 1}, and { M \geq 0} are such that

\displaystyle \sum_{p \leq x} \frac{1 - \hbox{Re}(f(p) p^{-it})}{p} \geq M \ \ \ \ \ (1)


for all real numbers {t} with {|t| \leq T}. Then

\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{\sqrt{T}}.

As a qualitative corollary, we conclude (by standard compactness arguments) that if

\displaystyle \sum_{p} \frac{1 - \hbox{Re}(f(p) p^{-it})}{p} = +\infty

for all real {t}, then

\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) = o(1) \ \ \ \ \ (2)


as {x \rightarrow \infty}. In the more recent work of this paper of Granville and Soundararajan, the sharper bound

\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{T} + \frac{\log\log x}{\log x}

is obtained (with a more precise description of the {(1+M) e^{-M}} term).

The usual proofs of Halasz’s theorem are somewhat lengthy (though there has been a recent simplification, in forthcoming work of Granville, Harper, and Soundarajan). Below the fold I would like to give a relatively short proof of the following “cheap” version of the inequality, which has slightly weaker quantitative bounds, but still suffices to give qualitative conclusions such as (2).

Theorem 2 (Cheap Halasz inequality) Let {f: {\bf N} \rightarrow {\bf C}} be a multiplicative function bounded in magnitude by {1}. Let {T \geq 1} and {M \geq 0}, and suppose that {x} is sufficiently large depending on {T,M}. If (1) holds for all {|t| \leq T}, then

\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M/2} + \frac{1}{T}.

The non-optimal exponent {1/2} can probably be improved a bit by being more careful with the exponents, but I did not try to optimise it here. A similar bound appears in the first paper of Halasz on this topic.

The idea of the argument is to split {f} as a Dirichlet convolution {f = f_1 * f_2 * f_3} where {f_1,f_2,f_3} is the portion of {f} coming from “small”, “medium”, and “large” primes respectively (with the dividing line between the three types of primes being given by various powers of {x}). Using a Perron-type formula, one can express this convolution in terms of the product of the Dirichlet series of {f_1,f_2,f_3} respectively at various complex numbers {1+it} with {|t| \leq T}. One can use {L^2} based estimates to control the Dirichlet series of {f_2,f_3}, while using the hypothesis (1) one can get {L^\infty} estimates on the Dirichlet series of {f_1}. (This is similar to the Fourier-analytic approach to ternary additive problems, such as Vinogradov’s theorem on representing large odd numbers as the sum of three primes.) This idea was inspired by a similar device used in the work of Granville, Harper, and Soundarajan. A variant of this argument also appears in unpublished work of Adam Harper.

I thank Andrew Granville for helpful comments which led to significant simplifications of the argument.

Read the rest of this entry »

The Chowla conjecture asserts, among other things, that one has the asymptotic

\displaystyle \frac{1}{X} \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(1)

as {X \rightarrow \infty} for any distinct integers {h_1,\dots,h_k}, where {\lambda} is the Liouville function. (The usual formulation of the conjecture also allows one to consider more general linear forms {a_i n + b_i} than the shifts {n+h_i}, but for sake of discussion let us focus on the shift case.) This conjecture remains open for {k \geq 2}, though there are now some partial results when one averages either in {x} or in the {h_1,\dots,h_k}, as discussed in this recent post.

A natural generalisation of the Chowla conjecture is the Elliott conjecture. Its original formulation was basically as follows: one had

\displaystyle \frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k) = o(1) \ \ \ \ \ (1)

whenever {g_1,\dots,g_k} were bounded completely multiplicative functions and {h_1,\dots,h_k} were distinct integers, and one of the {g_i} was “non-pretentious” in the sense that

\displaystyle \sum_p \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} = +\infty \ \ \ \ \ (2)

for all Dirichlet characters {\chi} and real numbers {t}. It is easy to see that some condition like (2) is necessary; for instance if {g(n) := \chi(n) n^{it}} and {\chi} has period {q} then {\frac{1}{X} \sum_{n \leq X} g(n+q) \overline{g(n)}} can be verified to be bounded away from zero as {X \rightarrow \infty}.

In a previous paper with Matomaki and Radziwill, we provided a counterexample to the original formulation of the Elliott conjecture, and proposed that (2) be replaced with the stronger condition

\displaystyle \inf_{|t| \leq X} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \rightarrow +\infty \ \ \ \ \ (3)

as {X \rightarrow \infty} for any Dirichlet character {\chi}. To support this conjecture, we proved an averaged and non-asymptotic version of this conjecture which roughly speaking showed a bound of the form

\displaystyle \frac{1}{H^k} \sum_{h_1,\dots,h_k \leq H} |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon

whenever {H} was an arbitrarily slowly growing function of {X}, {X} was sufficiently large (depending on {\varepsilon,k} and the rate at which {H} grows), and one of the {g_i} obeyed the condition

\displaystyle \inf_{|t| \leq AX} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \geq A \ \ \ \ \ (4)

for some {A} that was sufficiently large depending on {k,\varepsilon}, and all Dirichlet characters {\chi} of period at most {A}. As further support of this conjecture, I recently established the bound

\displaystyle \frac{1}{\log \omega} |\sum_{X/\omega \leq n \leq X} \frac{g_1(n+h_1) g_2(n+h_2)}{n}| \leq \varepsilon

under the same hypotheses, where {\omega} is an arbitrarily slowly growing function of {X}.

In view of these results, it is tempting to conjecture that the condition (4) for one of the {g_i} should be sufficient to obtain the bound

\displaystyle |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon

when {A} is large enough depending on {k,\varepsilon}. This may well be the case for {k=2}. However, the purpose of this blog post is to record a simple counterexample for {k>2}. Let’s take {k=3} for simplicity. Let {t_0} be a quantity much larger than {X} but much smaller than {X^2} (e.g. {t = X^{3/2}}), and set

\displaystyle g_1(n) := n^{it_0}; \quad g_2(n) := n^{-2it_0}; \quad g_3(n) := n^{it_0}.

For {X/2 \leq n \leq X}, Taylor expansion gives

\displaystyle (n+1)^{it} = n^{it_0} \exp( i t_0 / n ) + o(1)


\displaystyle (n+2)^{it} = n^{it_0} \exp( 2 i t_0 / n ) + o(1)

and hence

\displaystyle g_1(n) g_2(n+1) g_3(n+2) = 1 + o(1)

and hence

\displaystyle |\frac{1}{X} \sum_{X/2 \leq n \leq X} g_1(n) g_2(n+1) g_3(n+2)| \gg 1.

On the other hand one can easily verify that all of the {g_1,g_2,g_3} obey (4) (the restriction {|t| \leq AX} there prevents {t} from getting anywhere close to {t_0}). So it seems the correct non-asymptotic version of the Elliott conjecture is the following:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let {k} be a natural number, and let {h_1,\dots,h_k} be integers. Let {\varepsilon > 0}, let {A} be sufficiently large depending on {k,\varepsilon,h_1,\dots,h_k}, and let {X} be sufficiently large depending on {k,\varepsilon,h_1,\dots,h_k,A}. Let {g_1,\dots,g_k} be bounded multiplicative functions such that for some {1 \leq i \leq k}, one has

\displaystyle \inf_{|t| \leq AX^{k-1}} \sum_{p \leq X} \frac{1 - \hbox{Re}( g_i(p) \overline{\chi(p)} p^{-it})}{p} \geq A

for all Dirichlet characters {\chi} of conductor at most {A}. Then

\displaystyle |\frac{1}{X} \sum_{n \leq X} g_1(n+h_1) \dots g_k(n+h_k)| \leq \varepsilon.

The {k=1} case of this conjecture follows from the work of Halasz; in my recent paper a logarithmically averaged version of the {k=2} case of this conjecture is established. The requirement to take {t} to be as large as {A X^{k-1}} does not emerge in the averaged Elliott conjecture in my previous paper with Matomaki and Radziwill; it thus seems that this averaging has concealed some of the subtler features of the Elliott conjecture. (However, this subtlety does not seem to affect the asymptotic version of the conjecture formulated in that paper, in which the hypothesis is of the form (3), and the conclusion is of the form (1).)

A similar subtlety arises when trying to control the maximal integral

\displaystyle \frac{1}{X} \int_X^{2X} \sup_\alpha \frac{1}{H} |\sum_{x \leq n \leq x+H} g(n) e(\alpha n)|\ dx. \ \ \ \ \ (5)

In my previous paper with Matomaki and Radziwill, we could show that easier expression

\displaystyle \frac{1}{X} \sup_\alpha \int_X^{2X} \frac{1}{H} |\sum_{x \leq n \leq x+H} g(n) e(\alpha n)|\ dx. \ \ \ \ \ (6)

was small (for {H} a slowly growing function of {X}) if {g} was bounded and completely multiplicative, and one had a condition of the form

\displaystyle \inf_{|t| \leq AX} \sum_{p \leq X} \frac{1 - \hbox{Re}( g(p) \overline{\chi(p)} p^{-it})}{p} \geq A \ \ \ \ \ (7)

for some large {A}. However, to obtain an analogous bound for (5) it now appears that one needs to strengthen the above condition to

\displaystyle \inf_{|t| \leq AX^2} \sum_{p \leq X} \frac{1 - \hbox{Re}( g(p) \overline{\chi(p)} p^{-it})}{p} \geq A

in order to address the counterexample in which {g(n) = n^{it_0}} for some {t_0} between {X} and {X^2}. This seems to suggest that proving (5) (which is closely related to the {k=3} case of the Chowla conjecture) could in fact be rather difficult; the estimation of (6) relied primarily of prior work of Matomaki and Radziwill which used the hypothesis (7), but as this hypothesis is not sufficient to conclude (5), some additional input must also be used.

I recently learned about a curious operation on square matrices known as sweeping, which is used in numerical linear algebra (particularly in applications to statistics), as a useful and more robust variant of the usual Gaussian elimination operations seen in undergraduate linear algebra courses. Given an {n \times n} matrix {A := (a_{ij})_{1 \leq i,j \leq n}} (with, say, complex entries) and an index {1 \leq k \leq n}, with the entry {a_{kk}} non-zero, the sweep {\hbox{Sweep}_k[A] = (\hat a_{ij})_{1 \leq i,j \leq n}} of {A} at {k} is the matrix given by the formulae

\displaystyle  \hat a_{ij} := a_{ij} - \frac{a_{ik} a_{kj}}{a_{kk}}

\displaystyle  \hat a_{ik} := \frac{a_{ik}}{a_{kk}}

\displaystyle  \hat a_{kj} := \frac{a_{kj}}{a_{kk}}

\displaystyle  \hat a_{kk} := \frac{-1}{a_{kk}}

for all {i,j \in \{1,\dots,n\} \backslash \{k\}}. Thus for instance if {k=1}, and {A} is written in block form as

\displaystyle  A = \begin{pmatrix} a_{11} & X \\ Y & B \end{pmatrix} \ \ \ \ \ (1)

for some {1 \times n-1} row vector {X}, {n-1 \times 1} column vector {Y}, and {n-1 \times n-1} minor {B}, one has

\displaystyle  \hbox{Sweep}_1[A] = \begin{pmatrix} -1/a_{11} & X / a_{11} \\ Y/a_{11} & B - a_{11}^{-1} YX \end{pmatrix}. \ \ \ \ \ (2)

The inverse sweep operation {\hbox{Sweep}_k^{-1}[A] = (\check a_{ij})_{1 \leq i,j \leq n}} is given by a nearly identical set of formulae:

\displaystyle  \check a_{ij} := a_{ij} - \frac{a_{ik} a_{kj}}{a_{kk}}

\displaystyle  \check a_{ik} := -\frac{a_{ik}}{a_{kk}}

\displaystyle  \check a_{kj} := -\frac{a_{kj}}{a_{kk}}

\displaystyle  \check a_{kk} := \frac{-1}{a_{kk}}

for all {i,j \in \{1,\dots,n\} \backslash \{k\}}. One can check that these operations invert each other. Actually, each sweep turns out to have order {4}, so that {\hbox{Sweep}_k^{-1} = \hbox{Sweep}_k^3}: an inverse sweep performs the same operation as three forward sweeps. Sweeps also preserve the space of symmetric matrices (allowing one to cut down computational run time in that case by a factor of two), and behave well with respect to principal minors; a sweep of a principal minor is a principal minor of a sweep, after adjusting indices appropriately.

Remarkably, the sweep operators all commute with each other: {\hbox{Sweep}_k \hbox{Sweep}_l = \hbox{Sweep}_l \hbox{Sweep}_k}. If {1 \leq k \leq n} and we perform the first {k} sweeps (in any order) to a matrix

\displaystyle  A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}

with {A_{11}} a {k \times k} minor, {A_{12}} a {k \times n-k} matrix, {A_{12}} a {n-k \times k} matrix, and {A_{22}} a {n-k \times n-k} matrix, one obtains the new matrix

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_k[A] = \begin{pmatrix} -A_{11}^{-1} & A_{11}^{-1} A_{12} \\ A_{21} A_{11}^{-1} & A_{22} - A_{21} A_{11}^{-1} A_{12} \end{pmatrix}.

Note the appearance of the Schur complement in the bottom right block. Thus, for instance, one can essentially invert a matrix {A} by performing all {n} sweeps:

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_n[A] = -A^{-1}.

If a matrix has the form

\displaystyle  A = \begin{pmatrix} B & X \\ Y & a \end{pmatrix}

for a {n-1 \times n-1} minor {B}, {n-1 \times 1} column vector {X}, {1 \times n-1} row vector {Y}, and scalar {a}, then performing the first {n-1} sweeps gives

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_{n-1}[A] = \begin{pmatrix} -B^{-1} & B^{-1} X \\ Y B^{-1} & a - Y B^{-1} X \end{pmatrix}

and all the components of this matrix are usable for various numerical linear algebra applications in statistics (e.g. in least squares regression). Given that sweeps behave well with inverses, it is perhaps not surprising that sweeps also behave well under determinants: the determinant of {A} can be factored as the product of the entry {a_{kk}} and the determinant of the {n-1 \times n-1} matrix formed from {\hbox{Sweep}_k[A]} by removing the {k^{th}} row and column. As a consequence, one can compute the determinant of {A} fairly efficiently (so long as the sweep operations don’t come close to dividing by zero) by sweeping the matrix for {k=1,\dots,n} in turn, and multiplying together the {kk^{th}} entry of the matrix just before the {k^{th}} sweep for {k=1,\dots,n} to obtain the determinant.

It turns out that there is a simple geometric explanation for these seemingly magical properties of the sweep operation. Any {n \times n} matrix {A} creates a graph {\hbox{Graph}[A] := \{ (X, AX): X \in {\bf R}^n \}} (where we think of {{\bf R}^n} as the space of column vectors). This graph is an {n}-dimensional subspace of {{\bf R}^n \times {\bf R}^n}. Conversely, most subspaces of {{\bf R}^n \times {\bf R}^n} arises as graphs; there are some that fail the vertical line test, but these are a positive codimension set of counterexamples.

We use {e_1,\dots,e_n,f_1,\dots,f_n} to denote the standard basis of {{\bf R}^n \times {\bf R}^n}, with {e_1,\dots,e_n} the standard basis for the first factor of {{\bf R}^n} and {f_1,\dots,f_n} the standard basis for the second factor. The operation of sweeping the {k^{th}} entry then corresponds to a ninety degree rotation {\hbox{Rot}_k: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n \times {\bf R}^n} in the {e_k,f_k} plane, that sends {f_k} to {e_k} (and {e_k} to {-f_k}), keeping all other basis vectors fixed: thus we have

\displaystyle  \hbox{Graph}[ \hbox{Sweep}_k[A] ] = \hbox{Rot}_k \hbox{Graph}[A]

for generic {n \times n} {A} (more precisely, those {A} with non-vanishing entry {a_{kk}}). For instance, if {k=1} and {A} is of the form (1), then {\hbox{Graph}[A]} is the set of tuples {(r,R,s,S) \in {\bf R} \times {\bf R}^{n-1} \times {\bf R} \times {\bf R}^{n-1}} obeying the equations

\displaystyle  a_{11} r + X R = s

\displaystyle  Y r + B R = S.

The image of {(r,R,s,S)} under {\hbox{Rot}_1} is {(s, R, -r, S)}. Since we can write the above system of equations (for {a_{11} \neq 0}) as

\displaystyle  \frac{-1}{a_{11}} s + \frac{X}{a_{11}} R = -r

\displaystyle  \frac{Y}{a_{11}} s + (B - a_{11}^{-1} YX) R = S

we see from (2) that {\hbox{Rot}_1 \hbox{Graph}[A]} is the graph of {\hbox{Sweep}_1[A]}. Thus the sweep operation is a multidimensional generalisation of the high school geometry fact that the line {y = mx} in the plane becomes {y = \frac{-1}{m} x} after applying a ninety degree rotation.

It is then an instructive exercise to use this geometric interpretation of the sweep operator to recover all the remarkable properties about these operations listed above. It is also useful to compare the geometric interpretation of sweeping as rotation of the graph to that of Gaussian elimination, which instead shears and reflects the graph by various elementary transformations (this is what is going on geometrically when one performs Gaussian elimination on an augmented matrix). Rotations are less distorting than shears, so one can see geometrically why sweeping can produce fewer numerical artefacts than Gaussian elimination.

Let {X} and {Y} be two random variables taking values in the same (discrete) range {R}, and let {E} be some subset of {R}, which we think of as the set of “bad” outcomes for either {X} or {Y}. If {X} and {Y} have the same probability distribution, then clearly

\displaystyle  {\bf P}( X \in E ) = {\bf P}( Y \in E ).

In particular, if it is rare for {Y} to lie in {E}, then it is also rare for {X} to lie in {E}.

If {X} and {Y} do not have exactly the same probability distribution, but their probability distributions are close to each other in some sense, then we can expect to have an approximate version of the above statement. For instance, from the definition of the total variation distance {\delta(X,Y)} between two random variables (or more precisely, the total variation distance between the probability distributions of two random variables), we see that

\displaystyle  {\bf P}(Y \in E) - \delta(X,Y) \leq {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \delta(X,Y) \ \ \ \ \ (1)

for any {E \subset R}. In particular, if it is rare for {Y} to lie in {E}, and {X,Y} are close in total variation, then it is also rare for {X} to lie in {E}.

A basic inequality in information theory is Pinsker’s inequality

\displaystyle  \delta(X,Y) \leq \sqrt{\frac{1}{2} D_{KL}(X||Y)}

where the Kullback-Leibler divergence {D_{KL}(X||Y)} is defined by the formula

\displaystyle  D_{KL}(X||Y) = \sum_{x \in R} {\bf P}( X=x ) \log \frac{{\bf P}(X=x)}{{\bf P}(Y=x)}.

(See this previous blog post for a proof of this inequality.) A standard application of Jensen’s inequality reveals that {D_{KL}(X||Y)} is non-negative (Gibbs’ inequality), and vanishes if and only if {X}, {Y} have the same distribution; thus one can think of {D_{KL}(X||Y)} as a measure of how close the distributions of {X} and {Y} are to each other, although one should caution that this is not a symmetric notion of distance, as {D_{KL}(X||Y) \neq D_{KL}(Y||X)} in general. Inserting Pinsker’s inequality into (1), we see for instance that

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \sqrt{\frac{1}{2} D_{KL}(X||Y)}.

Thus, if {X} is close to {Y} in the Kullback-Leibler sense, and it is rare for {Y} to lie in {E}, then it is rare for {X} to lie in {E} as well.

We can specialise this inequality to the case when {Y} a uniform random variable {U} on a finite range {R} of some cardinality {N}, in which case the Kullback-Leibler divergence {D_{KL}(X||U)} simplifies to

\displaystyle  D_{KL}(X||U) = \log N - {\bf H}(X)


\displaystyle  {\bf H}(X) := \sum_{x \in R} {\bf P}(X=x) \log \frac{1}{{\bf P}(X=x)}

is the Shannon entropy of {X}. Again, a routine application of Jensen’s inequality shows that {{\bf H}(X) \leq \log N}, with equality if and only if {X} is uniformly distributed on {R}. The above inequality then becomes

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(U \in E) + \sqrt{\frac{1}{2}(\log N - {\bf H}(X))}. \ \ \ \ \ (2)

Thus, if {E} is a small fraction of {R} (so that it is rare for {U} to lie in {E}), and the entropy of {X} is very close to the maximum possible value of {\log N}, then it is rare for {X} to lie in {E} also.

The inequality (2) is only useful when the entropy {{\bf H}(X)} is close to {\log N} in the sense that {{\bf H}(X) = \log N - O(1)}, otherwise the bound is worse than the trivial bound of {{\bf P}(X \in E) \leq 1}. In my recent paper on the Chowla and Elliott conjectures, I ended up using a variant of (2) which was still non-trivial when the entropy {{\bf H}(X)} was allowed to be smaller than {\log N - O(1)}. More precisely, I used the following simple inequality, which is implicit in the arguments of that paper but which I would like to make more explicit in this post:

Lemma 1 (Pinsker-type inequality) Let {X} be a random variable taking values in a finite range {R} of cardinality {N}, let {U} be a uniformly distributed random variable in {R}, and let {E} be a subset of {R}. Then

\displaystyle  {\bf P}(X \in E) \leq \frac{(\log N - {\bf H}(X)) + \log 2}{\log 1/{\bf P}(U \in E)}.

Proof: Consider the conditional entropy {{\bf H}(X | 1_{X \in E} )}. On the one hand, we have

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf H}(X, 1_{X \in E}) - {\bf H}(1_{X \in E} )

\displaystyle  = {\bf H}(X) - {\bf H}(1_{X \in E})

\displaystyle  \geq {\bf H}(X) - \log 2

by Jensen’s inequality. On the other hand, one has

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf P}(X \in E) {\bf H}(X | X \in E )

\displaystyle  + (1-{\bf P}(X \in E)) {\bf H}(X | X \not \in E)

\displaystyle  \leq {\bf P}(X \in E) \log |E| + (1-{\bf P}(X \in E)) \log N

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{N}{|E|}

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{1}{{\bf P}(U \in E)},

where we have again used Jensen’s inequality. Putting the two inequalities together, we obtain the claim. \Box

Remark 2 As noted in comments, this inequality can be viewed as a special case of the more general inequality

\displaystyle  {\bf P}(X \in E) \leq \frac{D(X||Y) + \log 2}{\log 1/{\bf P}(Y \in E)}

for arbitrary random variables {X,Y} taking values in the same discrete range {R}, which follows from the data processing inequality

\displaystyle  D( f(X)||f(Y)) \leq D(X|| Y)

for arbitrary functions {f}, applied to the indicator function {f = 1_E}. Indeed one has

\displaystyle  D( 1_E(X) || 1_E(Y) ) = {\bf P}(X \in E) \log \frac{{\bf P}(X \in E)}{{\bf P}(Y \in E)}

\displaystyle + {\bf P}(X \not \in E) \log \frac{{\bf P}(X \not \in E)}{{\bf P}(Y \not \in E)}

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - h( {\bf P}(X \in E) )

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - \log 2

where {h(u) := u \log \frac{1}{u} + (1-u) \log \frac{1}{1-u}} is the entropy function.

Thus, for instance, if one has

\displaystyle  {\bf H}(X) \geq \log N - o(K)


\displaystyle  {\bf P}(U \in E) \leq \exp( - K )

for some {K} much larger than {1} (so that {1/K = o(1)}), then

\displaystyle  {\bf P}(X \in E) = o(1).

More informally: if the entropy of {X} is somewhat close to the maximum possible value of {\log N}, and it is exponentially rare for a uniform variable to lie in {E}, then it is still somewhat rare for {X} to lie in {E}. The estimate given is close to sharp in this regime, as can be seen by calculating the entropy of a random variable {X} which is uniformly distributed inside a small set {E} with some probability {p} and uniformly distributed outside of {E} with probability {1-p}, for some parameter {0 \leq p \leq 1}.

It turns out that the above lemma combines well with concentration of measure estimates; in my paper, I used one of the simplest such estimates, namely Hoeffding’s inequality, but there are of course many other estimates of this type (see e.g. this previous blog post for some others). Roughly speaking, concentration of measure inequalities allow one to make approximations such as

\displaystyle  F(U) \approx {\bf E} F(U)

with exponentially high probability, where {U} is a uniform distribution and {F} is some reasonable function of {U}. Combining this with the above lemma, we can then obtain approximations of the form

\displaystyle  F(X) \approx {\bf E} F(U) \ \ \ \ \ (3)

with somewhat high probability, if the entropy of {X} is somewhat close to maximum. This observation, combined with an “entropy decrement argument” that allowed one to arrive at a situation in which the relevant random variable {X} did have a near-maximum entropy, is the key new idea in my recent paper; for instance, one can use the approximation (3) to obtain an approximation of the form

\displaystyle  \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+j+p) 1_{p|n+j}

\displaystyle  \approx \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+j+p)}{p}

for “most” choices of {n} and a suitable choice of {H} (with the latter being provided by the entropy decrement argument). The left-hand side is tied to Chowla-type sums such as {\sum_{n \leq x} \frac{\lambda(n)\lambda(n+1)}{n}} through the multiplicativity of {\lambda}, while the right-hand side, being a linear correlation involving two parameters {j,p} rather than just one, has “finite complexity” and can be treated by existing techniques such as the Hardy-Littlewood circle method. One could hope that one could similarly use approximations such as (3) in other problems in analytic number theory or combinatorics.

The Chowla conjecture asserts that all non-trivial correlations of the Liouville function are asymptotically negligible; for instance, it asserts that

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+h) = o(X)

as {X \rightarrow \infty} for any fixed natural number {h}. This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h).

For such correlations, the conjecture was that one had

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h) = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any natural number {h}, as long as {g} was a completely multiplicative function with magnitude bounded by {1}, and such that

\displaystyle  \sum_p \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p} = +\infty \ \ \ \ \ (2)

for any Dirichlet character {\chi} and any real number {t}. In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function {g} does not “pretend” to be like the completely multiplicative function {n \mapsto \chi(n) n^{it}} for any character {\chi} and real number {t}. A condition of this form is necessary; for instance, if {g(n)} is precisely equal to {\chi(n) n^{it}} and {\chi} has period {q}, then {g(n) \overline{g}(n+q)} is equal to {1_{(n,q)=1} + o(1)} as {n \rightarrow \infty} and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample {g} having the property that {g} “pretends” locally to be the function {n \mapsto n^{it_j}} for {n} in various intervals {[1, X_j]}, where {X_j} and {t_j} go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when {g} is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

\displaystyle  \sup_{|t| \leq X} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \rightarrow +\infty

as {X \rightarrow +\infty} for all fixed Dirichlet characters {\chi}. In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the {X} parameter does not go to infinity, or equivalently that the function {g} is permitted to depend on {X}:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let {\varepsilon > 0}, let {A \geq 1} be sufficiently large depending on {\varepsilon}, and let {X} be sufficiently large depending on {A,\varepsilon}. Suppose that {g} is a completely multiplicative function with magnitude bounded by {1}, such that

\displaystyle  \inf_{|t| \leq AX} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \geq A

for all Dirichlet characters {\chi} of period at most {A}. Then one has

\displaystyle  |\sum_{n \leq X} g(n) \overline{g(n+h)}| \leq \varepsilon X

for all natural numbers {1 \leq h \leq 1/\varepsilon}.

The {\varepsilon}-dependent factor {A} in the constraint {|t| \leq AX} is necessary, as can be seen by considering the completely multiplicative function {g(n) := n^{2iX}} (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture) For any function {f: {\bf N} \rightarrow \{-1,+1\}}, the discrepancy

\displaystyle  \sup_{n,d \in {\bf N}} |\sum_{j=1}^n f(jd)|

is infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character {f(n) = \chi(n)} that takes values in {\{-1,0,+1\}} rather than {\{-1,+1\}}. For this function, one has from the complete multiplicativity of {\chi} that

\displaystyle  |\sum_{j=1}^n f(jd)| = |\sum_{j=1}^n \chi(j) \chi(d)|

\displaystyle  \leq |\sum_{j=1}^n \chi(j)|.

If {q} denotes the period of {\chi}, then {\chi} has mean zero on every interval of length {q}, and thus

\displaystyle  |\sum_{j=1}^n f(jd)| \leq |\sum_{j=1}^n \chi(j)| \leq q.

Thus {\chi} has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because {\chi} can take the value {0}. Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let {\chi = \chi_3} be the non-principal Dirichlet character of period {3} (thus {\chi(n)} equals {+1} when {n=1 \hbox{ mod } 3}, {-1} when {n = 2 \hbox{ mod } 3}, and {0} when {n = 0 \hbox{ mod } 3}), and define the completely multiplicative function {f = \tilde \chi: {\bf N} \rightarrow \{-1,+1\}} by setting {\tilde \chi(p) := \chi(p)} when {p \neq 3} and {\tilde \chi(3) = +1}. This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

\displaystyle  \sum_{j=1}^n \tilde \chi(j)

with {n := 1 + 3 + 3^2 + \dots + 3^k} for some large {k}. Writing {j = 3^a m} with {m} coprime to {3} and {a} at most {k}, we can write this sum as

\displaystyle  \sum_{a=0}^k \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m).

Now observe that {\tilde \chi(3^a m) = \tilde \chi(3)^a \tilde \chi(m) = \chi(m)}. The function {\chi} has mean zero on every interval of length three, and {\lfloor n/3^j\rfloor} is equal to {1} mod {3}, and thus

\displaystyle  \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m) = 1

for every {a=0,\dots,k}, and thus

\displaystyle  \sum_{j=1}^n \tilde \chi(j) = k+1 \gg \log n.

Thus {\tilde \chi} also has unbounded discrepancy, but only barely so (it grows logarithmically in {n}). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions {f} that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when {f} is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy) Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions {g}. An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then {g} must pretend to be like a function of the form {n \mapsto \chi(n) n^{it}}. One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

Read the rest of this entry »

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)


for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))


\displaystyle \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)


as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)


Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)


where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)


whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

Read the rest of this entry »

The Poincaré upper half-plane {{\mathbf H} := \{ z: \hbox{Im}(z) > 0 \}} (with a boundary consisting of the real line {{\bf R}} together with the point at infinity {\infty}) carries an action of the projective special linear group

\displaystyle  \hbox{PSL}_2({\bf R}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf R}: ad-bc = 1 \} / \{\pm 1\}

via fractional linear transformations:

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z := \frac{az+b}{cz+d}. \ \ \ \ \ (1)

Here and in the rest of the post we will abuse notation by identifying elements {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of the special linear group {\hbox{SL}_2({\bf R})} with their equivalence class {\{ \pm \begin{pmatrix} a & b \\ c & d \end{pmatrix} \}} in {\hbox{PSL}_2({\bf R})}; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} by its negation {\begin{pmatrix} -a & -b \\ -c & -d \end{pmatrix}}. In particular, we recommend that the reader ignore the signs {\pm} that appear from time to time in the discussion below.

As the action of {\hbox{PSL}_2({\bf R})} on {{\mathbf H}} is transitive, and any given point in {{\mathbf H}} (e.g. {i}) has a stabiliser isomorphic to the projective rotation group {\hbox{PSO}_2({\bf R})}, we can view the Poincaré upper half-plane {{\mathbf H}} as a homogeneous space for {\hbox{PSL}_2({\bf R})}, and more specifically the quotient space of {\hbox{PSL}_2({\bf R})} of a maximal compact subgroup {\hbox{PSO}_2({\bf R})}. In fact, we can make the half-plane a symmetric space for {\hbox{PSL}_2({\bf R})}, by endowing {{\mathbf H}} with the Riemannian metric

\displaystyle  dg^2 := \frac{dx^2 + dy^2}{y^2}

(using Cartesian coordinates {z=x+iy}), which is invariant with respect to the {\hbox{PSL}_2({\bf R})} action. Like any other Riemannian metric, the metric on {{\mathbf H}} generates a number of other important geometric objects on {{\mathbf H}}, such as the distance function {d(z,w)} which can be computed to be given by the formula

\displaystyle  2(\cosh(d(z_1,z_2))-1) = \frac{|z_1-z_2|^2}{\hbox{Im}(z_1) \hbox{Im}(z_2)}, \ \ \ \ \ (2)

the volume measure {\mu = \mu_{\mathbf H}}, which can be computed to be

\displaystyle  d\mu = \frac{dx dy}{y^2},

and the Laplace-Beltrami operator, which can be computed to be {\Delta = y^2 (\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2})} (here we use the negative definite sign convention for {\Delta}). As the metric {dg} was {\hbox{PSL}_2({\bf R})}-invariant, all of these quantities arising from the metric are similarly {\hbox{PSL}_2({\bf R})}-invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant {-1}, thus {{\mathbf H}} is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere {S^2} in {{\bf R}^3} is a model for two-dimensional spherical geometry (or {{\bf R}^2} is a model for two-dimensional Euclidean geometry). (Indeed, {{\mathbf H}} is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid {\{ (x,t) \in {\bf R}^{2+1}: t = \sqrt{1+|x|^2}\}} in the Minkowski spacetime {{\bf R}^{2+1}}, which is the direct analogue of the unit sphere in Euclidean spacetime {{\bf R}^3} or the plane {{\bf R}^2} in Galilean spacetime {{\bf R}^2 \times {\bf R}}.)

One can inject arithmetic into this geometric structure by passing from the Lie group {\hbox{PSL}_2({\bf R})} to the full modular group

\displaystyle  \hbox{PSL}_2({\bf Z}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf Z}: ad-bc = 1 \} / \{\pm 1\}

or congruence subgroups such as

\displaystyle  \Gamma_0(q) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \hbox{PSL}_2({\bf Z}): c = 0\ (q) \} / \{ \pm 1 \} \ \ \ \ \ (3)

for natural number {q}, or to the discrete stabiliser {\Gamma_\infty} of the point at infinity:

\displaystyle  \Gamma_\infty := \{ \pm \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}: b \in {\bf Z} \} / \{\pm 1\}. \ \ \ \ \ (4)

These are discrete subgroups of {\hbox{PSL}_2({\bf R})}, nested by the subgroup inclusions

\displaystyle  \Gamma_\infty \leq \Gamma_0(q) \leq \Gamma_0(1)=\hbox{PSL}_2({\bf Z}) \leq \hbox{PSL}_2({\bf R}).

There are many further discrete subgroups of {\hbox{PSL}_2({\bf R})} (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup {\Gamma} of {\hbox{PSL}_2({\bf R})} generates a quotient space {\Gamma \backslash {\mathbf H}}, which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain {\hbox{Fund}( \Gamma \backslash {\mathbf H})} – a set consisting of a single representative of each of the orbits {\Gamma z} of {\Gamma} in {{\mathbf H}}. This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view {\Gamma \backslash {\mathbf H}} as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure {\mu = \mu_{\Gamma \backslash {\mathbf H}}} on {\Gamma \backslash {\mathbf H}} from the volume measure {\mu = \mu_{\mathbf H}} on {{\mathbf H}} (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as {\mu} when there is no chance of confusion.

For instance, a fundamental domain for {\Gamma_\infty \backslash {\mathbf H}} is given (up to null sets) by the strip {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2} \}}, with {\Gamma_\infty \backslash {\mathbf H}} identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} is famously given (again up to null sets) by an upper portion {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2}; |z| > 1 \}}, with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for {\Gamma_0(q) \backslash {\mathbf H}} can be formed by gluing together

\displaystyle  [\hbox{PSL}_2({\bf Z}) : \Gamma_0(q)] = q \prod_{p|q} (1 + \frac{1}{p}) = q^{1+o(1)}

copies of a fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces {\Gamma \backslash {\mathbf H}}. In order to analyse functions {f: \Gamma \backslash {\mathbf H} \rightarrow {\bf C}} on such orbifolds, it is convenient to lift such functions back up to {{\mathbf H}} and identify them with functions {f: {\mathbf H} \rightarrow {\bf C}} which are {\Gamma}-automorphic in the sense that {f( \gamma z ) = f(z)} for all {z \in {\mathbf H}} and {\gamma \in \Gamma}. Such functions will be referred to as {\Gamma}-automorphic forms, or automorphic forms for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over {\Gamma \backslash {\mathbf H}} than the trivial bundle {(\Gamma \backslash {\mathbf H}) \times {\bf C}} that is implicitly present when analysing scalar functions {f: {\mathbf H} \rightarrow {\bf C}}. However, we will not discuss this (important) more general situation here.)

An important way to create a {\Gamma}-automorphic form is to start with a non-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series {P_\Gamma[f]: {\mathbf H} \rightarrow {\bf C}} defined by

\displaystyle  P_{\Gamma}[f](z) = \sum_{\gamma \in \Gamma} f(\gamma z),

which is clearly {\Gamma}-automorphic. (One could equivalently write {f(\gamma^{-1} z)} in place of {f(\gamma z)} here; there are good argument for both conventions, but I have ultimately decided to use the {f(\gamma z)} convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over {\Gamma} associated with {\Gamma}-automorphic forms. A little more generally, given a subgroup {\Gamma_\infty} of {\Gamma} and a {\Gamma_\infty}-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} of suitable decay, we can form a relative Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[f]: {\mathbf H} \rightarrow {\bf C}} by

\displaystyle  P_{\Gamma_\infty \backslash \Gamma}[f](z) = \sum_{\gamma \in \hbox{Fund}(\Gamma_\infty \backslash \Gamma)} f(\gamma z)

where {\hbox{Fund}(\Gamma_\infty \backslash \Gamma)} is any fundamental domain for {\Gamma_\infty \backslash \Gamma}, that is to say a subset of {\Gamma} consisting of exactly one representative for each right coset of {\Gamma_\infty}. As {f} is {\Gamma_\infty}-automorphic, we see (if {f} has suitable decay) that {P_{\Gamma_\infty \backslash \Gamma}[f]} does not depend on the precise choice of fundamental domain, and is {\Gamma}-automorphic. These operations are all compatible with each other, for instance {P_\Gamma = P_{\Gamma_\infty \backslash \Gamma} \circ P_{\Gamma_\infty}}. A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function {f}.

For future reference we record the basic but fundamental unfolding identities

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_\Gamma[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\mathbf H} f g\ d\mu_{\mathbf H} \ \ \ \ \ (5)

for any function {f: {\mathbf H} \rightarrow {\bf C}} with sufficient decay, and any {\Gamma}-automorphic function {g} of reasonable growth (e.g. {f} bounded and compact support, and {g} bounded, will suffice). Note that {g} is viewed as a function on {\Gamma \backslash {\mathbf H}} on the left-hand side, and as a {\Gamma}-automorphic function on {{\mathbf H}} on the right-hand side. More generally, one has

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_{\Gamma_\infty \backslash \Gamma}[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\Gamma_\infty \backslash {\mathbf H}} f g\ d\mu_{\Gamma_\infty \backslash {\mathbf H}} \ \ \ \ \ (6)

whenever {\Gamma_\infty \leq \Gamma} are discrete subgroups of {\hbox{PSL}_2({\bf R})}, {f} is a {\Gamma_\infty}-automorphic function with sufficient decay on {\Gamma_\infty \backslash {\mathbf H}}, and {g} is a {\Gamma}-automorphic (and thus also {\Gamma_\infty}-automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains {{\mathbf H}}, {\Gamma_\infty \backslash {\mathbf H}}, and {\Gamma \backslash {\mathbf H}} in our analysis.

When computing various statistics of a Poincaré series {P_\Gamma[f]}, such as its values {P_\Gamma[f](z)} at special points {z}, or the {L^2} quantity {\int_{\Gamma \backslash {\mathbf H}} |P_\Gamma[f]|^2\ d\mu}, expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1), \ \ \ \ \ (7)

where {\tau(n) := \sum_{d|n} 1} is the divisor function. This can be rewritten (by factoring {n=bc} and {n+1=ad}) as

\displaystyle  \sum_{ a,b,c,d \in {\bf N}: ad-bc = 1} 1_{bc \leq x} \ \ \ \ \ (8)

which is basically a sum over the full modular group {\hbox{PSL}_2({\bf Z})}. At this point we will “cheat” a little by moving to the related, but different, sum

\displaystyle  \sum_{a,b,c,d \in {\bf Z}: ad-bc = 1} 1_{a^2+b^2+c^2+d^2 \leq x}. \ \ \ \ \ (9)

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between {i} and {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} is given by the formula

\displaystyle  2(\cosh(d(i,\begin{pmatrix} a & b \\ c & d \end{pmatrix} i))-1) = a^2+b^2+c^2+d^2 - 2

and so one can express the above sum as

\displaystyle  2 \sum_{\gamma \in \hbox{PSL}_2({\bf Z})} 1_{d(i,\gamma i) \leq \hbox{cosh}^{-1}(x/2)}

(the factor of {2} coming from the quotient by {\{\pm 1\}} in the projective special linear group); one can express this as {P_\Gamma[f](i)}, where {\Gamma = \hbox{PSL}_2({\bf Z})} and {f} is the indicator function of the ball {B(i, \hbox{cosh}^{-1}(x/2))}. Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

\displaystyle  \sum_{n \leq x} \tau(n^2+1) \ \ \ \ \ (10)

of the sum (7). Note from multiplicativity that (7) can be written as {\sum_{n \leq x} \tau(n^2+n)}, which is superficially very similar to (10), but with the key difference that the polynomial {n^2+1} is irreducible over the integers.

As with (7), we may expand (10) as

\displaystyle  \sum_{A,B,C \in {\bf N}: B^2 - AC = -1} 1_{B \leq x}.

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation {B + i = (a-bi) (c+di)} of {B+i} into Gaussian integers {a-bi, c+di} gives rise (upon taking norms) to an identity of the form {B^2 - AC = -1}, where {A = a^2+b^2} and {C = c^2+d^2}. Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form {B^2-AC=-1} gives rise to a factorisation of the form {B+i = (a-bi) (c+di)}, essentially uniquely up to units. Now note that {(a-bi)(c+di)} is of the form {B+i} if and only if {ad-bc=1}, in which case {B = ac+bd}. Thus we can essentially write the above sum as something like

\displaystyle  \sum_{a,b,c,d: ad-bc = 1} 1_{|ac+bd| \leq x} \ \ \ \ \ (11)

and one the modular group {\hbox{PSL}_2({\bf Z})} is now manifest. An equivalent way to see these manipulations is as follows. A triple {A,B,C} of natural numbers with {B^2-AC=1} gives rise to a positive quadratic form {Ax^2+2Bxy+Cy^2} of normalised discriminant {B^2-AC} equal to {-1} with integer coefficients (it is natural here to allow {B} to take integer values rather than just natural number values by essentially doubling the sum). The group {\hbox{PSL}_2({\bf Z})} acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse {\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}} of an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {\hbox{SL}_2({\bf Z})}). Because the discriminant {-1} has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form {Ax^2 + 2Bxy + Cy^2} in this space is equivalent (under the action of some element of {\hbox{PSL}_2({\bf Z})}) with the standard quadratic form {x^2+y^2}. In other words, one has

\displaystyle  Ax^2 + 2Bxy + Cy^2 = (dx-by)^2 + (-cx+ay)^2

which (up to a harmless sign) is exactly the representation {B = ac+bd}, {A = c^2+d^2}, {C = a^2+b^2} introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial {n^2+1} is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} has real part {\frac{ac+bd}{c^2+d^2}} and imaginary part {\frac{1}{c^2+d^2}}. Thus (11) is (up to a factor of two) the Poincaré series {P_\Gamma[f](i)} as in the preceding example, except that {f} is now the indicator of the sector {\{ z: |\hbox{Re} z| \leq x |\hbox{Im} z| \}}.

Sums involving subgroups of the full modular group, such as {\Gamma_0(q)}, often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression {\sum_{n \leq x: q|n} \tau(n^2+1)} when {q} and {x} are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over {\Gamma_0(q)} rather than {\hbox{PSL}_2({\bf Z})}.

The third and final example concerns averages of Kloosterman sums

\displaystyle  S(m,n;c) := \sum_{x \in ({\bf Z}/c{\bf Z})^\times} e( \frac{mx + n\overline{x}}{c} ) \ \ \ \ \ (12)

where {e(\theta) := e^{2p\i i\theta}} and {\overline{x}} is the inverse of {x} in the multiplicative group {({\bf Z}/c{\bf Z})^\times}. It turns out that the {L^2} norms of Poincaré series {P_\Gamma[f]} or {P_{\Gamma_\infty \backslash \Gamma}[f]} are closely tied to such averages. Consider for instance the quantity

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]|^2\ d\mu_{\Gamma \backslash {\mathbf H}} \ \ \ \ \ (13)

where {q} is a natural number and {f} is a {\Gamma_\infty}-automorphic form that is of the form

\displaystyle  f(x+iy) = F(my) e(m x)

for some integer {m} and some test function {f: (0,+\infty) \rightarrow {\bf C}}, which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} \overline{f} P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}.

To compute this, we use the double coset decomposition

\displaystyle  \Gamma_0(q) = \Gamma_\infty \cup \bigcup_{c \in {\mathbf N}: q|c} \bigcup_{1 \leq d \leq c: (d,c)=1} \Gamma_\infty \begin{pmatrix} a & b \\ c & d \end{pmatrix} \Gamma_\infty,

where for each {c,d}, {a,b} are arbitrarily chosen integers such that {ad-bc=1}. To see this decomposition, observe that every element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} in {\Gamma_0(q)} outside of {\Gamma_\infty} can be assumed to have {c>0} by applying a sign {\pm}, and then using the row and column operations coming from left and right multiplication by {\Gamma_\infty} (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place {d} in the interval {[1,c]} and {(a,b)} to be any specified integer pair with {ad-bc=1}. From this we see that

\displaystyle  P_{\Gamma_\infty \backslash \Gamma_0(q)}[f] = f + \sum_{c \in {\mathbf N}: q|c} \sum_{1 \leq d \leq c: (d,c)=1} P_{\Gamma_\infty}[ f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot ) ]

and so from further use of the unfolding formula (5) we may expand (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}

\displaystyle  + \sum_{c \in {\mathbf N}} \sum_{1 \leq d \leq c: (d,c)=1} \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}.

The first integral is just {m \int_0^\infty |F(y)|^2 \frac{dy}{y^2}}. The second expression is more interesting. We have

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z = \frac{az+b}{cz+d} = \frac{a}{c} - \frac{1}{c(cz+d)}

\displaystyle  = \frac{a}{c} - \frac{cx+d}{c((cx+d)^2+c^2y^2)} + \frac{iy}{(cx+d)^2 + c^2y^2}

so we can write

\displaystyle  \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}


\displaystyle  \int_0^\infty \int_{\bf R} \overline{F}(my) F(\frac{imy}{(cx+d)^2 + c^2y^2}) e( -mx + \frac{ma}{c} - m \frac{cx+d}{c((cx+d)^2+c^2y^2)} )

\displaystyle \frac{dx dy}{y^2}

which on shifting {x} by {d/c} simplifies a little to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(my) \bar{F}(\frac{imy}{c^2(x^2 + y^2)}) e(- mx - m \frac{x}{c^2(x^2+y^2)} )

\displaystyle  \frac{dx dy}{y^2}

and then on scaling {x,y} by {m} simplifies a little further to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(y) \bar{F}(\frac{m^2}{c^2} \frac{iy}{x^2 + y^2}) e(- x - \frac{m^2}{c^2} \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

Note that as {ad-bc=1}, we have {a = \overline{d}} modulo {c}. Comparing the above calculations with (12), we can thus write (13) as

\displaystyle  m (\int_0^\infty |F(y)|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c})) \ \ \ \ \ (14)


\displaystyle  V(u) := \frac{1}{u} \int_0^\infty \int_{\bf R} F(y) \bar{F}(u^2 \frac{y}{x^2 + y^2}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}

is a certain integral involving {F} and a parameter {u}, but which does not depend explicitly on parameters such as {m,c,d}. Thus we have indeed expressed the {L^2} expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of {L^2} expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator {-\Delta} on spaces such as {\Gamma\backslash {\mathbf H}} or {\Gamma_\infty \backslash {\mathbf H}}, so that a Poincaré series such as {P_\Gamma[f]} might be expanded out using inner products of {P_\Gamma[f]} (or, by the unfolding identities, {f}) with various generalised eigenfunctions of {-\Delta} (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions {K_{it}} of the second kind, play a prominent role, basically because the {\Gamma_\infty}-automorphic functions

\displaystyle  x+iy \mapsto y^{1/2} K_{it}(2\pi |m| y) e(mx)

for {t \in {\bf R}} and {m \in {\bf Z}} non-zero are generalised eigenfunctions of {-\Delta} (with eigenvalue {\frac{1}{4}+t^2}), and are almost square-integrable on {\Gamma_\infty \backslash {\mathbf H}} (the {L^2} norm diverges only logarithmically at one end {y \rightarrow 0^+} of the cylinder {\Gamma_\infty \backslash {\mathbf H}}, while decaying exponentially fast at the other end {y \rightarrow +\infty}).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as {-\Delta} is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation {(-\Delta + k^2) u = f}, the heat equation {\partial_t u = \Delta u}, the Schrödinger equation {i\partial_t u + \Delta u = 0}, or the wave equation {\partial_{tt} u = \Delta u}. Thus, one can hope to rephrase many arguments that involve spectral data of {-\Delta} into arguments that instead involve resolvents {(-\Delta + k^2)^{-1}}, heat kernels {e^{t\Delta}}, Schrödinger propagators {e^{it\Delta}}, or wave propagators {e^{\pm it\sqrt{-\Delta}}}, or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

\displaystyle  \partial_{tt} u - \Delta u = F.

Actually it will be a bit more convenient to normalise the Laplacian by {\frac{1}{4}}, and look instead at the automorphic wave equation

\displaystyle  \partial_{tt} u + (-\Delta - \frac{1}{4}) u = F. \ \ \ \ \ (15)

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of {\frac{1}{4}} that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance {u} and {F} are {\Gamma_\infty}-automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[u]} and {P_{\Gamma_\infty \backslash \Gamma}[F]} will be {\Gamma}-automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

  • Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for {-\Delta} on {\Gamma_0(q) \backslash {\mathbf H}} (discussed previously here);
  • Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s {3/16} bound to the optimal {1/4}) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
  • Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.