You are currently browsing the category archive for the ‘Mathematics’ category.

In Notes 0, we introduced the notion of a measure space {\Omega = (\Omega, {\mathcal F}, \mu)}, which includes as a special case the notion of a probability space. By selecting one such probability space {(\Omega,{\mathcal F},\mu)} as a sample space, one obtains a model for random events and random variables, with random events {E} being modeled by measurable sets {E_\Omega} in {{\mathcal F}}, and random variables {X} taking values in a measurable space {R} being modeled by measurable functions {X_\Omega: \Omega \rightarrow R}. We then defined some basic operations on these random events and variables:

  • Given events {E,F}, we defined the conjunction {E \wedge F}, the disjunction {E \vee F}, and the complement {\overline{E}}. For countable families {E_1,E_2,\dots} of events, we similarly defined {\bigwedge_{n=1}^\infty E_n} and {\bigvee_{n=1}^\infty E_n}. We also defined the empty event {\emptyset} and the sure event {\overline{\emptyset}}, and what it meant for two events to be equal.
  • Given random variables {X_1,\dots,X_n} in ranges {R_1,\dots,R_n} respectively, and a measurable function {F: R_1 \times \dots \times R_n \rightarrow S}, we defined the random variable {F(X_1,\dots,X_n)} in range {S}. (As the special case {n=0} of this, every deterministic element {s} of {S} was also a random variable taking values in {S}.) Given a relation {P: R_1 \times \dots \times R_n \rightarrow \{\hbox{true}, \hbox{false}\}}, we similarly defined the event {P(X_1,\dots,X_n)}. Conversely, given an event {E}, we defined the indicator random variable {1_E}. Finally, we defined what it meant for two random variables to be equal.
  • Given an event {E}, we defined its probability {{\bf P}(E)}.

These operations obey various axioms; for instance, the boolean operations on events obey the axioms of a Boolean algebra, and the probabilility function {E \mapsto {\bf P}(E)} obeys the Kolmogorov axioms. However, we will not focus on the axiomatic approach to probability theory here, instead basing the foundations of probability theory on the sample space models as discussed in Notes 0. (But see this previous post for a treatment of one such axiomatic approach.)

It turns out that almost all of the other operations on random events and variables we need can be constructed in terms of the above basic operations. In particular, this allows one to safely extend the sample space in probability theory whenever needed, provided one uses an extension that respects the above basic operations. We gave a simple example of such an extension in the previous notes, but now we give a more formal definition:

Definition 1 Suppose that we are using a probability space {\Omega = (\Omega, {\mathcal F}, \mu)} as the model for a collection of events and random variables. An extension of this probability space is a probability space {\Omega' = (\Omega', {\mathcal F}', \mu')}, together with a measurable map {\pi: \Omega' \rightarrow \Omega} (sometimes called the factor map) which is probability-preserving in the sense that

\displaystyle  \mu'( \pi^{-1}(E) ) = \mu(E) \ \ \ \ \ (1)

for all {E \in {\mathcal F}}. (Caution: this does not imply that {\mu(\pi(F)) = \mu'(F)} for all {F \in {\mathcal F}'} – why not?)

An event {E} which is modeled by a measurable subset {E_\Omega} in the sample space {\Omega}, will be modeled by the measurable set {E_{\Omega'} := \pi^{-1}(E_\Omega)} in the extended sample space {\Omega'}. Similarly, a random variable {X} taking values in some range {R} that is modeled by a measurable function {X_\Omega: \Omega \rightarrow R} in {\Omega}, will be modeled instead by the measurable function {X_{\Omega'} := X_\Omega \circ \pi} in {\Omega'}. We also allow the extension {\Omega'} to model additional events and random variables that were not modeled by the original sample space {\Omega} (indeed, this is one of the main reasons why we perform extensions in probability in the first place).

Thus, for instance, the sample space {\Omega'} in Example 3 of the previous post is an extension of the sample space {\Omega} in that example, with the factor map {\pi: \Omega' \rightarrow \Omega} given by the first coordinate projection {\pi(i,j) := i}. One can verify that all of the basic operations on events and random variables listed above are unaffected by the above extension (with one caveat, see remark below). For instance, the conjunction {E \wedge F} of two events can be defined via the original model {\Omega} by the formula

\displaystyle  (E \wedge F)_\Omega := E_\Omega \cap F_\Omega

or via the extension {\Omega'} via the formula

\displaystyle  (E \wedge F)_{\Omega'} := E_{\Omega'} \cap F_{\Omega'}.

The two definitions are consistent with each other, thanks to the obvious set-theoretic identity

\displaystyle  \pi^{-1}( E_\Omega \cap F_\Omega ) = \pi^{-1}(E_\Omega) \cap \pi^{-1}(F_\Omega).

Similarly, the assumption (1) is precisely what is needed to ensure that the probability {\mathop{\bf P}(E)} of an event remains unchanged when one replaces a sample space model with an extension. We leave the verification of preservation of the other basic operations described above under extension as exercises to the reader.

Remark 2 There is one minor exception to this general rule if we do not impose the additional requirement that the factor map {\pi} is surjective. Namely, for non-surjective {\pi}, it can become possible that two events {E, F} are unequal in the original sample space model, but become equal in the extension (and similarly for random variables), although the converse never happens (events that are equal in the original sample space always remain equal in the extension). For instance, let {\Omega} be the discrete probability space {\{a,b\}} with {p_a=1} and {p_b=0}, and let {\Omega'} be the discrete probability space {\{ a'\}} with {p'_{a'}=1}, and non-surjective factor map {\pi: \Omega' \rightarrow \Omega} defined by {\pi(a') := a}. Then the event modeled by {\{b\}} in {\Omega} is distinct from the empty event when viewed in {\Omega}, but becomes equal to that event when viewed in {\Omega'}. Thus we see that extending the sample space by a non-surjective factor map can identify previously distinct events together (though of course, being probability preserving, this can only happen if those two events were already almost surely equal anyway). This turns out to be fairly harmless though; while it is nice to know if two given events are equal, or if they differ by a non-null event, it is almost never useful to know that two events are unequal if they are already almost surely equal. Alternatively, one can add the additional requirement of surjectivity in the definition of an extension, which is also a fairly harmless constraint to impose (this is what I chose to do in this previous set of notes).

Roughly speaking, one can define probability theory as the study of those properties of random events and random variables that are model-independent in the sense that they are preserved by extensions. For instance, the cardinality {|E_\Omega|} of the model {E_\Omega} of an event {E} is not a concept within the scope of probability theory, as it is not preserved by extensions: continuing Example 3 from Notes 0, the event {E} that a die roll {X} is even is modeled by a set {E_\Omega = \{2,4,6\}} of cardinality {3} in the original sample space model {\Omega}, but by a set {E_{\Omega'} = \{2,4,6\} \times \{1,2,3,4,5,6\}} of cardinality {18} in the extension. Thus it does not make sense in the context of probability theory to refer to the “cardinality of an event {E}“.

On the other hand, the supremum {\sup_n X_n} of a collection of random variables {X_n} in the extended real line {[-\infty,+\infty]} is a valid probabilistic concept. This can be seen by manually verifying that this operation is preserved under extension of the sample space, but one can also see this by defining the supremum in terms of existing basic operations. Indeed, note from Exercise 24 of Notes 0 that a random variable {X} in the extended real line is completely specified by the threshold events {(X \leq t)} for {t \in {\bf R}}; in particular, two such random variables {X,Y} are equal if and only if the events {(X \leq t)} and {(Y \leq t)} are surely equal for all {t}. From the identity

\displaystyle  (\sup_n X_n \leq t) = \bigwedge_{n=1}^\infty (X_n \leq t)

we thus see that one can completely specify {\sup_n X_n} in terms of {X_n} using only the basic operations provided in the above list (and in particular using the countable conjunction {\bigwedge_{n=1}^\infty}.) Of course, the same considerations hold if one replaces supremum, by infimum, limit superior, limit inferior, or (if it exists) the limit.

In this set of notes, we will define some further important operations on scalar random variables, in particular the expectation of these variables. In the sample space models, expectation corresponds to the notion of integration on a measure space. As we will need to use both expectation and integration in this course, we will thus begin by quickly reviewing the basics of integration on a measure space, although we will then translate the key results of this theory into probabilistic language.

As the finer details of the Lebesgue integral construction are not the core focus of this probability course, some of the details of this construction will be left to exercises. See also Chapter 1 of Durrett, or these previous blog notes, for a more detailed treatment.

Read the rest of this entry »

Starting this week, I will be teaching an introductory graduate course (Math 275A) on probability theory here at UCLA. While I find myself using probabilistic methods routinely nowadays in my research (for instance, the probabilistic concept of Shannon entropy played a crucial role in my recent paper on the Chowla and Elliott conjectures, and random multiplicative functions similarly played a central role in the paper on the Erdos discrepancy problem), this will actually be the first time I will be teaching a course on probability itself (although I did give a course on random matrix theory some years ago that presumed familiarity with graduate-level probability theory). As such, I will be relying primarily on an existing textbook, in this case Durrett’s Probability: Theory and Examples. I still need to prepare lecture notes, though, and so I thought I would continue my practice of putting my notes online, although in this particular case they will be less detailed or complete than with other courses, as they will mostly be focusing on those topics that are not already comprehensively covered in the text of Durrett. Below the fold are my first such set of notes, concerning the classical measure-theoretic foundations of probability. (I wrote on these foundations also in this previous blog post, but in that post I already assumed that the reader was familiar with measure theory and basic probability, whereas in this course not every student will have a strong background in these areas.)

Note: as this set of notes is primarily concerned with foundational issues, it will contain a large number of pedantic (and nearly trivial) formalities and philosophical points. We dwell on these technicalities in this set of notes primarily so that they are out of the way in later notes, when we work with the actual mathematics of probability, rather than on the supporting foundations of that mathematics. In particular, the excessively formal and philosophical language in this set of notes will not be replicated in later notes.

Read the rest of this entry »

Let {X} and {Y} be two random variables taking values in the same (discrete) range {R}, and let {E} be some subset of {R}, which we think of as the set of “bad” outcomes for either {X} or {Y}. If {X} and {Y} have the same probability distribution, then clearly

\displaystyle  {\bf P}( X \in E ) = {\bf P}( Y \in E ).

In particular, if it is rare for {Y} to lie in {E}, then it is also rare for {X} to lie in {E}.

If {X} and {Y} do not have exactly the same probability distribution, but their probability distributions are close to each other in some sense, then we can expect to have an approximate version of the above statement. For instance, from the definition of the total variation distance {\delta(X,Y)} between two random variables (or more precisely, the total variation distance between the probability distributions of two random variables), we see that

\displaystyle  {\bf P}(Y \in E) - \delta(X,Y) \leq {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \delta(X,Y) \ \ \ \ \ (1)

for any {E \subset R}. In particular, if it is rare for {Y} to lie in {E}, and {X,Y} are close in total variation, then it is also rare for {X} to lie in {E}.

A basic inequality in information theory is Pinsker’s inequality

\displaystyle  \delta(X,Y) \leq \sqrt{\frac{1}{2} D_{KL}(X||Y)}

where the Kullback-Leibler divergence {D_{KL}(X||Y)} is defined by the formula

\displaystyle  D_{KL}(X||Y) = \sum_{x \in R} {\bf P}( X=x ) \log \frac{{\bf P}(X=x)}{{\bf P}(Y=x)}.

(See this previous blog post for a proof of this inequality.) A standard application of Jensen’s inequality reveals that {D_{KL}(X||Y)} is non-negative (Gibbs’ inequality), and vanishes if and only if {X}, {Y} have the same distribution; thus one can think of {D_{KL}(X||Y)} as a measure of how close the distributions of {X} and {Y} are to each other, although one should caution that this is not a symmetric notion of distance, as {D_{KL}(X||Y) \neq D_{KL}(Y||X)} in general. Inserting Pinsker’s inequality into (1), we see for instance that

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(Y \in E) + \sqrt{\frac{1}{2} D_{KL}(X||Y)}.

Thus, if {X} is close to {Y} in the Kullback-Leibler sense, and it is rare for {Y} to lie in {E}, then it is rare for {X} to lie in {E} as well.

We can specialise this inequality to the case when {Y} a uniform random variable {U} on a finite range {R} of some cardinality {N}, in which case the Kullback-Leibler divergence {D_{KL}(X||U)} simplifies to

\displaystyle  D_{KL}(X||U) = \log N - {\bf H}(X)


\displaystyle  {\bf H}(X) := \sum_{x \in R} {\bf P}(X=x) \log \frac{1}{{\bf P}(X=x)}

is the Shannon entropy of {X}. Again, a routine application of Jensen’s inequality shows that {{\bf H}(X) \leq \log N}, with equality if and only if {X} is uniformly distributed on {R}. The above inequality then becomes

\displaystyle  {\bf P}(X \in E) \leq {\bf P}(U \in E) + \sqrt{\frac{1}{2}(\log N - {\bf H}(X))}. \ \ \ \ \ (2)

Thus, if {E} is a small fraction of {R} (so that it is rare for {U} to lie in {E}), and the entropy of {X} is very close to the maximum possible value of {\log N}, then it is rare for {X} to lie in {E} also.

The inequality (2) is only useful when the entropy {{\bf H}(X)} is close to {\log N} in the sense that {{\bf H}(X) = \log N - O(1)}, otherwise the bound is worse than the trivial bound of {{\bf P}(X \in E) \leq 1}. In my recent paper on the Chowla and Elliott conjectures, I ended up using a variant of (2) which was still non-trivial when the entropy {{\bf H}(X)} was allowed to be smaller than {\log N - O(1)}. More precisely, I used the following simple inequality, which is implicit in the arguments of that paper but which I would like to make more explicit in this post:

Lemma 1 (Pinsker-type inequality) Let {X} be a random variable taking values in a finite range {R} of cardinality {N}, let {U} be a uniformly distributed random variable in {R}, and let {E} be a subset of {R}. Then

\displaystyle  {\bf P}(X \in E) \leq \frac{(\log N - {\bf H}(X)) + \log 2}{\log 1/{\bf P}(U \in E)}.

Proof: Consider the conditional entropy {{\bf H}(X | 1_{X \in E} )}. On the one hand, we have

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf H}(X, 1_{X \in E}) - {\bf H}(1_{X \in E} )

\displaystyle  = {\bf H}(X) - {\bf H}(1_{X \in E})

\displaystyle  \geq {\bf H}(X) - \log 2

by Jensen’s inequality. On the other hand, one has

\displaystyle  {\bf H}(X | 1_{X \in E} ) = {\bf P}(X \in E) {\bf H}(X | X \in E )

\displaystyle  + (1-{\bf P}(X \in E)) {\bf H}(X | X \not \in E)

\displaystyle  \leq {\bf P}(X \in E) \log |E| + (1-{\bf P}(X \in E)) \log N

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{N}{|E|}

\displaystyle  = \log N - {\bf P}(X \in E) \log \frac{1}{{\bf P}(U \in E)},

where we have again used Jensen’s inequality. Putting the two inequalities together, we obtain the claim. \Box

Remark 2 As noted in comments, this inequality can be viewed as a special case of the more general inequality

\displaystyle  {\bf P}(X \in E) \leq \frac{D(X||Y) + \log 2}{\log 1/{\bf P}(Y \in E)}

for arbitrary random variables {X,Y} taking values in the same discrete range {R}, which follows from the data processing inequality

\displaystyle  D( f(X)||f(Y)) \leq D(X|| Y)

for arbitrary functions {f}, applied to the indicator function {f = 1_E}. Indeed one has

\displaystyle  D( 1_E(X) || 1_E(Y) ) = {\bf P}(X \in E) \log \frac{{\bf P}(X \in E)}{{\bf P}(Y \in E)}

\displaystyle + {\bf P}(X \not \in E) \log \frac{{\bf P}(X \not \in E)}{{\bf P}(Y \not \in E)}

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - h( {\bf P}(X \in E) )

\displaystyle  \geq {\bf P}(X \in E) \log \frac{1}{{\bf P}(Y \in E)} - \log 2

where {h(u) := u \log \frac{1}{u} + (1-u) \log \frac{1}{1-u}} is the entropy function.

Thus, for instance, if one has

\displaystyle  {\bf H}(X) \geq \log N - o(K)


\displaystyle  {\bf P}(U \in E) \leq \exp( - K )

for some {K} much larger than {1} (so that {1/K = o(1)}), then

\displaystyle  {\bf P}(X \in E) = o(1).

More informally: if the entropy of {X} is somewhat close to the maximum possible value of {\log N}, and it is exponentially rare for a uniform variable to lie in {E}, then it is still somewhat rare for {X} to lie in {E}. The estimate given is close to sharp in this regime, as can be seen by calculating the entropy of a random variable {X} which is uniformly distributed inside a small set {E} with some probability {p} and uniformly distributed outside of {E} with probability {1-p}, for some parameter {0 \leq p \leq 1}.

It turns out that the above lemma combines well with concentration of measure estimates; in my paper, I used one of the simplest such estimates, namely Hoeffding’s inequality, but there are of course many other estimates of this type (see e.g. this previous blog post for some others). Roughly speaking, concentration of measure inequalities allow one to make approximations such as

\displaystyle  F(U) \approx {\bf E} F(U)

with exponentially high probability, where {U} is a uniform distribution and {F} is some reasonable function of {U}. Combining this with the above lemma, we can then obtain approximations of the form

\displaystyle  F(X) \approx {\bf E} F(U) \ \ \ \ \ (3)

with somewhat high probability, if the entropy of {X} is somewhat close to maximum. This observation, combined with an “entropy decrement argument” that allowed one to arrive at a situation in which the relevant random variable {X} did have a near-maximum entropy, is the key new idea in my recent paper; for instance, one can use the approximation (3) to obtain an approximation of the form

\displaystyle  \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+j+p) 1_{p|n+j}

\displaystyle  \approx \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+j+p)}{p}

for “most” choices of {n} and a suitable choice of {H} (with the latter being provided by the entropy decrement argument). The left-hand side is tied to Chowla-type sums such as {\sum_{n \leq x} \frac{\lambda(n)\lambda(n+1)}{n}} through the multiplicativity of {\lambda}, while the right-hand side, being a linear correlation involving two parameters {j,p} rather than just one, has “finite complexity” and can be treated by existing techniques such as the Hardy-Littlewood circle method. One could hope that one could similarly use approximations such as (3) in other problems in analytic number theory or combinatorics.

I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case {k=2} of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let {a_1,a_2} be natural numbers, and let {b_1,b_2} be integers such that {a_1 b_2 - a_2 b_1 \neq 0}. Let {1 \leq \omega(x) \leq x} be a quantity depending on {x} that goes to infinity as {x \rightarrow \infty}. Let {\lambda} denote the Liouville function. Then one has

\displaystyle  \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)

as {x \rightarrow \infty}.

Thus for instance one has

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)

For comparison, the non-averaged Chowla conjecture would imply that

\displaystyle  \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence {f(1), f(2), \dots} takes values in the unit sphere of an arbitrary Hilbert space, rather than in {\{-1,+1\}}.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity {\lambda(pn) = -\lambda(n)}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}

is also large and positive for all primes {p} that are not too large; note here how the logarithmic averaging allows us to leave the constraint {n \leq x} unchanged. Summing in {p}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}

is large and positive for any given set {{\mathcal P}} of medium-sized primes. By a standard averaging argument, this implies that

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)

is large for many choices of {n}, where {H} is a medium-sized parameter at our disposal to choose, and we take {{\mathcal P}} to be some set of primes that are somewhat smaller than {H}. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view {n} as a random variable, in which case (4) is essentially a bilinear sum of the random sequence {(\lambda(n+1),\dots,\lambda(n+H))} along a random graph {G_{n,H}} on {\{1,\dots,H\}}, in which two vertices {j, j+p} are connected if they differ by a prime {p} in {{\mathcal P}} that divides {n+j}. A key difficulty in controlling this sum is that for randomly chosen {n}, the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter {H} (in some suitable range {[H_-,H_+]}) for which the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence {(\lambda(n+1),\dots,\lambda(n+H))} has significant mutual information with {G_{n,H}}, then the entropy of the sequence {(\lambda(n+1),\dots,\lambda(n+H'))} for {H' > H} will grow a little slower than linearly, due to the fact that the graph {G_{n,H}} has zero entropy (knowledge of {G_{n,H}} more or less completely determines the shifts {G_{n+kH,H}} of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing {H} as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}}. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than {\log H} on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series {\sum_{j \geq 2} \frac{1}{j \log j \log\log j}} diverges, which is only barely true).

Once one locates a scale {H} with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

\displaystyle  \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

\displaystyle  \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},

one ends up having to consider expressions such as

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.

where {c_p} is the coefficient {c_p := \overline{g_1}(p) \overline{g_2}(p)}. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

\displaystyle  S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient {c_p} is identically {1}, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, {S(\alpha)} is large when {\alpha} lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions {g_1,g_2}, the coefficients {c_p} are more or less arbitrary; the large values of {S(\alpha)} are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations {\alpha} (up to the uncertainty mandated by the Heisenberg uncertainty principle) where {S(\alpha)} is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the {L^4} restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+p+2j)}{p}.

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

\displaystyle  \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)

where {X} goes to infinity and {H} is a very slowly growing function of {X}. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the {k=3} case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher {k} than {k=3}, the same line of analysis requires one to replace the linear phase {e(\alpha n)} by more complicated phases, such as quadratic phases {e(\alpha n^2 + \beta n)} or even {k-2}-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all {k}, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale {H} that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

The Chowla conjecture asserts that all non-trivial correlations of the Liouville function are asymptotically negligible; for instance, it asserts that

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+h) = o(X)

as {X \rightarrow \infty} for any fixed natural number {h}. This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h).

For such correlations, the conjecture was that one had

\displaystyle  \sum_{n \leq X} g(n) \overline{g}(n+h) = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any natural number {h}, as long as {g} was a completely multiplicative function with magnitude bounded by {1}, and such that

\displaystyle  \sum_p \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p} = +\infty \ \ \ \ \ (2)

for any Dirichlet character {\chi} and any real number {t}. In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function {g} does not “pretend” to be like the completely multiplicative function {n \mapsto \chi(n) n^{it}} for any character {\chi} and real number {t}. A condition of this form is necessary; for instance, if {g(n)} is precisely equal to {\chi(n) n^{it}} and {\chi} has period {q}, then {g(n) \overline{g}(n+q)} is equal to {1_{(n,q)=1} + o(1)} as {n \rightarrow \infty} and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample {g} having the property that {g} “pretends” locally to be the function {n \mapsto n^{it_j}} for {n} in various intervals {[1, X_j]}, where {X_j} and {t_j} go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when {g} is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

\displaystyle  \sup_{|t| \leq X} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \rightarrow +\infty

as {X \rightarrow +\infty} for all fixed Dirichlet characters {\chi}. In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the {X} parameter does not go to infinity, or equivalently that the function {g} is permitted to depend on {X}:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let {\varepsilon > 0}, let {A \geq 1} be sufficiently large depending on {\varepsilon}, and let {X} be sufficiently large depending on {A,\varepsilon}. Suppose that {g} is a completely multiplicative function with magnitude bounded by {1}, such that

\displaystyle  \inf_{|t| \leq AX} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \geq A

for all Dirichlet characters {\chi} of period at most {A}. Then one has

\displaystyle  |\sum_{n \leq X} g(n) \overline{g(n+h)}| \leq \varepsilon X

for all natural numbers {1 \leq h \leq 1/\varepsilon}.

The {\varepsilon}-dependent factor {A} in the constraint {|t| \leq AX} is necessary, as can be seen by considering the completely multiplicative function {g(n) := n^{2iX}} (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture) For any function {f: {\bf N} \rightarrow \{-1,+1\}}, the discrepancy

\displaystyle  \sup_{n,d \in {\bf N}} |\sum_{j=1}^n f(jd)|

is infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character {f(n) = \chi(n)} that takes values in {\{-1,0,+1\}} rather than {\{-1,+1\}}. For this function, one has from the complete multiplicativity of {\chi} that

\displaystyle  |\sum_{j=1}^n f(jd)| = |\sum_{j=1}^n \chi(j) \chi(d)|

\displaystyle  \leq |\sum_{j=1}^n \chi(j)|.

If {q} denotes the period of {\chi}, then {\chi} has mean zero on every interval of length {q}, and thus

\displaystyle  |\sum_{j=1}^n f(jd)| \leq |\sum_{j=1}^n \chi(j)| \leq q.

Thus {\chi} has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because {\chi} can take the value {0}. Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let {\chi = \chi_3} be the non-principal Dirichlet character of period {3} (thus {\chi(n)} equals {+1} when {n=1 \hbox{ mod } 3}, {-1} when {n = 2 \hbox{ mod } 3}, and {0} when {n = 0 \hbox{ mod } 3}), and define the completely multiplicative function {f = \tilde \chi: {\bf N} \rightarrow \{-1,+1\}} by setting {\tilde \chi(p) := \chi(p)} when {p \neq 3} and {\tilde \chi(3) = +1}. This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

\displaystyle  \sum_{j=1}^n \tilde \chi(j)

with {n := 1 + 3 + 3^2 + \dots + 3^k} for some large {k}. Writing {j = 3^a m} with {m} coprime to {3} and {a} at most {k}, we can write this sum as

\displaystyle  \sum_{a=0}^k \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m).

Now observe that {\tilde \chi(3^a m) = \tilde \chi(3)^a \tilde \chi(m) = \chi(m)}. The function {\chi} has mean zero on every interval of length three, and {\lfloor n/3^j\rfloor} is equal to {1} mod {3}, and thus

\displaystyle  \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m) = 1

for every {a=0,\dots,k}, and thus

\displaystyle  \sum_{j=1}^n \tilde \chi(j) = k+1 \gg \log n.

Thus {\tilde \chi} also has unbounded discrepancy, but only barely so (it grows logarithmically in {n}). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions {f} that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when {f} is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy) Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions {g}. An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then {g} must pretend to be like a function of the form {n \mapsto \chi(n) n^{it}}. One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

Read the rest of this entry »

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)

for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))


\displaystyle  \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle  \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle  \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)

where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)

whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle  \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle  \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle  e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle  d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle  m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle  \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle  \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle  \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

Read the rest of this entry »

The Poincaré upper half-plane {{\mathbf H} := \{ z: \hbox{Im}(z) > 0 \}} (with a boundary consisting of the real line {{\bf R}} together with the point at infinity {\infty}) carries an action of the projective special linear group

\displaystyle  \hbox{PSL}_2({\bf R}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf R}: ad-bc = 1 \} / \{\pm 1\}

via fractional linear transformations:

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z := \frac{az+b}{cz+d}. \ \ \ \ \ (1)

Here and in the rest of the post we will abuse notation by identifying elements {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of the special linear group {\hbox{SL}_2({\bf R})} with their equivalence class {\{ \pm \begin{pmatrix} a & b \\ c & d \end{pmatrix} \}} in {\hbox{PSL}_2({\bf R})}; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} by its negation {\begin{pmatrix} -a & -b \\ -c & -d \end{pmatrix}}. In particular, we recommend that the reader ignore the signs {\pm} that appear from time to time in the discussion below.

As the action of {\hbox{PSL}_2({\bf R})} on {{\mathbf H}} is transitive, and any given point in {{\mathbf H}} (e.g. {i}) has a stabiliser isomorphic to the projective rotation group {\hbox{PSO}_2({\bf R})}, we can view the Poincaré upper half-plane {{\mathbf H}} as a homogeneous space for {\hbox{PSL}_2({\bf R})}, and more specifically the quotient space of {\hbox{PSL}_2({\bf R})} of a maximal compact subgroup {\hbox{PSO}_2({\bf R})}. In fact, we can make the half-plane a symmetric space for {\hbox{PSL}_2({\bf R})}, by endowing {{\mathbf H}} with the Riemannian metric

\displaystyle  dg^2 := \frac{dx^2 + dy^2}{y^2}

(using Cartesian coordinates {z=x+iy}), which is invariant with respect to the {\hbox{PSL}_2({\bf R})} action. Like any other Riemannian metric, the metric on {{\mathbf H}} generates a number of other important geometric objects on {{\mathbf H}}, such as the distance function {d(z,w)} which can be computed to be given by the formula

\displaystyle  2(\cosh(d(z_1,z_2))-1) = \frac{|z_1-z_2|^2}{\hbox{Im}(z_1) \hbox{Im}(z_2)}, \ \ \ \ \ (2)

the volume measure {\mu = \mu_{\mathbf H}}, which can be computed to be

\displaystyle  d\mu = \frac{dx dy}{y^2},

and the Laplace-Beltrami operator, which can be computed to be {\Delta = y^2 (\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2})} (here we use the negative definite sign convention for {\Delta}). As the metric {dg} was {\hbox{PSL}_2({\bf R})}-invariant, all of these quantities arising from the metric are similarly {\hbox{PSL}_2({\bf R})}-invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant {-1}, thus {{\mathbf H}} is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere {S^2} in {{\bf R}^3} is a model for two-dimensional spherical geometry (or {{\bf R}^2} is a model for two-dimensional Euclidean geometry). (Indeed, {{\mathbf H}} is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid {\{ (x,t) \in {\bf R}^{2+1}: t = \sqrt{1+|x|^2}\}} in the Minkowski spacetime {{\bf R}^{2+1}}, which is the direct analogue of the unit sphere in Euclidean spacetime {{\bf R}^3} or the plane {{\bf R}^2} in Galilean spacetime {{\bf R}^2 \times {\bf R}}.)

One can inject arithmetic into this geometric structure by passing from the Lie group {\hbox{PSL}_2({\bf R})} to the full modular group

\displaystyle  \hbox{PSL}_2({\bf Z}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf Z}: ad-bc = 1 \} / \{\pm 1\}

or congruence subgroups such as

\displaystyle  \Gamma_0(q) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \hbox{PSL}_2({\bf Z}): c = 0\ (q) \} / \{ \pm 1 \} \ \ \ \ \ (3)

for natural number {q}, or to the discrete stabiliser {\Gamma_\infty} of the point at infinity:

\displaystyle  \Gamma_\infty := \{ \pm \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}: b \in {\bf Z} \} / \{\pm 1\}. \ \ \ \ \ (4)

These are discrete subgroups of {\hbox{PSL}_2({\bf R})}, nested by the subgroup inclusions

\displaystyle  \Gamma_\infty \leq \Gamma_0(q) \leq \Gamma_0(1)=\hbox{PSL}_2({\bf Z}) \leq \hbox{PSL}_2({\bf R}).

There are many further discrete subgroups of {\hbox{PSL}_2({\bf R})} (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup {\Gamma} of {\hbox{PSL}_2({\bf R})} generates a quotient space {\Gamma \backslash {\mathbf H}}, which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain {\hbox{Fund}( \Gamma \backslash {\mathbf H})} – a set consisting of a single representative of each of the orbits {\Gamma z} of {\Gamma} in {{\mathbf H}}. This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view {\Gamma \backslash {\mathbf H}} as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure {\mu = \mu_{\Gamma \backslash {\mathbf H}}} on {\Gamma \backslash {\mathbf H}} from the volume measure {\mu = \mu_{\mathbf H}} on {{\mathbf H}} (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as {\mu} when there is no chance of confusion.

For instance, a fundamental domain for {\Gamma_\infty \backslash {\mathbf H}} is given (up to null sets) by the strip {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2} \}}, with {\Gamma_\infty \backslash {\mathbf H}} identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} is famously given (again up to null sets) by an upper portion {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2}; |z| > 1 \}}, with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for {\Gamma_0(q) \backslash {\mathbf H}} can be formed by gluing together

\displaystyle  [\hbox{PSL}_2({\bf Z}) : \Gamma_0(q)] = q \prod_{p|q} (1 + \frac{1}{p}) = q^{1+o(1)}

copies of a fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces {\Gamma \backslash {\mathbf H}}. In order to analyse functions {f: \Gamma \backslash {\mathbf H} \rightarrow {\bf C}} on such orbifolds, it is convenient to lift such functions back up to {{\mathbf H}} and identify them with functions {f: {\mathbf H} \rightarrow {\bf C}} which are {\Gamma}-automorphic in the sense that {f( \gamma z ) = f(z)} for all {z \in {\mathbf H}} and {\gamma \in \Gamma}. Such functions will be referred to as {\Gamma}-automorphic forms, or automorphic forms for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over {\Gamma \backslash {\mathbf H}} than the trivial bundle {(\Gamma \backslash {\mathbf H}) \times {\bf C}} that is implicitly present when analysing scalar functions {f: {\mathbf H} \rightarrow {\bf C}}. However, we will not discuss this (important) more general situation here.)

An important way to create a {\Gamma}-automorphic form is to start with a non-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series {P_\Gamma[f]: {\mathbf H} \rightarrow {\bf C}} defined by

\displaystyle  P_{\Gamma}[f](z) = \sum_{\gamma \in \Gamma} f(\gamma z),

which is clearly {\Gamma}-automorphic. (One could equivalently write {f(\gamma^{-1} z)} in place of {f(\gamma z)} here; there are good argument for both conventions, but I have ultimately decided to use the {f(\gamma z)} convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over {\Gamma} associated with {\Gamma}-automorphic forms. A little more generally, given a subgroup {\Gamma_\infty} of {\Gamma} and a {\Gamma_\infty}-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} of suitable decay, we can form a relative Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[f]: {\mathbf H} \rightarrow {\bf C}} by

\displaystyle  P_{\Gamma_\infty \backslash \Gamma}[f](z) = \sum_{\gamma \in \hbox{Fund}(\Gamma_\infty \backslash \Gamma)} f(\gamma z)

where {\hbox{Fund}(\Gamma_\infty \backslash \Gamma)} is any fundamental domain for {\Gamma_\infty \backslash \Gamma}, that is to say a subset of {\Gamma} consisting of exactly one representative for each right coset of {\Gamma_\infty}. As {f} is {\Gamma_\infty}-automorphic, we see (if {f} has suitable decay) that {P_{\Gamma_\infty \backslash \Gamma}[f]} does not depend on the precise choice of fundamental domain, and is {\Gamma}-automorphic. These operations are all compatible with each other, for instance {P_\Gamma = P_{\Gamma_\infty \backslash \Gamma} \circ P_{\Gamma_\infty}}. A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function {f}.

For future reference we record the basic but fundamental unfolding identities

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_\Gamma[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\mathbf H} f g\ d\mu_{\mathbf H} \ \ \ \ \ (5)

for any function {f: {\mathbf H} \rightarrow {\bf C}} with sufficient decay, and any {\Gamma}-automorphic function {g} of reasonable growth (e.g. {f} bounded and compact support, and {g} bounded, will suffice). Note that {g} is viewed as a function on {\Gamma \backslash {\mathbf H}} on the left-hand side, and as a {\Gamma}-automorphic function on {{\mathbf H}} on the right-hand side. More generally, one has

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_{\Gamma_\infty \backslash \Gamma}[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\Gamma_\infty \backslash {\mathbf H}} f g\ d\mu_{\Gamma_\infty \backslash {\mathbf H}} \ \ \ \ \ (6)

whenever {\Gamma_\infty \leq \Gamma} are discrete subgroups of {\hbox{PSL}_2({\bf R})}, {f} is a {\Gamma_\infty}-automorphic function with sufficient decay on {\Gamma_\infty \backslash {\mathbf H}}, and {g} is a {\Gamma}-automorphic (and thus also {\Gamma_\infty}-automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains {{\mathbf H}}, {\Gamma_\infty \backslash {\mathbf H}}, and {\Gamma \backslash {\mathbf H}} in our analysis.

When computing various statistics of a Poincaré series {P_\Gamma[f]}, such as its values {P_\Gamma[f](z)} at special points {z}, or the {L^2} quantity {\int_{\Gamma \backslash {\mathbf H}} |P_\Gamma[f]|^2\ d\mu}, expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1), \ \ \ \ \ (7)

where {\tau(n) := \sum_{d|n} 1} is the divisor function. This can be rewritten (by factoring {n=bc} and {n+1=ad}) as

\displaystyle  \sum_{ a,b,c,d \in {\bf N}: ad-bc = 1} 1_{bc \leq x} \ \ \ \ \ (8)

which is basically a sum over the full modular group {\hbox{PSL}_2({\bf Z})}. At this point we will “cheat” a little by moving to the related, but different, sum

\displaystyle  \sum_{a,b,c,d \in {\bf Z}: ad-bc = 1} 1_{a^2+b^2+c^2+d^2 \leq x}. \ \ \ \ \ (9)

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between {i} and {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} is given by the formula

\displaystyle  2(\cosh(d(i,\begin{pmatrix} a & b \\ c & d \end{pmatrix} i))-1) = a^2+b^2+c^2+d^2 - 2

and so one can express the above sum as

\displaystyle  2 \sum_{\gamma \in \hbox{PSL}_2({\bf Z})} 1_{d(i,\gamma i) \leq \hbox{cosh}^{-1}(x/2)}

(the factor of {2} coming from the quotient by {\{\pm 1\}} in the projective special linear group); one can express this as {P_\Gamma[f](i)}, where {\Gamma = \hbox{PSL}_2({\bf Z})} and {f} is the indicator function of the ball {B(i, \hbox{cosh}^{-1}(x/2))}. Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

\displaystyle  \sum_{n \leq x} \tau(n^2+1) \ \ \ \ \ (10)

of the sum (7). Note from multiplicativity that (7) can be written as {\sum_{n \leq x} \tau(n^2+n)}, which is superficially very similar to (10), but with the key difference that the polynomial {n^2+1} is irreducible over the integers.

As with (7), we may expand (10) as

\displaystyle  \sum_{A,B,C \in {\bf N}: B^2 - AC = -1} 1_{B \leq x}.

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation {B + i = (a-bi) (c+di)} of {B+i} into Gaussian integers {a-bi, c+di} gives rise (upon taking norms) to an identity of the form {B^2 - AC = -1}, where {A = a^2+b^2} and {C = c^2+d^2}. Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form {B^2-AC=-1} gives rise to a factorisation of the form {B+i = (a-bi) (c+di)}, essentially uniquely up to units. Now note that {(a-bi)(c+di)} is of the form {B+i} if and only if {ad-bc=1}, in which case {B = ac+bd}. Thus we can essentially write the above sum as something like

\displaystyle  \sum_{a,b,c,d: ad-bc = 1} 1_{|ac+bd| \leq x} \ \ \ \ \ (11)

and one the modular group {\hbox{PSL}_2({\bf Z})} is now manifest. An equivalent way to see these manipulations is as follows. A triple {A,B,C} of natural numbers with {B^2-AC=1} gives rise to a positive quadratic form {Ax^2+2Bxy+Cy^2} of normalised discriminant {B^2-AC} equal to {-1} with integer coefficients (it is natural here to allow {B} to take integer values rather than just natural number values by essentially doubling the sum). The group {\hbox{PSL}_2({\bf Z})} acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse {\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}} of an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {\hbox{SL}_2({\bf Z})}). Because the discriminant {-1} has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form {Ax^2 + 2Bxy + Cy^2} in this space is equivalent (under the action of some element of {\hbox{PSL}_2({\bf Z})}) with the standard quadratic form {x^2+y^2}. In other words, one has

\displaystyle  Ax^2 + 2Bxy + Cy^2 = (dx-by)^2 + (-cx+ay)^2

which (up to a harmless sign) is exactly the representation {B = ac+bd}, {A = c^2+d^2}, {C = a^2+b^2} introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial {n^2+1} is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} has real part {\frac{ac+bd}{c^2+d^2}} and imaginary part {\frac{1}{c^2+d^2}}. Thus (11) is (up to a factor of two) the Poincaré series {P_\Gamma[f](i)} as in the preceding example, except that {f} is now the indicator of the sector {\{ z: |\hbox{Re} z| \leq x |\hbox{Im} z| \}}.

Sums involving subgroups of the full modular group, such as {\Gamma_0(q)}, often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression {\sum_{n \leq x: q|n} \tau(n^2+1)} when {q} and {x} are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over {\Gamma_0(q)} rather than {\hbox{PSL}_2({\bf Z})}.

The third and final example concerns averages of Kloosterman sums

\displaystyle  S(m,n;c) := \sum_{x \in ({\bf Z}/c{\bf Z})^\times} e( \frac{mx + n\overline{x}}{c} ) \ \ \ \ \ (12)

where {e(\theta) := e^{2p\i i\theta}} and {\overline{x}} is the inverse of {x} in the multiplicative group {({\bf Z}/c{\bf Z})^\times}. It turns out that the {L^2} norms of Poincaré series {P_\Gamma[f]} or {P_{\Gamma_\infty \backslash \Gamma}[f]} are closely tied to such averages. Consider for instance the quantity

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]|^2\ d\mu_{\Gamma \backslash {\mathbf H}} \ \ \ \ \ (13)

where {q} is a natural number and {f} is a {\Gamma_\infty}-automorphic form that is of the form

\displaystyle  f(x+iy) = F(my) e(m x)

for some integer {m} and some test function {f: (0,+\infty) \rightarrow {\bf C}}, which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} \overline{f} P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}.

To compute this, we use the double coset decomposition

\displaystyle  \Gamma_0(q) = \Gamma_\infty \cup \bigcup_{c \in {\mathbf N}: q|c} \bigcup_{1 \leq d \leq c: (d,c)=1} \Gamma_\infty \begin{pmatrix} a & b \\ c & d \end{pmatrix} \Gamma_\infty,

where for each {c,d}, {a,b} are arbitrarily chosen integers such that {ad-bc=1}. To see this decomposition, observe that every element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} in {\Gamma_0(q)} outside of {\Gamma_\infty} can be assumed to have {c>0} by applying a sign {\pm}, and then using the row and column operations coming from left and right multiplication by {\Gamma_\infty} (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place {d} in the interval {[1,c]} and {(a,b)} to be any specified integer pair with {ad-bc=1}. From this we see that

\displaystyle  P_{\Gamma_\infty \backslash \Gamma_0(q)}[f] = f + \sum_{c \in {\mathbf N}: q|c} \sum_{1 \leq d \leq c: (d,c)=1} P_{\Gamma_\infty}[ f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot ) ]

and so from further use of the unfolding formula (5) we may expand (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}

\displaystyle  + \sum_{c \in {\mathbf N}} \sum_{1 \leq d \leq c: (d,c)=1} \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}.

The first integral is just {m \int_0^\infty |F(y)|^2 \frac{dy}{y^2}}. The second expression is more interesting. We have

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z = \frac{az+b}{cz+d} = \frac{a}{c} - \frac{1}{c(cz+d)}

\displaystyle  = \frac{a}{c} - \frac{cx+d}{c((cx+d)^2+c^2y^2)} + \frac{iy}{(cx+d)^2 + c^2y^2}

so we can write

\displaystyle  \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}


\displaystyle  \int_0^\infty \int_{\bf R} \overline{F}(my) F(\frac{imy}{(cx+d)^2 + c^2y^2}) e( -mx + \frac{ma}{c} - m \frac{cx+d}{c((cx+d)^2+c^2y^2)} )

\displaystyle \frac{dx dy}{y^2}

which on shifting {x} by {d/c} simplifies a little to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(my) \bar{F}(\frac{imy}{c^2(x^2 + y^2)}) e(- mx - m \frac{x}{c^2(x^2+y^2)} )

\displaystyle  \frac{dx dy}{y^2}

and then on scaling {x,y} by {m} simplifies a little further to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(y) \bar{F}(\frac{m^2}{c^2} \frac{iy}{x^2 + y^2}) e(- x - \frac{m^2}{c^2} \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

Note that as {ad-bc=1}, we have {a = \overline{d}} modulo {c}. Comparing the above calculations with (12), we can thus write (13) as

\displaystyle  m (\int_0^\infty |F(y)|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c})) \ \ \ \ \ (14)


\displaystyle  V(u) := \frac{1}{u} \int_0^\infty \int_{\bf R} F(y) \bar{F}(u^2 \frac{y}{x^2 + y^2}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}

is a certain integral involving {F} and a parameter {u}, but which does not depend explicitly on parameters such as {m,c,d}. Thus we have indeed expressed the {L^2} expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of {L^2} expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator {-\Delta} on spaces such as {\Gamma\backslash {\mathbf H}} or {\Gamma_\infty \backslash {\mathbf H}}, so that a Poincaré series such as {P_\Gamma[f]} might be expanded out using inner products of {P_\Gamma[f]} (or, by the unfolding identities, {f}) with various generalised eigenfunctions of {-\Delta} (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions {K_{it}} of the second kind, play a prominent role, basically because the {\Gamma_\infty}-automorphic functions

\displaystyle  x+iy \mapsto y^{1/2} K_{it}(2\pi |m| y) e(mx)

for {t \in {\bf R}} and {m \in {\bf Z}} non-zero are generalised eigenfunctions of {-\Delta} (with eigenvalue {\frac{1}{4}+t^2}), and are almost square-integrable on {\Gamma_\infty \backslash {\mathbf H}} (the {L^2} norm diverges only logarithmically at one end {y \rightarrow 0^+} of the cylinder {\Gamma_\infty \backslash {\mathbf H}}, while decaying exponentially fast at the other end {y \rightarrow +\infty}).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as {-\Delta} is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation {(-\Delta + k^2) u = f}, the heat equation {\partial_t u = \Delta u}, the Schrödinger equation {i\partial_t u + \Delta u = 0}, or the wave equation {\partial_{tt} u = \Delta u}. Thus, one can hope to rephrase many arguments that involve spectral data of {-\Delta} into arguments that instead involve resolvents {(-\Delta + k^2)^{-1}}, heat kernels {e^{t\Delta}}, Schrödinger propagators {e^{it\Delta}}, or wave propagators {e^{\pm it\sqrt{-\Delta}}}, or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

\displaystyle  \partial_{tt} u - \Delta u = F.

Actually it will be a bit more convenient to normalise the Laplacian by {\frac{1}{4}}, and look instead at the automorphic wave equation

\displaystyle  \partial_{tt} u + (-\Delta - \frac{1}{4}) u = F. \ \ \ \ \ (15)

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of {\frac{1}{4}} that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance {u} and {F} are {\Gamma_\infty}-automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[u]} and {P_{\Gamma_\infty \backslash \Gamma}[F]} will be {\Gamma}-automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

  • Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for {-\Delta} on {\Gamma_0(q) \backslash {\mathbf H}} (discussed previously here);
  • Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s {3/16} bound to the optimal {1/4}) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
  • Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

Read the rest of this entry »

The equidistribution theorem asserts that if {\alpha \in {\bf R}/{\bf Z}} is an irrational phase, then the sequence {(n\alpha)_{n=1}^\infty} is equidistributed on the unit circle, or equivalently that

\displaystyle \frac{1}{N} \sum_{n=1}^N F(n\alpha) \rightarrow \int_{{\bf R}/{\bf Z}} F(x)\ dx

for any continuous (or equivalently, for any smooth) function {F: {\bf R}/{\bf Z} \rightarrow {\bf C}}. By approximating {F} uniformly by a Fourier series, this claim is equivalent to that of showing that

\displaystyle \frac{1}{N} \sum_{n=1}^N e(hn\alpha) \rightarrow 0

for any non-zero integer {h} (where {e(x) := e^{2\pi i x}}), which is easily verified from the irrationality of {\alpha} and the geometric series formula. Conversely, if {\alpha} is rational, then clearly {\frac{1}{N} \sum_{n=1}^N e(hn\alpha)} fails to go to zero when {h} is a multiple of the denominator of {\alpha}.

One can then ask for more quantitative information about the decay of exponential sums of {\frac{1}{N} \sum_{n=1}^N e(n \alpha)}, or more generally on exponential sums of the form {\frac{1}{|Q|} \sum_{n \in Q} e(P(n))} for an arithmetic progression {Q} (in this post all progressions are understood to be finite) and a polynomial {P: Q \rightarrow \/{\bf Z}}. It will be convenient to phrase such information in the form of an inverse theorem, describing those phases for which the exponential sum is large. Indeed, we have

Lemma 1 (Geometric series formula, inverse form) Let {Q \subset {\bf Z}} be an arithmetic progression of length at most {N} for some {N \geq 1}, and let {P(n) = n \alpha + \beta} be a linear polynomial for some {\alpha,\beta \in {\bf R}/{\bf Z}}. If

\displaystyle \frac{1}{N} |\sum_{n \in Q} e(P(n))| \geq \delta

for some {\delta > 0}, then there exists a subprogression {Q'} of {Q} of size {|Q'| \gg \delta^2 N} such that {P(n)} varies by at most {\delta} on {Q'} (that is to say, {P(n)} lies in a subinterval of {{\bf R}/{\bf Z}} of length at most {\delta}).

Proof: By a linear change of variable we may assume that {Q} is of the form {\{0,\dots,N'-1\}} for some {N' \geq 1}. We may of course assume that {\alpha} is non-zero in {{\bf R}/{\bf Z}}, so that {\|\alpha\|_{{\bf R}/{\bf Z}} > 0} ({\|x\|_{{\bf R}/{\bf Z}}} denotes the distance from {x} to the nearest integer). From the geometric series formula we see that

\displaystyle |\sum_{n \in Q} e(P(n))| \leq \frac{2}{|e(\alpha) - 1|} \ll \frac{1}{\|\alpha\|_{{\bf R}/{\bf Z}}},

and so {\|\alpha\|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta N}}. Setting {Q' := \{ n \in Q: n \leq c \delta^2 N \}} for some sufficiently small absolute constant {c}, we obtain the claim. \Box

Thus, in order for a linear phase {P(n)} to fail to be equidistributed on some long progression {Q}, {P} must in fact be almost constant on large piece of {Q}.

As is well known, this phenomenon generalises to higher order polynomials. To achieve this, we need two elementary additional lemmas. The first relates the exponential sums of {P} to the exponential sums of its “first derivatives” {n \mapsto P(n+h)-P(n)}.

Lemma 2 (Van der Corput lemma, inverse form) Let {Q \subset {\bf Z}} be an arithmetic progression of length at most {N}, and let {P: Q \rightarrow {\bf R}/{\bf Z}} be an arbitrary function such that

\displaystyle \frac{1}{N} |\sum_{n \in Q} e(P(n))| \geq \delta \ \ \ \ \ (1)


for some {\delta > 0}. Then, for {\gg \delta^2 N} integers {h \in Q-Q}, there exists a subprogression {Q_h} of {Q}, of the same spacing as {Q}, such that

\displaystyle \frac{1}{N} |\sum_{n \in Q_h} e(P(n+h)-P(n))| \gg \delta^2. \ \ \ \ \ (2)


Proof: Squaring (1), we see that

\displaystyle \sum_{n,n' \in Q} e(P(n') - P(n)) \geq \delta^2 N^2.

We write {n' = n+h} and conclude that

\displaystyle \sum_{h \in Q-Q} \sum_{n \in Q_h} e( P(n+h)-P(n) ) \geq \delta^2 N^2

where {Q_h := Q \cap (Q-h)} is a subprogression of {Q} of the same spacing. Since {\sum_{n \in Q_h} e( P(n+h)-P(n) ) = O(N)}, we conclude that

\displaystyle |\sum_{n \in Q_h} e( P(n+h)-P(n) )| \gg \delta^2 N

for {\gg \delta^2 N} values of {h} (this can be seen, much like the pigeonhole principle, by arguing via contradiction for a suitable choice of implied constants). The claim follows. \Box

The second lemma (which we recycle from this previous blog post) is a variant of the equidistribution theorem.

Lemma 3 (Vinogradov lemma) Let {I \subset [-N,N] \cap {\bf Z}} be an interval for some {N \geq 1}, and let {\theta \in{\bf R}/{\bf Z}} be such that {\|n\theta\|_{{\bf R}/{\bf Z}} \leq \varepsilon} for at least {\delta N} values of {n \in I}, for some {0 < \varepsilon, \delta < 1}. Then either

\displaystyle N < \frac{2}{\delta}


\displaystyle \varepsilon > 10^{-2} \delta

or else there is a natural number {q \leq 2/\delta} such that

\displaystyle \| q \theta \|_{{\bf R}/{\bf Z}} \ll \frac{\varepsilon}{\delta N}.

Proof: We may assume that {N \geq \frac{2}{\delta}} and {\varepsilon \leq 10^{-2} \delta}, since we are done otherwise. Then there are at least two {n \in I} with {\|n \theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}, and by the pigeonhole principle we can find {n_1 < n_2} in {Q} with {\|n_1 \theta \|_{{\bf R}/{\bf Z}}, \|n_2 \theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} and {n_2-n_1 \leq \frac{2}{\delta}}. By the triangle inequality, we conclude that there exists at least one natural number {q \leq \frac{2}{\delta}} for which

\displaystyle \| q \theta \|_{{\bf R}/{\bf Z}} \leq 2\varepsilon.

We take {q} to be minimal amongst all such natural numbers, then we see that there exists {a} coprime to {q} and {|\kappa| \leq 2\varepsilon} such that

\displaystyle \theta = \frac{a}{q} + \frac{\kappa}{q}. \ \ \ \ \ (3)


If {\kappa=0} then we are done, so suppose that {\kappa \neq 0}. Suppose that {n < m} are elements of {I} such that {\|n\theta \|_{{\bf R}/{\bf Z}}, \|m\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} and {m-n \leq \frac{1}{10 \kappa}}. Writing {m-n = qk + r} for some {0 \leq r < q}, we have

\displaystyle \| (m-n) \theta \|_{{\bf R}/{\bf Z}} = \| \frac{ra}{q} + (m-n) \frac{\kappa}{q} \|_{{\bf R}/{\bf Z}} \leq 2\varepsilon.

By hypothesis, {(m-n) \frac{\kappa}{q} \leq \frac{1}{10 q}}; note that as {q \leq 2/\delta} and {\varepsilon \leq 10^{-2} \delta} we also have {\varepsilon \leq \frac{1}{10q}}. This implies that {\| \frac{ra}{q} \|_{{\bf R}/{\bf Z}} < \frac{1}{q}} and thus {r=0}. We then have

\displaystyle |k \kappa| \leq 2 \varepsilon.

We conclude that for fixed {n \in I} with {\|n\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}, there are at most {\frac{2\varepsilon}{|\kappa|}} elements {m} of {[n, n + \frac{1}{10 |\kappa|}]} such that {\|m\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}. Iterating this with a greedy algorithm, we see that the number of {n \in I} with {\|n\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} is at most {(\frac{N}{1/10|\kappa|} + 1) 2\varepsilon/|\kappa|}; since {\varepsilon < 10^{-2} \delta}, this implies that

\displaystyle \delta N \ll 2 \varepsilon / \kappa

and the claim follows. \Box

Now we can quickly obtain a higher degree version of Lemma 1:

Proposition 4 (Weyl exponential sum estimate, inverse form) Let {Q \subset {\bf Z}} be an arithmetic progression of length at most {N} for some {N \geq 1}, and let {P: {\bf Z} \rightarrow {\bf R}/{\bf Z}} be a polynomial of some degree at most {d \geq 0}. If

\displaystyle \frac{1}{N} |\sum_{n \in Q} e(P(n))| \geq \delta

for some {\delta > 0}, then there exists a subprogression {Q'} of {Q} with {|Q'| \gg_d \delta^{O_d(1)} N} such that {P} varies by at most {\delta} on {Q'}.

Proof: We induct on {d}. The cases {d=0,1} are immediate from Lemma 1. Now suppose that {d \geq 2}, and that the claim had already been proven for {d-1}. To simplify the notation we allow implied constants to depend on {d}. Let the hypotheses be as in the proposition. Clearly {\delta} cannot exceed {1}. By shrinking {\delta} as necessary we may assume that {\delta \leq c} for some sufficiently small constant {c} depending on {d}.

By rescaling we may assume {Q \subset [0,N] \cap {\bf Z}}. By Lemma 3, we see that for {\gg \delta^2 N} choices of {h \in [-N,N] \cap {\bf Z}} such that

\displaystyle \frac{1}{N} |\sum_{n \in I_h} e(P(n+h) - P(n))| \gg \delta^2

for some interval {I_h \subset [0,N] \cap {\bf Z}}. We write {P(n) = \sum_{i \leq d} \alpha_i n^i}, then {P(n+h)-P(n)} is a polynomial of degree at most {d-1} with leading coefficient {h \alpha_d n^{d-1}}. We conclude from induction hypothesis that for each such {h}, there exists a natural number {q_h \ll \delta^{-O(1)}} such that {\|q_h h \alpha_d \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N^{d-1}}, by double-counting, this implies that there are {\gg \delta^{O(1)} N} integers {n} in the interval {[-\delta^{-O(1)} N, \delta^{-O(1)} N] \cap {\bf Z}} such that {\|n \alpha_d \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N^{d-1}}. Applying Lemma 3, we conclude that either {N \ll \delta^{-O(1)}}, or that

\displaystyle \| q \alpha_d \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N^d. \ \ \ \ \ (4)


In the former case the claim is trivial (just take {Q'} to be a point), so we may assume that we are in the latter case.

We partition {Q} into arithmetic progressions {Q'} of spacing {q} and length comparable to {\delta^{-C} N} for some large {C} depending on {d} to be chosen later. By hypothesis, we have

\displaystyle \frac{1}{|Q|} |\sum_{n \in Q} e(P(n))| \geq \delta

so by the pigeonhole principle, we have

\displaystyle \frac{1}{|Q'|} |\sum_{n \in Q'} e(P(n))| \geq \delta

for at least one such progression {Q'}. On this progression, we may use the binomial theorem and (4) to write {\alpha_d n^d} as a polynomial in {n} of degree at most {d-1}, plus an error of size {O(\delta^{C - O(1)})}. We thus can write {P(n) = P'(n) + O(\delta^{C-O(1)})} for {n \in Q'} for some polynomial {P'} of degree at most {d-1}. By the triangle inequality, we thus have (for {C} large enough) that

\displaystyle \frac{1}{|Q'|} |\sum_{n \in Q'} e(P'(n))| \gg \delta

and hence by induction hypothesis we may find a subprogression {Q''} of {Q'} of size {|Q''| \gg \delta^{O(1)} N} such that {P'} varies by most {\delta/2} on {Q''}, and thus (for {C} large enough again) that {P} varies by at most {\delta} on {Q''}, and the claim follows. \Box

This gives the following corollary (also given as Exercise 16 in this previous blog post):

Corollary 5 (Weyl exponential sum estimate, inverse form II) Let {I \subset [-N,N] \cap {\bf Z}} be a discrete interval for some {N \geq 1}, and let {P(n) = \sum_{i \leq d} \alpha_i n^i} polynomial of some degree at most {d \geq 0} for some {\alpha_0,\dots,\alpha_d \in {\bf R}/{\bf Z}}. If

\displaystyle \frac{1}{N} |\sum_{n \in I} e(P(n))| \geq \delta

for some {\delta > 0}, then there is a natural number {q \ll_d \delta^{-O_d(1)}} such that {\| q\alpha_i \|_{{\bf R}/{\bf Z}} \ll_d \delta^{-O_d(1)} N^{-i}} for all {i=0,\dots,d}.

One can obtain much better exponents here using Vinogradov’s mean value theorem; see Theorem 1.6 this paper of Wooley. (Thanks to Mariusz Mirek for this reference.) However, this weaker result already suffices for many applications, and does not need any result as deep as the mean value theorem.

Proof: To simplify notation we allow implied constants to depend on {d}. As before, we may assume that {\delta \leq c} for some small constant {c>0} depending only on {d}. We may also assume that {N \geq \delta^{-C}} for some large {C}, as the claim is trivial otherwise (set {q=1}).

Applying Proposition 4, we can find a natural number {q \ll \delta^{-O(1)}} and an arithmetic subprogression {Q} of {I} such that {|Q| \gg \delta^{O(1)}} and such that {P} varies by at most {\delta} on {Q}. Writing {Q = \{ qn+r: n \in I'\}} for some interval {I' \subset [0,N] \cap {\bf Z}} of length {\gg \delta^{O(1)}} and some {0 \leq r < q}, we conclude that the polynomial {n \mapsto P(qn+r)} varies by at most {\delta} on {I'}. Taking {d^{th}} order differences, we conclude that the {d^{th}} coefficient of this polynomial is {O(\delta^{-O(1)} / N^d)}; by the binomial theorem, this implies that {n \mapsto P(qn+r)} differs by at most {O(\delta)} on {I'} from a polynomial of degree at most {d-1}. Iterating this, we conclude that the {i^{th}} coefficient of {n \mapsto P(qn+r)} is {O(\delta N^{-i})} for {i=0,\dots,d}, and the claim then follows by inverting the change of variables {n \mapsto qn+r} (and replacing {q} with a larger quantity such as {q^d} as necessary). \Box

For future reference we also record a higher degree version of the Vinogradov lemma.

Lemma 6 (Polynomial Vinogradov lemma) Let {I \subset [-N,N] \cap {\bf Z}} be a discrete interval for some {N \geq 1}, and let {P: {\bf Z} \rightarrow {\bf R}/{\bf Z}} be a polynomial {P(n) = \sum_{i \leq d} \alpha_i n^i} of degree at most {d} for some {d \geq 1} such that {\|P(n)\|_{{\bf R}/{\bf Z}} \leq \varepsilon} for at least {\delta N} values of {n \in I}, for some {0 < \varepsilon, \delta < 1}. Then either

\displaystyle N \ll_d \delta^{-O_d(1)} \ \ \ \ \ (5)



\displaystyle \varepsilon \gg_d \delta^{O_d(1)} \ \ \ \ \ (6)


or else there is a natural number {q \ll_d \delta^{-O_d(1)}} such that

\displaystyle \| q \alpha_i \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)} \varepsilon}{N^i}

for all {i=0,\dots,d}.

Proof: We induct on {d}. For {d=1} this follows from Lemma 3 (noting that if {\|P(n)\|_{{\bf R}/{\bf Z}}, \|P(n_0)\|_{{\bf R}/Z} \leq \varepsilon} then {\|P(n)-P(n_0)\|_{{\bf R}/{\bf Z}} \leq 2\varepsilon}), so suppose that {d \geq 2} and that the claim is already proven for {d-1}. We now allow all implied constants to depend on {d}.

For each {h \in [-2N,2N] \cap {\bf Z}}, let {N_h} denote the number of {n \in [-N,N] \cap {\bf Z}} such that {\| P(n+h)\|_{{\bf R}/{\bf Z}}, \|P(n)\|_{{\bf R}/{\bf Z}} \leq \varepsilon}. By hypothesis, {\sum_{h \in [-2N,2N] \cap {\bf Z}} N_h \gg \delta^2 N^2}, and clearly {N_h = O(N)}, so we must have {N_h \gg \delta^2 N} for {\gg \delta^2 N} choices of {h}. For each such {h}, we then have {\|P(n+h)-P(n)\|_{{\bf R}/{\bf Z}} \leq 2\varepsilon} for {\gg \delta^2 N} choices of {n \in [-N,N] \cap {\bf Z}}, so by induction hypothesis, either (5) or (6) holds, or else for {\gg \delta^{O(1)} N} choices of {h \in [-2N,2N] \cap {\bf Z}}, there is a natural number {q_h \ll \delta^{-O(1)}} such that

\displaystyle \| q_h \alpha_{i,h} \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)} \varepsilon}{N^i}

for {i=1,\dots,d-1}, where {\alpha_{i,h}} are the coefficients of the degree {d-1} polynomial {n \mapsto P(n+h)-P(n)}. We may of course assume it is the latter which holds. By the pigeonhole principle we may take {q_h= q} to be independent of {h}.

Since {\alpha_{d-1,h} = dh \alpha_d}, we have

\displaystyle \| qd h \alpha_d \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)} \varepsilon}{N^{d-1}}

for {\gg \delta^{O(1)} N} choices of {h}, so by Lemma 3, either (5) or (6) holds, or else (after increasing {q} as necessary) we have

\displaystyle \| q \alpha_d \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)} \varepsilon}{N^d}.

We can again assume it is the latter that holds. This implies that {q \alpha_{d-2,h} = (d-1) h \alpha_{d-1} + O( \delta^{-O(1)} \varepsilon / N^{d-2} )} modulo {1}, so that

\displaystyle \| q(d-1) h \alpha_{d-1} \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)} \varepsilon}{N^{d-2}}

for {\gg \delta^{O(1)} N} choices of {h}. Arguing as before and iterating, we obtain the claim. \Box

The above results also extend to higher dimensions. Here is the higher dimensional version of Proposition 4:

Proposition 7 (Multidimensional Weyl exponential sum estimate, inverse form) Let {k \geq 1} and {N_1,\dots,N_k \geq 1}, and let {Q_i \subset {\bf Z}} be arithmetic progressions of length at most {N_i} for each {i=1,\dots,k}. Let {P: {\bf Z}^k \rightarrow {\bf R}/{\bf Z}} be a polynomial of degrees at most {d_1,\dots,d_k} in each of the {k} variables {n_1,\dots,n_k} separately. If

\displaystyle \frac{1}{N_1 \dots N_k} |\sum_{n \in Q_1 \times \dots \times Q_k} e(P(n))| \geq \delta

for some {\delta > 0}, then there exists a subprogression {Q'_i} of {Q_i} with {|Q'_i| \gg_{k,d_1,\dots,d_k} \delta^{O_{k,d_1,\dots,d_k}(1)} N_i} for each {i=1,\dots,k} such that {P} varies by at most {\delta} on {Q'_1 \times \dots \times Q'_k}.

A much more general statement, in which the polynomial phase {n \mapsto e(P(n))} is replaced by a nilsequence, and in which one does not necessarily assume the exponential sum is small, is given in Theorem 8.6 of this paper of Ben Green and myself, but it involves far more notation to even state properly.

Proof: We induct on {k}. The case {k=1} was established in Proposition 5, so we assume that {k \geq 2} and that the claim has already been proven for {k-1}. To simplify notation we allow all implied constants to depend on {k,d_1,\dots,d_k}. We may assume that {\delta \leq c} for some small {c>0} depending only on {k,d_1,\dots,d_k}.

By a linear change of variables, we may assume that {Q_i \subset [0,N_i] \cap {\bf Z}} for all {i=1,\dots,k}.

We write {n' := (n_1,\dots,n_{k-1})}. First suppose that {N_k = O(\delta^{-O(1)})}. Then by the pigeonhole principle we can find {n_k \in I_k} such that

\displaystyle \frac{1}{N_1 \dots N_{k-1}} |\sum_{n' \in Q_1 \times \dots \times Q_{k-1}} e(P(n',n_k))| \geq \delta

and the claim then follows from the induction hypothesis. Thus we may assume that {N_k \geq \delta^{-C}} for some large {C} depending only on {k,d_1,\dots,d_k}. Similarly we may assume that {N_i \geq \delta^{-C}} for all {i=1,\dots,k}.

By the triangle inequality, we have

\displaystyle \frac{1}{N_1 \dots N_k} \sum_{n_k \in Q_k} |\sum_{n' \in Q_1 \times \dots \times Q_{k-1}} e(P(n',n_k))| \geq \delta.

The inner sum is {O(N_k)}, and the outer sum has {O(N_1 \dots N_{k-1})} terms. Thus, for {\gg \delta N_1 \dots N_{k-1}} choices of {n' \in Q_1 \times \dots \times Q_{k-1}}, one has

\displaystyle \frac{1}{N_k} |\sum_{n_k \in Q_k} e(P(n',n_k))| \gg \delta. \ \ \ \ \ (7)


We write

\displaystyle P(n',n_k) = \sum_{i_k \leq d_k} P_{i_k}(n') n_k^i

for some polynomials {P_{i_k}: {\bf Z}^{k-1} \rightarrow {\bf R}/{\bf Z}} of degrees at most {d_1,\dots,d_{k-1}} in the variables {n_1,\dots,n_{k-1}}. For each {n'} obeying (7), we apply Corollary 5 to conclude that there exists a natural number {q_{n'} \ll \delta^{-O(1)}} such that

\displaystyle \| q_{n'} P_{i_k}(n') \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N_k^{i_k}

for {i_k=1,\dots,d_k} (the claim also holds for {i_k=0} but we discard it as being trivial). By the pigeonhole principle, there thus exists a natural number {q \ll \delta^{-O(1)}} such that

\displaystyle \| q P_{i_k}(n') \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N_k^{i_k}

for all {i_k=1,\dots,d_k} and for {\gg \delta^{O(1)} N_1 \dots N_{k-1}} choices of {n' \in Q_1 \times \dots \times Q_{k-1}}. If we write

\displaystyle P_{i_k}(n') = \sum_{i_{k-1} \leq d_{k-1}} P_{i_{k-1},i_k}(n_1,\dots,n_{k-2}) n_{k-1}^{i_{k-1}},

where {P_{i_{k-1},i_k}: {\bf Z}^{k-2} \rightarrow {\bf R}/{\bf Z}} is a polynomial of degrees at most {d_1,\dots,d_{k-2}}, then for {\gg \delta^{O(1)} N_1 \dots N_{k-2}} choices of {(n_1,\dots,n_{k-2}) \in Q_1 \times \dots \times Q_{k-2}} we then have

\displaystyle \| \sum_{i_{k-1} \leq d_{k-1}} q P_{i_{k-1},i_k}(n_1,\dots,n_{k-2}) n_{k-1}^{i_{k-1}} \|_{{\bf R}/{\bf Z}} \ll \delta^{-O(1)} / N_k^{i_k}.

Applying Lemma 6 in the {n_{k-1}} and the largeness hypotheses on the {N_i} (and also the assumption that {i_k \geq 1}) we conclude (after enlarging {q} as necessary, and pigeonholing to keep {q} independent of {n_1,\dots,n_{k-2}}) that

\displaystyle \| q P_{i_{k-1},i_k}(n_1,\dots,n_{k-2}) \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)}}{N_{k-1}^{i_{k-1}} N_k^{i_k}}

for all {i_{k-1}=0,\dots,d_{k-1}} (note that we now include that {i_{k-1}=0} case, which is no longer trivial) and for {\gg \delta^{O(1)} N_1 \dots N_{k-2}} choices of {(n_1,\dots,n_{k-2}) \in Q_1 \times \dots \times Q_{k-2}}. Iterating this, we eventually conclude (after enlarging {q} as necessary) that

\displaystyle \| q \alpha_{i_1,\dots,i_k} \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O(1)}}{N_1^{i_1} \dots N_k^{i_k}} \ \ \ \ \ (8)


whenever {i_j \in \{0,\dots,d_j\}} for {j=1,\dots,k}, with {i_k} nonzero. Permuting the indices, and observing that the claim is trivial for {(i_1,\dots,i_k) = (0,\dots,0)}, we in fact obtain (8) for all {(i_1,\dots,i_k) \in \{0,\dots,d_1\} \times \dots \times \{0,\dots,d_k\}}, at which point the claim easily follows by taking {Q'_j := \{ qn_j: n_j \leq \delta^C N_j\}} for each {j=1,\dots,k}. \Box

An inspection of the proof of the above result (or alternatively, by combining the above result again with many applications of Lemma 6) reveals the following general form of Proposition 4, which was posed as Exercise 17 in this previous blog post, but had a slight misprint in it (it did not properly treat the possibility that some of the {N_j} could be small) and was a bit trickier to prove than anticipated (in fact, the reason for this post was that I was asked to supply a more detailed solution for this exercise):

Proposition 8 (Multidimensional Weyl exponential sum estimate, inverse form, II) Let {k \geq 1} be an natural number, and for each {j=1,\dots,k}, let {I_j \subset [0,N_j]_{\bf Z}} be a discrete interval for some {N_j \geq 1}. Let

\displaystyle P(n_1,\dots,n_k) = \sum_{i_1 \leq d_1, \dots, i_k \leq d_k} \alpha_{i_1,\dots,i_k} n_1^{i_1} \dots n_k^{i_k}

be a polynomial in {k} variables of multidegrees {d_1,\dots,d_k \geq 0} for some {\alpha_{i_1,\dots,i_k} \in {\bf R}/{\bf Z}}. If

\displaystyle \frac{1}{N_1 \dots N_k} |\sum_{n \in I_1 \times \dots \times I_k} e(P(n))| \geq \delta \ \ \ \ \ (9)


for some {\delta > 0}, then either

\displaystyle N_j \ll_{k,d_1,\dots,d_k} \delta^{-O_{k,d_1,\dots,d_k}(1)} \ \ \ \ \ (10)


for some {1 \leq j \leq d_k}, or else there is a natural number {q \ll_{k,d_1,\dots,d_k} \delta^{-O_{k,d_1,\dots,d_k}(1)}} such that

\displaystyle \| q\alpha_{i_1,\dots,i_k} \|_{{\bf R}/{\bf Z}} \ll_{k,d_1,\dots,d_k} \delta^{-O_d(1)} N_1^{-i_1} \dots N_k^{-i_k} \ \ \ \ \ (11)


whenever {i_j \leq d_j} for {j=1,\dots,k}.

Again, the factor of {N_1^{-i_1} \dots N_k^{-i_k}} is natural in this bound. In the {k=1} case, the option (10) may be deleted since (11) trivially holds in this case, but this simplification is no longer available for {k>1} since one needs (10) to hold for all {j} (not just one {j}) to make (11) completely trivial. Indeed, the above proposition fails for {k \geq 2} if one removes (10) completely, as can be seen for instance by inspecting the exponential sum {\sum_{n_1 \in \{0,1\}} \sum_{n_2 \in [1,N] \cap {\bf Z}} e( \alpha n_1 n_2)}, which has size comparable to {N} regardless of how irrational {\alpha} is.

Chantal David, Andrew Granville, Emmanuel Kowalski, Phillipe Michel, Kannan Soundararajan, and I are running a program at MSRI in the Spring of 2017 (more precisely, from Jan 17, 2017 to May 26, 2017) in the area of analytic number theory, with the intention to bringing together many of the leading experts in all aspects of the subject and to present recent work on the many active areas of the subject (the discussion on previous blog posts here have mostly focused on advances in the study of the distribution of the prime numbers, but there have been many other notable recent developments too, such as refinements of the circle method, a deeper understanding of the asymptotics of bounded multiplicative functions and of the “pretentious” approach to analytic number theory, more “analysis-friendly” formulations of the theorems of Deligne and others involving trace functions over fields, and new subconvexity theorems for automorphic forms, to name a few).  Like any other semester MSRI program, there will be a number of workshops, seminars, and similar activities taking place while the members are in residence.  I’m personally looking forward to the program, which should be occurring in the midst of a particularly productive time for the subject.  Needless to say, I (and the rest of the organising committee) plan to be present for most of the program.

Applications for Postdoctoral Fellowships, Research Memberships, and Research Professorships for this program (and for other MSRI programs in this time period, namely the companion program in Harmonic Analysis and the Fall program in Geometric Group Theory, as well as the complementary program in all other areas of mathematics) have just opened up today.  Applications are open to everyone (until they close on Dec 1), but require supporting documentation, such as a CV, statement of purpose, and letters of recommendation from other mathematicians; see the application page for more details.

This week I have been at a Banff workshop “Combinatorics meets Ergodic theory“, focused on the combinatorics surrounding Szemerédi’s theorem and the Gowers uniformity norms on one hand, and the ergodic theory surrounding Furstenberg’s multiple recurrence theorem and the Host-Kra structure theory on the other. This was quite a fruitful workshop, and directly inspired the various posts this week on this blog. Incidentally, BIRS being as efficient as it is, videos for this week’s talks are already online.

As mentioned in the previous two posts, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:

Theorem 1 (Inverse theorem for Gowers norms) Let {N \geq 1} and {s \geq 1} be integers, and let {\delta > 0}. Suppose that {f: {\bf Z} \rightarrow [-1,1]} is a function supported on {[N] := \{1,\dots,N\}} such that

\displaystyle \frac{1}{N^{s+2}} \sum_{n,h_1,\dots,h_{s+1} \in {\bf Z}} \prod_{\omega \in \{0,1\}^{s+1}} f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta.

Then there exists a filtered nilmanifold {G/\Gamma} of degree {\leq s} and complexity {O_{s,\delta}(1)}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and a Lipschitz function {F: G/\Gamma \rightarrow {\bf R}} of Lipschitz constant {O_{s,\delta}(1)} such that

\displaystyle \frac{1}{N} \sum_n f(n) F(g(n) \Gamma) \gg_{s,\delta} 1.

There is a higher dimensional generalisation, which first appeared explicitly (in a more general form) in this preprint of Szegedy (which used a slightly different argument than the one of Ben, Tammy, and myself; see also this previous preprint of Szegedy with related results):

Theorem 2 (Inverse theorem for multidimensional Gowers norms) Let {N \geq 1} and {s,d \geq 1} be integers, and let {\delta > 0}. Suppose that {f: {\bf Z}^d \rightarrow [-1,1]} is a function supported on {[N]^d} such that

\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n,h_1,\dots,h_{s+1} \in {\bf Z}^d} \prod_{\omega \in \{0,1\}^{s+1}} f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta. \ \ \ \ \ (1)

Then there exists a filtered nilmanifold {G/\Gamma} of degree {\leq s} and complexity {O_{s,\delta,d}(1)}, a polynomial sequence {g: {\bf Z}^d \rightarrow G}, and a Lipschitz function {F: G/\Gamma \rightarrow {\bf R}} of Lipschitz constant {O_{s,\delta,d}(1)} such that

\displaystyle \frac{1}{N^d} \sum_{n \in {\bf Z}^d} f(n) F(g(n) \Gamma) \gg_{s,\delta,d} 1.

The {d=2} case of this theorem was recently used by Wenbo Sun. One can replace the polynomial sequence with a linear sequence if desired by using a lifting trick (essentially due to Furstenberg, but which appears explicitly in Appendix C of my paper with Ben and Tammy).

In this post I would like to record a very neat and simple observation of Ben Green and Nikos Frantzikinakis, that uses the tool of Freiman isomorphisms to derive Theorem 2 as a corollary of the one-dimensional theorem. Namely, consider the linear map {\phi: {\bf Z}^d \rightarrow {\bf Z}} defined by

\displaystyle \phi( n_1,\dots,n_d ) := \sum_{i=1}^d (10 N)^{i-1} n_i,

that is to say {\phi} is the digit string base {10N} that has digits {n_d \dots n_1}. This map is a linear map from {[N]^d} to a subset of {[d 10^d N^d]} of density {1/(d10^d)}. Furthermore it has the following “Freiman isomorphism” property: if {n, h_1,\dots,h_{s+1}} lie in {{\bf Z}} with {n + \omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}} in the image set {\phi( [N]^d )} of {[N]^d} for all {\omega}, then there exist (unique) lifts {\tilde n \in {\bf Z}^d, \tilde h_1,\dots,\tilde h_{s+1} \in {\bf Z}} such that

\displaystyle \tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1} \in [N]^d


\displaystyle \phi( \tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1} ) = n + \omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}

for all {\omega}. Indeed, the injectivity of {\phi} on {[N]^d} uniquely determines the sum {\tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1}} for each {\omega}, and one can use base {10N} arithmetic to verify that the alternating sum of these sums on any {2}-facet of the cube {\{0,1\}^{s+1}} vanishes, which gives the claim. (In the language of additive combinatorics, the point is that {\phi} is a Freiman isomorphism of order (say) {8} on {[N]^d}.)

Now let {\tilde f: {\bf Z} \rightarrow [-1,1]} be the function defined by setting {\tilde f( \phi(n) ) := f(n)} whenever {n \in [N]^d}, with {\tilde f} vanishing outside of {\phi([N]^d)}. If {f} obeys (1), then from the above Freiman isomorphism property we have

\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n, h_1,\dots,h_{s+1} \in {\bf Z}} \prod_{\omega \in \{0,1\}^{s+1}} \tilde f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta.

Applying the one-dimensional inverse theorem (Theorem 1), with {\delta} reduced by a factor of {d 10^d} and {N} replaced by {d 10^d N^d}, this implies the existence of a filtered nilmanifold {G/\Gamma} of degree {\leq s} and complexity {O_{s,\delta,d}(1)}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and a Lipschitz function {F: G/\Gamma \rightarrow {\bf R}} of Lipschitz constant {O_{s,\delta,d}(1)} such that

\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n \in {\bf Z}} \tilde f(n) F(g(n) \Gamma) \gg_{s,\delta,d} 1

which by the Freiman isomorphism property again implies that

\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n \in {\bf Z}^d} f(n) F(g(\phi(n)) \Gamma) \gg_{s,\delta,d} 1.

But the map {n \mapsto g(\phi(n))} is clearly a polynomial map from {{\bf Z}^d} to {G} (the composition of two polynomial maps is polynomial, see e.g. Appendix B of my paper with Ben and Tammy), and the claim follows.

Remark 3 This trick appears to be largely restricted to the case of boundedly generated groups such as {{\bf Z}^d}; I do not see any easy way to deduce an inverse theorem for, say, {\bigcup_{n=1}^\infty {\mathbb F}_p^n} from the {{\bf Z}}-inverse theorem by this method.

Remark 4 By combining this argument with the one in the previous post, one can obtain a weak ergodic inverse theorem for {{\bf Z}^d}-actions. Interestingly, the Freiman isomorphism argument appears to be difficult to implement directly in the ergodic category; in particular, there does not appear to be an obvious direct way to derive the Host-Kra inverse theorem for {{\bf Z}^d} actions (a result first obtained in the PhD thesis of Griesmer) from the counterpart for {{\bf Z}} actions.


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 5,196 other followers