You are currently browsing the category archive for the ‘math.NT’ category.

A basic estimate in multiplicative number theory (particularly if one is using the Granville-Soundararajan “pretentious” approach to this subject) is the following inequality of Halasz (formulated here in a quantitative form introduced by Montgomery and Tenenbaum).

Theorem 1 (Halasz inequality)Let be a multiplicative function bounded in magnitude by , and suppose that , , and are such that

As a qualitative corollary, we conclude (by standard compactness arguments) that if

as . In the more recent work of this paper of Granville and Soundararajan, the sharper bound

is obtained (with a more precise description of the term).

The usual proofs of Halasz’s theorem are somewhat lengthy (though there has been a recent simplification, in forthcoming work of Granville, Harper, and Soundarajan). Below the fold I would like to give a relatively short proof of the following “cheap” version of the inequality, which has slightly weaker quantitative bounds, but still suffices to give qualitative conclusions such as (2).

Theorem 2 (Cheap Halasz inequality)Let be a multiplicative function bounded in magnitude by . Let and , and suppose that is sufficiently large depending on . If (1) holds for all , then

The non-optimal exponent can probably be improved a bit by being more careful with the exponents, but I did not try to optimise it here. A similar bound appears in the first paper of Halasz on this topic.

The idea of the argument is to split as a Dirichlet convolution where is the portion of coming from “small”, “medium”, and “large” primes respectively (with the dividing line between the three types of primes being given by various powers of ). Using a Perron-type formula, one can express this convolution in terms of the product of the Dirichlet series of respectively at various complex numbers with . One can use based estimates to control the Dirichlet series of , while using the hypothesis (1) one can get estimates on the Dirichlet series of . (This is similar to the Fourier-analytic approach to ternary additive problems, such as Vinogradov’s theorem on representing large odd numbers as the sum of three primes.) This idea was inspired by a similar device used in the work of Granville, Harper, and Soundarajan. A variant of this argument also appears in unpublished work of Adam Harper.

I thank Andrew Granville for helpful comments which led to significant simplifications of the argument.

Kevin Ford, James Maynard, and I have uploaded to the arXiv our preprint “Chains of large gaps between primes“. This paper was announced in our previous paper with Konyagin and Green, which was concerned with the largest gap

between consecutive primes up to , in which we improved the Rankin bound of

to

for large (where we use the abbreviations , , and ). Here, we obtain an analogous result for the quantity

which measures how far apart the gaps between chains of consecutive primes can be. Our main result is

whenever is sufficiently large depending on , with the implied constant here absolute (and effective). The factor of is inherent to the method, and related to the basic probabilistic fact that if one selects numbers at random from the unit interval , then one expects the minimum gap between adjacent numbers to be about (i.e. smaller than the mean spacing of by an additional factor of ).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

for an infinite sequence of going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval which contained a block of at least primes, each separated from each other by at least (ultimately, will be something like and something like ). One can do this by the probabilistic method: pick to be a random large natural number (with the precise distribution to be chosen later), and try to lower bound the probability that the interval contains at least primes, no two of which are within of each other.

By carefully choosing the residue class of with respect to small primes, one can eliminate several of the from consideration of being prime immediately. For instance, if is chosen to be large and even, then the with even have no chance of being prime and can thus be eliminated; similarly if is large and odd, then cannot be prime for any odd . Using the methods of our previous paper, we can find a residue class (where is a product of a large number of primes) such that, if one chooses to be a large random element of (that is, for some large random integer ), then the set of shifts for which still has a chance of being prime has size comparable to something like ; furthermore this set is fairly well distributed in in the sense that it does not concentrate too strongly in any short subinterval of . The main new difficulty, not present in the previous paper, is to get *lower* bounds on the size of in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift , is going to be prime with probability comparable to , so one expects about primes in the set . An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct , the probability that and are both prime is . Using this and some routine second moment calculations, one can then show that with large probability, the set will indeed contain about primes, no two of which are closer than to each other; with no other numbers in this interval being prime, this gives a lower bound on .

Klaus Roth, who made fundamental contributions to analytic number theory, died this Tuesday, aged 90.

I never met or communicated with Roth personally, but was certainly influenced by his work; he wrote relatively few papers, but they tended to have outsized impact. For instance, he was one of the key people (together with Bombieri) to work on simplifying and generalising the large sieve, taking it from the technically formidable original formulation of Linnik and Rényi to the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper proving what is now known as Roth’s theorem on arithmetic progressions:

Theorem 1 (Roth’s theorem on arithmetic progressions)Let be a set of natural numbers of positive upper density (thus ). Then contains infinitely many arithmetic progressions of length three (with non-zero of course).

At the heart of Roth’s elegant argument was the following (surprising at the time) dichotomy: if had some moderately large density within some arithmetic progression , either one could use Fourier-analytic methods to detect the presence of an arithmetic progression of length three inside , or else one could locate a long subprogression of on which had increased density. Iterating this dichotomy by an argument now known as the *density increment argument*, one eventually obtains Roth’s theorem, no matter which side of the dichotomy actually holds. This argument (and the many descendants of it), based on various “dichotomies between structure and randomness”, became essential in many other results of this type, most famously perhaps in Szemerédi’s proof of his celebrated theorem on arithmetic progressions that generalised Roth’s theorem to progressions of arbitrary length. More recently, my recent work on the Chowla and Elliott conjectures that was a crucial component of the solution of the Erdös discrepancy problem, relies on an *entropy decrement argument* which was directly inspired by the density increment argument of Roth.

The Erdös discrepancy problem also is connected with another well known theorem of Roth:

Theorem 2 (Roth’s discrepancy theorem for arithmetic progressions)Let be a sequence in . Then there exists an arithmetic progression in with positive such thatfor an absolute constant .

In fact, Roth proved a stronger estimate regarding mean square discrepancy, which I am not writing down here; as with the Roth theorem in arithmetic progressions, his proof was short and Fourier-analytic in nature (although non-Fourier-analytic proofs have since been found, for instance the semidefinite programming proof of Lovasz). The exponent is known to be sharp (a result of Matousek and Spencer).

As a particular corollary of the above theorem, for an infinite sequence of signs, the sums are unbounded in . The Erdös discrepancy problem asks whether the same statement holds when is restricted to be zero. (Roth also established discrepancy theorems for other sets, such as rectangles, which will not be discussed here.)

Finally, one has to mention Roth’s most famous result, cited for instance in his Fields medal citation:

Theorem 3 (Roth’s theorem on Diophantine approximation)Let be an irrational algebraic number. Then for any there is a quantity such that

From the Dirichlet approximation theorem (or from the theory of continued fractions) we know that the exponent in the denominator cannot be reduced to or below. A classical and easy theorem of Liouville gives the claim with the exponent replaced by the degree of the algebraic number ; work of Thue and Siegel reduced this exponent, but Roth was the one who obtained the near-optimal result. An important point is that the constant is *ineffective* – it is a major open problem in Diophantine approximation to produce any bound significantly stronger than Liouville’s theorem with effective constants. This is because the proof of Roth’s theorem does not exclude any *single* rational from being close to , but instead very ingeniously shows that one cannot have *two* different rationals , that are unusually close to , even when the denominators are very different in size. (I refer to this sort of argument as a “dueling conspiracies” argument; they are strangely prevalent throughout analytic number theory.)

Chantal David, Andrew Granville, Emmanuel Kowalski, Phillipe Michel, Kannan Soundararajan, and I are running a program at MSRI in the Spring of 2017 (more precisely, from Jan 17, 2017 to May 26, 2017) in the area of analytic number theory, with the intention to bringing together many of the leading experts in all aspects of the subject and to present recent work on the many active areas of the subject (e.g. the distribution of the prime numbers, refinements of the circle method, a deeper understanding of the asymptotics of bounded multiplicative functions (and applications to Erdos discrepancy type problems!) and of the “pretentious” approach to analytic number theory, more “analysis-friendly” formulations of the theorems of Deligne and others involving trace functions over fields, and new subconvexity theorems for automorphic forms, to name a few). Like any other semester MSRI program, there will be a number of workshops, seminars, and similar activities taking place while the members are in residence. I’m personally looking forward to the program, which should be occurring in the midst of a particularly productive time for the subject. Needless to say, I (and the rest of the organising committee) plan to be present for most of the program.

Applications for Postdoctoral Fellowships and Research Memberships for this program (and for other MSRI programs in this time period, namely the companion program in Harmonic Analysis and the Fall program in Geometric Group Theory, as well as the complementary program in all other areas of mathematics) remain open until Dec 1. Applications are open to everyone, but require supporting documentation, such as a CV, statement of purpose, and letters of recommendation from other mathematicians; see the application page for more details.

The Chowla conjecture asserts, among other things, that one has the asymptotic

as for any distinct integers , where is the Liouville function. (The usual formulation of the conjecture also allows one to consider more general linear forms than the shifts , but for sake of discussion let us focus on the shift case.) This conjecture remains open for , though there are now some partial results when one averages either in or in the , as discussed in this recent post.

A natural generalisation of the Chowla conjecture is the Elliott conjecture. Its original formulation was basically as follows: one had

whenever were bounded completely multiplicative functions and were distinct integers, and one of the was “non-pretentious” in the sense that

for all Dirichlet characters and real numbers . It is easy to see that some condition like (2) is necessary; for instance if and has period then can be verified to be bounded away from zero as .

In a previous paper with Matomaki and Radziwill, we provided a counterexample to the original formulation of the Elliott conjecture, and proposed that (2) be replaced with the stronger condition

as for any Dirichlet character . To support this conjecture, we proved an averaged and non-asymptotic version of this conjecture which roughly speaking showed a bound of the form

whenever was an arbitrarily slowly growing function of , was sufficiently large (depending on and the rate at which grows), and one of the obeyed the condition

for some that was sufficiently large depending on , and all Dirichlet characters of period at most . As further support of this conjecture, I recently established the bound

under the same hypotheses, where is an arbitrarily slowly growing function of .

In view of these results, it is tempting to conjecture that the condition (4) for one of the should be sufficient to obtain the bound

when is large enough depending on . This may well be the case for . However, the purpose of this blog post is to record a simple counterexample for . Let’s take for simplicity. Let be a quantity much larger than but much smaller than (e.g. ), and set

For , Taylor expansion gives

and

and hence

and hence

On the other hand one can easily verify that all of the obey (4) (the restriction there prevents from getting anywhere close to ). So it seems the correct non-asymptotic version of the Elliott conjecture is the following:

Conjecture 1 (Non-asymptotic Elliott conjecture)Let be a natural number, and let be integers. Let , let be sufficiently large depending on , and let be sufficiently large depending on . Let be bounded multiplicative functions such that for some , one hasfor all Dirichlet characters of conductor at most . Then

The case of this conjecture follows from the work of Halasz; in my recent paper a logarithmically averaged version of the case of this conjecture is established. The requirement to take to be as large as does not emerge in the averaged Elliott conjecture in my previous paper with Matomaki and Radziwill; it thus seems that this averaging has concealed some of the subtler features of the Elliott conjecture. (However, this subtlety does not seem to affect the asymptotic version of the conjecture formulated in that paper, in which the hypothesis is of the form (3), and the conclusion is of the form (1).)

A similar subtlety arises when trying to control the maximal integral

In my previous paper with Matomaki and Radziwill, we could show that easier expression

was small (for a slowly growing function of ) if was bounded and completely multiplicative, and one had a condition of the form

for some large . However, to obtain an analogous bound for (5) it now appears that one needs to strengthen the above condition to

in order to address the counterexample in which for some between and . This seems to suggest that proving (5) (which is closely related to the case of the Chowla conjecture) could in fact be rather difficult; the estimation of (6) relied primarily of prior work of Matomaki and Radziwill which used the hypothesis (7), but as this hypothesis is not sufficient to conclude (5), some additional input must also be used.

Let and be two random variables taking values in the same (discrete) range , and let be some subset of , which we think of as the set of “bad” outcomes for either or . If and have the same probability distribution, then clearly

In particular, if it is rare for to lie in , then it is also rare for to lie in .

If and do not have exactly the same probability distribution, but their probability distributions are *close* to each other in some sense, then we can expect to have an approximate version of the above statement. For instance, from the definition of the total variation distance between two random variables (or more precisely, the total variation distance between the probability distributions of two random variables), we see that

for any . In particular, if it is rare for to lie in , and are close in total variation, then it is also rare for to lie in .

A basic inequality in information theory is Pinsker’s inequality

where the Kullback-Leibler divergence is defined by the formula

(See this previous blog post for a proof of this inequality.) A standard application of Jensen’s inequality reveals that is non-negative (Gibbs’ inequality), and vanishes if and only if , have the same distribution; thus one can think of as a measure of how close the distributions of and are to each other, although one should caution that this is not a symmetric notion of distance, as in general. Inserting Pinsker’s inequality into (1), we see for instance that

Thus, if is close to in the Kullback-Leibler sense, and it is rare for to lie in , then it is rare for to lie in as well.

We can specialise this inequality to the case when a uniform random variable on a finite range of some cardinality , in which case the Kullback-Leibler divergence simplifies to

where

is the Shannon entropy of . Again, a routine application of Jensen’s inequality shows that , with equality if and only if is uniformly distributed on . The above inequality then becomes

Thus, if is a small fraction of (so that it is rare for to lie in ), and the entropy of is very close to the maximum possible value of , then it is rare for to lie in also.

The inequality (2) is only useful when the entropy is close to in the sense that , otherwise the bound is worse than the trivial bound of . In my recent paper on the Chowla and Elliott conjectures, I ended up using a variant of (2) which was still non-trivial when the entropy was allowed to be smaller than . More precisely, I used the following simple inequality, which is implicit in the arguments of that paper but which I would like to make more explicit in this post:

Lemma 1 (Pinsker-type inequality)Let be a random variable taking values in a finite range of cardinality , let be a uniformly distributed random variable in , and let be a subset of . Then

*Proof:* Consider the conditional entropy . On the one hand, we have

by Jensen’s inequality. On the other hand, one has

where we have again used Jensen’s inequality. Putting the two inequalities together, we obtain the claim.

Remark 2As noted in comments, this inequality can be viewed as a special case of the more general inequalityfor arbitrary random variables taking values in the same discrete range , which follows from the data processing inequality

for arbitrary functions , applied to the indicator function . Indeed one has

where is the entropy function.

Thus, for instance, if one has

and

for some much larger than (so that ), then

More informally: if the entropy of is *somewhat* close to the maximum possible value of , and it is *exponentially* rare for a uniform variable to lie in , then it is still *somewhat* rare for to lie in . The estimate given is close to sharp in this regime, as can be seen by calculating the entropy of a random variable which is uniformly distributed inside a small set with some probability and uniformly distributed outside of with probability , for some parameter .

It turns out that the above lemma combines well with concentration of measure estimates; in my paper, I used one of the simplest such estimates, namely Hoeffding’s inequality, but there are of course many other estimates of this type (see e.g. this previous blog post for some others). Roughly speaking, concentration of measure inequalities allow one to make approximations such as

with exponentially high probability, where is a uniform distribution and is some reasonable function of . Combining this with the above lemma, we can then obtain approximations of the form

with somewhat high probability, if the entropy of is somewhat close to maximum. This observation, combined with an “entropy decrement argument” that allowed one to arrive at a situation in which the relevant random variable did have a near-maximum entropy, is the key new idea in my recent paper; for instance, one can use the approximation (3) to obtain an approximation of the form

for “most” choices of and a suitable choice of (with the latter being provided by the entropy decrement argument). The left-hand side is tied to Chowla-type sums such as through the multiplicativity of , while the right-hand side, being a linear correlation involving two parameters rather than just one, has “finite complexity” and can be treated by existing techniques such as the Hardy-Littlewood circle method. One could hope that one could similarly use approximations such as (3) in other problems in analytic number theory or combinatorics.

I’ve just uploaded two related papers to the arXiv:

- The logarithmically averaged Chowla and Elliott conjectures for two-point correlations, submitted to Forum of Mathematics, Pi; and
- The Erdos discrepancy problem, submitted to the new arXiv overlay journal, Discrete Analysis (see this recent announcement on Tim Gowers’ blog).

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture)Let be natural numbers, and let be integers such that . Let be a quantity depending on that goes to infinity as . Let denote the Liouville function. Then one has

For comparison, the non-averaged Chowla conjecture would imply that

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence takes values in the unit sphere of an arbitrary Hilbert space, rather than in .

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity , we conclude that

is also large and positive for all primes that are not too large; note here how the logarithmic averaging allows us to leave the constraint unchanged. Summing in , we conclude that

is large and positive for any given set of medium-sized primes. By a standard averaging argument, this implies that

is large for many choices of , where is a medium-sized parameter at our disposal to choose, and we take to be some set of primes that are somewhat smaller than . (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view as a random variable, in which case (4) is essentially a bilinear sum of the random sequence along a random graph on , in which two vertices are connected if they differ by a prime in that divides . A key difficulty in controlling this sum is that for randomly chosen , the sequence and the graph need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter (in some suitable range ) for which the sequence and the graph exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence has significant mutual information with , then the entropy of the sequence for will grow a little slower than linearly, due to the fact that the graph has zero entropy (knowledge of more or less completely determines the shifts of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence and the graph . Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series diverges, which is only barely true).

Once one locates a scale with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

one ends up having to consider expressions such as

where is the coefficient . When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient is identically , and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, is large when lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions , the coefficients are more or less arbitrary; the large values of are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations (up to the uncertainty mandated by the Heisenberg uncertainty principle) where is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

where goes to infinity and is a very slowly growing function of . This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher than , the same line of analysis requires one to replace the linear phase by more complicated phases, such as quadratic phases or even -step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all , which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

The Chowla conjecture asserts that all non-trivial correlations of the Liouville function are asymptotically negligible; for instance, it asserts that

as for any fixed natural number . This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

For such correlations, the conjecture was that one had

as for any natural number , as long as was a completely multiplicative function with magnitude bounded by , and such that

for any Dirichlet character and any real number . In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function does not “pretend” to be like the completely multiplicative function for any character and real number . A condition of this form is necessary; for instance, if is precisely equal to and has period , then is equal to as and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample having the property that “pretends” *locally* to be the function for in various intervals , where and go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

as for all fixed Dirichlet characters . In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the parameter does not go to infinity, or equivalently that the function is permitted to depend on :

Conjecture 1 (Non-asymptotic Elliott conjecture)Let , let be sufficiently large depending on , and let be sufficiently large depending on . Suppose that is a completely multiplicative function with magnitude bounded by , such thatfor all Dirichlet characters of period at most . Then one has

for all natural numbers .

The -dependent factor in the constraint is necessary, as can be seen by considering the completely multiplicative function (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture)For any function , the discrepancyis infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character that takes values in rather than . For this function, one has from the complete multiplicativity of that

If denotes the period of , then has mean zero on every interval of length , and thus

Thus has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because can take the value . Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let be the non-principal Dirichlet character of period (thus equals when , when , and when ), and define the completely multiplicative function by setting when and . This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

with for some large . Writing with coprime to and at most , we can write this sum as

Now observe that . The function has mean zero on every interval of length three, and is equal to mod , and thus

for every , and thus

Thus also has unbounded discrepancy, but only barely so (it grows logarithmically in ). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy)Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions . An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then must pretend to be like a function of the form . One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the *Möbius pseudorandomness principle* that asserts that the Möbius function is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function *is* strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, *in principle*, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1At least one of the following two statements are true:

- (Twin prime conjecture) There are infinitely many primes such that is also prime.
- (No Siegel zeroes) There exists a constant such that for every real Dirichlet character of conductor , the associated Dirichlet -function has no zeroes in the interval .

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

for some large value of , where is the von Mangoldt function. Actually, in this post we will work with the slight variant

where

is the second von Mangoldt function, and denotes Dirichlet convolution, and is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval and remove very small primes from , but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve is essentially replaced by the more combinatorial restriction for some large , where is the primorial of , but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero with close to and a Dirichlet character of conductor , then multiplicative number theory methods can be used to show that the Möbius function “pretends” to be like the character in the sense that for “most” primes near (e.g. in the range for some small and large ). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that pretends to be like can be used to construct a tractable approximation (after inserting the sieve weight ) in the range (where for some large ) for the second von Mangoldt function , namely the function

Roughly speaking, we think of the periodic function and the slowly varying function as being of about the same “complexity” as the constant function , so that is roughly of the same “complexity” as the divisor function

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate to accuracy with little difficulty, whereas to obtain a comparable level of accuracy for or is essentially the Riemann hypothesis.)

One expects to be a good approximant to if is of size and has no prime factors less than for some large constant . The Selberg sieve will be mostly supported on numbers with no prime factor less than . As such, one can hope to approximate (1) by the expression

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation with in various ranges; this is clearly related to understanding the equidistribution of the hyperbola in . Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

where denotes the inverse of in . One can then use the Weil bound

where is the greatest common divisor of (with the convention that this is equal to if vanish), and the decays to zero as . The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that with large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound)One haswhenever and are coprime to , where the is with respect to the limit (and is uniform in ).

*Proof:* Observe from change of variables that the Kloosterman sum is unchanged if one replaces with for . For fixed , the number of such pairs is at least , thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

The left-hand side can be rearranged as

which by Fourier summation is equal to

Observe from the quadratic formula and the divisor bound that each pair has at most solutions to the system of equations . Hence the number of quadruples of the desired form is , and the claim follows.

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound)Let be a primitive real Dirichlet character of conductor , and let . Then

*Proof:* As is the conductor of a primitive real Dirichlet character, is equal to times a squarefree odd number for some . By the Chinese remainder theorem, it thus suffices to establish the claim when is an odd prime. We may assume that is not divisible by this prime , as the claim is trivial otherwise. If vanishes then does not vanish, and the claim follows from the mean zero nature of ; similarly if vanishes. Hence we may assume that do not vanish, and then we can normalise them to equal . By completing the square it now suffices to show that

whenever . As is on the quadratic residues and on the non-residues, it now suffices to show that

But by making the change of variables , the left-hand side becomes , and the claim follows.

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of . These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

The Poincaré upper half-plane (with a boundary consisting of the real line together with the point at infinity ) carries an action of the projective special linear group

via fractional linear transformations:

Here and in the rest of the post we will abuse notation by identifying elements of the special linear group with their equivalence class in ; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix by its negation . In particular, we recommend that the reader ignore the signs that appear from time to time in the discussion below.

As the action of on is transitive, and any given point in (e.g. ) has a stabiliser isomorphic to the projective rotation group , we can view the Poincaré upper half-plane as a homogeneous space for , and more specifically the quotient space of of a maximal compact subgroup . In fact, we can make the half-plane a symmetric space for , by endowing with the Riemannian metric

(using Cartesian coordinates ), which is invariant with respect to the action. Like any other Riemannian metric, the metric on generates a number of other important geometric objects on , such as the distance function which can be computed to be given by the formula

the volume measure , which can be computed to be

and the Laplace-Beltrami operator, which can be computed to be (here we use the negative definite sign convention for ). As the metric was -invariant, all of these quantities arising from the metric are similarly -invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant , thus is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere in is a model for two-dimensional spherical geometry (or is a model for two-dimensional Euclidean geometry). (Indeed, is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid in the Minkowski spacetime , which is the direct analogue of the unit sphere in Euclidean spacetime or the plane in Galilean spacetime .)

One can inject arithmetic into this geometric structure by passing from the Lie group to the full modular group

or congruence subgroups such as

for natural number , or to the discrete stabiliser of the point at infinity:

These are discrete subgroups of , nested by the subgroup inclusions

There are many further discrete subgroups of (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup of generates a quotient space , which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain – a set consisting of a single representative of each of the orbits of in . This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure on from the volume measure on (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as when there is no chance of confusion.

For instance, a fundamental domain for is given (up to null sets) by the strip , with identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for is famously given (again up to null sets) by an upper portion , with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for can be formed by gluing together

copies of a fundamental domain for in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces . In order to analyse functions on such orbifolds, it is convenient to lift such functions back up to and identify them with functions which are *-automorphic* in the sense that for all and . Such functions will be referred to as -automorphic forms, or *automorphic forms* for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over than the trivial bundle that is implicitly present when analysing scalar functions . However, we will not discuss this (important) more general situation here.)

An important way to create a -automorphic form is to start with a non-automorphic function obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series defined by

which is clearly -automorphic. (One could equivalently write in place of here; there are good argument for both conventions, but I have ultimately decided to use the convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over associated with -automorphic forms. A little more generally, given a subgroup of and a -automorphic function of suitable decay, we can form a relative Poincaré series by

where is any fundamental domain for , that is to say a subset of consisting of exactly one representative for each right coset of . As is -automorphic, we see (if has suitable decay) that does not depend on the precise choice of fundamental domain, and is -automorphic. These operations are all compatible with each other, for instance . A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function .

For future reference we record the basic but fundamental *unfolding identities*

for any function with sufficient decay, and any -automorphic function of reasonable growth (e.g. bounded and compact support, and bounded, will suffice). Note that is viewed as a function on on the left-hand side, and as a -automorphic function on on the right-hand side. More generally, one has

whenever are discrete subgroups of , is a -automorphic function with sufficient decay on , and is a -automorphic (and thus also -automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains , , and in our analysis.

When computing various statistics of a Poincaré series , such as its values at special points , or the quantity , expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

where is the divisor function. This can be rewritten (by factoring and ) as

which is basically a sum over the full modular group . At this point we will “cheat” a little by moving to the related, but different, sum

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between and is given by the formula

and so one can express the above sum as

(the factor of coming from the quotient by in the projective special linear group); one can express this as , where and is the indicator function of the ball . Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

of the sum (7). Note from multiplicativity that (7) can be written as , which is superficially very similar to (10), but with the key difference that the polynomial is irreducible over the integers.

As with (7), we may expand (10) as

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation of into Gaussian integers gives rise (upon taking norms) to an identity of the form , where and . Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form gives rise to a factorisation of the form , essentially uniquely up to units. Now note that is of the form if and only if , in which case . Thus we can essentially write the above sum as something like

and one the modular group is now manifest. An equivalent way to see these manipulations is as follows. A triple of natural numbers with gives rise to a positive quadratic form of normalised discriminant equal to with integer coefficients (it is natural here to allow to take integer values rather than just natural number values by essentially doubling the sum). The group acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse of an element of ). Because the discriminant has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form in this space is equivalent (under the action of some element of ) with the standard quadratic form . In other words, one has

which (up to a harmless sign) is exactly the representation , , introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that has real part and imaginary part . Thus (11) is (up to a factor of two) the Poincaré series as in the preceding example, except that is now the indicator of the sector .

Sums involving subgroups of the full modular group, such as , often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression when and are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over rather than .

The third and final example concerns averages of Kloosterman sums

where and is the inverse of in the multiplicative group . It turns out that the norms of Poincaré series or are closely tied to such averages. Consider for instance the quantity

where is a natural number and is a -automorphic form that is of the form

for some integer and some test function , which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

To compute this, we use the double coset decomposition

where for each , are arbitrarily chosen integers such that . To see this decomposition, observe that every element in outside of can be assumed to have by applying a sign , and then using the row and column operations coming from left and right multiplication by (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place in the interval and to be any specified integer pair with . From this we see that

and so from further use of the unfolding formula (5) we may expand (13) as

The first integral is just . The second expression is more interesting. We have

so we can write

as

which on shifting by simplifies a little to

and then on scaling by simplifies a little further to

Note that as , we have modulo . Comparing the above calculations with (12), we can thus write (13) as

is a certain integral involving and a parameter , but which does not depend explicitly on parameters such as . Thus we have indeed expressed the expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator on spaces such as or , so that a Poincaré series such as might be expanded out using inner products of (or, by the unfolding identities, ) with various generalised eigenfunctions of (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions of the second kind, play a prominent role, basically because the -automorphic functions

for and non-zero are generalised eigenfunctions of (with eigenvalue ), and are almost square-integrable on (the norm diverges only logarithmically at one end of the cylinder , while decaying exponentially fast at the other end ).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation , the heat equation , the Schrödinger equation , or the wave equation . Thus, one can hope to rephrase many arguments that involve spectral data of into arguments that instead involve resolvents , heat kernels , Schrödinger propagators , or wave propagators , or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

Actually it will be a bit more convenient to normalise the Laplacian by , and look instead at the *automorphic wave equation*

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance and are -automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series and will be -automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

- Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for on (discussed previously here);
- Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s bound to the optimal ) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
- Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

## Recent Comments