You are currently browsing the tag archive for the ‘Burgess inequality’ tag.
As in previous posts, we use the following asymptotic notation: is a parameter going off to infinity, and all quantities may depend on unless explicitly declared to be “fixed”. The asymptotic notation is then defined relative to this parameter. A quantity is said to be of polynomial size if one has , and said to be bounded if . Another convenient notation: we write for . Thus for instance the divisor bound asserts that if has polynomial size, then the number of divisors of is .
This post is intended to highlight a phenomenon unearthed in the ongoing polymath8 project (and is in fact a key component of Zhang’s proof that there are bounded gaps between primes infinitely often), namely that one can get quite good bounds on relatively short exponential sums when the modulus is smooth, through the basic technique of Weyl differencing (ultimately based on the Cauchy-Schwarz inequality, and also related to the van der Corput lemma in equidistribution theory). Improvements in the case of smooth moduli have appeared before in the literature (e.g. in this paper of Heath-Brown, paper of Graham and Ringrose, this later paper of Heath-Brown, this paper of Chang, or this paper of Goldmakher); the arguments here are particularly close to that of the first paper of Heath-Brown. It now also appears that further optimisation of this Weyl differencing trick could lead to noticeable improvements in the numerology for the polymath8 project, so I am devoting this post to explaining this trick further.
To illustrate the method, let us begin with the classical problem in analytic number theory of estimating an incomplete character sum
where is a primitive Dirichlet character of some conductor , is an integer, and is some quantity between and . Clearly we have the trivial bound
we also have the classical Pólya-Vinogradov inequality
This latter inequality gives improvements over the trivial bound when is much larger than , but not for much smaller than . The Pólya-Vinogradov inequality can be deduced via a little Fourier analysis from the completed exponential sum bound
for any , where . (In fact, from the classical theory of Gauss sums, this exponential sum is equal to for some complex number of norm .)
In the case when is a prime, improving upon the above two inequalities is an important but difficult problem, with only partially satisfactory results so far. To give just one indication of the difficulty, the seemingly modest improvement
to the Pólya-Vinogradov inequality when was a prime required a 14-page paper in Inventiones by Montgomery and Vaughan to prove, and even then it was only conditional on the generalised Riemann hypothesis! See also this more recent paper of Granville and Soundararajan for an unconditional variant of this result in the case that has odd order.
Another important improvement is the Burgess bound, which in our notation asserts that
for any fixed integer , assuming that is square-free (for simplicity) and of polynomial size; see this previous post for a discussion of the Burgess argument. This is non-trivial for as small as .
In the case when is prime, there has been very little improvement to the Burgess bound (or its Fourier dual, which can give bounds for as large as ) in the last fifty years; an improvement to the exponents in (3) in this case (particularly anything that gave a power saving for below ) would in fact be rather significant news in analytic number theory.
However, in the opposite case when is smooth – that is to say, all of its factors are much smaller than – then one can do better than the Burgess bound in some regimes. This fact has been observed in several places in the literature (in particular, in the papers of Heath-Brown, Graham-Ringrose, Chang, and Goldmakher mentioned previously), but also turns out to (implicitly) be a key insight in Zhang’s paper on bounded prime gaps. In the case of character sums, one such improved estimate (closely related to Theorem 2 of the Heath-Brown paper) is as follows:
This proposition is particularly powerful when is smooth, as this gives many factorisations with the ability to specify with a fair amount of accuracy. For instance, if is -smooth (i.e. all prime factors are at most ), then by the greedy algorithm one can find a divisor of with ; if we set , then , and the above proposition then gives
which can improve upon the Burgess bound when is small. For instance, if , then this bound becomes ; in contrast the Burgess bound only gives for this value of (using the optimal choice for ), which is inferior for .
The hypothesis that be squarefree may be relaxed, but for applications to the Polymath8 project, it is only the squarefree moduli that are relevant.
We use the method of Weyl differencing, the key point being to difference in multiples of .
Let , thus . For any , we have
By the Chinese remainder theorem, we may factor
where are primitive characters of conductor respectively. As is periodic of period , we thus have
and so we can take out of the inner summation of the right-hand side of (4) to obtain
and hence by the triangle inequality
Note how the characters on the right-hand side only have period rather than . This reduction in the period is ultimately the source of the saving over the Pólya-Vinogradov inequality.
Note that the inner sum vanishes unless , which is an interval of length by choice of . Thus by Cauchy-Schwarz one has
We expand the right-hand side as
We first consider the diagonal contribution . In this case we use the trivial bound for the inner summation, and we soon see that the total contribution here is .
Now we consider the off-diagonal case; by symmetry we can take . Then the indicator functions restrict to the interval . On the other hand, as a consequence of the Weil conjectures for curves one can show that
for any ; indeed one can use the Chinese remainder theorem and the square-free nature of to reduce to the case when is prime, in which case one can apply (for instance) the original paper of Weil to establish this bound, noting also that and are coprime since is squarefree. Applying the method of completion of sums (or the Parseval formula), this shows that
Summing in (using Lemma 5 from this previous post) we see that the total contribution to the off-diagonal case is
which simplifies to . The claim follows.
A modification of the above argument (using more complicated versions of the Weil conjectures) allows one to replace the summand by more complicated summands such as for some polynomials or rational functions of bounded degree and obeying a suitable non-degeneracy condition (after restricting of course to those for which the arguments are well-defined). We will not detail this here, but instead turn to the question of estimating slightly longer exponential sums, such as
where should be thought of as a little bit larger than .
A large portion of analytic number theory is concerned with the distribution of number-theoretic sets such as the primes, or quadratic residues in a certain modulus. At a local level (e.g. on a short interval ), the behaviour of these sets may be quite irregular. However, in many cases one can understand the global behaviour of such sets on very large intervals, (e.g. ), with reasonable accuracy (particularly if one assumes powerful additional conjectures, such as the Riemann hypothesis and its generalisations). For instance, in the case of the primes, we have the prime number theorem, which asserts that the number of primes in a large interval is asymptotically equal to ; in the case of quadratic residues modulo a prime , it is clear that there are exactly such residues in . With elementary arguments, one can also count statistics such as the number of pairs of consecutive quadratic residues; and with the aid of deeper tools such as the Weil sum estimates, one can count more complex patterns in these residues also (e.g. -point correlations).
One is often interested in converting this sort of “global” information on long intervals into “local” information on short intervals. If one is interested in the behaviour on a generic or average short interval, then the question is still essentially a global one, basically because one can view a long interval as an average of a long sequence of short intervals. (This does not mean that the problem is automatically easy, because not every global statistic about, say, the primes is understood. For instance, we do not know how to rigorously establish the conjectured asymptotic for the number of twin primes in a long interval , and so we do not fully understand the local distribution of the primes in a typical short interval .)
However, suppose that instead of understanding the average-case behaviour of short intervals, one wants to control the worst-case behaviour of such intervals (i.e. to establish bounds that hold for all short intervals, rather than most short intervals). Then it becomes substantially harder to convert global information to local information. In many cases one encounters a “square root barrier”, in which global information at scale (e.g. statistics on ) cannot be used to say anything non-trivial about a fixed (and possibly worst-case) short interval at scales or below. (Here we ignore factors of for simplicity.) The basic reason for this is that even randomly distributed sets in (which are basically the most uniform type of global distribution one could hope for) exhibit random fluctuations of size or so in their global statistics (as can be seen for instance from the central limit theorem). Because of this, one could take a random (or pseudorandom) subset of and delete all the elements in a short interval of length , without anything suspicious showing up on the global statistics level; the edited set still has essentially the same global statistics as the original set. On the other hand, the worst-case behaviour of this set on a short interval has been drastically altered.
One stark example of this arises when trying to control the largest gap between consecutive prime numbers in a large interval . There are convincing heuristics that suggest that this largest gap is of size (Cramér’s conjecture). But even assuming the Riemann hypothesis, the best upper bound on this gap is only of size , basically because of this square root barrier. This particular instance of the square root barrier is a significant obstruction to the current polymath project “Finding primes“.
On the other hand, in some cases one can use additional tricks to get past the square root barrier. The key point is that many number-theoretic sequences have special structure that distinguish them from being exactly like random sets. For instance, quadratic residues have the basic but fundamental property that the product of two quadratic residues is again a quadratic residue. One way to use this sort of structure to amplify bad behaviour in a single short interval into bad behaviour across many short intervals. Because of this amplification, one can sometimes get new worst-case bounds by tapping the average-case bounds.
In this post I would like to indicate a classical example of this type of amplification trick, namely Burgess’s bound on short character sums. To narrow the discussion, I would like to focus primarily on the following classical problem:
Problem 1 What are the best bounds one can place on the first quadratic non-residue in the interval for a large prime ?
(The first quadratic residue is, of course, ; the more interesting problem is the first quadratic non-residue.)
Probabilistic heuristics (presuming that each non-square integer has a 50-50 chance of being a quadratic residue) suggests that should have size , and indeed Vinogradov conjectured that for any . Using the Pólya-Vinogradov inequality, one can get the bound (and can improve it to using smoothed sums); combining this with a sieve theory argument (exploiting the multiplicative nature of quadratic residues) one can boost this to . Inserting Burgess’s amplification trick one can boost this to for any . Apart from refinements to the factor, this bound has stood for five decades as the “world record” for this problem, which is a testament to the difficulty in breaching the square root barrier.
Note: in order not to obscure the presentation with technical details, I will be using asymptotic notation in a somewhat informal manner.