A large portion of analytic number theory is concerned with the distribution of number-theoretic sets such as the primes, or quadratic residues in a certain modulus. At a local level (e.g. on a short interval ), the behaviour of these sets may be quite irregular. However, in many cases one can understand the global behaviour of such sets on very large intervals, (e.g. ), with reasonable accuracy (particularly if one assumes powerful additional conjectures, such as the Riemann hypothesis and its generalisations). For instance, in the case of the primes, we have the prime number theorem, which asserts that the number of primes in a large interval is asymptotically equal to ; in the case of quadratic residues modulo a prime , it is clear that there are exactly such residues in . With elementary arguments, one can also count statistics such as the number of pairs of consecutive quadratic residues; and with the aid of deeper tools such as the Weil sum estimates, one can count more complex patterns in these residues also (e.g. -point correlations).
One is often interested in converting this sort of “global” information on long intervals into “local” information on short intervals. If one is interested in the behaviour on a generic or average short interval, then the question is still essentially a global one, basically because one can view a long interval as an average of a long sequence of short intervals. (This does not mean that the problem is automatically easy, because not every global statistic about, say, the primes is understood. For instance, we do not know how to rigorously establish the conjectured asymptotic for the number of twin primes in a long interval , and so we do not fully understand the local distribution of the primes in a typical short interval .)
However, suppose that instead of understanding the average-case behaviour of short intervals, one wants to control the worst-case behaviour of such intervals (i.e. to establish bounds that hold for all short intervals, rather than most short intervals). Then it becomes substantially harder to convert global information to local information. In many cases one encounters a “square root barrier”, in which global information at scale (e.g. statistics on ) cannot be used to say anything non-trivial about a fixed (and possibly worst-case) short interval at scales or below. (Here we ignore factors of for simplicity.) The basic reason for this is that even randomly distributed sets in (which are basically the most uniform type of global distribution one could hope for) exhibit random fluctuations of size or so in their global statistics (as can be seen for instance from the central limit theorem). Because of this, one could take a random (or pseudorandom) subset of and delete all the elements in a short interval of length , without anything suspicious showing up on the global statistics level; the edited set still has essentially the same global statistics as the original set. On the other hand, the worst-case behaviour of this set on a short interval has been drastically altered.
One stark example of this arises when trying to control the largest gap between consecutive prime numbers in a large interval . There are convincing heuristics that suggest that this largest gap is of size (Cramér’s conjecture). But even assuming the Riemann hypothesis, the best upper bound on this gap is only of size , basically because of this square root barrier. This particular instance of the square root barrier is a significant obstruction to the current polymath project “Finding primes“.
On the other hand, in some cases one can use additional tricks to get past the square root barrier. The key point is that many number-theoretic sequences have special structure that distinguish them from being exactly like random sets. For instance, quadratic residues have the basic but fundamental property that the product of two quadratic residues is again a quadratic residue. One way to use this sort of structure to amplify bad behaviour in a single short interval into bad behaviour across many short intervals. Because of this amplification, one can sometimes get new worst-case bounds by tapping the average-case bounds.
In this post I would like to indicate a classical example of this type of amplification trick, namely Burgess’s bound on short character sums. To narrow the discussion, I would like to focus primarily on the following classical problem:
Problem 1 What are the best bounds one can place on the first quadratic non-residue in the interval for a large prime ?
(The first quadratic residue is, of course, ; the more interesting problem is the first quadratic non-residue.)
Probabilistic heuristics (presuming that each non-square integer has a 50-50 chance of being a quadratic residue) suggests that should have size , and indeed Vinogradov conjectured that for any . Using the Pólya-Vinogradov inequality, one can get the bound (and can improve it to using smoothed sums); combining this with a sieve theory argument (exploiting the multiplicative nature of quadratic residues) one can boost this to . Inserting Burgess’s amplification trick one can boost this to for any . Apart from refinements to the factor, this bound has stood for five decades as the “world record” for this problem, which is a testament to the difficulty in breaching the square root barrier.
Note: in order not to obscure the presentation with technical details, I will be using asymptotic notation in a somewhat informal manner.
— 1. Character sums —
To approach the problem, we begin by fixing the large prime and introducing the Legendre symbol , defined to equal when is divisible by , when is an invertible quadratic residue modulo , and when is an invertible quadratic non-residue modulo . Thus, for instance, for all . One of the main reasons one wants to work with the function is that it enjoys two easily verified properties:
- is periodic with period .
- One has the total multiplicativity property for all integers .
In the jargon of number theory, is a Dirichlet character with conductor . Another important property of this character is of course the law of quadratic reciprocity, but this law is more important for the average-case behaviour in , whereas we are concerned here with the worst-case behaviour in , and so we will not actually use this law here.
for some , this forces the existence of a quadratic nonresidue less than or equal to , thus . So one approach to the problem is to bound the character sum (1).
As there are just as many residues as non-residues, the sum (1) is periodic with period and we obtain a trivial bound of for the magnitude of the sum. One can achieve a non-trivial bound by Fourier analysis. One can expand
where are the Fourier coefficients of :
where is the distance from to the nearest integer ( or ); inserting these bounds into (1) and summing what is essentially a harmonic series in we obtain
Now, how big is ? Taking absolute values, we get a bound of , but this gives us something worse than the trivial bound. To do better, we use the Plancherel identity
which tells us that
This tells us that is small on the average, but does not immediately tell us anything new about the worst-case behaviour of , which is what we need here. But now we use the multiplicative structure of to relate average-case and worst-case behaviour. Note that if is coprime to , then is a scalar multiple of by a quantity of magnitude ; taking Fourier transforms, this implies that and also differ by this factor. In particular, . As was arbitrary, we thus see that is constant for all coprime to ; in other words, the worst case is the same as the average case. Combining this with the Plancherel bound one obtains , leading to the Pólya-Vinogradov inequality
(In fact, a more careful computation reveals the slightly sharper bound ; this is non-trivial for .)
Remark 1 Up to logarithmic factors, this is consistent with what one would expect if fluctuated like a random sign pattern (at least for comparable to ; for smaller values of , one expects instead a bound of the form , up to logarithmic factors). It is conjectured that the factor can be replaced with a factor, which would be consistent with the random fluctuation model and is best possible; this is known for GRH, but unconditionally the Pólya-Vinogradov inequality is still the best known. (See however this paper of Granville and Soundararajan for an improvement for non-quadratic characters .)
A direct application of the Pólya-Vinogradov inequality gives the bound . One can get rid of the logarithmic factor (which comes from the harmonic series arising from (3)) by replacing the sharp cutoff by a smoother sum, which has a better behaved Fourier transform. But one can do better still by exploiting the multiplicativity of again, by the following trick of Vinogradov. Observe that not only does one have for all , but also for any which is -smooth, i.e. is the product of integers less than . So even if is significantly less than , one can show that the sum (1) is large if the majority of integers less than are -smooth.
Since every integer less than is either -smooth (in which case ), or divisible by a prime between and (in which case is at least ), we obtain the lower bound
Clearly, and . The total number of primes less than is by the prime number theorem, thus
Using the classical asymptotic for some absolute constant (which basically follows from the prime number theorem, but also has an elementary proof), we conclude that
If for some fixed , then the expression in brackets is bounded away from zero for large; in particular, this is incompatible with (2) for large enough. As a consequence, we see that if we have a bound of the form (2), then we can conclude for all ; in particular, from the Pólya-Vinogradov inequality one has
for all , or equivalently that . (By being a bit more careful, one can refine this to .)
Remark 2 The estimates on the Gauss-type sums are sharp; nevertheless, they fail to penetrate the square root barrier in the sense that no non-trivial estimates are provided below the scale . One can also see this barrier using the Poisson summation formula, which basically gives a formula that (very roughly) takes the form
for any , and is basically the limit of what one can say about character sums using Fourier analysis alone. In particular, we see that the Pólya-Vinogradov bound is basically the Poisson dual of the trivial bound. The scale is the crossing point where Poisson summation does not achieve any non-trivial modification of the scale parameter.
— 2. Average-case bounds —
where is a parameter. The analogue of (1) for such intervals would be
For very small (e.g. for some small ), we do not know how to establish (5) for all ; but we can at least establish (5) for almost all , with only about exceptions (here we see the square root barrier again!).
for any positive even integer . If is not too tiny, say for some , then by applying (6) for a sufficiently large and using Chebyshev’s inequality (or Markov’s inequality), we see (for any given ) that one has the non-trivial bound
for all but at most values of .
We can write as . Writing , and using the periodicity of , we can rewrite the left-hand side as
where we have abused notation and identified the finite field with .
which is consistent with (and is in fact slightly stronger than) what one would get if was a random sign pattern; assuming this bound gives (6) for as required.
The bound (7) can be established by quite elementary means (as it comes down to counting points on the hyperbola , which can be done by transforming the hyperbola to be rectangular), but for larger values of we will need the more general estimate
whenever is a polynomial over of degree which is not a constant multiple of a perfect square; this can be easily seen to give (6) for general .
contains points. This fact follows from a general theorem of Weil establishing the Riemann hypothesis for curves over function fields, but can also be deduced by a more elementary argument of Stepanov, using the polynomial method, which we now give here. (This arrangement of the argument is based on the exposition in the book by Iwaniec and Kowalski.)
By translating the variable we may assume that is non-zero. The key lemma is the following. Assume large, and take to be an integer comparable to (other values of this parameter are possible, but this is the optimal choice). All polynomials are understood to be over the field (i.e. they lie in the polynomial ring ), although indeterminate variables need not lie in this field.
Lemma 2 There exists a non-zero polynomial of one indeterminate variable over of degree at most which vanishes to order at least at every point for which is a quadratic residue.
Note from the factor theorem that can vanish to order at least at at most points, and so we see that is an invertible quadratic residue for at most values of . Multiplying by a quadratic non-residue and running the same argument, we also see that is an invertible quadratic non-residue for at most values of , and (8) (or the asymptotic for the number of points in (9)) follows.
We now prove the lemma. The polynomial will be chosen to be of the form
where are polynomials of degree at most in , and degree at most in , where is a large constant (depending on ) to be chosen later (these parameters have been optimised for the argument that follows). Since has degree at most , such a will have degree
as required. We claim (for suitable choices of ) that
- (a) The degrees are small enough that is a non-zero polynomial whenever are non-zero polynomials; and
- (b) The degrees are large enough that there exists a non-trivial choice of and that vanishes to order at least whenever is such that is a quadratic residue.
Claims (a) and (b) together establish the lemma.
We first verify (a). We can cancel off the initial factor, so that we need to show that does not vanish when at least one of is not vanishing. We may assume that are not both divisible by , since we could cancel out a common factor of otherwise.
Suppose for contradiction that the polynomial vanished, which implies that modulo . Squaring and multiplying by , we see that
But over and modulo , by Fermat’s little theorem. Observe that and both have degree at most , and so we can remove the modulus and conclude that over . But this implies (by the fundamental theorem of arithmetic for ) that is a constant multiple of a square, a contradiction. (Recall that is non-zero, and that and are not both zero.)
for all . (Of course, we cannot define derivatives using limits and Newton quotients in this finite characteristic setting, but we can still define derivatives of polynomials formally, thus for instance , and enjoy all the usual rules of calculus, such as the product rule and chain rule.)
Over , the polynomial has derivative zero. If we then compute the derivative in (10) using many applications of the product and chain rule, we see that the left-hand side of (10) can be expressed in the form
where are polynomials of degree at most in and at most in , whose coefficients depend in some linear fashion on the coefficients of and . (The exact nature of this linear relationship will depend on , but this will not concern us.) Since we only need to evaluate this expression when and (by Fermat’s little theorem), we thus see that we can verify (10) provided that the polynomial
vanishes identically. This is a polynomial of degree at most
and there are possible values of , so this leads to
linear constraints on the coefficients of and to satisfy. On the other hand, the total number of these coefficients is
For large enough, there are more coefficients than constraints, and so one can find a non-trivial choice of coefficients obeying the constraints (10), and (b) follows.
Remark 3 If one optimises all the constants here, one gets an upper bound of basically for the deviation in the number of points in (9). This is only a little worse than the sharp bound of given from Weil’s theorem, where is the genus; however, it is possible to boost the former bound to the latter by using a version of the tensor power trick (generalising to and then letting ) combined with the theory of Artin L-functions and the Riemann-Roch theorem. This is (very briefly!) sketched in the Tricki article on the tensor power trick.
Remark 4 Once again, the global estimate (8) is very sharp, but cannot penetrate below the square root barrier, in that one is allowed to have about exceptional values of for which no cancellation exists. One expects that these exceptional values of in fact do not exist, but we do not know how to do this unless is larger than (so that the Burgess bounds apply).
— 3. The Burgess bound —
The average case bounds in the previous section give an alternate demonstration of a non-trivial estimate (1) for , which is just a bit weaker than what the Pólya-Vinogradov inequality gives. Indeed, if (1) failed for such an , thus
then by taking a small (e.g. ) and covering by intervals of length , we see (from a first moment method argument) that
for a positive fraction of the in . But this contradicts the results of the previous section.
Burgess observed that by exploiting the multiplicativity of one last time to amplify the above argument, one can extend the range for which (1) can be proved from to also cover the range . The idea is not to cover by intervals of length , but rather by arithmetic progressions of length , where and . Another application of the first moment method then shows that
for a positive fraction of the in and in (i.e. such pairs ).
For technical reasons, it will be inconvenient if and have too large of a common factor, so we pause to remove this possibility. Observe that for any , the number of pairs which have as a common factor is . As is convergent, we may thus remove those pairs which have too large of a common factor, and assume that all pairs have common factor at most (so are “almost coprime”).
for pairs .
Now for a key observation: the values of arising from the pairs are mostly disjoint. Indeed, suppose that two pairs generated the same value of , thus . This implies that . Since , we see that do not exceed , so we may remove the modulus and conclude that . But since we are assuming that and are almost coprime, we see that for each there are at most values of for which . So the ‘s here only overlap with multiplicity , and we conclude that (11) holds for values of . But comparing this with the previous section, we obtain a contradiction unless . Setting to be a sufficiently small power of , we obtain Burgess’s result that (1) holds for for any fixed .
Combining Burgess’s estimate with Vinogradov’s sieving trick we conclude the bound for all , which is the best bound known today on the least quadratic non-residue except for refinements of the error term.
Remark 5 There are many generalisations of this result, for instance to more general characters (with possibly composite conductor), or to shifted sums (4). However, the type exponent has not been improved except with the assistance of powerful conjectures such as the generalised Riemann hypothesis.