The least quadratic nonresidue, and the square root barrier

18 August, 2009 in expository, math.NT | Tags: Burgess inequality, least quadratic nonresidue, Polya-Vinogradov inequality, square root barrier | by Terence Tao

A large portion of analytic number theory is concerned with the distribution of number-theoretic sets such as the primes, or quadratic residues in a certain modulus. At a local level (e.g. on a short interval ${[x,x+y]}$ ), the behaviour of these sets may be quite irregular. However, in many cases one can understand the global behaviour of such sets on very large intervals, (e.g. ${[1,x]}$ ), with reasonable accuracy (particularly if one assumes powerful additional conjectures, such as the Riemann hypothesis and its generalisations). For instance, in the case of the primes, we have the prime number theorem, which asserts that the number of primes in a large interval ${[1,x]}$ is asymptotically equal to ${x/\log x}$ ; in the case of quadratic residues modulo a prime ${p}$ , it is clear that there are exactly ${(p-1)/2}$ such residues in ${[1,p]}$ . With elementary arguments, one can also count statistics such as the number of pairs of consecutive quadratic residues; and with the aid of deeper tools such as the Weil sum estimates, one can count more complex patterns in these residues also (e.g. ${k}$ -point correlations).

One is often interested in converting this sort of “global” information on long intervals into “local” information on short intervals. If one is interested in the behaviour on a generic or average short interval, then the question is still essentially a global one, basically because one can view a long interval as an average of a long sequence of short intervals. (This does not mean that the problem is automatically easy, because not every global statistic about, say, the primes is understood. For instance, we do not know how to rigorously establish the conjectured asymptotic for the number of twin primes ${n,n+2}$ in a long interval ${[1,N]}$ , and so we do not fully understand the local distribution of the primes in a typical short interval ${[n,n+2]}$ .)

However, suppose that instead of understanding the average-case behaviour of short intervals, one wants to control the worst-case behaviour of such intervals (i.e. to establish bounds that hold for all short intervals, rather than most short intervals). Then it becomes substantially harder to convert global information to local information. In many cases one encounters a “square root barrier”, in which global information at scale ${x}$ (e.g. statistics on ${[1,x]}$ ) cannot be used to say anything non-trivial about a fixed (and possibly worst-case) short interval at scales ${x^{1/2}}$ or below. (Here we ignore factors of ${\log x}$ for simplicity.) The basic reason for this is that even randomly distributed sets in ${[1,x]}$ (which are basically the most uniform type of global distribution one could hope for) exhibit random fluctuations of size ${x^{1/2}}$ or so in their global statistics (as can be seen for instance from the central limit theorem). Because of this, one could take a random (or pseudorandom) subset of ${[1,x]}$ and delete all the elements in a short interval of length ${o(x^{1/2})}$ , without anything suspicious showing up on the global statistics level; the edited set still has essentially the same global statistics as the original set. On the other hand, the worst-case behaviour of this set on a short interval has been drastically altered.

One stark example of this arises when trying to control the largest gap between consecutive prime numbers in a large interval ${[x,2x]}$ . There are convincing heuristics that suggest that this largest gap is of size ${O( \log^2 x )}$ (Cramér’s conjecture). But even assuming the Riemann hypothesis, the best upper bound on this gap is only of size ${O( x^{1/2} \log x )}$ , basically because of this square root barrier. This particular instance of the square root barrier is a significant obstruction to the current polymath project “Finding primes“.

On the other hand, in some cases one can use additional tricks to get past the square root barrier. The key point is that many number-theoretic sequences have special structure that distinguish them from being exactly like random sets. For instance, quadratic residues have the basic but fundamental property that the product of two quadratic residues is again a quadratic residue. One way to use this sort of structure to amplify bad behaviour in a single short interval into bad behaviour across many short intervals. Because of this amplification, one can sometimes get new worst-case bounds by tapping the average-case bounds.

In this post I would like to indicate a classical example of this type of amplification trick, namely Burgess’s bound on short character sums. To narrow the discussion, I would like to focus primarily on the following classical problem:

Problem 1 What are the best bounds one can place on the first quadratic non-residue ${n_p}$ in the interval ${[1,p-1]}$ for a large prime ${p}$ ?

(The first quadratic residue is, of course, ${1}$ ; the more interesting problem is the first quadratic non-residue.)

Probabilistic heuristics (presuming that each non-square integer has a 50-50 chance of being a quadratic residue) suggests that ${n_p}$ should have size ${O( \log p )}$ , and indeed Vinogradov conjectured that ${n_p = O_\varepsilon(p^\varepsilon)}$ for any ${\varepsilon > 0}$ . Using the Pólya-Vinogradov inequality, one can get the bound ${n_p = O( \sqrt{p} \log p )}$ (and can improve it to ${n_p = O(\sqrt{p})}$ using smoothed sums); combining this with a sieve theory argument (exploiting the multiplicative nature of quadratic residues) one can boost this to ${n_p = O( p^{\frac{1}{2\sqrt{e}}} \log^2 p )}$ . Inserting Burgess’s amplification trick one can boost this to ${n_p = O_\varepsilon( p^{\frac{1}{4\sqrt{e}}+\varepsilon} )}$ for any ${\varepsilon > 0}$ . Apart from refinements to the ${\varepsilon}$ factor, this bound has stood for five decades as the “world record” for this problem, which is a testament to the difficulty in breaching the square root barrier.

Note: in order not to obscure the presentation with technical details, I will be using asymptotic notation ${O()}$ in a somewhat informal manner.

— 1. Character sums —

To approach the problem, we begin by fixing the large prime ${p}$ and introducing the Legendre symbol ${\chi(n) = \left(\tfrac{n}{p}\right)}$ , defined to equal ${0}$ when ${n}$ is divisible by ${p}$ , ${+1}$ when ${n}$ is an invertible quadratic residue modulo ${p}$ , and ${-1}$ when ${n}$ is an invertible quadratic non-residue modulo ${p}$ . Thus, for instance, ${\chi(n) = +1}$ for all ${1 \leq n < n_p}$ . One of the main reasons one wants to work with the function ${\chi}$ is that it enjoys two easily verified properties:

${\chi}$ is periodic with period ${p}$ .
One has the total multiplicativity property ${\chi(nm) = \chi(n) \chi(m)}$ for all integers ${n,m}$ .

In the jargon of number theory, ${\chi}$ is a Dirichlet character with conductor ${p}$ . Another important property of this character is of course the law of quadratic reciprocity, but this law is more important for the average-case behaviour in ${p}$ , whereas we are concerned here with the worst-case behaviour in ${p}$ , and so we will not actually use this law here.

An obvious way to control ${n_p}$ is via the character sum

$\displaystyle \sum_{1 \leq n \leq x} \chi(n). \ \ \ \ \ (1)$

From the triangle inequality, we see that this sum has magnitude at most ${x}$ . If we can then obtain a non-trivial bound of the form

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) = o(x) \ \ \ \ \ (2)$

for some ${x}$ , this forces the existence of a quadratic nonresidue less than or equal to ${x}$ , thus ${n_p \leq x}$ . So one approach to the problem is to bound the character sum (1).

As there are just as many residues as non-residues, the sum (1) is periodic with period ${p}$ and we obtain a trivial bound of ${p}$ for the magnitude of the sum. One can achieve a non-trivial bound by Fourier analysis. One can expand

$\displaystyle \chi(n) = \sum_{a=0}^{p-1} \hat \chi(a) e^{2\pi i a n / p }$

where ${\hat \chi(a)}$ are the Fourier coefficients of ${\chi}$ :

$\displaystyle \hat \chi(a) := \frac{1}{p} \sum_{n=0}^{p-1} \chi(n) e^{-2\pi i an/p}.$

As there are just as many quadratic residues as non-residues, ${\hat \chi(0)=0}$ , so we may drop the ${a=0}$ term. From summing the geometric series we see that

$\displaystyle \sum_{1 \leq n \leq x} e^{2\pi i an/p} = O( 1 / \| a/p \| ), \ \ \ \ \ (3)$

where ${\|a/p\|}$ is the distance from ${a/p}$ to the nearest integer ( ${0}$ or ${1}$ ); inserting these bounds into (1) and summing what is essentially a harmonic series in ${a}$ we obtain

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) = O( p \log p \sup_{a \neq 0} |\hat \chi(a)| ).$

Now, how big is ${\hat \chi(a)}$ ? Taking absolute values, we get a bound of ${1}$ , but this gives us something worse than the trivial bound. To do better, we use the Plancherel identity

$\displaystyle \sum_{a=0}^{p-1} |\hat \chi(a)|^2 = \frac{1}{p} \sum_{n=0}^{p-1} |\chi(n)|^2$

which tells us that

$\displaystyle \sum_{a=0}^{p-1} |\hat \chi(a)|^2 = O(1).$

This tells us that ${\hat \chi}$ is small on the average, but does not immediately tell us anything new about the worst-case behaviour of ${\chi}$ , which is what we need here. But now we use the multiplicative structure of ${\chi}$ to relate average-case and worst-case behaviour. Note that if ${b}$ is coprime to ${p}$ , then ${\chi(bn)}$ is a scalar multiple of ${\chi(n)}$ by a quantity ${\chi(b)}$ of magnitude ${1}$ ; taking Fourier transforms, this implies that ${\hat \chi(a/b)}$ and ${\hat \chi(a)}$ also differ by this factor. In particular, ${|\hat \chi(a/b)|=|\hat \chi(a)|}$ . As ${b}$ was arbitrary, we thus see that ${|\hat \chi(a)|}$ is constant for all ${a}$ coprime to ${p}$ ; in other words, the worst case is the same as the average case. Combining this with the Plancherel bound one obtains ${|\hat \chi(a)| = O(1/\sqrt{p})}$ , leading to the Pólya-Vinogradov inequality

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) = O( \sqrt{p} \log p ).$

(In fact, a more careful computation reveals the slightly sharper bound ${|\sum_{1 \leq n \leq x} \chi(n)| \leq \sqrt{p} \log p}$ ; this is non-trivial for ${x > \sqrt{p} \log p}$ .)

Remark 1 Up to logarithmic factors, this is consistent with what one would expect if ${\chi}$ fluctuated like a random sign pattern (at least for ${x}$ comparable to ${p}$ ; for smaller values of ${x}$ , one expects instead a bound of the form ${O(\sqrt{x})}$ , up to logarithmic factors). It is conjectured that the ${\log p}$ factor can be replaced with a ${O(\log \log p)}$ factor, which would be consistent with the random fluctuation model and is best possible; this is known for GRH, but unconditionally the Pólya-Vinogradov inequality is still the best known. (See however this paper of Granville and Soundararajan for an improvement for non-quadratic characters ${\chi}$ .)

A direct application of the Pólya-Vinogradov inequality gives the bound ${n_p \leq \sqrt{p} \log p}$ . One can get rid of the logarithmic factor (which comes from the harmonic series arising from (3)) by replacing the sharp cutoff ${1_{1 \leq n \leq x}}$ by a smoother sum, which has a better behaved Fourier transform. But one can do better still by exploiting the multiplicativity of ${\chi}$ again, by the following trick of Vinogradov. Observe that not only does one have ${\chi(n)=+1}$ for all ${n \leq n_p}$ , but also ${\chi(n) = +1}$ for any ${n}$ which is ${n_p-1}$ -smooth, i.e. is the product of integers less than ${n_p}$ . So even if ${n_p}$ is significantly less than ${x}$ , one can show that the sum (1) is large if the majority of integers less than ${x}$ are ${n_p-1}$ -smooth.

Since every integer ${n}$ less than ${x}$ is either ${n_p}$ -smooth (in which case ${\chi(n)=+1}$ ), or divisible by a prime ${q}$ between ${n_p}$ and ${x}$ (in which case ${\chi(n)}$ is at least ${-1}$ ), we obtain the lower bound

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) \geq \sum_{1 \leq n \leq x} 1 - \sum_{n_p < q \leq x} \sum_{1 \leq n \leq x: q|n} 2.$

Clearly, ${\sum_{1 \leq n \leq x} 1 = x+O(1)}$ and ${\sum_{1 \leq n \leq x: q|n} 2 = 2\frac{x}{q} + O(1)}$ . The total number of primes less than ${x}$ is ${O(\frac{x}{\log x}) = o(x)}$ by the prime number theorem, thus

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) \geq x - \sum_{n_p < q \leq x} 2\frac{x}{q} + o(x).$

Using the classical asymptotic ${\sum_{q \leq y} \frac{1}{q} = \log \log y + C + o(1)}$ for some absolute constant ${C}$ (which basically follows from the prime number theorem, but also has an elementary proof), we conclude that

$\displaystyle \sum_{1 \leq n \leq x} \chi(n) \geq x [1 - 2 \log \frac{\log x}{\log n_p} + o(1)].$

If ${n_p \geq x^{\frac{1}{\sqrt{e}}+\varepsilon}}$ for some fixed ${\varepsilon > 0}$ , then the expression in brackets is bounded away from zero for ${x}$ large; in particular, this is incompatible with (2) for ${x}$ large enough. As a consequence, we see that if we have a bound of the form (2), then we can conclude ${n_p = O_\varepsilon( x^{\frac{1}{\sqrt{e}} + \varepsilon} )}$ for all ${\varepsilon > 0}$ ; in particular, from the Pólya-Vinogradov inequality one has

$\displaystyle n_p = O_\varepsilon( p^{\frac{1}{2\sqrt{e}} + \varepsilon} )$

for all ${\varepsilon > 0}$ , or equivalently that ${n_p \leq p^{\frac{1}{2\sqrt{e}} + o(1)}}$ . (By being a bit more careful, one can refine this to ${n_p = O( p^{\frac{1}{2\sqrt{e}}} \log^{2/\sqrt{e}} p )}$ .)

Remark 2 The estimates on the Gauss-type sums ${ \hat \chi(a) := \frac{1}{p} \sum_{n=0}^{p-1} \chi(n) e^{-2\pi i an/p}}$ are sharp; nevertheless, they fail to penetrate the square root barrier in the sense that no non-trivial estimates are provided below the scale ${\sqrt{p}}$ . One can also see this barrier using the Poisson summation formula, which basically gives a formula that (very roughly) takes the form

$\displaystyle \sum_{n = O(x)} \chi(n) \sim \frac{x}{\sqrt{p}} \sum_{n=O(p/x)} \chi(n)$

for any ${1 < x < p}$ , and is basically the limit of what one can say about character sums using Fourier analysis alone. In particular, we see that the Pólya-Vinogradov bound is basically the Poisson dual of the trivial bound. The scale ${x=\sqrt{p}}$ is the crossing point where Poisson summation does not achieve any non-trivial modification of the scale parameter.

— 2. Average-case bounds —

The Pólya-Vinogradov bound establishes a non-trivial estimate (1) for ${x}$ significantly larger than ${\sqrt{p} \log p}$ . We are interested in extending (1) to shorter intervals.

Before we address this issue for a fixed interval ${[1,x]}$ , we first study the average-case bound on short character sums. Fix a short length ${y}$ , and consider the shifted sum

$\displaystyle \sum_{a \leq n \leq a+y} \chi(n), \ \ \ \ \ (4)$

where ${a}$ is a parameter. The analogue of (1) for such intervals would be

$\displaystyle \sum_{a \leq n \leq a+y} \chi(n) = o(y). \ \ \ \ \ (5)$

For ${y}$ very small (e.g. ${y=p^\varepsilon}$ for some small ${\epsilon > 0}$ ), we do not know how to establish (5) for all ${a}$ ; but we can at least establish (5) for almost all ${a}$ , with only about ${O(\sqrt{p})}$ exceptions (here we see the square root barrier again!).

More precisely, we will establish the moment estimates

$\displaystyle \frac{1}{p} \sum_{a=0}^{p-1} |\sum_{a \leq n \leq a+y} \chi(n)|^k = O_k( y^{k/2} + y^k p^{-1/2} ) \ \ \ \ \ (6)$

for any positive even integer ${k=2,4,\ldots}$ . If ${y}$ is not too tiny, say ${y \geq p^{\varepsilon}}$ for some ${\varepsilon > 0}$ , then by applying (6) for a sufficiently large ${k}$ and using Chebyshev’s inequality (or Markov’s inequality), we see (for any given ${\delta > 0}$ ) that one has the non-trivial bound

$\displaystyle |\sum_{a \leq n \leq a+y} \chi(n)| \leq \delta y$

for all but at most ${O_{\delta,\epsilon}(\sqrt{p})}$ values of ${a \in [1,p]}$ .

To see why (6) is true, let us just consider the easiest case ${k=2}$ . Squaring both sides, we expand (6) as

$\displaystyle \frac{1}{p} \sum_{a=0}^{p-1} \sum_{a \leq n,m \leq a+y} \chi(n) \chi(m) = O(y) + O(y^2 p^{-1/2} ).$

We can write ${\chi(n) \chi(m)}$ as ${\chi(n m)}$ . Writing ${m=n+h}$ , and using the periodicity of ${\chi}$ , we can rewrite the left-hand side as

$\displaystyle \sum_{h=-y}^y (y-|h|) [\frac{1}{p} \sum_{n \in F_p} \chi(n(n+h))]$

where we have abused notation and identified the finite field ${F_p}$ with ${\{0,1,\ldots,p-1\}}$ .

For ${h=0}$ , the inner average is ${O(1)}$ . For ${h}$ non-zero, we claim the bound

$\displaystyle \sum_{n \in F_p} \chi(n(n+h))= O( \sqrt{p} ) \ \ \ \ \ (7)$

which is consistent with (and is in fact slightly stronger than) what one would get if ${\chi}$ was a random sign pattern; assuming this bound gives (6) for ${k=2}$ as required.

The bound (7) can be established by quite elementary means (as it comes down to counting points on the hyperbola ${y^2=x(x+h)}$ , which can be done by transforming the hyperbola to be rectangular), but for larger values of ${k}$ we will need the more general estimate

$\displaystyle \sum_{n \in F_p} \chi(P(n))= O_k( \sqrt{p} ) \ \ \ \ \ (8)$

whenever ${P}$ is a polynomial over ${F}$ of degree ${k}$ which is not a constant multiple of a perfect square; this can be easily seen to give (6) for general ${k}$ .

An equivalent form of (8) is that the hyperelliptic curve

$\displaystyle \{ (x,y) \in F_p \times F_p: y^2 = P(x) \} \ \ \ \ \ (9)$

contains ${p + O_k(\sqrt{p})}$ points. This fact follows from a general theorem of Weil establishing the Riemann hypothesis for curves over function fields, but can also be deduced by a more elementary argument of Stepanov, using the polynomial method, which we now give here. (This arrangement of the argument is based on the exposition in the book by Iwaniec and Kowalski.)

By translating the ${x}$ variable we may assume that ${P(0)}$ is non-zero. The key lemma is the following. Assume ${p}$ large, and take ${l}$ to be an integer comparable to ${\sqrt{p}}$ (other values of this parameter are possible, but this is the optimal choice). All polynomials ${Q(x)}$ are understood to be over the field ${F_p}$ (i.e. they lie in the polynomial ring ${F_p[X]}$ ), although indeterminate variables ${x}$ need not lie in this field.

Lemma 2 There exists a non-zero polynomial ${Q(x)}$ of one indeterminate variable ${x}$ over ${F_p}$ of degree at most ${lp/2 + O_k(p)}$ which vanishes to order at least ${l}$ at every point ${x \in F_p}$ for which ${P(x)}$ is a quadratic residue.

Note from the factor theorem that ${Q}$ can vanish to order at least ${l}$ at at most ${\hbox{deg}(Q)/l \leq p/2 + O_k(\sqrt{p})}$ points, and so we see that ${P(x)}$ is an invertible quadratic residue for at most ${p/2 + O_k(\sqrt{p})}$ values of ${F_p}$ . Multiplying ${P}$ by a quadratic non-residue and running the same argument, we also see that ${P(x)}$ is an invertible quadratic non-residue for at most ${p/2 + O_k(\sqrt{p})}$ values of ${F_p}$ , and (8) (or the asymptotic for the number of points in (9)) follows.

We now prove the lemma. The polynomial ${Q}$ will be chosen to be of the form

$\displaystyle Q(x) = P^l(x) ( R(x,x^p) + P^{\frac{p-1}{2}}(x) S(x,x^p) )$

where ${R(x,z), S(x,z)}$ are polynomials of degree at most ${\frac{p-k-1}{2}}$ in ${x}$ , and degree at most ${\frac{l}{2} + C}$ in ${z}$ , where ${C}$ is a large constant (depending on ${k}$ ) to be chosen later (these parameters have been optimised for the argument that follows). Since ${P}$ has degree at most ${k}$ , such a ${Q}$ will have degree

$\displaystyle \leq kl + \frac{p-k-1}{2} + \frac{p-1}{2} k + p (\frac{l}{2}+C') = \frac{lp}{2} + O_k(p)$

as required. We claim (for suitable choices of ${C,C'}$ ) that

(a) The degrees are small enough that ${Q(x)}$ is a non-zero polynomial whenever ${R(x,z), S(x,z)}$ are non-zero polynomials; and
(b) The degrees are large enough that there exists a non-trivial choice of ${R(x,z)}$ and ${S(x,z)}$ that ${Q(x)}$ vanishes to order at least ${l}$ whenever ${x \in F_p}$ is such that ${P(x)}$ is a quadratic residue.

Claims (a) and (b) together establish the lemma.

We first verify (a). We can cancel off the initial ${P^l}$ factor, so that we need to show that ${R(x,x^p) + P^{\frac{p-1}{2}}(x) S(x,x^p)}$ does not vanish when at least one of ${R(x,z), Q(x,z)}$ is not vanishing. We may assume that ${R, Q}$ are not both divisible by ${z}$ , since we could cancel out a common factor of ${x^p}$ otherwise.

Suppose for contradiction that the polynomial ${R(x,x^p) + P^{\frac{p-1}{2}} S(x,x^p)}$ vanished, which implies that ${R(x,0) = - P^{\frac{p-1}{2}}(x) S(x,0)}$ modulo ${x^p}$ . Squaring and multiplying by ${P}$ , we see that

$\displaystyle R(x,0)^2 P(x) = P(x)^p S(x,0)^2 \hbox{ mod } x^p.$

But over ${F_p}$ and modulo ${x^p}$ , ${P(x)^p = P(0)}$ by Fermat’s little theorem. Observe that ${R(x,0)^2 P(x)}$ and ${P(0) S(x,0)^2}$ both have degree at most ${p-1}$ , and so we can remove the ${x^p}$ modulus and conclude that ${R(x,0)^2 P(x) = P(0) S(x,0)^2}$ over ${F_p}$ . But this implies (by the fundamental theorem of arithmetic for ${F_p[X]}$ ) that ${P}$ is a constant multiple of a square, a contradiction. (Recall that ${P(0)}$ is non-zero, and that ${R(x,0)}$ and ${S(x,0)}$ are not both zero.)

Now we prove (b). Let ${x \in F_p}$ be such that ${P(x)}$ is a quadratic residue, thus ${P(x)^{\frac{p-1}{2}} = +1}$ by Fermat’s little theorem. To get vanishing to order ${l}$ , we need

$\displaystyle \frac{d^j}{dx^j} [ P^l(x) ( R(x,x^p) + P^{\frac{p-1}{2}}(x) S(x,x^p) ) ] = 0 \ \ \ \ \ (10)$

for all ${0 \leq j < l}$ . (Of course, we cannot define derivatives using limits and Newton quotients in this finite characteristic setting, but we can still define derivatives of polynomials formally, thus for instance ${\frac{d}{dx} x^n := n x^{n-1}}$ , and enjoy all the usual rules of calculus, such as the product rule and chain rule.)

Over ${F_p}$ , the polynomial ${x^p}$ has derivative zero. If we then compute the derivative in (10) using many applications of the product and chain rule, we see that the left-hand side of (10) can be expressed in the form

$\displaystyle P^{l-j}(x) [ R_j(x,x^p) + P^{\frac{p-1}{2}}(x) S_j(x,x^p) ) ]$

where ${R_j(x,z), S_j(x,z)}$ are polynomials of degree at most ${\frac{p-k-1}{2} + O_k(j)}$ in ${x}$ and at most ${\frac{l}{2}+C}$ in ${z}$ , whose coefficients depend in some linear fashion on the coefficients of ${R}$ and ${S}$ . (The exact nature of this linear relationship will depend on ${k, p, P}$ , but this will not concern us.) Since we only need to evaluate this expression when ${P(x)^{\frac{p-1}{2}} = +1}$ and ${x^p=p}$ (by Fermat’s little theorem), we thus see that we can verify (10) provided that the polynomial

$\displaystyle P^{l-j}(x) [ R_j(x,x) + S_j(x,x) ) ]$

vanishes identically. This is a polynomial of degree at most

$\displaystyle O(l-j) + \frac{p-k-1}{2} + O_k(j) + \frac{l}{2}+C = \frac{p}{2} + O_k(p^{1/2}) + C,$

and there are ${l+1}$ possible values of ${j}$ , so this leads to

$\displaystyle \frac{lp}{2} + O_k(p) + O( C \sqrt{p} )$

linear constraints on the coefficients of ${R}$ and ${S}$ to satisfy. On the other hand, the total number of these coefficients is

$\displaystyle 2 \times (\frac{p-k-1}{2} + O(1)) \times (\frac{l}{2}+C + O(1)) = \frac{lp}{2} + Cp + O_k(p).$

For ${C}$ large enough, there are more coefficients than constraints, and so one can find a non-trivial choice of coefficients obeying the constraints (10), and (b) follows.

Remark 3 If one optimises all the constants here, one gets an upper bound of basically ${8k \sqrt{p}}$ for the deviation in the number of points in (9). This is only a little worse than the sharp bound of ${2g\sqrt{p}}$ given from Weil’s theorem, where ${g = \lfloor \frac{k-1}{2}\rfloor}$ is the genus; however, it is possible to boost the former bound to the latter by using a version of the tensor power trick (generalising ${F_p}$ to ${F_{p^m}}$ and then letting ${m \rightarrow \infty}$ ) combined with the theory of Artin L-functions and the Riemann-Roch theorem. This is (very briefly!) sketched in the Tricki article on the tensor power trick.

Remark 4 Once again, the global estimate (8) is very sharp, but cannot penetrate below the square root barrier, in that one is allowed to have about ${O(\sqrt{p})}$ exceptional values of ${a}$ for which no cancellation exists. One expects that these exceptional values of ${a}$ in fact do not exist, but we do not know how to do this unless ${y}$ is larger than ${x^{1/4}}$ (so that the Burgess bounds apply).

— 3. The Burgess bound —

The average case bounds in the previous section give an alternate demonstration of a non-trivial estimate (1) for ${x > p^{1/2+\varepsilon}}$ , which is just a bit weaker than what the Pólya-Vinogradov inequality gives. Indeed, if (1) failed for such an ${x}$ , thus

$\displaystyle |\sum_{n \in [1,x]} \chi(n)| \gg x,$

then by taking a small ${y}$ (e.g. ${y = p^{\varepsilon/2}}$ ) and covering ${[1,x]}$ by intervals of length ${y}$ , we see (from a first moment method argument) that

$\displaystyle |\sum_{a \leq n \leq a+y} \chi(n)| \gg y$

for a positive fraction of the ${a}$ in ${[1,x]}$ . But this contradicts the results of the previous section.

Burgess observed that by exploiting the multiplicativity of ${\chi}$ one last time to amplify the above argument, one can extend the range for which (1) can be proved from ${x > p^{1/2 + \varepsilon}}$ to also cover the range ${p^{1/4+\varepsilon} < x < p^{1/2}}$ . The idea is not to cover ${[1,x]}$ by intervals of length ${y}$ , but rather by arithmetic progressions ${\{ a, a+r, \ldots, a+yr\}}$ of length ${y}$ , where ${a = O(x)}$ and ${r = O(x/y)}$ . Another application of the first moment method then shows that

$\displaystyle |\sum_{0 \leq j \leq y} \chi(a+jr)| \gg y$

for a positive fraction of the ${a}$ in ${[1,x]}$ and ${r}$ in ${[1,x/y]}$ (i.e. ${\gg x^2/y}$ such pairs ${(a,r)}$ ).

For technical reasons, it will be inconvenient if ${a}$ and ${r}$ have too large of a common factor, so we pause to remove this possibility. Observe that for any ${d \geq 1}$ , the number of pairs ${(a,r)}$ which have ${d}$ as a common factor is ${O( \frac{1}{d^2} x^2/y )}$ . As ${\sum_{d=1}^\infty \frac{1}{d^2}}$ is convergent, we may thus remove those pairs which have too large of a common factor, and assume that all pairs ${(a,r)}$ have common factor ${O(1)}$ at most (so are “almost coprime”).

Now we exploit the multiplicativity of ${\chi}$ to write ${\chi(a+jr)}$ as ${\chi(r) \chi(b+j)}$ , where ${b}$ is a residue which is equal to ${a/r}$ mod ${q}$ . Dividing out by ${\chi(r)}$ , we conclude that

$\displaystyle |\sum_{0 \leq j \leq y} \chi(b+j)| \gg y \ \ \ \ \ (11)$

for ${\gg x^2/y}$ pairs ${(a,r)}$ .

Now for a key observation: the ${\gg x^2/y}$ values of ${b}$ arising from the pairs ${(a,r)}$ are mostly disjoint. Indeed, suppose that two pairs ${(a,r), (a',r')}$ generated the same value of ${b}$ , thus ${a/r = a'/r' \hbox{ mod } p}$ . This implies that ${ar' = a'r \hbox{ mod } p}$ . Since ${x < p^{1/2}}$ , we see that ${ar', a'r}$ do not exceed ${p}$ , so we may remove the modulus and conclude that ${ar'=a'r}$ . But since we are assuming that ${a, r}$ and ${a',r'}$ are almost coprime, we see that for each ${(a,r)}$ there are at most ${O(1)}$ values of ${a',r'}$ for which ${ar'=a'r}$ . So the ${b}$ ‘s here only overlap with multiplicity ${O(1)}$ , and we conclude that (11) holds for ${\gg x^2/y}$ values of ${b}$ . But comparing this with the previous section, we obtain a contradiction unless ${x^2/y \ll \sqrt{p}}$ . Setting ${y}$ to be a sufficiently small power of ${p}$ , we obtain Burgess’s result that (1) holds for ${x > p^{1/4+\varepsilon}}$ for any fixed ${\varepsilon > 0}$ .

Combining Burgess’s estimate with Vinogradov’s sieving trick we conclude the bound ${n_p = O_\varepsilon( p^{1/4\sqrt{e} + \varepsilon} )}$ for all ${\varepsilon > 0}$ , which is the best bound known today on the least quadratic non-residue except for refinements of the ${p^\varepsilon}$ error term.

Remark 5 There are many generalisations of this result, for instance to more general characters (with possibly composite conductor), or to shifted sums (4). However, the ${p^{1/4}}$ type exponent has not been improved except with the assistance of powerful conjectures such as the generalised Riemann hypothesis.

27 comments

Comments feed for this article

18 August, 2009 at 5:36 pm

Joshua Zelinsky

“…but also ${\chi(n) = +1}$ for any $n$ which is $n_p$-smooth, i.e. is the product of integers less than or equal to $n_p$.” Shouldn’t that be $n_p -1$ smooth? [Corrected, thanks – T.]

19 August, 2009 at 3:38 am

charav

Professor Tao,
how we define a number-theoretic set?

Thanks for your time,
Christian

19 August, 2009 at 11:46 am

Terence Tao

By “number-theoretic set”, I simply mean a set which is of interest to number-theorists.

19 August, 2009 at 10:44 am

Gil Kalai

Some look on amplification in various areas of science and in tcs can be found in these two posts of Dick Lipton http://rjlipton.wordpress.com/2009/08/10/the-role-of-amplifiers-in-science/
http://rjlipton.wordpress.com/2009/08/13/amplifying-on-the-pcr-amplifier/

20 August, 2009 at 2:04 pm

I think the bound in (3) should be $\sum_{1 \leq n \leq x} e^{2\pi i an/p} = O( \| a/p \| ^{-1})$ rather than $\sum_{1 \leq n \leq x} e^{2\pi i an/p} = O( p / \| a/p \| )$ . [Corrected, thanks. -T]

21 August, 2009 at 1:54 pm

Efthymios Sofos

There exists an elementary argument which shows the inequality
$n=n_p \leq \sqrt{p}+1$ . Defining an integer m by
$(m-1)n \leq p < m n$ , we deduce that $mn-p$ lies between 0 and n.
By the definition of n, m is not a quadratic residue, therefore $m\geq n$
which yields the stated inequality.

24 August, 2009 at 11:43 pm

In a different direction, one can look at this problem on average over the prime p, using the large sieve. Fix $\epsilon > 0$ . Linnik showed that the number of primes $p \sim P$ with $n_p > P^{\epsilon}$ is bounded.

7 September, 2009 at 9:31 am

Boris

That is a very nice explanation of Stepanov’s argument. The trick of getting the extra variables by using the Frobenius map x|->x^p is really cool. It is similar to Thue’s trick from the proof of his famous result on the Diophantine approximations to algebraic numbers, where instead of the exact equality x=x^p, that is used here, he used two very good approximations to the same algebraic number.

A request: can you please increase the size of the images you use for mathematical equations. Currently, a single letter is 7px by 7px, which is really hard on the eyes. Since the images are scalar, scaling these tiny letters up yields images that are even more straining on the eyes. Thank you.

29 September, 2009 at 7:56 am

other

Bourgain has recently found many ways around the square root barrier; see:
1. Bourgain-Glibichuk-Konyagin, Bourgain-Chang: character sums over multiplicative groups
2. Bourgain: 2-source extractors
3. Bourgain: affine extractors

These are all based on the sum-product theorem.

30 September, 2009 at 2:26 am

Seva

Dear Terry,

Could you indicate the exact form of the identity you mention in Remark 2?

As a more rhetorical question, I wonder whether it is possible to improve
Burgess’ result showing that, for some smoothening function $\psi$, we have
$\sum_{n=1}^{H} \chi(a+n)\psi(n/H)=o(H)$ for $H\gg p^c$ with some
constant $c<1/4$ . Using Vinogradov's trick, one would automatically
improve then the exponent $1/2\sqrt e$ in the least quadratic
non-residue estimate.

Several tiny corrections on this occasion. "Consecutive pairs" in the first
paragraph should be "pairs of consecutive"; to count such pairs one does not
need Weil or anything that deep, this goes by a very simple computation.
Similarly, one does not need Chebyshev's inequality (whatever it means in
this context) to derive conclusions from (6), this follows by "plain Markov".
For the proof of Lemma 2, part (a), an issue to address is the case where
$R(x,0)=0$ ; fortunately, there is an easy fix-up. For part (b), to get
vanishing of order $l$ it suffices for (10) to hold for all $j$
strictly less than $l$ .

30 September, 2009 at 4:06 pm

Terence Tao

Thanks for the corrections!

Regarding Remark 2, the Fourier transform of $\chi$ is (up to a constant phase) $1/\sqrt{p} \chi$ , hence by Parseval $\sum_{n \in Z/pZ} \chi(n) \psi(n)$ is a constant phase times $\sqrt{p} \sum_{n \in Z/pZ} \chi(n) \hat \psi(n)$ (I may have dropped a conjugation sign here, but it is not particularly relevant). If one chooses $\psi(n)$ to be a smooth function adapted to the interval $n = O(x)$ , then its Fourier transform will be mostly adapted to the interval $n = O(p/x)$ by the uncertainty principle, which leads to the heuristic relationship indicated in the text.

In harmonic analysis, Chebyshev’s inequality is often used to refer to the L^p version of Markov’s inequality (rather than just the L^2 version, which is the probability theorist’s version). I’ve now referenced both here.

Smoothing the sum can’t hurt, but in this particular case I don’t think it helps either. The sum from 1 to H in the Burgess argument ultimately just gets summed in absolute value, so smoothing this sum doesn’t really make it much smaller (there is no cancellation from oscillation to exploit, e.g. from Fourier analysis, which is the place where smoothing really helps.)

17 December, 2009 at 8:03 pm

Math websites falling into disuse? « What Is Research?

[…] Tricki also hasn’t been mentioned on Gowers’ blog since June 25, 2009 and on Terence Tao’s blog since August 2009. […]

1 August, 2010 at 12:45 pm

Theorem 33: the size of Gauss sums « Theorem of the week

[…] difficult to say precise things about the distribution of the quadratic residues (Tao has a nice post about some ideas about this), but this evaluation of the Gauss sum gives some information that […]

15 March, 2011 at 8:55 am

Sungjin Kim

Dear Professor Tao,
If chi is a Legendre symbol, (7) follows easily by replacing (n/p) by (n*/p). Even we get strong result, the sum (7) becomes -1. Also, the n_p << sqrt (p) bound can be obtained by much simpler way, it is in
http://number.subwiki.org/wiki/Smallest_quadratic_nonresidue_is_less_than_square_root_plus_one

Sincerely,
Sungjin Kim

7 July, 2011 at 10:52 am

On the number of solutions to 4/p = 1/n_1 + 1/n_2 + 1/n_3 « What’s new

[…] in which one or two of the are divisible by , with the other denominators being coprime to . (Of course, one can still get representations of by starting with a representation of and dividing by , but such representations are is not of the above form.) This shows that any attempt to establish the Erdös-Straus conjecture by manually constructing as a function of must involve a technique which breaks down if is replaced by (for instance, this rules out any approach based on using polynomial combinations of and dividing into cases based on residue classes of in small moduli). Part of the problem here is that we do not have good bounds that prevent a prime from “spoofing” a perfect square to all small moduli (say, to all moduli less than a small power of ); this problem is closely related (via quadratic reciprocity) to Vinogradov’s conjecture on the least quadratic nonresidue, discussed in this previous blog post. […]

8 July, 2011 at 4:59 am

Ian M.

In the line following equation (2), you write quadratic residue where I think you mean quadratic nonresidue. [Corrected, thanks – T.]

30 September, 2011 at 5:41 am

Jack D'Aurizio

What is known about the short sum $S=\sum_{p\in [q^{D/C},q^D]}\frac{\chi(p)}{p}$ where $q$ is a big prime, $\chi$ is the nonprincipal real character modulus $q$ , $D$ is a real number a little bigger than $\frac{1}{4}$ and $C\in(1,2)$ ?

If $S$ was $o(1)$ (or even bounded by a small constant), there would be significant improvements in the Vinogradov’s amplification trick. However, $S$ is rather too short…
By naming $J=[q^{D/C},q^D]\cap\mathcal{P}$ , one has
$|S|^2 = \sum_{p\in J}\frac{1}{p^2}+2\sum_{n=q^{2D/C}}^{q^{2D}}\frac{\chi(n)}{n}\,1_{\omega(n)=2}\,\cdot\,1_{J\cdot J}$
that is slighly larger and supported on almost primes: is it possible to apply some weaker version of the Selberg formulas
$\sum_{pq\leq x}\frac{\log p\,\log q}{pq}=\frac{1}{2}\,\log^2 x+O(\log x),$
$\sum_{p\leq x,\;\left(\frac{D}{q}\right)=1}\frac{\log p}{p}=\frac{1}{2}\,\log x + O(D)$
in the unlucky case $x\ll q$ ?
In alternative, it is possible to express every value of the character as a Gauss sum and use Weyl difference method to state:
$|S|^2 = \frac{1}{q}\left(\sum_{n=q^{D/C}}^{q^{D+1}}w^2(n)+2\sum_{h=1}^{\lfloor q^{D+1}-q^{D/C}\rfloor}\sum_{n=q^{D/C}}^{q^{D+1}-h}\cos\left(\frac{h}{q}\right)f(n)f(n+h)\right),$
where
$f(n)=\chi(n)\sum_{p|n,\,p\in J\,n/p\leq q}\frac{\chi(p)}{p}$ .
Here $\chi(n)\chi(n+h)$ has mean zero on intervals of length $q$ , and $|w(n)|$ is relatively small and supported on q-smooth numbers. However, reasonable bounds seem very hard to achieve also in this case, since there are a huge number of sums, and the least of them have too few terms to hope in some cancellation.
As a third way, one could try to deal with
$e^S-1\leq \prod_{p\in J}\left(1-\frac{\chi(p)}{p}\right)^{-1}=T,$
that exhibits a strong connection with $L(1,\chi)$ . Again, by expressing character values as Gauss sums and by switching the order of summation,
one has
$T = \frac{1}{\sqrt{q}}\sum_{n} \chi(n)\, e(-n/q)\, w(n),$
where
$w(n) = \sum_{d|n,\, p|d\to p\in J,\,n/d\leq q}\frac{\chi(d)}{d}$
is a small ( $|w(n)|\leq \frac{q}{n}\,d(n)$ ), almost-multiplicative weight-function, supported on q-smooth numbers and
$\sum \chi(n) e(-n/q)$
is a Gauss sum on intervals of length q. Is it possible to bound $T$ by Poisson summation, maybe using the results of Maier, De La Breteche and Tenenbaum on exponential sums over smooth numbers?

elianto84@gmail.com
University of Pisa

30 September, 2011 at 6:11 am

Jack D'Aurizio

The problem of bounding $n_q$ , the least quadratic non-residue modulus, without assuming GRH, is indeed very challenging.
I’m asking myself, too, if there is a way to deeper exploit quadratic reciprocity.

By assuming the independence of the values taken by the Legendre symbol modulus q over primes, one would expect to have $n_q$ bounded by some (potentially huge) power of $\log q$ .

Moreover, if $q\equiv 1\pmod{4}$ and, for some prime $p<q$ ,
$n_p|(q-p)$ , then the prime number $n_p\ll p^{\frac{1}{4\sqrt{e}}+\varepsilon}$ is a quadratic non-residue modulus q.

Is there a way to exploit the Lovasz Local Lemma in the last setting?

31 August, 2012 at 5:02 pm

The Lang-Weil bound « What’s new

[…] for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); […]

22 June, 2013 at 7:39 am

Bounding short exponential sums on smooth moduli via Weyl differencing | What's new

[…] any fixed integer , assuming that is square-free (for simplicity) and of polynomial size; see this previous post for a discussion of the Burgess argument. This is non-trivial for as small as […]

27 October, 2014 at 9:22 pm

The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture | What's new

[…] any fixed . See this previous post for a proof of these […]

8 February, 2016 at 12:50 pm

Anonymous

I don’t really understand the second half of this:

“Note that if $b$ is coprime to $p$ , then $\chi(bn)$ is a scalar multiple of $\chi(n)$ by a quantity $\chi(b)$ of magnitude $1$ ; taking Fourier transforms, this implies that $\hat \chi(a/b)$ and $\hat \chi(a)$ also differ by this factor. “

8 February, 2016 at 8:38 pm

Terence Tao

If $a \mapsto \hat \chi(a)$ is the Fourier transform of $n \mapsto \chi(n)$ , then $a \mapsto \hat \chi(a/b)$ is the Fourier transform of $n \mapsto \chi(bn)$ , as can be seen by a simple change of variables $n' := bn$ .

23 September, 2016 at 10:19 pm

kodlu

Could you please give a reference to the generalisation of the Burgess bound to shifted sums as in (4)?

24 September, 2016 at 7:35 am

Terence Tao

See e.g. Theorem 12.6 of Iwaniec-Kowalski.

24 April, 2018 at 8:10 am

On the Cramér-Granville Conjecture and finding prime pairs whose difference is 666 BootMath

[…] The least quadratic nonresidue, and the square root barrier […]

9 May, 2018 at 11:13 am

Breaking the Square root Barrier via the Burgess Bound | George Shakan

[…] analytic number theory and such a result would have wide applications. For instance, it would solve the Vinogradov least square problem. It would also have implications to sub–convexity bounds of –fucntions, […]

	Anonymous on It ought to be common knowledg…
	Ring Theory Intervie… on Reading seminar: “Stable…
	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…

The least quadratic nonresidue, and the square root barrier

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

27 comments

Leave a comment Cancel reply

For commenters

The least quadratic nonresidue, and the square root barrier

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

27 comments

Leave a comment Cancel reply

For commenters