One of the biggest deficiencies with my previous result is the fact that the averaged Navier-Stokes equation does not enjoy any good equation for the vorticity , in contrast to the true Navier-Stokes equations which, when written in vorticity-stream formulation, become

(Throughout this post we will be working in three spatial dimensions .) So one of my main near-term goals in this area is to exhibit an equation resembling Navier-Stokes as much as possible which enjoys a vorticity equation, and for which there is finite time blowup.

Heuristically, this task should be easier for the Euler equations (i.e. the zero viscosity case of Navier-Stokes) than the viscous Navier-Stokes equation, as one expects the viscosity to only make it easier for the solution to stay regular. Indeed, morally speaking, the assertion that finite time blowup solutions of Navier-Stokes exist should be roughly equivalent to the assertion that finite time blowup solutions of Euler exist which are “Type I” in the sense that all Navier-Stokes-critical and Navier-Stokes-subcritical norms of this solution go to infinity (which, as explained in the above slides, heuristically means that the effects of viscosity are negligible when compared against the nonlinear components of the equation). In vorticity-stream formulation, the Euler equations can be written as

As discussed in this previous blog post, a natural generalisation of this system of equations is the system

where is a linear operator on divergence-free vector fields that is “zeroth order” in some sense; ideally it should also be invertible, self-adjoint, and positive definite (in order to have a Hamiltonian that is comparable to the kinetic energy ). (In the previous blog post, it was observed that the surface quasi-geostrophic (SQG) equation could be embedded in a system of the form (1).) The system (1) has many features in common with the Euler equations; for instance vortex lines are transported by the velocity field , and Kelvin’s circulation theorem is still valid.

So far, I have not been able to fully achieve this goal. However, I have the following partial result, stated somewhat informally:

Theorem 1There is a “zeroth order” linear operator (which, unfortunately, is not invertible, self-adjoint, or positive definite) for which the system (1) exhibits smooth solutions that blowup in finite time.

The operator constructed is not quite a zeroth-order pseudodifferential operator; it is instead merely in the “forbidden” symbol class , and more precisely it takes the form

for some compactly supported divergence-free of mean zero with

being rescalings of . This operator is still bounded on all spaces , and so is arguably still a zeroth order operator, though not as convincingly as I would like. Another, less significant, issue with the result is that the solution constructed does not have good spatial decay properties, but this is mostly for convenience and it is likely that the construction can be localised to give solutions that have reasonable decay in space. But the biggest drawback of this theorem is the fact that is not invertible, self-adjoint, or positive definite, so in particular there is no non-negative Hamiltonian for this equation. It may be that some modification of the arguments below can fix these issues, but I have so far been unable to do so. Still, the construction does show that the circulation theorem is insufficient by itself to prevent blowup.

We sketch the proof of the above theorem as follows. We use the barrier method, introducing the time-varying hyperboloid domains

for (expressed in cylindrical coordinates ). We will select initial data to be for some non-negative even bump function supported on , normalised so that

in particular is divergence-free supported in , with vortex lines connecting to . Suppose for contradiction that we have a smooth solution to (1) with this initial data; to simplify the discussion we assume that the solution behaves well at spatial infinity (this can be justified with the choice (2) of vorticity-stream operator, but we will not do so here). Since the domains disconnect from at time , there must exist a time which is the first time where the support of touches the boundary of , with supported in .

From (1) we see that the support of is transported by the velocity field . Thus, at the point of contact of the support of with the boundary of , the inward component of the velocity field cannot exceed the inward velocity of . We will construct the functions so that this is not the case, leading to the desired contradiction. (Geometrically, what is going on here is that the operator is pinching the flow to pass through the narrow cylinder , leading to a singularity by time at the latest.)

First we observe from conservation of circulation, and from the fact that is supported in , that the integrals

are constant in both space and time for . From the choice of initial data we thus have

for all and all . On the other hand, if is of the form (2) with for some bump function that only has -components, then is divergence-free with mean zero, and

where . We choose to be supported in the slab for some large constant , and to equal a function depending only on on the cylinder , normalised so that . If , then passes through this cylinder, and we conclude that

Inserting ths into (2), (1) we conclude that

for some coefficients . We will not be able to control these coefficients , but fortunately we only need to understand on the boundary , for which . So, if happens to be supported on an annulus , then vanishes on if is large enough. We then have

on the boundary of .

Let be a function of the form

where is a bump function supported on that equals on . We can perform a dyadic decomposition where

where is a bump function supported on with . If we then set

then one can check that for a function that is divergence-free and mean zero, and supported on the annulus , and

so on (where ) we have

One can manually check that the inward velocity of this vector on exceeds the inward velocity of if is large enough, and the claim follows.

Remark 2The type of blowup suggested by this construction, where a unit amount of circulation is squeezed into a narrow cylinder, is of “Type II” with respect to the Navier-Stokes scaling, because Navier-Stokes-critical norms such (or at least ) look like they stay bounded during this squeezing procedure (the velocity field is of size about in cylinders of radius and length about ). So even if the various issues with are repaired, it does not seem likely that this construction can be directly adapted to obtain a corresponding blowup for a Navier-Stokes type equation. To get a “Type I” blowup that is consistent with Kelvin’s circulation theorem, it seems that one needs to coil the vortex lines around a loop multiple times in order to get increased circulation in a small space. This seems possible to pull off to me – there don’t appear to be any unavoidable obstructions coming from topology, scaling, or conservation laws – but would require a more complicated construction than the one given above.

Filed under: expository, math.AP Tagged: incompressible Euler equations ]]>

Filed under: advertising Tagged: ipam ]]>

(expanded in terms of Taylor series in ). Comments on the problem should be placed in the polymath blog post; if there is enough interest, we can start a formal polymath project on it.

Filed under: advertising, polymath Tagged: Dinesh Thakur, irreducible polynomials ]]>

Theorem 1 (Two-dimensional Vinogradov main conjecture)One hasas .

This particular case of the main conjecture has a classical proof using some elementary number theory. Indeed, the left-hand side can be viewed as the number of solutions to the system of equations

with . These two equations can combine (using the algebraic identity applied to ) to imply the further equation

which, when combined with the divisor bound, shows that each is associated to choices of excluding diagonal cases when two of the collide, and this easily yields Theorem 1. However, the Bourgain-Demeter-Guth argument (which, in the two dimensional case, is essentially contained in a previous paper of Bourgain and Demeter) does not require the divisor bound, and extends for instance to the the more general case where ranges in a -separated set of reals between to .

In this special case, the Bourgain-Demeter argument simplifies, as the lower dimensional inductive hypothesis becomes a simple almost orthogonality claim, and the multilinear Kakeya estimate needed is also easy (collapsing to just Fubini’s theorem). Also one can work entirely in the context of the Vinogradov main conjecture, and not turn to the increased generality of decoupling inequalities (though this additional generality is convenient in higher dimensions). As such, I am presenting this special case as an introduction to the Bourgain-Demeter-Guth machinery.

We now give the specialisation of the Bourgain-Demeter argument to Theorem 1. It will suffice to establish the bound

for all , (where we keep fixed and send to infinity), as the bound then follows by combining the above bound with the trivial bound . Accordingly, for any and , we let denote the claim that

as . Clearly, for any fixed , holds for some large , and it will suffice to establish

Proposition 2Let , and let be such that holds. Then there exists such that holds.

Indeed, this proposition shows that for , the infimum of the for which holds is zero.

We prove the proposition below the fold, using a simplified form of the methods discussed in the previous blog post. To simplify the exposition we will be a bit cavalier with the uncertainty principle, for instance by essentially ignoring the tails of rapidly decreasing functions.

Henceforth we fix and , and assume that holds. For any interval , let denote the exponential sum

this function is periodic with respect to the lattice and can thus also be thought of as a function on the torus . The hypothesis , is then asserting that

A Galilean rescaling argument (noting that the Galilean transform used lies in ) then shows that

for any interval of length going to infinity as .

for some . We first observe that it will suffice to show the apparently weaker *bilinear estimate*

whenever are disjoint intervals in that are separated by . Indeed, suppose the bilinear estimate (4) held for all . If we define the quantity

then by decomposing into intervals of length about , with a moderately large natural number, we can use the triangle inequality to bound

By (4), the contribution of those with is . On the other hand, by Hölder’s inequality and affine rescaling, the contribution of the near-diagonal with is . This gives the inequality

and by taking to be a sufficiently large constant (depending on ) and using a trivial bound for small , one can obtain the bound , which gives (3). Thus it suffices to show (4).

Let be as in (4). For any fixed and , we let denote the best constant for which one has the bound

as , where for , ranges over a partition of into intervals of length , and

is the local norm of near , where is the rectangle

(Actually, to make the argument below work rigorously we have to replace the indicator by a smoothed out variant , but to simplify the exposition we shall simply ignore this technical issue.) The function has Fourier support in the rectangle , and so by uncertainty principle heuristics one morally has (ignoring the technical issue alluded to above) a pointwise bound of the form

for any . We will shortly establish the inequality

for any and for any that is sufficiently small depending on ; inserting this bound into (5) for a suitably large and sufficiently small gives the desired bound (4).

It remains to establish (6). This will follow from the following claims.

Proposition 3For sufficiently small , we have

- (i) (Hölder) The functions and are convex non-increasing in .
- (ii) (Rescaled induction hypothesis) We have .
- (iii) ( decoupling) We have .
- (iv) (Bilinear Kakeya) We have .

Let us now see why this proposition implies (6) for all . From the proposition we have

which gives the claim for . To increase , assume that (6) already holds for some value of , then by Proposition 3(iii) we have

for sufficiently small . On the other hand, from (ii) we have . Interpolating using (i) and the hypothesis , we have

for sufficiently small and for some depending only on . Applying (iv) followed by (i) we conclude that (6) holds with replaced by . Iterating this, we can obtain (6) for arbitrarily large , as required.

The claim (i) is an easy application of Hölder’s inequality; we now turn to the more interesting claims (ii), (iii), (iv).

** — 1. Rescaled induction hypothesis — **

To prove (ii), we need to show

where ranges over a partition of into intervals of length , and similarly for . By Hölder’s inequality it suffices to show that

for . Since , we can use Minkowski’s inequality to conclude that

and the claim then follows from (2) (since there are intervals to sum over).

** — 2. decoupling — **

To prove (iii), it will suffice to show that

where the and are partitions of into intervals of length and respectively. This will follow from the pointwise estimates

for any , any and any interval of length (assuming the intervals are nicely nested in some dyadic fashion for simplicity). This expands as

where is a rectangle of dimensions roughly with sides parallel to the coordinate axes. Without the localisation to , this would be immediate from the orthogonality of the . Morally, the localisation to introduces a Fourier uncertainty by a rectangle of dimensions roughly . But the frequencies that the are Fourier supported in are essentially disjoint in even up to this uncertainty, so the global orthogonality of the should localise to the scale of the rectangle . (This can be made rigorous using suitable smoothed approximants to the indicator of , but we omit this technical detail here.)

** — 3. Bilinear Kakeya — **

To prove (iv), it will suffice to show that

as , where ranges over a partition of into intervals of length . By averaging, it suffices to show that

whenever is a rectangle of dimensions essentially with sides parallel to the axes. If we set , then we morally have

on , and so the estimate will follow if we can show that

(As before, to be rigorous we need to replace the localisation with a smoother weight , but we ignore this technicality here.) We now apply a logarithmic pigeonholing (conceding a factor of ) to restrict to a set in which all the means are comparable to each other, and similarly to restrict to a set where the means are comparable to each other. We can then normalise so that

for all surviving , so it now suffices to show that

Since , we have

for , so it suffices to show that

By the triangle inequality, it suffices to show that

Recall that is a rectangle of dimensions about . As each is an interval of length about , we see from the uncertainty principle that the are essentially constant along rectangles of length that fit inside the rectangle in a certain orientation (depending on the location of , and oriented within of the horizontal). Thus the functions also exhibit similar behaviour, and can be essentially written within as

for some non-negative coefficients and some rectangles in . The estimate (7) then takes the form

so it would suffice (since ) to show that

for any rectangles associated to intervals from respectively. But the transversality of ensures that these rectangles make an angle of with each other, and the claim follows from simple geometry ( behaves like a rectangle of dimensions ).

Filed under: expository, math.CA, math.NT Tagged: induction on scales, Vinogradov main conjecture ]]>

However, when the all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if is a Hilbert space and the are mutually orthogonal, we have the Pythagorean theorem

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

for any finite collection in any Banach space , where denotes the cardinality of . Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection exhibits *decoupling in * if one has the Pythagorean-like inequality

for any , thus one obtains almost the full square root cancellation in the norm. The theory of *almost orthogonality* can then be viewed as the theory of decoupling in Hilbert spaces such as . In spaces for one usually does not expect this sort of decoupling; for instance, if the are disjointly supported one has

and the right-hand side can be much larger than when . At the opposite extreme, one usually does not expect to get decoupling in , since one could conceivably align the to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in becomes sharp.

However, in some cases one can get decoupling for certain . For instance, suppose we are in , and that are *bi-orthogonal* in the sense that the products for are pairwise orthogonal in . Then we have

giving decoupling in . (Similarly if each of the is orthogonal to all but of the other .) A similar argument also gives decoupling when one has tri-orthogonality (with the mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed if one multiplies each of the by an independent random sign .

In recent years, Bourgain and Demeter have been establishing *decoupling theorems* in spaces for various key exponents of , in the “restriction theory” setting in which the are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve parameterised by the polynomial curve

For any ball in , let denote the weight

which should be viewed as a smoothed out version of the indicator function of . In particular, the space can be viewed as a smoothed out version of the space . For future reference we observe a fundamental self-similarity of the curve : any arc in this curve, with a compact interval, is affinely equivalent to the standard arc .

Theorem 1 (Decoupling theorem)Let . Subdivide the unit interval into equal subintervals of length , and for each such , let be the Fourier transformof a finite Borel measure on the arc , where . Then the exhibit decoupling in for any ball of radius .

Orthogonality gives the case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in up to the range ; the point here is that we can now get a much larger value of . The case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent (and the radius ) is best possible, as can be seen by the following basic example. If

where is a bump function adapted to , then standard Fourier-analytic computations show that will be comparable to on a rectangular box of dimensions (and thus volume ) centred at the origin, and exhibit decay away from this box, with comparable to

On the other hand, is comparable to on a ball of radius comparable to centred at the origin, so is , which is just barely consistent with decoupling. This calculation shows that decoupling will fail if is replaced by any larger exponent, and also if the radius of the ball is reduced to be significantly smaller than .

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture)Let be integers, and let . Then

*Proof:* By the Hölder inequality (and the trivial bound of for the exponential sum), it suffices to treat the critical case , that is to say to show that

We can rescale this as

As the integrand is periodic along the lattice , this is equivalent to

The left-hand side may be bounded by , where and . Since

the claim now follows from the decoupling theorem and a brief calculation.

Using the Plancherel formula, one may equivalently (when is an integer) write the Vinogradov main conjecture in terms of solutions to the system of equations

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for , or for and either below or above , with the bulk of recent progress coming from the *efficient congruencing* technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of fifth powers (the previous best result required fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set that is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

** — 1. Initial reductions — **

The claim will proceed by an induction on dimension, thus we assume henceforth that (the case being immediate from the Pythagorean theorem) and that Theorem 1 has already been proven for smaller values of . This has the following nice consequence:

Proposition 3 (Lower dimensional decoupling)Let the notation be as in Theorem 1. Suppose also that , and that Theorem 1 has already been proven for all smaller values of . Then for any , the exhibits decoupling in for any ball of radius .

*Proof:* (Sketch) We slice the ball into -dimensional slices parallel to the first coordinate directions. On each slice, the can be interpreted as functions on whose Fourier transform lie on the curve , where . Applying Theorem 1 with replaced by , and then integrating over all slices using Fubini’s theorem and Minkowski’s inequality (to interchange the norm and the square function), we obtain the claim.

The first step, needed for technical inductive purposes, is to work at an exponent slightly below . More precisely, given any and , let denote the assertion that

whenever , , and are as in Theorem 1. Theorem 1 is then clearly equivalent to the claim holding for all . This turns out to be equivalent to the following variant:

Proposition 4Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , then holds for all .

The reason for this is that the functions and all have Fourier transform supported on a ball of radius , and so there is a Bernstein-type inequality that lets one replace the norm of either function by the norm, losing a power of that goes to zero as goes to . (See Corollary 6.2 and Lemma 8.2 of the Bourgain-Demeter-Guth paper for more details of this.)

Using the trivial bound (1) we see that holds for large (e.g. ). To reduce , it suffices to prove the following inductive claim.

Proposition 5 (Inductive claim)Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , and holds for some , then holds for some .

Since the set of for which holds is clearly a closed half-infinite interval, Proposition 5 implies Proposition 4 and hence Theorem 1.

Henceforth we fix as in Proposition 5. We fix and use to denote any quantity that goes to zero as , keeping fixed. Then the hypothesis reads

The next step is to reduce matters to a “multilinear” version of the above estimate, in order to exploit a multilinear Kakeya estimate at a later stage of the argument. Let be a large integer depending only on (actually Bourgain, Demeter, and Guth choose ). It turns out that it will suffice to prove the multilinear version

whenever are families of disjoint subintervals on of length that are separated from each other by a distance of , and where denotes the geometric mean

We have the following nice equivalence (essentially due to Bourgain and Guth, building upon an earlier “bilinear equivalence” result of Vargas, Vega, and myself, and discussed in this previous blog post):

Proposition 6 (Multilinear equivalence)For any , the estimates (2) and (3) are equivalent.

*Proof:* The derivation of (3) from (2) is immediate from Hölder’s inequality. To obtain the converse implication, let denote the best constant in (2), thus is the smallest constant such that

The idea is to prove an inequality of the form

for any fixed integer (with the implied constant in the notation independent of ); by choosing large enough one can then prove by an inductive argument.

We partition the intervals in (2) into classes of consecutive intervals, so that can be expressed as where . Observe that for any , one either has

for some (i.e. one of the dominates the sum), or else one has

for some with the transversality condition . This leads to the pointwise inequality

Bounding the supremum by and then taking norms and using (3), we conclude that

On the other hand, applying an affine rescaling to (4) one sees that

and the claim follows. (A more detailed version of this argument may be found in Theorem 4.1 of this paper of Bourgain and Demeter.)

It thus suffices to show (3).

The next step is to set up some intermediate scales between and , in order to run an “induction on scales” argument. For any scale , any exponent , and any function , let denote the local average

where denotes the volume of (one could also use the equivalent quantity here if desired). For any exponents , , and (independent of ), let denote the least exponent for which one has the local decoupling inequality

for as in (3), where the -length intervals in have been covered by a family of finitely overlapping intervals of length , and . It is then not difficult to see that the estimate (3) is equivalent to the inequality

(basically because when , there is essentially only one for each , and is basically ; also; the averaging is essentially the identity when since all the and here have Fourier support on a ball of radius ). To put it another way, our task is now to show that

On the other hand, one can establish the following inequalities concerning the quantities , arranged roughly in increasing order of difficulty to prove.

Proposition 7 (Inequalities on )Throughout this proposition it is understood that , , and .

- (i) (Hölder) The quantity is convex in , and monotone nondecreasing in .
- (ii) (Minkowski) If , then is monotone non-decreasing in .
- (iii) (Stability) One has . (In fact, is Lipschitz in uniformly in , but we will not need this.)
- (iv) (Rescaled decoupling hypothesis) If and , then one has .
- (v) (Lower dimensional decoupling) If and , then .
- (vi) (Multilinear Kakeya) If and , then .

We sketch the proof of the various parts of this proposition in later sections. For now, let us show how these properties imply the claim (6). In the paper of Bourgain, Demeter, and Guth, the above properties were iterated along a certain “tree” of parameters , relying in (v) to increase the parameter (which measures the amount of decoupling) and (vi) to “inflate” or increase the parameter (which measures the spatial scale at which decoupling has been obtained), and (i) to reconcile the different choices of appearing in (v) and (vi), with the remaining properties (ii), (iii), (iv) used to control various “boundary terms” arising from this tree iteration. Here, we will present an essentially equivalent “Bellman function” formulation of the argument which replaces this iteration by a carefully (but rather unmotivatedly) chosen inductive claim. More precisely, let be a small quantity (depending only on and ) to be chosen later. For any , let denote the claim that for every , and for all sufficiently small , one has the inequality

From Proposition 7 (i), (ii), (iv), we see that holds for some small . We will shortly establish the implication

for some independent of ; this implies upon iteration that holds for arbitrarily large values of . Applying (9) with for a sufficiently large and a sufficiently small , and combining with Proposition 7(iii), we obtain the claim (6).

We now prove the implication (10). Thus we assume (7) holds for , sufficiently small , and obeying (8), and also (9) for and we wish to improve this to

for the same range of and for sufficiently small , and also

By Proposition 7(i) it suffices to show this for the extreme values of , thus we wish to show that

We begin with (13). The case of this estimate is

But since , we see that if is small enough, so the right-hand side of (16) is greater than and the claim follows from Proposition 7(iv) (with a little bit of room to spare). Now we look at the cases of (13). By Proposition 7(vi), we have

For close to , lies between and , so from (7) one has

Since , one has

for small enough depending on , and (13) follows (if is small enough depending on but not on ).

The same argument applied with gives

Since , we thus have

if are sufficiently small depending on (but not on ). This, together with Proposition 7(i), gives (15).

Finally, we establish (14). From Proposition 7(v) (with replaced by ) we have

In the case, this gives

and the claim (14) follows from (15) in this case. Now suppose . Since is close to , lies between and , and so we may apply (7) to conclude that

and hence (after simplifying)

which gives (14) for small enough (depending on , but not on ).

** — 2. Rescaled decoupling — **

The claims (i), (ii), (iii) of Proposition 7 are routine applications of the Hölder and Minkowski inequalities (and also the Bernstein inequality, in the case of (iii)); we will focus on the more interesting claims (iv), (v), (vi).

Here we establish (iv). The main geometric point exploited here is that any segment of the curve is affinely equivalent to itself, with the key factor of in the bound coming from this affine rescaling.

Using the definition (5) of , we see that we need to show that

for balls of radius . By Hölder’s inequality, it suffices to show that

for each . By Minkowski’s inequality (and the fact that ), the left-hand side is at most

so it suffices to show that

for each . From Fubini’s theorem one has

so we reduce to showing that

But this follows by applying an affine rescaling to map to , and then using the hypothesis with replaced by . (The ball gets distorted into an ellipsoid, but one can check that this ellipsoid can be covered efficiently by finitely overlapping balls of radius , and so one can close the argument using the triangle inequality.)

** — 3. Lower dimensional decoupling — **

Now we establish (v). Here, the geometric point is the one implicitly used in Proposition 3, namely that the -dimensional curve projects down to the -dimensional curve for any .

Let be as in Proposition 7(v). From (5), it suffices to show that

for balls of radius . It will suffice to show the pointwise estimate

for any , or equivalently that

where . Clearly this will follow if we have

for each . Covering the intervals in by those in , it suffices to show that

for each . But this follows from Proposition 3.

** — 4. Multidimensional Kakeya — **

Finally, we establish (vi), which is the most substantial component of Proposition 7, and the only component which truly takes advantage of the reduction to the multilinear setting. Let and be such that . From (5), it suffices to show that

for balls of radius . By averaging, it suffices to establish the bound

for balls of radius . If we write , the right-hand side simplifies to

so it suffices to show that

At this point it is convenient to perform a dyadic pigeonholing (giving up a factor of ) to normalise, for each , all of the quantities to be of comparable size, after reducing the sets so some appropriate subset . (The contribution of those for which this quantity is less than, say, of the maximal value, can be safely discarded by trivial estimates.) By homogeneity we may then normalise

for all surviving , so the estimate now becomes

Since is close to , is less than , so we can estimate

and so it suffices to show that

or, on raising to the power ,

Localising to balls of radius , it suffices to show that

The arc is contained in a box of dimensions roughly , so by the uncertainty principle is essentially constant along boxes of dimensions (this can be made precise by standard methods, see e.g. the discussion in the proof of Theorem 5.6 of Bourgain-Demeter-Guth, or my general discussion on the uncertainty principle in this previous blog post). This implies that , when restricted to , is essentially constant on “plates”, defined as the intersection of with slabs that have dimensions of length and the remaining dimensions infinite (and thus restricted to be of length about after restriction to ). Furthermore, as varies (and is constrained to be in , the orientation of these slabs varies in a suitably “transverse” fashion (the precise definition of this is a little technical, but can be verified for ; see the BDG paper for details). After rescaling, the claim then follows from the following proposition:

Proposition 8 (Multilinear Kakeya)For , let be a collection of “plates” that have dimensions of length , and dimensions that are infinite, and for each let be a non-negative number. Assume that the families of plates obey a suitable transversality condition. Thenfor any ball of radius .

The exponent here is natural, as can be seen by considering the example where each consists of about parallel disjoint plates passing through , with for all such plates.

For (where the plates now become tubes), this result was first obtained by Bennett, Carbery, and myself using heat kernel methods, with a rather different proof (also capturing the endpoint case) later given using algebraic topological methods by Guth (as discussed in this previous post. More recently, a very short and elementary proof of this theorem was given by Guth, which was initially given for but extends to general . The scheme of the proof can be described as follows.

- When all the plates in a each family are parallel, the claim follows from the Loomis-Whitney inequality (when ) or a more general Brascamp-Lieb inequality of Bennett, Carbery, Christ, and myself (for general ). These inequalities can be proven by a repeated applications of the Hölder inequality and Fubini’s theorem.
- Perturbing this, we can obtain the proposition with a loss of for any and , provided that the plates in each are within of being parallel, and is sufficiently small depending on and . (For the case of general , this requires some uniformity in the result of Bennett, Carbery, Christ, and myself, which can be obtained by hand in the specific case of interest here, but was recently established in general by Bennett, Bez, Flock, and Lee.
- A standard “induction on scales” argument shows that if the proposition is true at scale with some loss , then it is also true at scale with loss . Iterating this, we see that we can obtain the proposition with a loss of uniformly for
*all*, provided that the plates are within of being parallel and is sufficiently small depending now only on (and not on ). - A finite partition of unity then suffices to remove the restriction of the plates being within of each other, and then sending to zero we obtain the claim.

The proof of the decoupling theorem (and thus the Vinogradov main conjecture) are now complete.

Remark 9The above arguments extend to give decoupling for the curve in for every . As it turns out (Bourgain, private communication), a variant of the argument also handles the range , and the range can be covered from an induction on dimension (using the argument used to establish Proposition 3).

Filed under: expository, math.CA, math.NT Tagged: Ciprian Demeter, decoupling, induction on scales, Jean Bourgain, Larry Guth, multilinear Kakeya conjecture, restriction theorems, Vinogradov main conjecture ]]>

as , that is to say that exhibits cancellation on large intervals such as . This result can be improved to give cancellation on shorter intervals. For instance, using the known zero density estimates for the Riemann zeta function, one can establish that

as if for some fixed ; I believe this result is due to Ramachandra (see also Exercise 21 of this previous blog post), and in fact one could obtain a better error term on the right-hand side that for instance gained an arbitrary power of . On the Riemann hypothesis (or the weaker density hypothesis), it was known that the could be lowered to .

Early this year, there was a major breakthrough by Matomaki and Radziwill, who (among other things) showed that the asymptotic (1) was in fact valid for *any* with that went to infinity as , thus yielding cancellation on extremely short intervals. This has many further applications; for instance, this estimate, or more precisely its extension to other “non-pretentious” bounded multiplicative functions, was a key ingredient in my recent solution of the Erdös discrepancy problem, as well as in obtaining logarithmically averaged cases of Chowla’s conjecture, such as

It is of interest to twist the above estimates by phases such as the linear phase . In 1937, Davenport showed that

which of course improves the prime number theorem. Recently with Matomaki and Radziwill, we obtained a common generalisation of this estimate with (1), showing that

as , for any that went to infinity as . We were able to use this estimate to obtain an averaged form of Chowla’s conjecture.

In that paper, we asked whether one could improve this estimate further by moving the supremum inside the integral, that is to say to establish the bound

as , for any that went to infinity as . This bound is asserting that is locally Fourier-uniform on most short intervals; it can be written equivalently in terms of the “local Gowers norm” as

from which one can see that this is another averaged form of Chowla’s conjecture (stronger than the one I was able to prove with Matomaki and Radziwill, but a consequence of the unaveraged Chowla conjecture). If one inserted such a bound into the machinery I used to solve the Erdös discrepancy problem, it should lead to further averaged cases of Chowla’s conjecture, such as

though I have not fully checked the details of this implication. It should also have a number of new implications for sign patterns of the Liouville function, though we have not explored these in detail yet.

One can write (4) equivalently in the form

uniformly for all -dependent phases . In contrast, (3) is equivalent to the subcase of (6) when the linear phase coefficient is independent of . This dependency of on seems to necessitate some highly nontrivial additive combinatorial analysis of the function in order to establish (4) when is small. To date, this analysis has proven to be elusive, but I would like to record what one can do with more classical methods like Vaughan’s identity, namely:

Proposition 1The estimate (4) (or equivalently (6)) holds in the range for any fixed . (In fact one can improve the right-hand side by an arbitrary power of in this case.)

The values of in this range are far too large to yield implications such as new cases of the Chowla conjecture, but it appears that the exponent is the limit of “classical” methods (at least as far as I was able to apply them), in the sense that one does not do any combinatorial analysis on the function , nor does one use modern equidistribution results on “Type III sums” that require deep estimates on Kloosterman-type sums. The latter may shave a little bit off of the exponent, but I don’t see how one would ever hope to go below without doing some non-trivial combinatorics on the function . UPDATE: I have come across this paper of Zhan which uses mean-value theorems for L-functions to lower the exponent to .

Let me now sketch the proof of the proposition, omitting many of the technical details. We first remark that known estimates on sums of the Liouville function (or similar functions such as the von Mangoldt function) in short arithmetic progressions, based on zero-density estimates for Dirichlet -functions, can handle the “major arc” case of (4) (or (6)) where is restricted to be of the form for (the exponent here being of the same numerology as the exponent in the classical result of Ramachandra, tied to the best zero density estimates currently available); for instance a modification of the arguments in this recent paper of Koukoulopoulos would suffice. Thus we can restrict attention to “minor arc” values of (or , using the interpretation of (6)).

Next, one breaks up (or the closely related Möbius function) into Dirichlet convolutions using one of the standard identities (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the Liouville and Möbius functions). The exact choice of identity is not terribly important, but the upshot is that can be decomposed into terms, each of which is either of the “Type I” form

for some coefficients that are roughly of logarithmic size on the average, and scales with and , or else of the “Type II” form

for some coefficients that are roughly of logarithmic size on the average, and scales with and . As discussed in the previous post, the exponent is a natural barrier in these identities if one is unwilling to also consider “Type III” type terms which are roughly of the shape of the third divisor function .

A Type I sum makes a contribution to that can be bounded (via Cauchy-Schwarz) in terms of an expression such as

The inner sum exhibits a lot of cancellation unless is within of an integer. (Here, “a lot” should be loosely interpreted as “gaining many powers of over the trivial bound”.) Since is significantly larger than , standard Vinogradov-type manipulations (see e.g. Lemma 13 of these previous notes) show that this bad case occurs for many only when is “major arc”, which is the case we have specifically excluded. This lets us dispose of the Type I contributions.

A Type II sum makes a contribution to roughly of the form

We can break this up into a number of sums roughly of the form

for ; note that the range is non-trivial because is much larger than . Applying the usual bilinear sum Cauchy-Schwarz methods (e.g. Theorem 14 of these notes) we conclude that there is a lot of cancellation unless one has for some . But with , is well below the threshold for the definition of major arc, so we can exclude this case and obtain the required cancellation.

Filed under: expository, math.NT, question Tagged: Chowla conjecture, Gowers uniformity norm, Liouville function ]]>

Theorem 1 (Halasz inequality)Let be a multiplicative function bounded in magnitude by , and suppose that , , and are such that

As a qualitative corollary, we conclude (by standard compactness arguments) that if

as . In the more recent work of this paper of Granville and Soundararajan, the sharper bound

is obtained (with a more precise description of the term).

The usual proofs of Halasz’s theorem are somewhat lengthy (though there has been a recent simplification, in forthcoming work of Granville, Harper, and Soundarajan). Below the fold I would like to give a relatively short proof of the following “cheap” version of the inequality, which has slightly weaker quantitative bounds, but still suffices to give qualitative conclusions such as (2).

Theorem 2 (Cheap Halasz inequality)Let be a multiplicative function bounded in magnitude by . Let and , and suppose that is sufficiently large depending on . If (1) holds for all , then

The non-optimal exponent can probably be improved a bit by being more careful with the exponents, but I did not try to optimise it here. A similar bound appears in the first paper of Halasz on this topic.

The idea of the argument is to split as a Dirichlet convolution where is the portion of coming from “small”, “medium”, and “large” primes respectively (with the dividing line between the three types of primes being given by various powers of ). Using a Perron-type formula, one can express this convolution in terms of the product of the Dirichlet series of respectively at various complex numbers with . One can use based estimates to control the Dirichlet series of , while using the hypothesis (1) one can get estimates on the Dirichlet series of . (This is similar to the Fourier-analytic approach to ternary additive problems, such as Vinogradov’s theorem on representing large odd numbers as the sum of three primes.) This idea was inspired by a similar device used in the work of Granville, Harper, and Soundarajan. A variant of this argument also appears in unpublished work of Adam Harper.

I thank Andrew Granville for helpful comments which led to significant simplifications of the argument.

** — 1. Basic estimates — **

We need the following basic tools from analytic number theory. We begin with a variant of the classical Perron formula.

Proposition 3 (Perron type formula)Let be an arithmetic function bounded in magnitude by , and let . Assume that the Dirichlet series is absolutely convergent for . Then

*Proof:* By telescoping series (and treating the contribution of trivially), it suffices to show that

whenever .

The left-hand side can be written as

where . We now introduce the mollified version

of , where

and is a fixed smooth function supported on that equals at the origin. Basic Fourier analysis then tells us that is a Schwartz function with total mass one. This gives the crude bound

for any . For or , we use the bound (say) to arrive at the bound

for we again use and write

and use the Lipschitz bound for to obtain

for such . Putting all these bounds together, we see that

for all . In particular, we can write (3) as

The expression is bounded by when , is bounded by when , is bounded by when or , and is bounded by otherwise. From these bounds, a routine calculation (using the hypothesis ) shows that

and so it remains to show that

Writing

where

we see from the triangle inequality and the support of that

But from integration by parts we see that , and the claim follows.

Next, we recall a standard mean value estimate for Dirichlet series:

Proposition 4 ( mean value estimate)Let be an arithmetic function, and let . Assume that the Dirichlet series is absolutely convergent for . Then

*Proof:* This follows from Lemma 7.1 of Iwaniec-Kowalski; for the convenience of the reader we reproduce the short proof here. Introducing the normalised sinc function , we have

But a standard Fourier-analytic computation shows that vanishes unless , in which case the integral is , and the claim follows.

Now we recall a basic sieve estimate:

Proposition 5 (Sieve bound)Let , let be an interval of length , and let be a set of primes up to . If we remove one residue class mod from for every , the number of remaining natural numbers in is at most .

*Proof:* This follows for instance from the fundamental lemma of sieve theory (see e.g. Corollary 19 of this blog post). (One can also use the Selberg sieve or the large sieve.)

Finally, we record a standard estimate on the number of smooth numbers:

Proposition 6Let and , and suppose that is sufficiently large depending on . Then the number of natural numbers in which have no prime factor larger than is at most .

*Proof:* See Corollary 1.3 of this paper of Hildebrand and Tenenbaum. (The result also follows from the more classical work of Dickman.) We sketch a short proof here due to Kevin Ford. Let denote the set of numbers that are “smooth” in the sense that they have no prime factor larger than . It then suffices to prove the bound

since the contribution of those less than (say) is negligible, and for the other values of , is comparable to . Writing , we can rearrange the left-hand side as

By the prime number theorem, the contribution to of those is , and the contribution of those with consists only of prime powers, which contribute . Combining these estimates, we can get a bound of the form

where is a quantity to be chosen later. Thus we can bound the left-hand side of (4) by

which by Euler products can be bounded by

By the mean value theorem applied to the function , we can bound by for . By Mertens’ theorem, we thus get a bound of

If we make the choice , we obtain the required bound (4).

** — 2. Proof of theorem — **

By increasing as necessary we may assume that (say). Let be small parameters (depending on ) to be optimised later; we assume to be sufficiently large depending on . Call a prime *small* if , *medium* if , and *large* if . Observe that for any we can factorise as a Dirichlet convolution

where

- is the restriction of to those natural numbers whose prime factors are all small;
- is the restriction of to those natural numbers whose prime factors are all medium;
- is the restriction of to those natural numbers whose prime factors are all large.

It is convenient to remove the Dirac function from , so we write

and split

Note that is the restriction of to those numbers whose prime factors are all small or medium. By Proposition 6, the number of such can certainly be bounded by if is sufficienty large. Thus the contribution of this term to (5) is .

Similarly, is the restriction of to those numbers which contain at least one large prime factor, but no medium prime factors. By Proposition 5 the number of such is bounded by if is sufficiently large. Thus the contribution of this term to (5) is , and hence

Note that is only supported on numbers whose prime factors do not exceed , so the Dirichlet series of is absolutely convergent for and is equal to , where are the Dirichlet series of respectively. Since is bounded in magnitude by (being a restriction of ), we may apply Proposition 3 and conclude (for large enough, and discarding the denominator) that

We now record some estimates:

Lemma 7For sufficiently large , we haveand

*Proof:* We just prove the former inequality, as the latter is similar. By Proposition 4, we have

The term vanishes unless , and we have , so we can bound the right-hand side by

The inner summand is bounded by and supported on those that are not divisible by any small primes. From Proposition 5 and Mertens’ theorem we conclude that

and thus

as desired.

We also have an estimate:

Lemma 8For sufficiently large , we havefor all .

*Proof:* From Euler products, Mertens’ theorem, and (1) we have

as desired.

Applying Hölder’s inequality, we conclude that

Setting and we obtain the claim.

Filed under: expository, math.NT Tagged: Halasz's theorem, pretentious multiplicative functions ]]>

Theorem 1 (Central limit theorem)Let be iid copies of a real random variable of mean and variance , and write . Then, for any fixed , we have

This is however not the end of the matter; there are many variants, refinements, and generalisations of the central limit theorem, and the purpose of this set of notes is to present a small sample of these variants.

First of all, the above theorem does not quantify the *rate* of convergence in (1). We have already addressed this issue to some extent with the Berry-Esséen theorem, which roughly speaking gives a convergence rate of uniformly in if we assume that has finite third moment. However there are still some quantitative versions of (1) which are not addressed by the Berry-Esséen theorem. For instance one may be interested in bounding the *large deviation probabilities*

in the setting where grows with . Chebyshev’s inequality gives an upper bound of for this quantity, but one can often do much better than this in practice. For instance, the central limit theorem (1) suggests that this probability should be bounded by something like ; however, this theorem only kicks in when is very large compared with . For instance, if one uses the Berry-Esséen theorem, one would need as large as or so to reach the desired bound of , even under the assumption of finite third moment. Basically, the issue is that convergence-in-distribution results, such as the central limit theorem, only really control the *typical* behaviour of statistics in ; they are much less effective at controlling the very rare *outlier* events in which the statistic strays far from its typical behaviour. Fortunately, there are large deviation inequalities (or *concentration of measure inequalities*) that do provide exponential type bounds for quantities such as (2), which are valid for both small and large values of . A basic example of this is the Chernoff bound that made an appearance in Exercise 47 of Notes 4; here we give some further basic inequalities of this type, including versions of the Bennett and Hoeffding inequalities.

In the other direction, we can also look at the fine scale behaviour of the sums by trying to control probabilities such as

where is now bounded (but can grow with ). The central limit theorem predicts that this quantity should be roughly , but even if one is able to invoke the Berry-Esséen theorem, one cannot quite see this main term because it is dominated by the error term in Berry-Esséen. There is good reason for this: if for instance takes integer values, then also takes integer values, and can vanish when is less than and is slightly larger than an integer. However, this turns out to essentially be the only obstruction; if does not lie in a lattice such as , then we can establish a *local limit theorem* controlling (3), and when does take values in a lattice like , there is a discrete local limit theorem that controls probabilities such as . Both of these limit theorems will be proven by the Fourier-analytic method used in the previous set of notes.

We also discuss other limit theorems in which the limiting distribution is something other than the normal distribution. Perhaps the most common example of these theorems is the Poisson limit theorems, in which one sums a large number of indicator variables (or approximate indicator variables), each of which is rarely non-zero, but which collectively add up to a random variable of medium-sized mean. In this case, it turns out that the limiting distribution should be a Poisson random variable; this again is an easy application of the Fourier method. Finally, we briefly discuss limit theorems for other stable laws than the normal distribution, which are suitable for summing random variables of infinite variance, such as the Cauchy distribution.

Finally, we mention a very important class of generalisations to the CLT (and to the variants of the CLT discussed in this post), in which the hypothesis of joint independence between the variables is relaxed, for instance one could assume only that the form a martingale. Many (though not all) of the proofs of the CLT extend to these more general settings, and this turns out to be important for many applications in which one does not expect joint independence. However, we will not discuss these generalisations in this course, as they are better suited for subsequent courses in this series when the theory of martingales, conditional expectation, and related tools are developed.

** — 1. Large deviation inequalities — **

We now look at some upper bounds for the large deviation probability (2). To get some intuition as to what kinds of bounds one can expect, we first consider some examples. First suppose that has the standard normal distribution, then , , and has the distribution of , so that has the distribution of . We thus have

which on using the inequality leads to the bound

Next, we consider the example when is a Bernoulli random variable drawn uniformly at random from . Then , , and has the standard binomial distribution on , thus . By symmetry, we then have

We recall Stirling’s formula, which we write crudely as

as , where denotes a quantity with as (and similarly for other uses of the notation in the sequel). If and is bounded away from zero and one, we then have the asymptotic

where is the entropy function

(compare with Exercise 17 of Notes 3). One can check that is decreasing for , and so one can compute that

as for any fixed . To compare this with (4), observe from Taylor expansion that

as .

Finally, consider the example where takes values in with and for some small , thus and . We have , and hence

with

Here, we see that the large deviation probability is somewhat larger than the gaussian prediction of . Instead, the exponent is approximately related to and by the formula

We now give a general large deviations inequality that is consistent with the above examples.

Proposition 2 (Cheap Bennett inequality)Let , and let be independent random variables, each of which takes values in an interval of length at most . Write , and write for the mean of . Let be such that has variance at most . Then for any , we have

There is more precise form of this inequality known as Bennett’s inequality, but we will not prove it here.

The first term in the minimum dominates when , and the second term dominates when . Sometimes it is convenient to weaken the estimate by discarding the logarithmic factor, leading to

(possibly with a slightly different choice of ); thus we have Gaussian type large deviation estimates for as large as , and (slightly better than) exponential decay after that.

In the case when are iid copies of a random variable of mean and variance taking values in an interval of length , we have and , and the above inequality simplifies slightly to

*Proof:* We first begin with some quick reductions. Firstly, by dividing the (and , , and ) by , we may normalise ; by subtracting the mean from each of the , we may assume that the have mean zero, so that as well. We also write for the variance of each , so that . Our task is to show that

for all . We will just prove the upper tail bound

the lower tail bound then follows by replacing all with their negations , and the claim then follows by summing the two estimates.

We use the “exponential moment method”, previously seen in proving the Chernoff bound (Exercise 47 of Notes 4), in which one uses the exponential moment generating function of . On the one hand, from Markov’s inequality one has

for any real parameter . On the other hand, from the joint independence of the one has

Since the take values in an interval of length at most and have mean zero, we have and so . This leads to the Taylor expansion

so on taking expectations we have

Putting all this together, we conclude that

If (say), one can then set to be a small multiple of to obtain a bound of the form

If instead , one can set to be a small multiple of to obtain a bound of the form

In either case, the claim follows.

The following variant of the above proposition is also useful, in which we get a simpler bound at the expense of worsening the quantity slightly:

Proposition 3 (Cheap Hoeffding inequality)Let be independent random variables, with each taking values in an interval with . Write , and write for the mean of , and write

In fact one can take , a fact known as Hoeffding’s inequality; see Exercise 6 below for a special case of this.

*Proof:* We again normalise the to have mean zero, so that . We then have for each , so by Taylor expansion we have for any real that

and thus

Multiplying in , we then have

and one can now repeat the previous arguments (but without the factor to deal with).

Remark 4In the above examples, the underlying random variable was assumed to either be restricted to an interval, or to be subgaussian. This type of hypothesis is necessary if one wishes to have estimates on (2) that are similarly subgaussian. For instance, suppose has a zeta distributionfor some and all natural numbers , where . One can check that this distribution has finite mean and variance for . On the other hand, since we trivially have , we have the crude lower bound

which shows that in this case the expression (2) only decays at a polynomial rate in rather than an exponential or subgaussian rate.

Exercise 5 (Khintchine inequality)Let be iid copies of a Bernoulli random variable drawn uniformly from .

- (i) For any non-negative reals and any , show that
for some constant depending only on . When , show that one can take and equality holds.

- (ii) With the hypotheses in (i), obtain the matching lower bound
for some depending only on . (

Hint:use (i) and Hölder’s inequality.)- (iii) For any and any functions on a measure space , show that
and

with the same constants as in (i), (ii). When , show that one can take and equality holds.

- (iv) (Marcinkiewicz-Zygmund theorem) The Khintchine inequality is very useful in real analysis; we give one example here. Let be measure spaces, let , and suppose is a linear operator obeying the bound
for all and some finite . Show that for any finite sequence , one has the bound

for some constant depending only on . (

Hint:test against a random sum .)- (v) By using gaussian sums in place of random signs, show that one can take the constant in (iv) to be one. (For simplicity, let us take the functions in to be real valued.)

In this set of notes we have not focused on getting explicit constants in the large deviation inequalities, but it is not too difficult to do so with a little extra work. We give just one example here:

Exercise 6Let be iid copies of a Bernoulli random variable drawn uniformly from .

- (i) Show that for any real . (
Hint:expand both sides as an infinite Taylor series in .)- (ii) Show that for any real numbers and any , we have
(Note that this is consistent with (6) with .)

There are many further large deviation inequalities than the ones presented here. For instance, the Azuma-Hoeffding inequality gives Hoeffding-type bounds when the random variables are not assumed to be jointly independent, but are instead required to form a martingale. Concentration of measure inequalities such as McDiarmid’s inequality handle the situation in which the sum is replaced by a more nonlinear function of the input variables . There are also a number of refinements of the Chernoff estimate from the previous notes, that are collectively referred to as “Chernoff bounds“. The Bernstein inequalities handle situations in which the underlying random variable is not bounded, but enjoys good moment bounds. See this previous blog post for these inequalities and some further discussion. Last, but certainly not least, there is an extensively developed theory of large deviations which is focused on the precise exponent in the exponential decay rate for tail probabilities such as (2) when is very large (of the order of ); there is also a complementary theory of *moderate deviations* that gives precise estimates in the regime where is much larger than one, but much less than , for which we generally expect gaussian behaviour rather than exponentially decaying bounds. These topics are beyond the scope of this course.

** — 2. Local limit theorems — **

Let be iid copies of a random variable of mean and variance , and write . On the one hand, the central limit theorem tells us that should behave like the normal distribution , which has probability density function . On the other hand, if is discrete, then must also be discrete. For instance, if takes values in the integers , then takes values in the integers as well. In this case, we would expect (much as we expect a Riemann sum to approximate an integral) the probability distribution of to behave like the probability density function predicted by the central limit theorem, thus we expect

for integer . This is not a direct consequence of the central limit theorem (which does not distinguish between continuous or discrete random variables ), and in any case is not true in some cases: if is restricted to an infinite subprogression of for some and integer , then is similarly restricted to the infinite subprogression , so that (7) totally fails when is outside of (and when does lie in , one would now expect the left-hand side of (7) to be about times larger than the right-hand side, to keep the total probability close to ). However, this turns out to be the only obstruction:

Theorem 7 (Discrete local limit theorem)Let be iid copies of an integer-valued random variable of mean and variance . Suppose furthermore that there is no infinite subprogression of with for which takes values almost surely in . Then one hasfor all and all integers , where the error term is uniform in . In other words, we have

as .

Note for comparison that the Berry-Esséen theorem (writing as, say, ) would give (assuming finite third moment) an error term of instead of , which would overwhelm the main term which is also of size .

*Proof:* Unlike previous arguments, we do not have the luxury here use an affine change of variables to normalise to mean zero and variance one, as this would disrupt the hypothesis that takes values in .

Fix and . Since and are integers, we have the Fourier identity

which upon taking expectations and using Fubini’s theorem gives

where is the characteristic function of . Expanding and noting that are iid copies of , we have

It will be convenient to make the change of variables , to obtain

As in the Fourier-analytic proof of the central limit theorem, we have

as , so by Taylor expansion we have

as for any fixed . This suggests (but does not yet prove) that

A standard Fourier-analytic calculation gives

so it will now suffice to establish that

uniformly in . From dominated convergence we have

so by the triangle inequality, it suffices to show that

This will follow from (9) and the dominated convergence theorem, as soon as we can dominate the integrands by an absolutely integrable function.

From (8), there is an such that

for all , and hence

for . This gives the required domination in the region , so it remains to handle the region .

From the triangle inequality we have for all . Actually we have the stronger bound for . Indeed, if for some such , this would imply that is a deterministic constant, which means that takes values in for some real , which implies that takes values in ; since also takes values in , this would place either in a singleton set or in an infinite subprogression of (depending on whether and are rational), a contradiction. As is continuous and the region is compact, there exists such that for all . This allows us to dominate by for , which is in turn bounded by for some independent of , giving the required domination.

Of course, if the random variable in Theorem 7 did take values almost surely in some subprogression , then either is almost surely constant, or there is a minimal progression with this property (since must divide the difference of any two integers that attains with positive probability). One can then make the affine change of variables (modifying and appropriately) and apply the above theorem to obtain a similar local limit theorem, which we will not write here. For instance, if is the uniform distribution on , then this argument gives

when is an integer of the same parity as (of course, will vanish otherwise). A further affine change of variables handles the case when is not integer valued, but takes values in some other lattice , where and are now real-valued.

We can complement these local limit theorems with the following result that handles the non-lattice case:

Theorem 8 (Continuous local limit theorem)Let be iid copies of an real-valued random variable of mean and variance . Suppose furthermore that there is no infinite progression with real for which takes values almost surely in . Then for any , one hasfor all and all , where the error term is uniform in (but may depend on ).

Equivalently, if we let be a normal random variable with mean and variance , then

uniformly in . Again, this can be compared with the Berry-Esséen theorem, which (assuming finite third moment) has an error term of which is uniform in both and .

*Proof:* Unlike the discrete case, we have the luxury here of normalising and , and we shall now do so.

We first observe that it will suffice to show that

whenever is a Schwartz function whose Fourier transform

is compactly supported, and where the error can depend on but is uniform in . Indeed, if this bound (10) holds, then (after replacing by to make it positive on some interval, and then rescaling) we obtain a bound of the form

for any and , where the term can depend on but not on . Then, by convolving by an approximation to the identity of some width much smaller than with compactly supported Fourier transform, applying (10) to the resulting function, and using (11) to control the error between that function and , we see that

uniformly in , where the term can depend on and but is uniform in . Letting tend slowly to zero, we obtain the claim.

It remains to establish (10). We adapt the argument from the discrete case. By the Fourier inversion formula we may write

By Fubini’s theorem as before, we thus have

and similarly

so it suffices by the triangle inequality and the boundedness and compact support of to show that

for any fixed (where the term can depend on ). We have and , so by making the change of variables , we now need to show that

as . But this follows from the argument used to handle the discrete case.

** — 3. The Poisson central limit theorem — **

The central limit theorem (after normalising the random variables to have mean zero) studies the fluctuations of sums where each individual term is quite small (typically of size ). Now we consider a variant situation, in which one considers a sum of random variables which are *usually* zero, but occasionally equal to a larger value such as . (This situation arises in many real-life situations when compiling aggregate statistics on rare events, e.g. the number of car crashes in a short period of time.) In these cases, one can get a different distribution than the gaussian distribution, namely a Poisson distribution with some intensity – that is to say, a random variable taking values in the non-negative integers with probability distribution

One can check that this distribution has mean and variance .

Theorem 9 (Poisson central limit theorem)Let be a triangular array of real random variables, where for each , the variables are jointly independent. Assume furthermore that

- (i) ( mostly ) One has as .
- (ii) ( rarely ) One has as .
- (iii) (Convergent expectation) One has as for some .
Then the random variables converge in distribution to a Poisson random variable of intensity .

*Proof:* From hypothesis (i) and the union bound we see that for each , we have that all of the lie in with probability as . Thus, if we replace each by the restriction , the random variable is only modified on an event of probability , which does not affect distributional limits (Slutsky’s theorem). Thus, we may assume without loss of generality that the take values in .

By Exercise 20 of Notes 4, a Poisson random variable of intensity has characteristic function . Applying the Lévy convergence theorem (Theorem 27 of Notes 4), we conclude that it suffices to show that

as for any fixed .

Fix . By the independence of the , we may write

Since only takes on the values and , we can write

where . By hypothesis (ii), we have , so by using a branch of the complex logarithm that is analytic near , we can write

By Taylor expansion we have

and hence by (ii), (iii)

as , and the claim follows.

Exercise 10Establish the conclusion of Theorem 9 directly from explicit computation of the probabilities in the case when each takes values in with for some fixed .

The Poisson central limit theorem can be viewed as a degenerate limit of the central limit theorem, as seen by the next two exercises.

Exercise 11Suppose we replace the hypothesis (iii) in Theorem 9 with the alternative hypothesis that the quantities go to infinity as , while leaving hypotheses (i) and (ii) unchanged. Show that converges in distribution to the normal distribution .

Exercise 12For each , let be a Poisson random variable with intensity . Show that as , the random variables converge in distribution to the normal distribution . Discuss how this is consistent with Theorem 9 and the previous exercise.

** — 4. Stable laws — **

Let be a real random variable. We say that has a *stable law* or a *stable distribution* if for any positive reals , there exists a positive real and a real such that whenever are iid copies of . In terms of the characteristic function of , we see that has a stable law if for any positive reals , there exist a positive real and a real for which we have the functional equation

for all real .

For instance, a normally distributed variable is stable thanks to Lemma 12 of Notes 4; one can also see this from the characteristic function . A Cauchy distribution , with probability density can also be seen to be stable, as is most easily seen from the characteristic function . As a more degenerate example, any deterministic random variable is stable. It is possible (though somewhat tedious) to completely classify all the stable distributions, see for instance the Wikipedia entry on these laws for the full classification.

If is stable, and are iid copies of , then by iterating the stable law hypothesis we see that the sums are all equal in distribution to some affine rescaling of . For instance, we have for some , and a routine induction then shows that

for all natural numbers (with the understanding that when ). In particular, the random variables all have the same distribution as .

More generally, given two real random variables and , we say that is in the *basin of attraction* for if, whenever are iid copies of and , there exist constants and such that converges in distribution to . Thus, any stable law is in its own basin of attraction, while the central limit theorem asserts that any random variable of finite variance is in the basin of attraction of a normal distribution. One can check that every random variable lies in the basin of attraction of a deterministic random variable such as , simply by letting go to infinity rapidly enough. To avoid this degenerate case, we now restrict to laws that are *non-degenerate*, in the sense that they are not almost surely constant. Then we have the following useful technical lemma:

Proposition 13 (Convergence of types)Let be a sequence of real random variables converging in distribution to a non-degenerate limit . Let and be real numbers such that converges in distribution to a non-degenerate limit . Then and converge to some finite limits respectively, and .

*Proof:* Suppose first that goes to zero. The sequence converges in distribution, hence is tight, hence converges in probability to zero. In particular, if is an independent copy of , then converges in probability to zero; but also converges in distribution to where is an independent copy of , and is not almost surely zero since is non-degenerate. This is a contradiction. Similarly if has any subsequence that goes to zero. We conclude that is bounded away from zero. Rewriting as and reversing the roles of and , we conclude also that is bounded away from zero, thus is bounded.

Since is tight and is bounded, is tight; since is also tight, this implies that is tight, that is to say is bounded.

Let be a limit point of the . By Slutsky’s theorem, a subsequence of the then converges in distribution to , thus . If the limit point is unique then we are done, so suppose there are two limit points , . Thus , which on rearranging gives for some and real with .

If then on iteration we have for any natural number , which clearly leads to a contradiction as since . If then iteration gives for any natural number , which on passing to the limit in distribution as gives , again a contradiction. If then we rewrite as and again obtain a contradiction, and the claim follows.

One can use this proposition to verify that basins of attraction of genuinely distinct laws are disjoint:

Exercise 14Let and be non-degenerate real random variables. Suppose that a random variable lies in the basin of attraction of both and . Then there exist and real such that .

If lies in the basin of attraction for a non-degenerate law , then converges in distribution to ; since is equal in distribution to the sum of two iid copies of , we see that converges in distribution to the sum of two iid copies of . On the other hand, converges in distribution to . Using Proposition 13 we conclude that for some and real . One can go further and conclude that in fact has a stable law; see the following exercise. Thus stable laws are the only laws that have a non-empty basin of attraction.

Exercise 15Let lie in the basin of attraction for a non-degenerate law .

- (i) Show that for any iid copies of , there exists a unique and such that . Also show that for all natural numbers .
- (ii) Show that the are strictly increasing, with for all natural numbers . (
Hint:study the absolute value of the characteristic function, using the non-degeneracy of to ensure that this absolute value is usually strictly less than one.) Also show that for all natural numbers .- (iii) Show that there exists such that for all . (
Hint:first show that is a Cauchy sequence in .)- (iv) If , and are iid copies of , show that for all natural numbers and some bounded real . Then show that has a stable law in this case.
- (v) If , show that for some real and all . Then show that has a stable law in this case.

Exercise 16 (Classification of stable laws)Let be a non-degenerate stable law, then lies in its own basin of attraction and one can then define as in the preceding exercise.

- (i) If , and is as in part (v) of the preceding exercise, show that for all and . Then show that
for some real . (One can use the identity to restrict attention to the case of positive .)

- (ii) Now suppose . Show that for all (where the implied constant in the notation is allowed to depend on ). Conclude that for all .
- (iii) We continue to assume . Show that for some real number . (
Hint:first show this when is a power of a fixed natural number (with possibly depending on ). Then use the estimates from part (ii) to show that does not actually depend on . (One may need to invoke the Dirichlet approximation theorem to show that for any given , one can find a power of that is somewhat close to a power of .)- (iv) We continue to assume . Show that for all and . Then show that
for all and some real .

It is also possible to determine which choices of parameters are actually achievable by some random variable , but we will not do so here.

It is possible to associate a central limit theorem to each stable law, which precisely determines their basin of attraction. We will not do this in full generality, but just illustrate the situation for the Cauchy distribution.

Exercise 17Let be a real random variable which is symmetric (that is, has the same distribution as ) and obeys the distribution identityfor all , where is a function which is

slowly varyingin the sense that as for all .

- (i) Show that
as , where denotes a quantity such that as . (You may need to establish the identity , which can be done by contour integration.)

- (ii) Let be iid copies of . Show that converges in distribution to a copy of the standard Cauchy distribution (i.e., to a random variable with probability density function ).

Filed under: 275A - probability theory, math.PR, Uncategorized Tagged: central limit theorem, large deviation inequality, local limit theorems, stable laws ]]>

between consecutive primes up to , in which we improved the Rankin bound of

to

for large (where we use the abbreviations , , and ). Here, we obtain an analogous result for the quantity

which measures how far apart the gaps between chains of consecutive primes can be. Our main result is

whenever is sufficiently large depending on , with the implied constant here absolute (and effective). The factor of is inherent to the method, and related to the basic probabilistic fact that if one selects numbers at random from the unit interval , then one expects the minimum gap between adjacent numbers to be about (i.e. smaller than the mean spacing of by an additional factor of ).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

for an infinite sequence of going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval which contained a block of at least primes, each separated from each other by at least (ultimately, will be something like and something like ). One can do this by the probabilistic method: pick to be a random large natural number (with the precise distribution to be chosen later), and try to lower bound the probability that the interval contains at least primes, no two of which are within of each other.

By carefully choosing the residue class of with respect to small primes, one can eliminate several of the from consideration of being prime immediately. For instance, if is chosen to be large and even, then the with even have no chance of being prime and can thus be eliminated; similarly if is large and odd, then cannot be prime for any odd . Using the methods of our previous paper, we can find a residue class (where is a product of a large number of primes) such that, if one chooses to be a large random element of (that is, for some large random integer ), then the set of shifts for which still has a chance of being prime has size comparable to something like ; furthermore this set is fairly well distributed in in the sense that it does not concentrate too strongly in any short subinterval of . The main new difficulty, not present in the previous paper, is to get *lower* bounds on the size of in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift , is going to be prime with probability comparable to , so one expects about primes in the set . An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct , the probability that and are both prime is . Using this and some routine second moment calculations, one can then show that with large probability, the set will indeed contain about primes, no two of which are closer than to each other; with no other numbers in this interval being prime, this gives a lower bound on .

Filed under: math.NT, paper Tagged: James Maynard, Kevin Ford, Maier matrix method, prime gaps ]]>

I never met or communicated with Roth personally, but was certainly influenced by his work; he wrote relatively few papers, but they tended to have outsized impact. For instance, he was one of the key people (together with Bombieri) to work on simplifying and generalising the large sieve, taking it from the technically formidable original formulation of Linnik and Rényi to the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper proving what is now known as Roth’s theorem on arithmetic progressions:

Theorem 1 (Roth’s theorem on arithmetic progressions)Let be a set of natural numbers of positive upper density (thus ). Then contains infinitely many arithmetic progressions of length three (with non-zero of course).

At the heart of Roth’s elegant argument was the following (surprising at the time) dichotomy: if had some moderately large density within some arithmetic progression , either one could use Fourier-analytic methods to detect the presence of an arithmetic progression of length three inside , or else one could locate a long subprogression of on which had increased density. Iterating this dichotomy by an argument now known as the *density increment argument*, one eventually obtains Roth’s theorem, no matter which side of the dichotomy actually holds. This argument (and the many descendants of it), based on various “dichotomies between structure and randomness”, became essential in many other results of this type, most famously perhaps in Szemerédi’s proof of his celebrated theorem on arithmetic progressions that generalised Roth’s theorem to progressions of arbitrary length. More recently, my recent work on the Chowla and Elliott conjectures that was a crucial component of the solution of the Erdös discrepancy problem, relies on an *entropy decrement argument* which was directly inspired by the density increment argument of Roth.

The Erdös discrepancy problem also is connected with another well known theorem of Roth:

Theorem 2 (Roth’s discrepancy theorem for arithmetic progressions)Let be a sequence in . Then there exists an arithmetic progression in with positive such thatfor an absolute constant .

In fact, Roth proved a stronger estimate regarding mean square discrepancy, which I am not writing down here; as with the Roth theorem in arithmetic progressions, his proof was short and Fourier-analytic in nature (although non-Fourier-analytic proofs have since been found, for instance the semidefinite programming proof of Lovasz). The exponent is known to be sharp (a result of Matousek and Spencer).

As a particular corollary of the above theorem, for an infinite sequence of signs, the sums are unbounded in . The Erdös discrepancy problem asks whether the same statement holds when is restricted to be zero. (Roth also established discrepancy theorems for other sets, such as rectangles, which will not be discussed here.)

Finally, one has to mention Roth’s most famous result, cited for instance in his Fields medal citation:

Theorem 3 (Roth’s theorem on Diophantine approximation)Let be an irrational algebraic number. Then for any there is a quantity such that

From the Dirichlet approximation theorem (or from the theory of continued fractions) we know that the exponent in the denominator cannot be reduced to or below. A classical and easy theorem of Liouville gives the claim with the exponent replaced by the degree of the algebraic number ; work of Thue and Siegel reduced this exponent, but Roth was the one who obtained the near-optimal result. An important point is that the constant is *ineffective* – it is a major open problem in Diophantine approximation to produce any bound significantly stronger than Liouville’s theorem with effective constants. This is because the proof of Roth’s theorem does not exclude any *single* rational from being close to , but instead very ingeniously shows that one cannot have *two* different rationals , that are unusually close to , even when the denominators are very different in size. (I refer to this sort of argument as a “dueling conspiracies” argument; they are strangely prevalent throughout analytic number theory.)

Filed under: math.NT, obituary Tagged: Diophantine approximation, Klaus Roth, large sieve, randomness, structure ]]>