Given any finite collection of elements in some Banach space , the triangle inequality tells us that
However, when the all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if is a Hilbert space and the are mutually orthogonal, we have the Pythagorean theorem
for any finite collection in any Banach space , where denotes the cardinality of . Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of or so over the trivial bound coming from the triangle inequality.
More generally, let us somewhat informally say that a collection exhibits decoupling in if one has the Pythagorean-like inequality
for any , thus one obtains almost the full square root cancellation in the norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as . In spaces for one usually does not expect this sort of decoupling; for instance, if the are disjointly supported one has
and the right-hand side can be much larger than when . At the opposite extreme, one usually does not expect to get decoupling in , since one could conceivably align the to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in becomes sharp.
However, in some cases one can get decoupling for certain . For instance, suppose we are in , and that are bi-orthogonal in the sense that the products for are pairwise orthogonal in . Then we have
giving decoupling in . (Similarly if each of the is orthogonal to all but of the other .) A similar argument also gives decoupling when one has tri-orthogonality (with the mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed if one multiplies each of the by an independent random sign .
In recent years, Bourgain and Demeter have been establishing decoupling theorems in spaces for various key exponents of , in the “restriction theory” setting in which the are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve parameterised by the polynomial curve
For any ball in , let denote the weight
which should be viewed as a smoothed out version of the indicator function of . In particular, the space can be viewed as a smoothed out version of the space . For future reference we observe a fundamental self-similarity of the curve : any arc in this curve, with a compact interval, is affinely equivalent to the standard arc .
of a finite Borel measure on the arc , where . Then the exhibit decoupling in for any ball of radius .
Orthogonality gives the case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in up to the range ; the point here is that we can now get a much larger value of . The case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent (and the radius ) is best possible, as can be seen by the following basic example. If
where is a bump function adapted to , then standard Fourier-analytic computations show that will be comparable to on a rectangular box of dimensions (and thus volume ) centred at the origin, and exhibit decay away from this box, with comparable to
On the other hand, is comparable to on a ball of radius comparable to centred at the origin, so is , which is just barely consistent with decoupling. This calculation shows that decoupling will fail if is replaced by any larger exponent, and also if the radius of the ball is reduced to be significantly smaller than .
This theorem has the following consequence of importance in analytic number theory:
Corollary 2 (Vinogradov main conjecture) Let be integers, and let . Then
Proof: By the Hölder inequality (and the trivial bound of for the exponential sum), it suffices to treat the critical case , that is to say to show that
We can rescale this as
As the integrand is periodic along the lattice , this is equivalent to
The left-hand side may be bounded by , where and . Since
the claim now follows from the decoupling theorem and a brief calculation.
Using the Plancherel formula, one may equivalently (when is an integer) write the Vinogradov main conjecture in terms of solutions to the system of equations
but we will not use this formulation here.
A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for , or for and either below or above , with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of fifth powers (the previous best result required fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set that is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)
Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.
I thank Jean Bourgain and Andrew Granville for helpful discussions.
— 1. Initial reductions —
The claim will proceed by an induction on dimension, thus we assume henceforth that (the case being immediate from the Pythagorean theorem) and that Theorem 1 has already been proven for smaller values of . This has the following nice consequence:
Proposition 3 (Lower dimensional decoupling) Let the notation be as in Theorem 1. Suppose also that , and that Theorem 1 has already been proven for all smaller values of . Then for any , the exhibits decoupling in for any ball of radius .
Proof: (Sketch) We slice the ball into -dimensional slices parallel to the first coordinate directions. On each slice, the can be interpreted as functions on whose Fourier transform lie on the curve , where . Applying Theorem 1 with replaced by , and then integrating over all slices using Fubini’s theorem and Minkowski’s inequality (to interchange the norm and the square function), we obtain the claim.
The first step, needed for technical inductive purposes, is to work at an exponent slightly below . More precisely, given any and , let denote the assertion that
Proposition 4 Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , then holds for all .
The reason for this is that the functions and all have Fourier transform supported on a ball of radius , and so there is a Bernstein-type inequality that lets one replace the norm of either function by the norm, losing a power of that goes to zero as goes to . (See Corollary 6.2 and Lemma 8.2 of the Bourgain-Demeter-Guth paper for more details of this.)
Using the trivial bound (1) we see that holds for large (e.g. ). To reduce , it suffices to prove the following inductive claim.
Proposition 5 (Inductive claim) Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , and holds for some , then holds for some .
Henceforth we fix as in Proposition 5. We fix and use to denote any quantity that goes to zero as , keeping fixed. Then the hypothesis reads
The next step is to reduce matters to a “multilinear” version of the above estimate, in order to exploit a multilinear Kakeya estimate at a later stage of the argument. Let be a large integer depending only on (actually Bourgain, Demeter, and Guth choose ). It turns out that it will suffice to prove the multilinear version
We have the following nice equivalence (essentially due to Bourgain and Guth, building upon an earlier “bilinear equivalence” result of Vargas, Vega, and myself, and discussed in this previous blog post):
for any fixed integer (with the implied constant in the notation independent of ); by choosing large enough one can then prove by an inductive argument.
We partition the intervals in (2) into classes of consecutive intervals, so that can be expressed as where . Observe that for any , one either has
for some (i.e. one of the dominates the sum), or else one has
for some with the transversality condition . This leads to the pointwise inequality
Bounding the supremum by and then taking norms and using (3), we conclude that
On the other hand, applying an affine rescaling to (4) one sees that
and the claim follows. (A more detailed version of this argument may be found in Theorem 4.1 of this paper of Bourgain and Demeter.)
It thus suffices to show (3).
The next step is to set up some intermediate scales between and , in order to run an “induction on scales” argument. For any scale , any exponent , and any function , let denote the local average
where denotes the volume of (one could also use the equivalent quantity here if desired). For any exponents , , and (independent of ), let denote the least exponent for which one has the local decoupling inequality
for as in (3), where the -length intervals in have been covered by a family of finitely overlapping intervals of length , and . It is then not difficult to see that the estimate (3) is equivalent to the inequality
(basically because when , there is essentially only one for each , and is basically ; also; the averaging is essentially the identity when since all the and here have Fourier support on a ball of radius ). To put it another way, our task is now to show that
On the other hand, one can establish the following inequalities concerning the quantities , arranged roughly in increasing order of difficulty to prove.
- (i) (Hölder) The quantity is convex in , and monotone nondecreasing in .
- (ii) (Minkowski) If , then is monotone non-decreasing in .
- (iii) (Stability) One has . (In fact, is Lipschitz in uniformly in , but we will not need this.)
- (iv) (Rescaled decoupling hypothesis) If and , then one has .
- (v) (Lower dimensional decoupling) If and , then .
- (vi) (Multilinear Kakeya) If and , then .
We sketch the proof of the various parts of this proposition in later sections. For now, let us show how these properties imply the claim (6). In the paper of Bourgain, Demeter, and Guth, the above properties were iterated along a certain “tree” of parameters , relying in (v) to increase the parameter (which measures the amount of decoupling) and (vi) to “inflate” or increase the parameter (which measures the spatial scale at which decoupling has been obtained), and (i) to reconcile the different choices of appearing in (v) and (vi), with the remaining properties (ii), (iii), (iv) used to control various “boundary terms” arising from this tree iteration. Here, we will present an essentially equivalent “Bellman function” formulation of the argument which replaces this iteration by a carefully (but rather unmotivatedly) chosen inductive claim. More precisely, let be a small quantity (depending only on and ) to be chosen later. For any , let denote the claim that for every , and for all sufficiently small , one has the inequality
From Proposition 7 (i), (ii), (iv), we see that holds for some small . We will shortly establish the implication
for some independent of ; this implies upon iteration that holds for arbitrarily large values of . Applying (9) with for a sufficiently large and a sufficiently small , and combining with Proposition 7(iii), we obtain the claim (6).
By Proposition 7(i) it suffices to show this for the extreme values of , thus we wish to show that
We begin with (13). The case of this estimate is
But since , we see that if is small enough, so the right-hand side of (16) is greater than and the claim follows from Proposition 7(iv) (with a little bit of room to spare). Now we look at the cases of (13). By Proposition 7(vi), we have
For close to , lies between and , so from (7) one has
Since , one has
for small enough depending on , and (13) follows (if is small enough depending on but not on ).
The same argument applied with gives
Since , we thus have
In the case, this gives
and hence (after simplifying)
which gives (14) for small enough (depending on , but not on ).
— 2. Rescaled decoupling —
The claims (i), (ii), (iii) of Proposition 7 are routine applications of the Hölder and Minkowski inequalities (and also the Bernstein inequality, in the case of (iii)); we will focus on the more interesting claims (iv), (v), (vi).
Here we establish (iv). The main geometric point exploited here is that any segment of the curve is affinely equivalent to itself, with the key factor of in the bound coming from this affine rescaling.
Using the definition (5) of , we see that we need to show that
for balls of radius . By Hölder’s inequality, it suffices to show that
for each . By Minkowski’s inequality (and the fact that ), the left-hand side is at most
so it suffices to show that
for each . From Fubini’s theorem one has
so we reduce to showing that
But this follows by applying an affine rescaling to map to , and then using the hypothesis with replaced by . (The ball gets distorted into an ellipsoid, but one can check that this ellipsoid can be covered efficiently by finitely overlapping balls of radius , and so one can close the argument using the triangle inequality.)
— 3. Lower dimensional decoupling —
Now we establish (v). Here, the geometric point is the one implicitly used in Proposition 3, namely that the -dimensional curve projects down to the -dimensional curve for any .
for balls of radius . It will suffice to show the pointwise estimate
for any , or equivalently that
where . Clearly this will follow if we have
for each . Covering the intervals in by those in , it suffices to show that
for each . But this follows from Proposition 3.
— 4. Multidimensional Kakeya —
Finally, we establish (vi), which is the most substantial component of Proposition 7, and the only component which truly takes advantage of the reduction to the multilinear setting. Let and be such that . From (5), it suffices to show that
for balls of radius . By averaging, it suffices to establish the bound
for balls of radius . If we write , the right-hand side simplifies to
so it suffices to show that
At this point it is convenient to perform a dyadic pigeonholing (giving up a factor of ) to normalise, for each , all of the quantities to be of comparable size, after reducing the sets so some appropriate subset . (The contribution of those for which this quantity is less than, say, of the maximal value, can be safely discarded by trivial estimates.) By homogeneity we may then normalise
for all surviving , so the estimate now becomes
Since is close to , is less than , so we can estimate
and so it suffices to show that
or, on raising to the power ,
Localising to balls of radius , it suffices to show that
The arc is contained in a box of dimensions roughly , so by the uncertainty principle is essentially constant along boxes of dimensions (this can be made precise by standard methods, see e.g. the discussion in the proof of Theorem 5.6 of Bourgain-Demeter-Guth, or my general discussion on the uncertainty principle in this previous blog post). This implies that , when restricted to , is essentially constant on “plates”, defined as the intersection of with slabs that have dimensions of length and the remaining dimensions infinite (and thus restricted to be of length about after restriction to ). Furthermore, as varies (and is constrained to be in , the orientation of these slabs varies in a suitably “transverse” fashion (the precise definition of this is a little technical, but can be verified for ; see the BDG paper for details). After rescaling, the claim then follows from the following proposition:
Proposition 8 (Multilinear Kakeya) For , let be a collection of “plates” that have dimensions of length , and dimensions that are infinite, and for each let be a non-negative number. Assume that the families of plates obey a suitable transversality condition. Then
for any ball of radius .
The exponent here is natural, as can be seen by considering the example where each consists of about parallel disjoint plates passing through , with for all such plates.
For (where the plates now become tubes), this result was first obtained by Bennett, Carbery, and myself using heat kernel methods, with a rather different proof (also capturing the endpoint case) later given using algebraic topological methods by Guth (as discussed in this previous post. More recently, a very short and elementary proof of this theorem was given by Guth, which was initially given for but extends to general . The scheme of the proof can be described as follows.
- When all the plates in a each family are parallel, the claim follows from the Loomis-Whitney inequality (when ) or a more general Brascamp-Lieb inequality of Bennett, Carbery, Christ, and myself (for general ). These inequalities can be proven by a repeated applications of the Hölder inequality and Fubini’s theorem.
- Perturbing this, we can obtain the proposition with a loss of for any and , provided that the plates in each are within of being parallel, and is sufficiently small depending on and . (For the case of general , this requires some uniformity in the result of Bennett, Carbery, Christ, and myself, which can be obtained by hand in the specific case of interest here, but was recently established in general by Bennett, Bez, Flock, and Lee.
- A standard “induction on scales” argument shows that if the proposition is true at scale with some loss , then it is also true at scale with loss . Iterating this, we see that we can obtain the proposition with a loss of uniformly for all , provided that the plates are within of being parallel and is sufficiently small depending now only on (and not on ).
- A finite partition of unity then suffices to remove the restriction of the plates being within of each other, and then sending to zero we obtain the claim.
The proof of the decoupling theorem (and thus the Vinogradov main conjecture) are now complete.
Remark 9 The above arguments extend to give decoupling for the curve in for every . As it turns out (Bourgain, private communication), a variant of the argument also handles the range , and the range can be covered from an induction on dimension (using the argument used to establish Proposition 3).