Given any finite collection of elements {(f_i)_{i \in I}} in some Banach space {X}, the triangle inequality tells us that

\displaystyle \| \sum_{i \in I} f_i \|_X \leq \sum_{i \in I} \|f_i\|_X.

However, when the {f_i} all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if {X} is a Hilbert space and the {f_i} are mutually orthogonal, we have the Pythagorean theorem

\displaystyle \| \sum_{i \in I} f_i \|_X = (\sum_{i \in I} \|f_i\|_X^2)^{1/2}.

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \leq (\# I)^{1/2} (\sum_{i \in I} \|f_i\|_X^2)^{1/2} \ \ \ \ \ (1)


for any finite collection {(f_i)_{i \in I}} in any Banach space {X}, where {\# I} denotes the cardinality of {I}. Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of {(\# I)^{1/2}} or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection {(f_i)_{i \in I}} exhibits decoupling in {X} if one has the Pythagorean-like inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \ll_\varepsilon (\# I)^\varepsilon (\sum_{i \in I} \|f_i\|_X^2)^{1/2}

for any {\varepsilon>0}, thus one obtains almost the full square root cancellation in the {X} norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as {L^2({\bf R}^n)}. In {L^p} spaces for {p < 2} one usually does not expect this sort of decoupling; for instance, if the {f_i} are disjointly supported one has

\displaystyle \| \sum_{i \in I} f_i \|_{L^p} = (\sum_{i \in I} \|f_i\|_{L^p}^p)^{1/p}

and the right-hand side can be much larger than {(\sum_{i \in I} \|f_i\|_{L^p}^2)^{1/2}} when {p < 2}. At the opposite extreme, one usually does not expect to get decoupling in {L^\infty}, since one could conceivably align the {f_i} to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in {L^\infty} becomes sharp.

However, in some cases one can get decoupling for certain {2 < p < \infty}. For instance, suppose we are in {L^4}, and that {f_1,\dots,f_N} are bi-orthogonal in the sense that the products {f_i f_j} for {1 \leq i < j \leq N} are pairwise orthogonal in {L^2}. Then we have

\displaystyle \| \sum_{i = 1}^N f_i \|_{L^4}^2 = \| (\sum_{i=1}^N f_i)^2 \|_{L^2}

\displaystyle = \| \sum_{1 \leq i,j \leq N} f_i f_j \|_{L^2}

\displaystyle \ll (\sum_{1 \leq i,j \leq N} \|f_i f_j \|_{L^2}^2)^{1/2}

\displaystyle = \| (\sum_{1 \leq i,j \leq N} |f_i f_j|^2)^{1/2} \|_{L^2}

\displaystyle = \| \sum_{i=1}^N |f_i|^2 \|_{L^2}

\displaystyle \leq \sum_{i=1}^N \| |f_i|^2 \|_{L^2}

\displaystyle = \sum_{i=1}^N \|f_i\|_{L^4}^2

giving decoupling in {L^4}. (Similarly if each of the {f_i f_j} is orthogonal to all but {O_\varepsilon( N^\varepsilon )} of the other {f_{i'} f_{j'}}.) A similar argument also gives {L^6} decoupling when one has tri-orthogonality (with the {f_i f_j f_k} mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed {2 < p < \infty} if one multiplies each of the {f_i} by an independent random sign {\epsilon_i \in \{-1,+1\}}.

In recent years, Bourgain and Demeter have been establishing decoupling theorems in {L^p({\bf R}^n)} spaces for various key exponents of {2 < p < \infty}, in the “restriction theory” setting in which the {f_i} are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve {\gamma({\bf R}) \subset {\bf R}^n} parameterised by the polynomial curve

\displaystyle \gamma: t \mapsto (t, t^2, \dots, t^n).

For any ball {B = B(x_0,r)} in {{\bf R}^n}, let {w_B: {\bf R}^n \rightarrow {\bf R}^+} denote the weight

\displaystyle w_B(x) := \frac{1}{(1 + \frac{|x-x_0|}{r})^{100n}},

which should be viewed as a smoothed out version of the indicator function {1_B} of {B}. In particular, the space {L^p(w_B) = L^p({\bf R}^n, w_B(x)\ dx)} can be viewed as a smoothed out version of the space {L^p(B)}. For future reference we observe a fundamental self-similarity of the curve {\gamma({\bf R})}: any arc {\gamma(I)} in this curve, with {I} a compact interval, is affinely equivalent to the standard arc {\gamma([0,1])}.

Theorem 1 (Decoupling theorem) Let {n \geq 1}. Subdivide the unit interval {[0,1]} into {N} equal subintervals {I_i} of length {1/N}, and for each such {I_i}, let {f_i: {\bf R}^n \rightarrow {\bf R}} be the Fourier transform

\displaystyle f_i(x) = \int_{\gamma(I_i)} e(x \cdot \xi)\ d\mu_i(\xi)

of a finite Borel measure {\mu_i} on the arc {\gamma(I_i)}, where {e(\theta) := e^{2\pi i \theta}}. Then the {f_i} exhibit decoupling in {L^{n(n+1)}(w_B)} for any ball {B} of radius {N^n}.

Orthogonality gives the {n=1} case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in {L^p} up to the range {2 \leq p \leq 2n}; the point here is that we can now get a much larger value of {n}. The {n=2} case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent {n(n+1)} (and the radius {N^n}) is best possible, as can be seen by the following basic example. If

\displaystyle f_i(x) := \int_{I_i} e(x \cdot \gamma(\xi)) g_i(\xi)\ d\xi

where {g_i} is a bump function adapted to {I_i}, then standard Fourier-analytic computations show that {f_i} will be comparable to {1/N} on a rectangular box of dimensions {N \times N^2 \times \dots \times N^n} (and thus volume {N^{n(n+1)/2}}) centred at the origin, and exhibit decay away from this box, with {\|f_i\|_{L^{n(n+1)}(w_B)}} comparable to

\displaystyle 1/N \times (N^{n(n+1)/2})^{1/(n(n+1))} = 1/\sqrt{N}.

On the other hand, {\sum_{i=1}^N f_i} is comparable to {1} on a ball of radius comparable to {1} centred at the origin, so {\|\sum_{i=1}^N f_i\|_{L^{n(n+1)}(w_B)}} is {\gg 1}, which is just barely consistent with decoupling. This calculation shows that decoupling will fail if {n(n+1)} is replaced by any larger exponent, and also if the radius of the ball {B} is reduced to be significantly smaller than {N^n}.

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture) Let {s, n, N \geq 1} be integers, and let {\varepsilon > 0}. Then

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{2s}\ dx_1 \dots dx_n

\displaystyle \ll_{\varepsilon,s,n} N^{s+\varepsilon} + N^{2s - \frac{n(n+1)}{2}+\varepsilon}.

Proof: By the Hölder inequality (and the trivial bound of {N} for the exponential sum), it suffices to treat the critical case {s = n(n+1)/2}, that is to say to show that

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{n(n+1)}\ dx_1 \dots dx_n \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+\varepsilon}.

We can rescale this as

\displaystyle \int_{[0,N] \times [0,N^2] \times \dots \times [0,N^n]} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{n(n+1)+\varepsilon}.

As the integrand is periodic along the lattice {N{\bf Z} \times N^2 {\bf Z} \times \dots \times N^n {\bf Z}}, this is equivalent to

\displaystyle \int_{[0,N^n]^n} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+n^2+\varepsilon}.

The left-hand side may be bounded by {\ll \| \sum_{j=1}^N f_j \|_{L^{n(n+1)}(w_B)}^{n(n+1)}}, where {B := B(0,N^n)} and {f_j(x) := e(x \cdot \gamma(j/N))}. Since

\displaystyle \| f_j \|_{L^{n(n+1)}(w_B)} \ll (N^{n^2})^{\frac{1}{n(n+1)}},

the claim now follows from the decoupling theorem and a brief calculation. \Box

Using the Plancherel formula, one may equivalently (when {s} is an integer) write the Vinogradov main conjecture in terms of solutions {j_1,\dots,j_s,k_1,\dots,k_s \in \{1,\dots,N\}} to the system of equations

\displaystyle j_1^i + \dots + j_s^i = k_1^i + \dots + k_s^i \forall i=1,\dots,n,

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for {n \leq 3}, or for {n > 3} and {s} either below {n(n+1)/2 - n/3 + O(n^{2/3})} or above {n(n-1)}, with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of {23} fifth powers (the previous best result required {28} fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals {I_i} in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set {\{1,\dots,N\}} that {j} is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

— 1. Initial reductions —

The claim will proceed by an induction on dimension, thus we assume henceforth that {n \geq 2} (the {n=1} case being immediate from the Pythagorean theorem) and that Theorem 1 has already been proven for smaller values of {n}. This has the following nice consequence:

Proposition 3 (Lower dimensional decoupling) Let the notation be as in Theorem 1. Suppose also that {n \geq 2}, and that Theorem 1 has already been proven for all smaller values of {n}. Then for any {1 \leq k < n}, the {f_i} exhibits decoupling in {L^{k(k+1)}(w_B)} for any ball {B} of radius {N^k}.

Proof: (Sketch) We slice the ball {B} into {k}-dimensional slices parallel to the first {k} coordinate directions. On each slice, the {f_i} can be interpreted as functions on {{\bf R}^n} whose Fourier transform lie on the curve {\gamma_k(I) \subset {\bf R}^k}, where {\gamma_k(t) := (t,\dots,t^k)}. Applying Theorem 1 with {n} replaced by {k}, and then integrating over all slices using Fubini’s theorem and Minkowski’s inequality (to interchange the {L^{k(k+1)}} norm and the square function), we obtain the claim. \Box

The first step, needed for technical inductive purposes, is to work at an exponent slightly below {n(n+1)}. More precisely, given any {2 < p < \infty} and {\eta > 0}, let {P(p,\eta)} denote the assertion that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll_{\varepsilon,p,n} N^{\eta+\varepsilon} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}

whenever {\varepsilon>0}, {N \geq 1}, {B} and {f_1,\dots,f_N} are as in Theorem 1. Theorem 1 is then clearly equivalent to the claim {P(n(n+1),\eta)} holding for all {\eta>0}. This turns out to be equivalent to the following variant:

Proposition 4 Let {n \geq 2}, and assume Theorem 1 has been established for all smaller values of {n}. If {p < n(n+1)} is sufficiently close to {n(n+1)}, then {P(p,\eta)} holds for all {\eta > 0}.

The reason for this is that the functions {f_i} and {\sum_{i=1}^N f_i} all have Fourier transform supported on a ball of radius {O(1)}, and so there is a Bernstein-type inequality that lets one replace the {L^p(w_B)} norm of either function by the {L^{n(n+1)}(w_B)} norm, losing a power of {N} that goes to zero as {p} goes to {n(n+1)}. (See Corollary 6.2 and Lemma 8.2 of the Bourgain-Demeter-Guth paper for more details of this.)

Using the trivial bound (1) we see that {P(p,\eta)} holds for large {\eta} (e.g. {\eta \geq 1/2}). To reduce {\eta}, it suffices to prove the following inductive claim.

Proposition 5 (Inductive claim) Let {n \geq 2}, and assume Theorem 1 has been established for all smaller values of {n}. If {p < n(n+1)} is sufficiently close to {n(n+1)}, and {P(p,\eta)} holds for some {\eta > 0}, then {P(p,\eta')} holds for some {0 < \eta' < \eta}.

Since the set of {\eta \geq 0} for which {P(p,\eta)} holds is clearly a closed half-infinite interval, Proposition 5 implies Proposition 4 and hence Theorem 1.

Henceforth we fix {n,p,\eta} as in Proposition 5. We fix {n,p,\eta} and use {o(1)} to denote any quantity that goes to zero as {N \rightarrow \infty}, keeping {n,p,\eta} fixed. Then the {P(p,\eta)} hypothesis reads

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll N^{\eta+o(1)} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}

and our task is to show that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll N^{\eta'+o(1)} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2} \ \ \ \ \ (2)


for some {0 < \eta' < \eta}.

The next step is to reduce matters to a “multilinear” version of the above estimate, in order to exploit a multilinear Kakeya estimate at a later stage of the argument. Let {M} be a large integer depending only on {n} (actually Bourgain, Demeter, and Guth choose {M := n!}). It turns out that it will suffice to prove the multilinear version

\displaystyle \| \hbox{geom} |\sum_{i \in {\mathcal I}_j} f_i|\|_{L^p(w_B)} \ll N^{\eta'+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2} \ \ \ \ \ (3)


whenever {{\mathcal I}_1,\dots,{\mathcal I}_M} are families of disjoint subintervals on {[0,1]} of length {1/N} that are separated from each other by a distance of {\gg 1}, and where {\hbox{geom} = \hbox{geom}_{j=1,\dots,M}} denotes the geometric mean

\displaystyle \hbox{geom} x_j := (x_1 \dots x_M)^{1/M}.

We have the following nice equivalence (essentially due to Bourgain and Guth, building upon an earlier “bilinear equivalence” result of Vargas, Vega, and myself, and discussed in this previous blog post):

Proposition 6 (Multilinear equivalence) For any {\eta'>0}, the estimates (2) and (3) are equivalent.

Proof: The derivation of (3) from (2) is immediate from Hölder’s inequality. To obtain the converse implication, let {A(N)} denote the best constant in (2), thus {A(N)} is the smallest constant such that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \leq A(N) (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}. \ \ \ \ \ (4)


The idea is to prove an inequality of the form

\displaystyle A(N) \ll A(N/K) + O_K( N^{\eta' + o(1)} )

for any fixed integer {K>2M} (with the implied constant in the {\ll} notation independent of {K}); by choosing {K} large enough one can then prove {A(N) \ll N^{\eta'+o(1)}} by an inductive argument.

We partition the {N} intervals in (2) into {K} classes {{\mathcal I}_1,\dots, {\mathcal I}_K} of {\sim N/K} consecutive intervals, so that {\sum_{i=1}^N f_i} can be expressed as {\sum_{k=1}^K F_k} where {F_k := \sum_{i \in {\mathcal I}_k} f_i}. Observe that for any {x}, one either has

\displaystyle |F_k(x)| \gg \sum_{k=1}^K |F_k(x)|

for some {k=1,\dots,K} (i.e. one of the {|F_k(x)|} dominates the sum), or else one has

\displaystyle \sum_{k=1}^K |F_k(x)| \ll_K \hbox{geom} |F_{k_j}(x)|

for some {k_1 < \dots < k_K} with the transversality condition {k_{j+1} - k_j > 1}. This leads to the pointwise inequality

\displaystyle \sum_{k=1}^K |F_k(x)| \ll \sup_k |F_k(x)| + O_K( \sum_{k_1 < \dots < k_K, \hbox{ transverse}} \hbox{geom} |F_{k_j}(x)| ).

Bounding the supremum {\sup_k |F_k(x)|} by {(\sum_{k=1}^K |F_k(x)|^p)^{1/p}} and then taking {L^p} norms and using (3), we conclude that

\displaystyle \| \sum_{i=1}^N f_i\|_{L^p(w_B)} \ll (\sum_{k=1}^K \| \sum_{i \in {\mathcal I}_k} f_i \|_{L^p(w_B)}^p)^{1/p}

\displaystyle + O_K( N^{\eta'+o(1)} (\sum_{i=1}^n \|f_i\|_{L^p(w_B)}^2)^{1/2} ).

On the other hand, applying an affine rescaling to (4) one sees that

\displaystyle \| \sum_{i \in {\mathcal I}_k} f_i \|_{L^p(w_B)} \leq A(N/K) (\sum_{i \in {\mathcal I}_k} \|f_i\|_{L^p(w_B)}^2)^{1/2},

and the claim follows. (A more detailed version of this argument may be found in Theorem 4.1 of this paper of Bourgain and Demeter.) \Box

It thus suffices to show (3).

The next step is to set up some intermediate scales between {1} and {N}, in order to run an “induction on scales” argument. For any scale {r > 0}, any exponent {1 < t < \infty}, and any function {f \in L^t({\bf R}^n)}, let {\hbox{Avg}_{t,r} f \in L^p({\bf R}^n)} denote the local {L^p} average

\displaystyle \hbox{Avg}_{t,r} f(x) := ( \frac{1}{|B(x,r)|} \int_{{\bf R}^n} |f(y)|^t w_{B(x,r)}(y)\ dy)^{1/t}

where {|B(x,r)|} denotes the volume of {B(x,r)} (one could also use the equivalent quantity {\int_{{\bf R}^n} w_{B(x,r)}(y)\ dy} here if desired). For any exponents {2 \leq t < \infty}, {0 \leq q \leq 1}, and {q \leq s \leq n} (independent of {N}), let {a_t(q,s)} denote the least exponent for which one has the local decoupling inequality

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{t,N^s} f_{I}|^2)^{1/2}\|_{L^p(w_B)} \ \ \ \ \ (5)


\displaystyle \ll N^{a_t(q,s)+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for {f_i, B, {\mathcal I}_j} as in (3), where the {1/N}-length intervals in {{\mathcal I}_j} have been covered by a family {{\mathcal I}_{j,q}} of finitely overlapping intervals of length {1/N^q}, and {f_I := \sum_{i: I_i \subset I} f_i}. It is then not difficult to see that the estimate (3) is equivalent to the inequality

\displaystyle a_2(0,0) \leq \eta'

(basically because when {q=0}, there is essentially only one {I} for each {j}, and {f_I} is basically {\sum_{i \in {\mathcal I}_j} f_i}; also; the averaging {\hbox{Avg}_{t,N^s}} is essentially the identity when {s=0} since all the {f_i} and {f_I} here have Fourier support on a ball of radius {O(1)}). To put it another way, our task is now to show that

\displaystyle a_2(0,0) < \eta' \ \ \ \ \ (6)


On the other hand, one can establish the following inequalities concerning the quantities {a_t(q,s)}, arranged roughly in increasing order of difficulty to prove.

Proposition 7 (Inequalities on {a_t(q,s)}) Throughout this proposition it is understood that {2 \leq t < \infty}, {0 \leq q \leq 1}, and {q \leq s \leq n}.

  • (i) (Hölder) The quantity {a_t(q,s)} is convex in {1/t}, and monotone nondecreasing in {t}.
  • (ii) (Minkowski) If {t=p}, then {a_t(q,s)} is monotone non-decreasing in {s}.
  • (iii) (Stability) One has {a_2(q,s) = a_2(0,0) + O( q + s )}. (In fact, {a_t(q,s)} is Lipschitz in {q,s} uniformly in {t}, but we will not need this.)
  • (iv) (Rescaled decoupling hypothesis) If {t=p} and {s=n}, then one has {a_t(q,s) \leq (1-q) \eta}.
  • (v) (Lower dimensional decoupling) If {1 \leq k \leq n-1} and {q \leq \frac{s}{k}}, then {a_{k(k+1)}(q,s) \leq a_{k(k+1)}(\frac{s}{k},s)}.
  • (vi) (Multilinear Kakeya) If {1 \leq k \leq n-1} and {(k+1)q \leq n}, then {a_{kp/n}(q,kq) \leq a_{kp/n}(q,(k+1)q)}.

We sketch the proof of the various parts of this proposition in later sections. For now, let us show how these properties imply the claim (6). In the paper of Bourgain, Demeter, and Guth, the above properties were iterated along a certain “tree” of parameters {(t,q,s)}, relying in (v) to increase the {q} parameter (which measures the amount of decoupling) and (vi) to “inflate” or increase the {s} parameter (which measures the spatial scale at which decoupling has been obtained), and (i) to reconcile the different choices of {t} appearing in (v) and (vi), with the remaining properties (ii), (iii), (iv) used to control various “boundary terms” arising from this tree iteration. Here, we will present an essentially equivalent “Bellman function” formulation of the argument which replaces this iteration by a carefully (but rather unmotivatedly) chosen inductive claim. More precisely, let {\varepsilon > 0} be a small quantity (depending only on {p} and {n}) to be chosen later. For any {W>0}, let {Q(W) = Q_\eta(W)} denote the claim that for every {k=2,\dots,n}, and for all sufficiently small {u > 0}, one has the inequality

\displaystyle a_t(u, ku) \leq (1 - 2 Wu (n-k+1) \ \ \ \ \ (7)


\displaystyle + (1-\varepsilon) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{1}{t} ) ) \eta

for all

\displaystyle (k-1) k \leq t \leq \frac{kp}{n}, \ \ \ \ \ (8)


and also

\displaystyle a_t(u,u) \leq (1 - Wu(n-1)) \eta \ \ \ \ \ (9)


for {2 \leq t \leq p/n}.

From Proposition 7 (i), (ii), (iv), we see that {Q(W)} holds for some small {W>0}. We will shortly establish the implication

\displaystyle Q(W) \implies Q( (1+\delta) W ) \ \ \ \ \ (10)


for some {\delta>0} independent of {W}; this implies upon iteration that {Q(W)} holds for arbitrarily large values of {W}. Applying (9) with {t=2} for a sufficiently large {W} and a sufficiently small {u}, and combining with Proposition 7(iii), we obtain the claim (6).

We now prove the implication (10). Thus we assume (7) holds for {2 \leq k \leq n}, sufficiently small {u > 0}, and {t} obeying (8), and also (9) for {2 \leq t \leq p/n} and we wish to improve this to

\displaystyle a_t(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) \ \ \ \ \ (11)


\displaystyle + (1-\varepsilon) (1+\delta) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{1}{t} ) ) \eta

for the same range of {k, t} and for sufficiently small {u}, and also

\displaystyle a_t(u,u) \leq (1 - (1+\delta)Wu(n-1)) \eta \ \ \ \ \ (12)


for {2 \leq t \leq p/n}.

By Proposition 7(i) it suffices to show this for the extreme values of {t}, thus we wish to show that

\displaystyle a_{kp/n}(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) \ \ \ \ \ (13)


\displaystyle + (1-\varepsilon) (1+\delta) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{n}{pk} ) ) \eta

for {k=2,\dots,n},

\displaystyle a_{(k-1)k}(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) ) \eta \ \ \ \ \ (14)


for {k=2,\dots,n}, and

\displaystyle a_2(u, u), a_{p/n}(u,u) \leq (1 - (1+\delta)Wu (n-1) ) \eta. \ \ \ \ \ (15)


We begin with (13). The {k=n} case of this estimate is

\displaystyle a_p(u,nu) \leq (1 - 2 (1+\delta) Wu \ \ \ \ \ (16)


\displaystyle + (1-\varepsilon) (1+\delta) W (n-1) n (n+1) (\frac{1}{(n-1)n} - \frac{1}{p}) ) \eta.

But since {p < n(n+1)}, we see that {(1-\varepsilon) (n-1) n (n+1) (\frac{1}{(n-1)n} - \frac{1}{p}) > 2} if {\varepsilon} is small enough, so the right-hand side of (16) is greater than {\eta} and the claim follows from Proposition 7(iv) (with a little bit of room to spare). Now we look at the {k=2,\dots,n-1} cases of (13). By Proposition 7(vi), we have

\displaystyle a_{kp/n}(u, ku) \leq a_{kp/n}(u,(k+1)u).

For {p} close to {n(n+1)}, {kp/n} lies between {\max(2,k(k+1))} and {\frac{(k+1)p}{n}}, so from (7) one has

\displaystyle a_{kp/n}(u,(k+1)u) \leq (1 - 2 Wu (n-k)

\displaystyle + (1-\varepsilon) W u k (k+1) (n+1) (\frac{1}{k(k+1)} - \frac{n}{pk} ) ) \eta.

Since {p < n(n+1)}, one has

\displaystyle -2(n-k) + (1-\varepsilon) k(k+1) (n+1) (\frac{1}{k(k+1)} - \frac{n}{pk})

\displaystyle < -2(n-k+1) + (1-\varepsilon) (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{n}{pk} )

for {\varepsilon} small enough depending on {p}, and (13) follows (if {\delta} is small enough depending on {\varepsilon,p} but not on {W}).

The same argument applied with {k=1} gives

\displaystyle a_{p/n}(u,u) \leq (1 - 2Wu(n-1)

\displaystyle + (1-\varepsilon) 2W u (n+1) (\frac{1}{2} - \frac{n}{2p} ) ) \eta.

Since {p < n(n+1)}, we thus have

\displaystyle a_{p/n}(u,u) \leq (1 - (1+\delta) Wu(n-1) ) \eta

if {\varepsilon,\delta} are sufficiently small depending on {p} (but not on {W}). This, together with Proposition 7(i), gives (15).

Finally, we establish (14). From Proposition 7(v) (with {k} replaced by {k-1}) we have

\displaystyle a_{(k-1)k}(u, ku) \leq a_{(k-1)k}( \frac{k}{k-1} u, ku ).

In the {k=2} case, this gives

\displaystyle a_{(k-1)k}(u,ku) \leq a_2( 2u, 2u )

and the claim (14) follows from (15) in this case. Now suppose {2 < k \leq n}. Since {p} is close to {n(n+1)}, {(k-1)k} lies between {(k-2)(k-1)} and {\frac{(k-1)p}{n}}, and so we may apply (7) to conclude that

\displaystyle a_{(k-1)k}( \frac{k}{k-1} u, ku ) \leq (1 - 2 W \frac{k}{k-1} u (n-k+2)

\displaystyle + (1-\varepsilon) W \frac{k}{k-1} u(k-1) (k-2) (n+1) (\frac{1}{(k-2)(k-1)} - \frac{1}{(k-1)k} ) ) \eta

and hence (after simplifying)

\displaystyle a_{(k-1)k}(u, ku) \leq (1 - 2W u (n-k+1) (1 + \frac{\varepsilon (n+1)}{(n-k+1)(k-1)} )) \eta,

which gives (14) for {\delta} small enough (depending on {\varepsilon,k,n}, but not on {W}).

— 2. Rescaled decoupling —

The claims (i), (ii), (iii) of Proposition 7 are routine applications of the Hölder and Minkowski inequalities (and also the Bernstein inequality, in the case of (iii)); we will focus on the more interesting claims (iv), (v), (vi).

Here we establish (iv). The main geometric point exploited here is that any segment of the curve {\gamma([0,1])} is affinely equivalent to {\gamma([0,1])} itself, with the key factor of {1-q} in the bound {a_t(q,s) \leq (1-q) \eta} coming from this affine rescaling.

Using the definition (5) of {a_t(q,s)}, we see that we need to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{p,N^n} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{(1-q)\eta+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for balls {B} of radius {N^n}. By Hölder’s inequality, it suffices to show that

\displaystyle \| (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{p,N^n} f_{I}|^2)^{1/2}\|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for each {j}. By Minkowski’s inequality (and the fact that {p>2}), the left-hand side is at most

\displaystyle ( \sum_{I \in {\mathcal I}_{j,q}} \| \hbox{Avg}_{p,N^n} f_{I}\|_{L^p(w_B)}^2)^{1/2}

so it suffices to show that

\displaystyle \| \hbox{Avg}_{p,N^n} f_{I} \|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i: I_i \subset I} \|f_i\|_{L^p(w_B)})^{1/2}

for each {I \in {\mathcal I}_{j,q}}. From Fubini’s theorem one has

\displaystyle \| \hbox{Avg}_{p,N^n} f_{I} \|_{L^p(w_B)} \ll_{n,p} \| f_{I} \|_{L^p(w_B)}

so we reduce to showing that

\displaystyle \| f_{I} \|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i: I_i \subset I} \|f_i\|_{L^p(w_B)}^2)^{1/2}.

But this follows by applying an affine rescaling to map {\gamma(I)} to {\gamma([0,1])}, and then using the hypothesis {P(p,\eta)} with {N} replaced by {N^{1-q}}. (The ball {B} gets distorted into an ellipsoid, but one can check that this ellipsoid can be covered efficiently by finitely overlapping balls of radius {N^{1-q}}, and so one can close the argument using the triangle inequality.)

— 3. Lower dimensional decoupling —

Now we establish (v). Here, the geometric point is the one implicitly used in Proposition 3, namely that the {n}-dimensional curve {\gamma([0,1])} projects down to the {k}-dimensional curve {\gamma_k([0,1])} for any {1 \leq k < n}.

Let {k,q,s} be as in Proposition 7(v). From (5), it suffices to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{k(k+1),N^s} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} |\hbox{Avg}_{k(k+1),N^s} f_{J}|^2)^{1/2}\|_{L^p(w_B)}

for balls {B} of radius {N^n}. It will suffice to show the pointwise estimate

\displaystyle \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{k(k+1),N^s} f_{I}(x_0)|^2)^{1/2}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} |\hbox{Avg}_{k(k+1),N^s} f_{J}(x_0)|^2)^{1/2}

for any {x_0 \in {\bf R}^n}, or equivalently that

\displaystyle \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} \| f_I \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

where {B' := B(x_0,N^s)}. Clearly this will follow if we have

\displaystyle (\sum_{I \in {\mathcal I}_{j,q}} \| f_I \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2} \ll N^{o(1)} (\sum_{J \in {\mathcal I}_{j,s/k}} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

for each {j}. Covering the intervals in {{\mathcal I}_{j,s/k}} by those in {{\mathcal I}_{j,q}}, it suffices to show that

\displaystyle \| f_I \|_{L^{k(k+1)}(w_{B'})} \ll N^{o(1)} (\sum_{J: J \subset I} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

for each {I \in {\mathcal I}_{j,q}}. But this follows from Proposition 3.

— 4. Multidimensional Kakeya —

Finally, we establish (vi), which is the most substantial component of Proposition 7, and the only component which truly takes advantage of the reduction to the multilinear setting. Let {1 \leq k \leq n-1} and {q} be such that {(k+1)q \leq n}. From (5), it suffices to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{kq}} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{(k+1)q}} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

for balls {B} of radius {N^n}. By averaging, it suffices to establish the bound

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{kq}} f_{I}|^2)^{1/2}\|_{L^p(w_{B'})}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{(k+1)q}} f_{I}|^2)^{1/2}\|_{L^p(w_{B'})}

for balls {B'} of radius {N^{(k+1)q}}. If we write {F_I := (\hbox{Avg}_{kp/n, N^{kq}} f_{I})^{kp/n}}, the right-hand side simplifies to

\displaystyle N^{o(1)} |B'|^{1/p} \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} (\frac{1}{|B'|} \int F_I w_{B'})^{2n/kp})^{1/2}

so it suffices to show that

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} F_I^{2n/kp})^{1/2}\|_{L^p(w_{B'})}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} (\frac{1}{|B'|} \int F_I w_{B'})^{2n/kp})^{1/2}.

At this point it is convenient to perform a dyadic pigeonholing (giving up a factor of {N^{o(1)}}) to normalise, for each {j}, all of the quantities {\frac{1}{|B'|} \int F_I w_{B'}} to be of comparable size, after reducing the sets {{\mathcal I}_{j,q}} so some appropriate subset {{\mathcal I}'_{j,q}}. (The contribution of those {I} for which this quantity is less than, say, {N^{-100n^2}} of the maximal value, can be safely discarded by trivial estimates.) By homogeneity we may then normalise

\displaystyle \frac{1}{|B'|} \int F_I w_{B'} \sim 1

for all surviving {I}, so the estimate now becomes

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}'_{j,q}} F_I^{2n/kp})^{1/2}\|_{L^p(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q})^{1/2}.

Since {p} is close to {n(n+1)}, {2n/kp} is less than {1}, so we can estimate

\displaystyle (\sum_{I \in {\mathcal I}'_{j,q}} F_I^{2n/kp})^{1/2} \leq (\# {\mathcal I}'_{j,q})^{1/2 - n/kp} (\sum_{I \in {\mathcal I}'_{j,q}} F_I)^{n/kp}

and so it suffices to show that

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}'_{j,q}} F_I)^{n/kp}\|_{L^p(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q})^{n/kp},

or, on raising to the power {pk/n},

\displaystyle |B'|^{-k/n} \| \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} F_I\|_{L^{n/k}(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q}).

Localising to balls {B'} of radius {N^{(k+1)q}}, it suffices to show that

\displaystyle |B'|^{-k/n} \| \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} F_I\|_{L^{n/k}(B')} \ll N^{o(1)} \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} |B'|^{-1} \| F_I \|_{L^1(B')}.

The arc {\gamma(I)} is contained in a box of dimensions roughly {N^{-q} \times N^{-2q} \times \dots \times N^{-nq}}, so by the uncertainty principle {f_I} is essentially constant along boxes of dimensions {N^q \times N^{2q} \times \dots \times N^{nq}} (this can be made precise by standard methods, see e.g. the discussion in the proof of Theorem 5.6 of Bourgain-Demeter-Guth, or my general discussion on the uncertainty principle in this previous blog post). This implies that {F_I}, when restricted to {B'}, is essentially constant on “plates”, defined as the intersection of {B'} with slabs that have {k} dimensions of length {N^{kq}} and the remaining {n-k} dimensions infinite (and thus restricted to be of length about {N^{(k+1)q}} after restriction to {B'}). Furthermore, as {j} varies (and {I} is constrained to be in {{\mathcal I}'_{j,q}}, the orientation of these slabs varies in a suitably “transverse” fashion (the precise definition of this is a little technical, but can be verified for {M=n!}; see the BDG paper for details). After rescaling, the claim then follows from the following proposition:

Proposition 8 (Multilinear Kakeya) For {j=1,\dots,M}, let {{\mathcal P}_j} be a collection of “plates” that have {k} dimensions of length {1}, and {n-k} dimensions that are infinite, and for each {P \in {\mathcal P}_j} let {c_P} be a non-negative number. Assume that the families of plates {{\mathcal P}_j} obey a suitable transversality condition. Then

\displaystyle \| \hbox{geom} \sum_{P \in {\mathcal P}_j} c_P 1_P \|_{L^{n/k}(B)} \ll N^{o(1)} \hbox{geom} \sum_{P \in {\mathcal P}_j} c_P

for any ball {B} of radius {N}.

The exponent {n/k} here is natural, as can be seen by considering the example where each {{\mathcal P}_j} consists of about {N^k} parallel disjoint plates passing through {B}, with {c_P=1} for all such plates.

For {k = n-1} (where the plates now become tubes), this result was first obtained by Bennett, Carbery, and myself using heat kernel methods, with a rather different proof (also capturing the endpoint case) later given using algebraic topological methods by Guth (as discussed in this previous post. More recently, a very short and elementary proof of this theorem was given by Guth, which was initially given for {k=n-1} but extends to general {k}. The scheme of the proof can be described as follows.

  • When all the plates {P} in a each family {{\mathcal P}_j} are parallel, the claim follows from the Loomis-Whitney inequality (when {k=n-1}) or a more general Brascamp-Lieb inequality of Bennett, Carbery, Christ, and myself (for general {k}). These inequalities can be proven by a repeated applications of the Hölder inequality and Fubini’s theorem.
  • Perturbing this, we can obtain the proposition with a loss of {N^\varepsilon} for any {N>0} and {\varepsilon>0}, provided that the plates in each {{\mathcal P}_j} are within {\delta} of being parallel, and {\delta} is sufficiently small depending on {N} and {\varepsilon}. (For the case of general {k}, this requires some uniformity in the result of Bennett, Carbery, Christ, and myself, which can be obtained by hand in the specific case of interest here, but was recently established in general by Bennett, Bez, Flock, and Lee.
  • A standard “induction on scales” argument shows that if the proposition is true at scale {N} with some loss {K(N)}, then it is also true at scale {N^2} with loss {O( K(N)^2 )}. Iterating this, we see that we can obtain the proposition with a loss of {O_\varepsilon(N^\varepsilon)} uniformly for all {N>0}, provided that the plates are within {\delta} of being parallel and {\delta} is sufficiently small depending now only on {\varepsilon} (and not on {N}).
  • A finite partition of unity then suffices to remove the restriction of the plates being within {\delta} of each other, and then sending {\varepsilon} to zero we obtain the claim.

The proof of the decoupling theorem (and thus the Vinogradov main conjecture) are now complete.

Remark 9 The above arguments extend to give decoupling for the curve {\gamma([0,1])} in {L^p} for every {n^2 \leq p \leq n(n+1)}. As it turns out (Bourgain, private communication), a variant of the argument also handles the range {n(n-1) \leq p \leq n^2}, and the range {2 \leq p \leq n(n-1)} can be covered from an induction on dimension (using the argument used to establish Proposition 3).