You are currently browsing the monthly archive for December 2015.

Over on the polymath blog, I’ve posted (on behalf of Dinesh Thakur) a new polymath proposal, which is to explain some numerically observed identities involving the irreducible polynomials $P$ in the polynomial ring ${\bf F}_2[t]$ over the finite field of characteristic two, the simplest of which is

$\displaystyle \sum_P \frac{1}{1+P} = 0$

(expanded in terms of Taylor series in $u = 1/t$).  Comments on the problem should be placed in the polymath blog post; if there is enough interest, we can start a formal polymath project on it.

In this blog post, I would like to specialise the arguments of Bourgain, Demeter, and Guth from the previous post to the two-dimensional case of the Vinogradov main conjecture, namely

Theorem 1 (Two-dimensional Vinogradov main conjecture) One has

$\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^6\ dx dy \ll N^{3+o(1)}$

as ${N \rightarrow \infty}$.

This particular case of the main conjecture has a classical proof using some elementary number theory. Indeed, the left-hand side can be viewed as the number of solutions to the system of equations

$\displaystyle j_1 + j_2 + j_3 = k_1 + k_2 + k_3$

$\displaystyle j_1^2 + j_2^2 + j_3^2 = k_1^2 + k_2^2 + k_3^2$

with ${j_1,j_2,j_3,k_1,k_2,k_3 \in \{0,\dots,N\}}$. These two equations can combine (using the algebraic identity ${(a+b-c)^2 - (a^2+b^2-c^2) = 2 (a-c)(b-c)}$ applied to ${(a,b,c) = (j_1,j_2,k_3), (k_1,k_2,j_3)}$) to imply the further equation

$\displaystyle (j_1 - k_3) (j_2 - k_3) = (k_1 - j_3) (k_2 - j_3)$

which, when combined with the divisor bound, shows that each ${k_1,k_2,j_3}$ is associated to ${O(N^{o(1)})}$ choices of ${j_1,j_2,k_3}$ excluding diagonal cases when two of the ${j_1,j_2,j_3,k_1,k_2,k_3}$ collide, and this easily yields Theorem 1. However, the Bourgain-Demeter-Guth argument (which, in the two dimensional case, is essentially contained in a previous paper of Bourgain and Demeter) does not require the divisor bound, and extends for instance to the the more general case where ${j}$ ranges in a ${1}$-separated set of reals between ${0}$ to ${N}$.

In this special case, the Bourgain-Demeter argument simplifies, as the lower dimensional inductive hypothesis becomes a simple ${L^2}$ almost orthogonality claim, and the multilinear Kakeya estimate needed is also easy (collapsing to just Fubini’s theorem). Also one can work entirely in the context of the Vinogradov main conjecture, and not turn to the increased generality of decoupling inequalities (though this additional generality is convenient in higher dimensions). As such, I am presenting this special case as an introduction to the Bourgain-Demeter-Guth machinery.

We now give the specialisation of the Bourgain-Demeter argument to Theorem 1. It will suffice to establish the bound

$\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^p\ dx dy \ll N^{p/2+o(1)}$

for all ${4, (where we keep ${p}$ fixed and send ${N}$ to infinity), as the ${L^6}$ bound then follows by combining the above bound with the trivial bound ${|\sum_{j=0}^N e( j x + j^2 x^2)| \ll N}$. Accordingly, for any ${\eta > 0}$ and ${4, we let ${P(p,\eta)}$ denote the claim that

$\displaystyle \int_{[0,1]^2} |\sum_{j=0}^N e( j x + j^2 y)|^p\ dx dy \ll N^{p/2+\eta+o(1)}$

as ${N \rightarrow \infty}$. Clearly, for any fixed ${p}$, ${P(p,\eta)}$ holds for some large ${\eta}$, and it will suffice to establish

Proposition 2 Let ${4, and let ${\eta>0}$ be such that ${P(p,\eta)}$ holds. Then there exists ${0 < \eta' < \eta}$ (depending continuously on $\eta$) such that ${P(p,\eta')}$ holds.

Indeed, this proposition shows that for ${4, the infimum of the ${\eta}$ for which ${P(p,\eta)}$ holds is zero.

We prove the proposition below the fold, using a simplified form of the methods discussed in the previous blog post. To simplify the exposition we will be a bit cavalier with the uncertainty principle, for instance by essentially ignoring the tails of rapidly decreasing functions.

Given any finite collection of elements ${(f_i)_{i \in I}}$ in some Banach space ${X}$, the triangle inequality tells us that

$\displaystyle \| \sum_{i \in I} f_i \|_X \leq \sum_{i \in I} \|f_i\|_X.$

However, when the ${f_i}$ all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if ${X}$ is a Hilbert space and the ${f_i}$ are mutually orthogonal, we have the Pythagorean theorem

$\displaystyle \| \sum_{i \in I} f_i \|_X = (\sum_{i \in I} \|f_i\|_X^2)^{1/2}.$

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

$\displaystyle \| \sum_{i \in I} f_i \|_X \leq (\# I)^{1/2} (\sum_{i \in I} \|f_i\|_X^2)^{1/2} \ \ \ \ \ (1)$

for any finite collection ${(f_i)_{i \in I}}$ in any Banach space ${X}$, where ${\# I}$ denotes the cardinality of ${I}$. Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of ${(\# I)^{1/2}}$ or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection ${(f_i)_{i \in I}}$ exhibits decoupling in ${X}$ if one has the Pythagorean-like inequality

$\displaystyle \| \sum_{i \in I} f_i \|_X \ll_\varepsilon (\# I)^\varepsilon (\sum_{i \in I} \|f_i\|_X^2)^{1/2}$

for any ${\varepsilon>0}$, thus one obtains almost the full square root cancellation in the ${X}$ norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as ${L^2({\bf R}^n)}$. In ${L^p}$ spaces for ${p < 2}$ one usually does not expect this sort of decoupling; for instance, if the ${f_i}$ are disjointly supported one has

$\displaystyle \| \sum_{i \in I} f_i \|_{L^p} = (\sum_{i \in I} \|f_i\|_{L^p}^p)^{1/p}$

and the right-hand side can be much larger than ${(\sum_{i \in I} \|f_i\|_{L^p}^2)^{1/2}}$ when ${p < 2}$. At the opposite extreme, one usually does not expect to get decoupling in ${L^\infty}$, since one could conceivably align the ${f_i}$ to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in ${L^\infty}$ becomes sharp.

However, in some cases one can get decoupling for certain ${2 < p < \infty}$. For instance, suppose we are in ${L^4}$, and that ${f_1,\dots,f_N}$ are bi-orthogonal in the sense that the products ${f_i f_j}$ for ${1 \leq i < j \leq N}$ are pairwise orthogonal in ${L^2}$. Then we have

$\displaystyle \| \sum_{i = 1}^N f_i \|_{L^4}^2 = \| (\sum_{i=1}^N f_i)^2 \|_{L^2}$

$\displaystyle = \| \sum_{1 \leq i,j \leq N} f_i f_j \|_{L^2}$

$\displaystyle \ll (\sum_{1 \leq i,j \leq N} \|f_i f_j \|_{L^2}^2)^{1/2}$

$\displaystyle = \| (\sum_{1 \leq i,j \leq N} |f_i f_j|^2)^{1/2} \|_{L^2}$

$\displaystyle = \| \sum_{i=1}^N |f_i|^2 \|_{L^2}$

$\displaystyle \leq \sum_{i=1}^N \| |f_i|^2 \|_{L^2}$

$\displaystyle = \sum_{i=1}^N \|f_i\|_{L^4}^2$

giving decoupling in ${L^4}$. (Similarly if each of the ${f_i f_j}$ is orthogonal to all but ${O_\varepsilon( N^\varepsilon )}$ of the other ${f_{i'} f_{j'}}$.) A similar argument also gives ${L^6}$ decoupling when one has tri-orthogonality (with the ${f_i f_j f_k}$ mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed ${2 < p < \infty}$ if one multiplies each of the ${f_i}$ by an independent random sign ${\epsilon_i \in \{-1,+1\}}$.

In recent years, Bourgain and Demeter have been establishing decoupling theorems in ${L^p({\bf R}^n)}$ spaces for various key exponents of ${2 < p < \infty}$, in the “restriction theory” setting in which the ${f_i}$ are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve ${\gamma({\bf R}) \subset {\bf R}^n}$ parameterised by the polynomial curve

$\displaystyle \gamma: t \mapsto (t, t^2, \dots, t^n).$

For any ball ${B = B(x_0,r)}$ in ${{\bf R}^n}$, let ${w_B: {\bf R}^n \rightarrow {\bf R}^+}$ denote the weight

$\displaystyle w_B(x) := \frac{1}{(1 + \frac{|x-x_0|}{r})^{100n}},$

which should be viewed as a smoothed out version of the indicator function ${1_B}$ of ${B}$. In particular, the space ${L^p(w_B) = L^p({\bf R}^n, w_B(x)\ dx)}$ can be viewed as a smoothed out version of the space ${L^p(B)}$. For future reference we observe a fundamental self-similarity of the curve ${\gamma({\bf R})}$: any arc ${\gamma(I)}$ in this curve, with ${I}$ a compact interval, is affinely equivalent to the standard arc ${\gamma([0,1])}$.

Theorem 1 (Decoupling theorem) Let ${n \geq 1}$. Subdivide the unit interval ${[0,1]}$ into ${N}$ equal subintervals ${I_i}$ of length ${1/N}$, and for each such ${I_i}$, let ${f_i: {\bf R}^n \rightarrow {\bf R}}$ be the Fourier transform

$\displaystyle f_i(x) = \int_{\gamma(I_i)} e(x \cdot \xi)\ d\mu_i(\xi)$

of a finite Borel measure ${\mu_i}$ on the arc ${\gamma(I_i)}$, where ${e(\theta) := e^{2\pi i \theta}}$. Then the ${f_i}$ exhibit decoupling in ${L^{n(n+1)}(w_B)}$ for any ball ${B}$ of radius ${N^n}$.

Orthogonality gives the ${n=1}$ case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in ${L^p}$ up to the range ${2 \leq p \leq 2n}$; the point here is that we can now get a much larger value of ${n}$. The ${n=2}$ case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent ${n(n+1)}$ (and the radius ${N^n}$) is best possible, as can be seen by the following basic example. If

$\displaystyle f_i(x) := \int_{I_i} e(x \cdot \gamma(\xi)) g_i(\xi)\ d\xi$

where ${g_i}$ is a bump function adapted to ${I_i}$, then standard Fourier-analytic computations show that ${f_i}$ will be comparable to ${1/N}$ on a rectangular box of dimensions ${N \times N^2 \times \dots \times N^n}$ (and thus volume ${N^{n(n+1)/2}}$) centred at the origin, and exhibit decay away from this box, with ${\|f_i\|_{L^{n(n+1)}(w_B)}}$ comparable to

$\displaystyle 1/N \times (N^{n(n+1)/2})^{1/(n(n+1))} = 1/\sqrt{N}.$

On the other hand, ${\sum_{i=1}^N f_i}$ is comparable to ${1}$ on a ball of radius comparable to ${1}$ centred at the origin, so ${\|\sum_{i=1}^N f_i\|_{L^{n(n+1)}(w_B)}}$ is ${\gg 1}$, which is just barely consistent with decoupling. This calculation shows that decoupling will fail if ${n(n+1)}$ is replaced by any larger exponent, and also if the radius of the ball ${B}$ is reduced to be significantly smaller than ${N^n}$.

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture) Let ${s, n, N \geq 1}$ be integers, and let ${\varepsilon > 0}$. Then

$\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{2s}\ dx_1 \dots dx_n$

$\displaystyle \ll_{\varepsilon,s,n} N^{s+\varepsilon} + N^{2s - \frac{n(n+1)}{2}+\varepsilon}.$

Proof: By the Hölder inequality (and the trivial bound of ${N}$ for the exponential sum), it suffices to treat the critical case ${s = n(n+1)/2}$, that is to say to show that

$\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{n(n+1)}\ dx_1 \dots dx_n \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+\varepsilon}.$

We can rescale this as

$\displaystyle \int_{[0,N] \times [0,N^2] \times \dots \times [0,N^n]} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{n(n+1)+\varepsilon}.$

As the integrand is periodic along the lattice ${N{\bf Z} \times N^2 {\bf Z} \times \dots \times N^n {\bf Z}}$, this is equivalent to

$\displaystyle \int_{[0,N^n]^n} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+n^2+\varepsilon}.$

The left-hand side may be bounded by ${\ll \| \sum_{j=1}^N f_j \|_{L^{n(n+1)}(w_B)}^{n(n+1)}}$, where ${B := B(0,N^n)}$ and ${f_j(x) := e(x \cdot \gamma(j/N))}$. Since

$\displaystyle \| f_j \|_{L^{n(n+1)}(w_B)} \ll (N^{n^2})^{\frac{1}{n(n+1)}},$

the claim now follows from the decoupling theorem and a brief calculation. $\Box$

Using the Plancherel formula, one may equivalently (when ${s}$ is an integer) write the Vinogradov main conjecture in terms of solutions ${j_1,\dots,j_s,k_1,\dots,k_s \in \{1,\dots,N\}}$ to the system of equations

$\displaystyle j_1^i + \dots + j_s^i = k_1^i + \dots + k_s^i \forall i=1,\dots,n,$

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for ${n \leq 3}$, or for ${n > 3}$ and ${s}$ either below ${n(n+1)/2 - n/3 + O(n^{2/3})}$ or above ${n(n-1)}$, with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of ${23}$ fifth powers (the previous best result required ${28}$ fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals ${I_i}$ in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set ${\{1,\dots,N\}}$ that ${j}$ is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

Let ${\lambda}$ denote the Liouville function. The prime number theorem is equivalent to the estimate

$\displaystyle \sum_{n \leq x} \lambda(n) = o(x)$

as ${x \rightarrow \infty}$, that is to say that ${\lambda}$ exhibits cancellation on large intervals such as ${[1,x]}$. This result can be improved to give cancellation on shorter intervals. For instance, using the known zero density estimates for the Riemann zeta function, one can establish that

$\displaystyle \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o( HX ) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$ if ${X^{1/6+\varepsilon} \leq H \leq X}$ for some fixed ${\varepsilon>0}$; I believe this result is due to Ramachandra (see also Exercise 21 of this previous blog post), and in fact one could obtain a better error term on the right-hand side that for instance gained an arbitrary power of ${\log X}$. On the Riemann hypothesis (or the weaker density hypothesis), it was known that the ${X^{1/6+\varepsilon}}$ could be lowered to ${X^\varepsilon}$.

Early this year, there was a major breakthrough by Matomaki and Radziwill, who (among other things) showed that the asymptotic (1) was in fact valid for any ${H = H(X)}$ with ${H \leq X}$ that went to infinity as ${X \rightarrow \infty}$, thus yielding cancellation on extremely short intervals. This has many further applications; for instance, this estimate, or more precisely its extension to other “non-pretentious” bounded multiplicative functions, was a key ingredient in my recent solution of the Erdös discrepancy problem, as well as in obtaining logarithmically averaged cases of Chowla’s conjecture, such as

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)$

It is of interest to twist the above estimates by phases such as the linear phase ${n \mapsto e(\alpha n) := e^{2\pi i \alpha n}}$. In 1937, Davenport showed that

$\displaystyle \sup_\alpha |\sum_{n \leq x} \lambda(n) e(\alpha n)| \ll_A x \log^{-A} x$

which of course improves the prime number theorem. Recently with Matomaki and Radziwill, we obtained a common generalisation of this estimate with (1), showing that

$\displaystyle \sup_\alpha \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)$

as ${X \rightarrow \infty}$, for any ${H = H(X) \leq X}$ that went to infinity as ${X \rightarrow \infty}$. We were able to use this estimate to obtain an averaged form of Chowla’s conjecture.

In that paper, we asked whether one could improve this estimate further by moving the supremum inside the integral, that is to say to establish the bound

$\displaystyle \int_X^{2X} \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (4)$

as ${X \rightarrow \infty}$, for any ${H = H(X) \leq X}$ that went to infinity as ${X \rightarrow \infty}$. This bound is asserting that ${\lambda}$ is locally Fourier-uniform on most short intervals; it can be written equivalently in terms of the “local Gowers ${U^2}$ norm” as

$\displaystyle \int_X^{2X} \sum_{1 \leq a \leq H} |\sum_{x \leq n \leq x+H} \lambda(n) \lambda(n+a)|^2\ dx = o( H^3 X )$

from which one can see that this is another averaged form of Chowla’s conjecture (stronger than the one I was able to prove with Matomaki and Radziwill, but a consequence of the unaveraged Chowla conjecture). If one inserted such a bound into the machinery I used to solve the Erdös discrepancy problem, it should lead to further averaged cases of Chowla’s conjecture, such as

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x), \ \ \ \ \ (5)$

though I have not fully checked the details of this implication. It should also have a number of new implications for sign patterns of the Liouville function, though we have not explored these in detail yet.

One can write (4) equivalently in the form

$\displaystyle \int_X^{2X} \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )\ dx = o(HX) \ \ \ \ \ (6)$

uniformly for all ${x}$-dependent phases ${\alpha(x), \beta(x)}$. In contrast, (3) is equivalent to the subcase of (6) when the linear phase coefficient ${\alpha(x)}$ is independent of ${x}$. This dependency of ${\alpha(x)}$ on ${x}$ seems to necessitate some highly nontrivial additive combinatorial analysis of the function ${x \mapsto \alpha(x)}$ in order to establish (4) when ${H}$ is small. To date, this analysis has proven to be elusive, but I would like to record what one can do with more classical methods like Vaughan’s identity, namely:

Proposition 1 The estimate (4) (or equivalently (6)) holds in the range ${X^{2/3+\varepsilon} \leq H \leq X}$ for any fixed ${\varepsilon>0}$. (In fact one can improve the right-hand side by an arbitrary power of ${\log X}$ in this case.)

The values of ${H}$ in this range are far too large to yield implications such as new cases of the Chowla conjecture, but it appears that the ${2/3}$ exponent is the limit of “classical” methods (at least as far as I was able to apply them), in the sense that one does not do any combinatorial analysis on the function ${x \mapsto \alpha(x)}$, nor does one use modern equidistribution results on “Type III sums” that require deep estimates on Kloosterman-type sums. The latter may shave a little bit off of the ${2/3}$ exponent, but I don’t see how one would ever hope to go below ${1/2}$ without doing some non-trivial combinatorics on the function ${x \mapsto \alpha(x)}$. UPDATE: I have come across this paper of Zhan which uses mean-value theorems for L-functions to lower the ${2/3}$ exponent to ${5/8}$.

Let me now sketch the proof of the proposition, omitting many of the technical details. We first remark that known estimates on sums of the Liouville function (or similar functions such as the von Mangoldt function) in short arithmetic progressions, based on zero-density estimates for Dirichlet ${L}$-functions, can handle the “major arc” case of (4) (or (6)) where ${\alpha}$ is restricted to be of the form ${\alpha = \frac{a}{q} + O( X^{-1/6-\varepsilon} )}$ for ${q = O(\log^{O(1)} X)}$ (the exponent here being of the same numerology as the ${X^{1/6+\varepsilon}}$ exponent in the classical result of Ramachandra, tied to the best zero density estimates currently available); for instance a modification of the arguments in this recent paper of Koukoulopoulos would suffice. Thus we can restrict attention to “minor arc” values of ${\alpha}$ (or ${\alpha(x)}$, using the interpretation of (6)).

Next, one breaks up ${\lambda}$ (or the closely related Möbius function) into Dirichlet convolutions using one of the standard identities (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the Liouville and Möbius functions). The exact choice of identity is not terribly important, but the upshot is that ${\lambda(n)}$ can be decomposed into ${\log^{O(1)} X}$ terms, each of which is either of the “Type I” form

$\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d$

for some coefficients ${a_d}$ that are roughly of logarithmic size on the average, and scales ${D, M}$ with ${D \ll X^{2/3}}$ and ${DM \sim X}$, or else of the “Type II” form

$\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d b_m$

for some coefficients ${a_d, b_m}$ that are roughly of logarithmic size on the average, and scales ${D,M}$ with ${X^{1/3} \ll D,M \ll X^{2/3}}$ and ${DM \sim X}$. As discussed in the previous post, the ${2/3}$ exponent is a natural barrier in these identities if one is unwilling to also consider “Type III” type terms which are roughly of the shape of the third divisor function ${\tau_3(n) := \sum_{d_1d_2d_3=1} 1}$.

A Type I sum makes a contribution to ${ \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )}$ that can be bounded (via Cauchy-Schwarz) in terms of an expression such as

$\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} e(\alpha(x) dm )|^2.$

The inner sum exhibits a lot of cancellation unless ${\alpha(x) d}$ is within ${O(D/H)}$ of an integer. (Here, “a lot” should be loosely interpreted as “gaining many powers of ${\log X}$ over the trivial bound”.) Since ${H}$ is significantly larger than ${D}$, standard Vinogradov-type manipulations (see e.g. Lemma 13 of these previous notes) show that this bad case occurs for many ${d}$ only when ${\alpha}$ is “major arc”, which is the case we have specifically excluded. This lets us dispose of the Type I contributions.

A Type II sum makes a contribution to ${ \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )}$ roughly of the form

$\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} b_m e(\alpha(x) dm)|.$

We can break this up into a number of sums roughly of the form

$\displaystyle \sum_{d = d_0 + O( H / M )} | \sum_{x/d_0 \leq m \leq x/d_0 + H/D} b_m e(\alpha(x) dm)|$

for ${d_0 \sim D}$; note that the ${d}$ range is non-trivial because ${H}$ is much larger than ${M}$. Applying the usual bilinear sum Cauchy-Schwarz methods (e.g. Theorem 14 of these notes) we conclude that there is a lot of cancellation unless one has ${\alpha(x) = a/q + O( \frac{X \log^{O(1)} X}{H^2} )}$ for some ${q = O(\log^{O(1)} X)}$. But with ${H \geq X^{2/3+\varepsilon}}$, ${X \log^{O(1)} X/H^2}$ is well below the threshold ${X^{-1/6-\varepsilon}}$ for the definition of major arc, so we can exclude this case and obtain the required cancellation.