In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes (or in dense subsets of primes ). Unfortunately, the most famous linear equations in primes: the twin prime equation and the even Goldbach equation – remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions for (or equivalently, for ) , as well as the odd Goldbach equation , are tractable.
To illustrate the main ideas, we will focus on the following result of Green:
Theorem 1 (Roth’s theorem in the primes) Let be a subset of primes whose upper density is positive. Then contains infinitely many arithmetic progressions of length three.
This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers (or natural numbers ). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have .
There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.
To transfer results from the integers to the primes, there are three basic steps:
- A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
- An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
- If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.
The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.
The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at , and no knowledge of L-functions is needed.
The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.
— 1. Transference —
The transference principle is not a single theorem, but is instead a family of related results with a common purpose, namely to show that a sufficiently pseudorandom set, measure, or probability distribution will be “indistinguishable” from the whole set (or the uniform measure or probability distribution) in certain statistical senses. A key tool in this regard is a dense model theorem that allows one to approximate or model any set or function that is dense with respect to a pseudorandom measure, by a set or function which is dense with respect to the uniform measure. It turns out that one can do this as long as the approximation is made with respect to a sufficiently weak topology; for the applications to counting arithmetic patterns, it turns out that the topology given by the Gowers norms is the right one to use. The somewhat complicated nature of these norms, though, does make the verification of the required pseudorandomness properties to be slightly tricky.
We illustrate these themes with Roth’s theorem, though the general strategy applies to several other results in additive combinatorics. We begin with Roth’s theorem in a cyclic group , which we phrase as follows:
Theorem 2 (Roth’s theorem in ) Let be odd. If is a function obeying the pointwise bound and the lower bound , then one has for some , where .
We assume this theorem as a “black box”, in that we will not care as to how this theorem is proven. As noted in previous notes, this theorem easily implies the existence of non-trivial arithmetic progressions of length three in any subset of (say) with , as long as is sufficiently large depending on , as it provides a non-trivial lower bound on .
Now we generalise the above theorem. We view as an (odd) parameter going off to infinity, and use to denote any quantity that goes to zero as . We define a measure (or more precisely, a weight function) to be a non-negative function depending on , such that , thus is basically the density function of a probability distribution on . We say that is Roth-pseudorandom if for every (independent of ) there exists such that one has the lower bound
whenever is a function obeying the pointwise bound and the lower bound , and goes to zero as for any fixed . Thus, Roth’s theorem asserts that the uniform measure is Roth-pseudorandom. Observe that if is Roth-pseudorandom, then any subset of whose weighted density is at least will contain a non-trivial arithmetic progression of length three, if is sufficiently large depending on , as we once again obtain a non-trivial lower bound on in this case. Thus it is of interest to establish Roth-pseudorandomness for a wide class of measures.
Exercise 1 Show that if is Roth-pseudorandom, and is another measure which is “uniformly absolutely continuous” with respect to in the sense that one has the bound all and some function with as , then is also Roth-pseudorandom.
In view of the above exercise, the case of measures that are absolutely continuous with respect to the uniform distribution is uninteresting: the important case is instead when is “singular” with respect to the uniform measure, in the sense that it is concentrated on a set of density with respect to uniform measure, as this will allow us to detect progressions of length three in sparse sets.
A model example to keep in mind of a candidate for a Roth-pseudorandom measure is a random sparse measure of some small density , in which each is an independent random variable that equals with probability and otherwise. The case can be thought of as a crude model for the primes (cf. Cramér’s random model for the primes).
Recall that the form is controlled by the norm in the sense that one has the inequality
whenever are bounded in magnitude by , and similarly for permutations. Actually one has the slightly more precise inequality
Hölder’s inequality, and the Plancherel identity.
This suggests a strategy to establish the Roth-pseudorandomness of a measure by showing that functions dominated that measure can be approximated in norm by functions dominated instead by the uniform measure . Indeed, we have
- (Control by ) For any with the pointwise bound , one has , where is a function with as , and similarly for permutations.
- (Approximation in ) For any with the pointwise bound , and any , there exists with the pointwise bound such that .
Then is Roth-pseudorandom.
Proof: Let be such that and . Let be a small number to be chosen later. We then use the decomposition to split with the above stated properties. Since
we have from the triangle inequality that
and in particular
for large enough. Similarly we have (say) for large enough. From Roth’s theorem we conclude that
for large enough. On the other hand, by the first hypothesis, the other seven terms in
are for large enough. If is sufficiently small depending on , we obtain the claim.
Note that this argument in fact gives a value of that is essentially the same as . Also, we see that the norm here could be replaced by the norm, or indeed by any other quantity which is strong enough for the control hypothesis to hold, and also weak enough for the approximation property to hold.
So now we need to find some conditions on that will allow us to obtain both the control and approximation properties. We begin with the control property. One way to accomplish this is via a restriction estimate:
whenever obeys the pointwise bound , where is independent of . Then enjoys the control in property from Lemma 5.
Proof: From Plancherel’s theorem, we see that (2) already holds if we have , so by the triangle inequality it also holds (with a slightly different value of ) if .
Now suppose that . From (1) and Hölder’s inequality one has
and thus by (2)
and the claim follows.
Exercise 2 Show that the estimate (2) for can only hold when is bounded uniformly in ; this explains the presence of the hypothesis in the above condition.
Exercise 3 Show that the estimate (2) is equivalent to the estimate
for all , where is the dual exponent to . Informally, this asserts that a Fourier series with coefficients can be “restricted” to the support of in an uniformly absolutely integrable manner (relative to ). Historically, this is the origin of the term “restriction theorem” (in the context where is replaced with a Euclidean space such as , and is surface measure on a manifold such as the sphere ). See for instance my notes on this topic.
We will briefly discuss the standard method for establishing restriction estimates, namely the Tomas-Stein argument or large sieve inequality, later in these notes.
Now we turn to the approximation property. The approximation to needs to be close in norm, i.e. the Fourier coefficients need to be uniformly close. One attempt to accomplish this is hard thresholding: one simply discards all Fourier coefficients in the Fourier expansion
of that are too small, thus setting equal to something like
The main problem with this choice is that there is no guarantee that the non-negativity of will transfer over to the non-negativity of ; also, there is no particular reason why would be bounded.
But a small modification of this idea does work, as follows. Let denote the large Fourier coefficients of . The function proposed above can be viewed as a convolution , where and . The inability to get good pointwise bounds on can be traced back to the oscillatory nature of the convolution kernel (which can be viewed as a generalised Dirichlet kernel).
But experience with Fourier analysis tells us that the behaviour of such convolutions improves if one replaces the Dirichlet-type kernels with something more like a Fejér type kernel instead. With that in mind, we try
where is the Bohr set
Clearly, if is non-negative, then is also. Now we look at upper bounds on . Clearly
so by Fourier expansion
Evaluating the term on the RHS separately, we conclude
By Plancherel’s theorem we have
From the Kronecker approximation theorem we have
(say). Finally, if we assume (2) we have . Putting this all together we obtain the pointwise bound
Finally, we see how approximates . From Fourier analysis one has
The frequencies that lie outside give a contribution of at most by the definition of , so we look now at the terms where . From the definition of and the triangle inequality we have
in such cases, while from the measure nature of we have
Putting this all together, we obtain
To summarise, we have the following result, which essentially appears in this paper of Green and myself:
This turns out to be a fairly tractable criterion for establishing the Roth-pseudorandomness of various measures, which in turn can be used to detect progressions of length three (and related patterns) on various sparse sets, such as the primes; see the next section.
The above arguments to establish Roth-pseudorandomness relied heavily on linear Fourier analysis. Now we give an alternate approach that avoids Fourier analysis entirely; it is less efficient and a bit messier, but will extend in a fairly straightforward (but notationally intensive) manner to higher order patterns. To do this, we replace the norm in Lemma 5 with the norm, so we now have to verify a control by hypothesis and an approximation by hypothesis.
We begin with the control by hypothesis. Instead of Fourier analysis, we will rely solely on the Cauchy-Schwarz inequality, using a weighted version of the arguments from Notes 3 that first appeared in this paper of Green and myself. We wish to control the expression
where are bounded in magnitude by . For simplicity we will just assume that are bounded in magnitude by ; the more general case is similar but a little bit messier. For brevity we will also omit the domain in the averages, and also abbreviate as . We make the change of variables to write this expression as
the point being that each term involves only two of the three variables .
We can pointwise bound by and estimate the above expression in magnitude by
Since , we can use Cauchy-Schwarz and bound this by
which we rewrite as
We now bound by , to obtain
(which is a variant of (3), as can be seen by expanding out using Fourier analysis), followed by Cauchy-Schwarz, we can bound this by
We expand this out as
If the factor could be replaced by , then the expression inside the absolute values would just be , which is what we wanted. Applying the triangle inequality and bounding by , we can thus bound the previous expression by
for , then another application of Cauchy-Schwarz gives
and so we have obtained the control in hypothesis (at least for , and assuming boundedness by and assuming the conditions (4), (5)). We refer to such conditions (involving the product of evaluated at distinct linear forms on the left-hand side, and a on the right-hand side) as linear forms conditions. Generalising to the case of functions bounded by , and permuting , we can soon obtain the following result (stated somewhat informally):
Lemma 6 (Generalised von Neumann theorem) If obeys a certain finite list of linear forms conditions, then the control by hypothesis in Lemma 5 holds.
Now we turn to the approximation in property. It is possible to establish this approximation property by an energy increment method, analogous to the energy increment proof of Roth’s theorem in Notes 2; see my paper with Green for details. However, this argument turns out to be rather complicated. We give here a simpler approach based on duality (and more precisely, the Hahn-Banach theorem) that yields the same result, due independently to Gowers and to Reingold-Trevisan-Tulsiani-Vadhan. This approach also has the benefit of giving somewhat sharper quantitative refinements.
The first task is to represent the norm in a dual formulation. The starting point is that the expression
whenever , can be rewritten as
where the dual function is defined by
Define a basic anti-uniform function to be any function of the form , where obeys the pointwise bound . To obtain the approximation property, it thus suffices to show that for every , for sufficiently large depending on , and any with , one can decompose where and for all basic anti-uniform functions . Indeed, if one sets , the latter bound gives , and the desired decomposition follows.
In order to apply the Hahn-Banach theorem properly, it is convenient to symmetrise and convexify the space of basic anti-uniform functions. Define an averaged anti-uniform function to be any convex combination of basic anti-uniform functions and their negations, and denote the space of all such averaged anti-uniform functions as . Thus is a compact convex symmetric subset of the finite-dimensional real vector space that contains a neighbourhood of the origin; equivalently, it defines a norm on . Our task is then to show (for fixed and large ) that for any with , the sets
have non-empty intersection.
The point of phrasing things this way is that and are both closed convex subsets of the finite-dimensional vector space , and so the Hahn-Banach theorem is applicable. (One could also use closely related results, such as the Farkas lemma: see this earlier blog post for more discussion.) Indeed, suppose that there was some for which and were disjoint. Then, by the Hahn-Banach theorem, there must exist some linear functional
which separates the two sets, in the sense that
for all , and
for all , where is a real number.
From the form of , we see that we must have . In particular, we may normalise to be on the boundary of . As all finite-dimensional spaces are reflexive, we see in that case that can be as large as on , and independently can be as large as . We conclude that
As , we see that , and thus
We remark that the linear forms condition (which we have not specified explicitly) will give this bound with .
Since is a convex combination of functions of the form and , this implies that is bounded uniformly as well: . Applying the Weierstrass approximation theorem to the function for (and noting that the norm of is ) we conclude that there exists a polynomial (depending only on and ) such that
(say). Breaking into monomials, and using the pigeonhole principle, we conclude that there exists a non-negative integer such that
since was a convex combination of functions of the form , we thus conclude that there exist with such that
for all and all bounded in magnitude by .
We summarise this discussion as follows:
There is nothing too special about the norm here; one could work with higher Gowers norms, or indeed with any other norm for which one has a reasonably explicit description of the dual.
The abstract version of theorem was first (implicitly) proven , and made more explicit in this paper of Ziegler and myself. The methods there were different (and somewhat more complicated). To prove approximation, the basic idea was to write for some carefully chosen -algebra (built out of dual functions that correlated with things like the residual ). This automatically gave the non-negativity of ; the upper bound on came from the bound , with the latter expression then being bounded by the Weierstrass approximation theorem and (7).
To summarise, in order to establish the Roth-pseudorandomness of a measure , we have at least two options. The first (which relies on Fourier analysis, and is thus largely restricted to complexity problems) is to establish the Fourier pseudorandomness bound (3) and the restriction estimate (2). The other (which does not require Fourier analysis) is to establish a finite number of linear forms conditions, as well as the estimate (7).
Next, we informally sketch how one can deduce (7) from a finite number of linear forms conditions, as well as a crude estimate
whenever are distinct, thus the -point correlation function of is bounded for each . For the number-theoretic applications, one needs to replace the on the right-hand side by a more complicated expression, but we will defer this technicality to the exercises. We remark that for each fixed , the correlation condition would be implied by the linear forms condition, but it is important that we can make arbitrarily large.
For simplicity of notation we assume that the are bounded in magnitude by rather than by . We begin by expanding out (7) as
Shifting by for some and re-averaging, we can rewrite this as
where for and
The inner expectation is the Gowers inner product of , , , and . Using the linear forms condition we may assume that
and so it will suffice by the Cauchy-Schwarz-Gowers inequality, followed by the Hölder inequality, to show that
and similarly for and .
We just prove the claim for , as the other two cases are similar. We expand the left-hand side as
which we can upper bound by
We can factorise this as
Exercise 4 Show that (7) also follows from a finite number of linear forms conditions and (9), if the are only assumed to be bounded in magnitude by rather than , and the right-hand side of (9) is weakened to , where is a function obeying the moment bounds for each .
The above machinery was geared to getting Roth-type lower bounds on ; but it also can be used to give more precise asymptotics:
Exercise 5 Suppose that obeys the hypotheses of Lemma 5 (with the norm). Let obey the pointwise bound and has mean ; suppose also that one has the pseudorandomness bound . Show that .
Exercise 6 Repeat the previous exercise, but with the norm replaced by the norm.
Informally, the above exercises show that if one wants to obtain asymptotics for three-term progressions in a set which has positive relative density with respect to a Roth-pseudorandom measure, then it suffices to obtain a non-trivial bound on the exponential sums for non-zero frequencies .
For longer progressions, one uses higher-order Gowers norms, and a similar argument (using the inverse conjecture for the Gowers norms) shows (roughly speaking) that to obtain asymptotics for -term progressions (or more generally, linear patterns of complexity ) in a -pseudorandom measure (by which we mean that the analogue of Lemma 5 for the norm holds) then it suffices to obtain a non-trivial bound on sums of the form for -step nilsequences . See this paper of Green and myself for further discussion.
— 2. A brief discussion of sieve theory —
In order to apply the above theory to find patterns in the primes, we need to build a measure with respect to which the primes have a positive density, and for which one can verify conditions such as the Fourier pseudorandomness condition (3), the restriction estimate (2), linear forms conditions, and the correlation condition (9).
There is an initial problem with this, namely that the primes themselves are not uniformly distributed with respect to small moduli. For instance, all primes are coprime to two (with one exception). In contrast, any measure obeying the Fourier pseudorandomness condition (3) (which is implied by the condition , which would follow in turn from the linear forms condition), must be evenly distributed in both odd and even residue classes up to errors; this forces the density of the primes in to be at most . A similar argument using all the prime moduli less than some parameter shows in fact that the density of primes in is at most . Since diverges to , diverges to zero, and so we see that the primes cannot in fact have a positive density with respect to any pseudorandom measure.
This difficulty can be overcome by a simple affine change of variables known as the -trick, where we replace the primes by the modified set , where is the product of all the primes less than , and is a residue class coprime to . In practice, (and ) are slowly growing functions of , e.g. one could take . By the pigeonhole principle, for any given and there will exist a for which is large (of cardinality , where is the number of residue classes coprime to ); indeed, thanks to the prime number theorem in arithmetic progressions, any such would work (e.g. one can take ). Note that every arithmetic progression in is associated to a corresponding arithmetic progression in . Thus, for the task of locating arithmetic progressions at least, we may as well work with ; a similar claim also holds for more complicated tasks, such as counting the number of linear patterns in , though one now has to work with several residue classes at once. The point of passing from to is that the latter set no longer has any particular bias to favorr or disfavour any residue class with modulus less than ; there are still biases at higher moduli, but as long as goes to infinity with , the effect of such biases will end up being negligible (ultimately contributing terms to things like the linear forms condition).
To simplify the exposition a bit, though, let us ignore the -trick and pretend that we are working with the primes themselves rather than the affine-shifted primes. We will also ignore the technical distinctions between the interval and the cyclic group .
The most natural candidate for the measure is the von Mangoldt function , defined by setting when is a prime or a power of a prime, and otherwise. One hint as to the significance of this function is provided by the identity
for all natural numbers , which can be viewed as a generating function of the fundamental theorem of arithmetic.
The prime number theorem tells us that is indeed a measure: . And the primes have full density with respect to this function: . Furthermore, the von Mangoldt function has good Fourier pseudorandomness properties (after applying the -trick), thanks to the classical techniques of Hardy-Littlewood and Vinogradov. Indeed, to control exponential sums such as for some , one can use tools such as the Siegel-Walfisz theorem (a quantitative version of the prime number theorem in arithmetic progressions) to control such sums in the “major arc” case when is close to a rational of small height, while in the “minor arc” case when behaves irrationally, one can use the standard identity
where is the Möbius function, to re-express such a sum in terms of expressions roughly of the form
where we are intentionally vague as to what range the parameters are being summed over. The idea is then to eliminate the factor by tools such as the triangle inequality or the Cauchy-Schwarz inequality, leading to expressions such as
the point is that the inner sum does not contain any number-theoretic factors such as or , but is still oscillatory (at least if is sufficiently irrational), and so one can extract useful cancellation from here. Actually, the situation is more complicated than this, because there are regions of the range of for which this method provides insufficient cancellation, in which case one has to rearrange the sum further using more arithmetic identities such as (10) (for instance, using a truncated version of (10) known as Vaughan’s identity). We will not discuss this further here, but any advanced analytic number theory text (e.g. Iwaniec-Kowalski) will cover this material.
Unfortunately, while the Fourier-pseudorandomness of is well-understood, the linear forms and correlation conditions are essentially equivalent to (and in fact slightly harder than) the original problem of obtaining asymptotics for linear patterns in primes, and so using for the pseudorandom measure would result in a circular argument. Furthermore, correlations such as
(which essentially counts the number of twin primes up to ) are notoriously difficult to compute. For instance, if one tries to expand the above sum using (10), one ends up with expressions such as
By the Chinese remainder theorem, the two residue conditions can be combined to a single residue condition for modulo the least common multiple of and . If and are both small, e.g. , then this least common multiple is much less than , and in such a case one can compute the inner sum very precisely; as it turns out, the main term in this estimate is multiplicative in , which allows the outer sum to be estimated using the techniques of multiplicative number theory (and in particular, using the theory of the Riemann zeta function). Unfortunately, for the bulk of the above sum, and are instead comparable to , and the least common multiple is typically of size , and then it becomes extraordinarily difficult to estimate the inner sum (and hence the entire sum).
However, if we truncate the divisor sum (10) to restrict to a range such as , then the situation improves substantially. This leads to expressions such as
for some cutoff function , where is a small power of (e.g. ); the expression (11) corresponds to the case . The presence of the square is to ensure that is non-negative, and the presence of the is a normalisation factor to ensure that has mean close to . Such expressions were essentially introduced to Selberg (as part of what is now known as the Selberg sieve), although the sieve weight factors are usually modified slightly for the Selberg sieve (see this paper of Green and myself for further discussion). The correlation properties of the particular expression (11) were studied intensively by Goldston and Yildirim, and have particularly sharp estimates, although for applications discussed here, one can work instead with a smoother choice of cutoff , which makes the required correlation estimates on easier to prove (but with slightly worse bounds). Indeed, the required linear forms and correlation conditions can be verified for (12) (or more precisely, a variant of in which the -trick is applied) by a moderately lengthy, but elementary and straightforward calculation, based ultimately on the Chinese remainder theorem, an analysis of the local problem (working mod for small ), and the fundamental fact that the Riemann zeta function is approximately equal to for close to . See for instance this unpublished note of mine for more discussion. (The exact power of that one sets equal to will depend on the complexity of the linear forms and co
If one uses (11), then we see that is equal to when is any prime larger than ; if is comparable to , we thus see (from the prime number theorem) that the primes in do indeed have positive density relative to . This is then enough to be able to invoke the transference principle and extend results such as Szemerédi’s theorem to the primes, establishing in particular that the primes contain arbitrarily long arithmetic progressions; see this paper of Green and myself for details.
To use the Fourier-analytic approach, it turns out to be convenient to replace the above measures by a slight variant which looks more complicated in the spatial domain, but is easier to manipulate in the frequency domain. More specifically, the expression (11) or (12) is replaced with a variant such as
where is the Euler totient function (the number of integers from to that are coprime to ). Some standard multiplicative number theory shows that the weights are approximately equal to in some sense. With such a representation, it turns out that the Fourier coefficients of can be computed more or less explicitly, and is essentially supported on those frequencies of the form with . This makes it easy to verify the required Fourier-pseudorandomness hypothesis (3) (once one applies the -trick). As for the restriction estimate (2), the first step is to use Exercise (3) and the Cauchy-Schwarz inequality to reduce matters to showing an estimate of the shape
The right-hand side can be rearranged to be of the shape
It is then possible to use the good pointwise control on the Fourier transform of (in particular, the fact that it “decays” quite rapidly away from the major arcs) to get a good restriction estimate. See this paper of Green and myself for further discussion.
As discussed in the previous section, to get asymptotics for patterns in the primes we also need to control exponential sums such as
and more generally (for higher complexity patterns)
for various nilsequences . Again, it is convenient to use the von Mangoldt function as a proxy for the primes, thus leading to expressions such as
Actually, for technical reasons it is convenient to use identities such as (10) to replace this type of expression with expressions such as
because the Möbius function enjoys better boundedness and equidistribution properties than . (For instance, strongly favours odd numbers over even numbers, whereas the Möbius function has no preference.) It turns out that these expressions can be controlled by a generalisation of the method of Vinogradov used to compute exponential sums over primes, using the equidistribution theory of nilsequences as a substitute for the classical theory of exponential sums over integers. See this paper of Green and myself for details.