In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes (or in dense subsets
of primes
). Unfortunately, the most famous linear equations in primes: the twin prime equation
and the even Goldbach equation
– remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions
for
(or equivalently,
for
) , as well as the odd Goldbach equation
, are tractable.
To illustrate the main ideas, we will focus on the following result of Green:
Theorem 1 (Roth’s theorem in the primes) Let
be a subset of primes whose upper density
is positive. Then
contains infinitely many arithmetic progressions of length three.
This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers
(or natural numbers
). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have
.
There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.
To transfer results from the integers to the primes, there are three basic steps:
- A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
- An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
- If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.
The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.
The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at , and no knowledge of L-functions is needed.
The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.
— 1. Transference —
The transference principle is not a single theorem, but is instead a family of related results with a common purpose, namely to show that a sufficiently pseudorandom set, measure, or probability distribution will be “indistinguishable” from the whole set (or the uniform measure or probability distribution) in certain statistical senses. A key tool in this regard is a dense model theorem that allows one to approximate or model any set or function that is dense with respect to a pseudorandom measure, by a set or function which is dense with respect to the uniform measure. It turns out that one can do this as long as the approximation is made with respect to a sufficiently weak topology; for the applications to counting arithmetic patterns, it turns out that the topology given by the Gowers norms is the right one to use. The somewhat complicated nature of these norms, though, does make the verification of the required pseudorandomness properties to be slightly tricky.
We illustrate these themes with Roth’s theorem, though the general strategy applies to several other results in additive combinatorics. We begin with Roth’s theorem in a cyclic group , which we phrase as follows:
Theorem 2 (Roth’s theorem in
) Let
be odd. If
is a function obeying the pointwise bound
and the lower bound
, then one has
for some
, where
.
We assume this theorem as a “black box”, in that we will not care as to how this theorem is proven. As noted in previous notes, this theorem easily implies the existence of non-trivial arithmetic progressions of length three in any subset of
(say) with
, as long as
is sufficiently large depending on
, as it provides a non-trivial lower bound on
.
Now we generalise the above theorem. We view as an (odd) parameter going off to infinity, and use
to denote any quantity that goes to zero as
. We define a measure (or more precisely, a weight function) to be a non-negative function
depending on
, such that
, thus
is basically the density function of a probability distribution on
. We say that
is Roth-pseudorandom if for every
(independent of
) there exists
such that one has the lower bound
whenever is a function obeying the pointwise bound
and the lower bound
, and
goes to zero as
for any fixed
. Thus, Roth’s theorem asserts that the uniform measure
is Roth-pseudorandom. Observe that if
is Roth-pseudorandom, then any subset
of
whose weighted density
is at least
will contain a non-trivial arithmetic progression of length three, if
is sufficiently large depending on
, as we once again obtain a non-trivial lower bound on
in this case. Thus it is of interest to establish Roth-pseudorandomness for a wide class of measures.
Exercise 1 Show that if
is Roth-pseudorandom, and
is another measure which is “uniformly absolutely continuous” with respect to
in the sense that one has the bound
all
and some function
with
as
, then
is also Roth-pseudorandom.
In view of the above exercise, the case of measures that are absolutely continuous with respect to the uniform distribution is uninteresting: the important case is instead when is “singular” with respect to the uniform measure, in the sense that it is concentrated on a set of density
with respect to uniform measure, as this will allow us to detect progressions of length three in sparse sets.
A model example to keep in mind of a candidate for a Roth-pseudorandom measure is a random sparse measure of some small density , in which each
is an independent random variable that equals
with probability
and
otherwise. The case
can be thought of as a crude model for the primes (cf. Cramér’s random model for the primes).
Recall that the form is controlled by the
norm in the sense that one has the inequality
whenever are bounded in magnitude by
, and similarly for permutations. Actually one has the slightly more precise inequality
where
as can easily be seen from the identity
Hölder’s inequality, and the Plancherel identity.
This suggests a strategy to establish the Roth-pseudorandomness of a measure by showing that functions dominated that measure can be approximated in
norm by functions dominated instead by the uniform measure
. Indeed, we have
Lemma 3 (Criterion for Roth-pseudorandomness) Suppose we have a measure
with the following properties:
- (Control by
) For any
with the pointwise bound
, one has
, where
is a function with
as
, and similarly for permutations.
- (Approximation in
) For any
with the pointwise bound
, and any
, there exists
with the pointwise bound
such that
.
Then
is Roth-pseudorandom.
Proof: Let be such that
and
. Let
be a small number to be chosen later. We then use the decomposition to split
with the above stated properties. Since
we have from the triangle inequality that
and in particular
for large enough. Similarly we have
(say) for
large enough. From Roth’s theorem we conclude that
for large enough. On the other hand, by the first hypothesis, the other seven terms in
are for
large enough. If
is sufficiently small depending on
, we obtain the claim.
Note that this argument in fact gives a value of that is essentially the same as
. Also, we see that the
norm here could be replaced by the
norm, or indeed by any other quantity which is strong enough for the control hypothesis to hold, and also weak enough for the approximation property to hold.
So now we need to find some conditions on that will allow us to obtain both the control and approximation properties. We begin with the control property. One way to accomplish this is via a restriction estimate:
Lemma 4 (Restriction estimate implies control) Let
be a measure. Suppose there exists an exponent
such that one has the restriction estimate
whenever
obeys the pointwise bound
, where
is independent of
. Then
enjoys the control in
property from Lemma 5.
Proof: From Plancherel’s theorem, we see that (2) already holds if we have , so by the triangle inequality it also holds (with a slightly different value of
) if
.
Now suppose that . From (1) and Hölder’s inequality one has
and thus by (2)
and the claim follows.
Exercise 2 Show that the estimate (2) for
can only hold when
is bounded uniformly in
; this explains the presence of the hypothesis
in the above condition.
Exercise 3 Show that the estimate (2) is equivalent to the estimate
for all
, where
is the dual exponent to
. Informally, this asserts that a Fourier series with
coefficients can be “restricted” to the support of
in an uniformly absolutely integrable manner (relative to
). Historically, this is the origin of the term “restriction theorem” (in the context where
is replaced with a Euclidean space such as
, and
is surface measure on a manifold such as the sphere
). See for instance my notes on this topic.
We will briefly discuss the standard method for establishing restriction estimates, namely the Tomas-Stein argument or large sieve inequality, later in these notes.
Now we turn to the approximation property. The approximation to
needs to be close in
norm, i.e. the Fourier coefficients need to be uniformly close. One attempt to accomplish this is hard thresholding: one simply discards all Fourier coefficients in the Fourier expansion
of that are too small, thus setting
equal to something like
The main problem with this choice is that there is no guarantee that the non-negativity of will transfer over to the non-negativity of
; also, there is no particular reason why
would be bounded.
But a small modification of this idea does work, as follows. Let denote the large Fourier coefficients of
. The function
proposed above can be viewed as a convolution
, where
and
. The inability to get good pointwise bounds on
can be traced back to the oscillatory nature of the convolution kernel
(which can be viewed as a generalised Dirichlet kernel).
But experience with Fourier analysis tells us that the behaviour of such convolutions improves if one replaces the Dirichlet-type kernels with something more like a Fejér type kernel instead. With that in mind, we try
where is the Bohr set
Clearly, if is non-negative, then
is also. Now we look at upper bounds on
. Clearly
so by Fourier expansion
Let us make the Fourier-pseudorandomness assumption
Evaluating the term on the RHS separately, we conclude
By Plancherel’s theorem we have
From the Kronecker approximation theorem we have
(say). Finally, if we assume (2) we have . Putting this all together we obtain the pointwise bound
Finally, we see how approximates
. From Fourier analysis one has
and so
The frequencies that lie outside
give a contribution of at most
by the definition of
, so we look now at the terms where
. From the definition of
and the triangle inequality we have
in such cases, while from the measure nature of we have
Putting this all together, we obtain
To summarise, we have the following result, which essentially appears in this paper of Green and myself:
Theorem 5 (Criterion for Roth-pseudorandomness) Let
be a measure obeying the Fourier-pseudorandomness assumption (3) and the restriction estimate (2) for some
. Then
is Roth-pseudorandom.
This turns out to be a fairly tractable criterion for establishing the Roth-pseudorandomness of various measures, which in turn can be used to detect progressions of length three (and related patterns) on various sparse sets, such as the primes; see the next section.
The above arguments to establish Roth-pseudorandomness relied heavily on linear Fourier analysis. Now we give an alternate approach that avoids Fourier analysis entirely; it is less efficient and a bit messier, but will extend in a fairly straightforward (but notationally intensive) manner to higher order patterns. To do this, we replace the norm in Lemma 5 with the
norm, so we now have to verify a control by
hypothesis and an approximation by
hypothesis.
We begin with the control by hypothesis. Instead of Fourier analysis, we will rely solely on the Cauchy-Schwarz inequality, using a weighted version of the arguments from Notes 3 that first appeared in this paper of Green and myself. We wish to control the expression
where are bounded in magnitude by
. For simplicity we will just assume that
are bounded in magnitude by
; the more general case is similar but a little bit messier. For brevity we will also omit the domain
in the averages, and also abbreviate
as
. We make the change of variables
to write this expression as
the point being that each term involves only two of the three variables .
We can pointwise bound by
and estimate the above expression in magnitude by
Since , we can use Cauchy-Schwarz and bound this by
which we rewrite as
We now bound by
, to obtain
(which is a variant of (3), as can be seen by expanding out using Fourier analysis), followed by Cauchy-Schwarz, we can bound this by
We expand this out as
where
If the factor could be replaced by
, then the expression inside the absolute values would just be
, which is what we wanted. Applying the triangle inequality and bounding
by
, we can thus bound the previous expression by
for , then another application of Cauchy-Schwarz gives
and so we have obtained the control in hypothesis (at least for
, and assuming boundedness by
and
assuming the conditions (4), (5)). We refer to such conditions (involving the product of
evaluated at distinct linear forms on the left-hand side, and a
on the right-hand side) as linear forms conditions. Generalising to the case of functions bounded by
, and permuting
, we can soon obtain the following result (stated somewhat informally):
Lemma 6 (Generalised von Neumann theorem) If
obeys a certain finite list of linear forms conditions, then the control by
hypothesis in Lemma 5 holds.
Now we turn to the approximation in property. It is possible to establish this approximation property by an energy increment method, analogous to the energy increment proof of Roth’s theorem in Notes 2; see my paper with Green for details. However, this argument turns out to be rather complicated. We give here a simpler approach based on duality (and more precisely, the Hahn-Banach theorem) that yields the same result, due independently to Gowers and to Reingold-Trevisan-Tulsiani-Vadhan. This approach also has the benefit of giving somewhat sharper quantitative refinements.
The first task is to represent the norm in a dual formulation. The starting point is that the expression
whenever , can be rewritten as
where the dual function is defined by
Define a basic anti-uniform function to be any function of the form , where
obeys the pointwise bound
. To obtain the approximation property, it thus suffices to show that for every
, for
sufficiently large depending on
, and any
with
, one can decompose
where
and
for all basic anti-uniform functions
. Indeed, if one sets
, the latter bound gives
, and the desired decomposition follows.
In order to apply the Hahn-Banach theorem properly, it is convenient to symmetrise and convexify the space of basic anti-uniform functions. Define an averaged anti-uniform function to be any convex combination of basic anti-uniform functions and their negations, and denote the space of all such averaged anti-uniform functions as . Thus
is a compact convex symmetric subset of the finite-dimensional real vector space
that contains a neighbourhood of the origin; equivalently, it defines a norm on
. Our task is then to show (for fixed
and large
) that for any
with
, the sets
and
have non-empty intersection.
The point of phrasing things this way is that and
are both closed convex subsets of the finite-dimensional vector space
, and so the Hahn-Banach theorem is applicable. (One could also use closely related results, such as the Farkas lemma: see this earlier blog post for more discussion.) Indeed, suppose that there was some
for which
and
were disjoint. Then, by the Hahn-Banach theorem, there must exist some linear functional
which separates the two sets, in the sense that
for all , and
for all , where
is a real number.
From the form of , we see that we must have
. In particular, we may normalise
to be on the boundary of
. As all finite-dimensional spaces are reflexive, we see in that case that
can be as large as
on
, and independently
can be as large as
. We conclude that
As , we see that
, and thus
We now make the hypothesis that the dual function of
is uniformly bounded:
We remark that the linear forms condition (which we have not specified explicitly) will give this bound with .
Since is a convex combination of functions of the form
and
, this implies that
is bounded uniformly as well:
. Applying the Weierstrass approximation theorem to the function
for
(and noting that the
norm of
is
) we conclude that there exists a polynomial
(depending only on
and
) such that
(say). Breaking into monomials, and using the pigeonhole principle, we conclude that there exists a non-negative integer
such that
since was a convex combination of functions of the form
, we thus conclude that there exist
with
such that
We shall contradict this by fiat, making the hypothesis that
for all and all
bounded in magnitude by
.
We summarise this discussion as follows:
Theorem 7 (Dense model theorem) If (6), (7) hold, then the approximation in
hypothesis in Lemma 5 holds.
There is nothing too special about the norm here; one could work with higher Gowers norms, or indeed with any other norm for which one has a reasonably explicit description of the dual.
The abstract version of theorem was first (implicitly) proven , and made more explicit in this paper of Ziegler and myself. The methods there were different (and somewhat more complicated). To prove approximation, the basic idea was to write for some carefully chosen
-algebra
(built out of dual functions that correlated with things like the residual
). This automatically gave the non-negativity of
; the upper bound on
came from the bound
, with the latter expression then being bounded by the Weierstrass approximation theorem and (7).
To summarise, in order to establish the Roth-pseudorandomness of a measure , we have at least two options. The first (which relies on Fourier analysis, and is thus largely restricted to complexity
problems) is to establish the Fourier pseudorandomness bound (3) and the restriction estimate (2). The other (which does not require Fourier analysis) is to establish a finite number of linear forms conditions, as well as the estimate (7).
Next, we informally sketch how one can deduce (7) from a finite number of linear forms conditions, as well as a crude estimate
and a condition known as the correlation condition. At the cost of oversimplifying slightly, we express this condition as the assertion that
whenever are distinct, thus the
-point correlation function of
is bounded for each
. For the number-theoretic applications, one needs to replace the
on the right-hand side by a more complicated expression, but we will defer this technicality to the exercises. We remark that for each fixed
, the correlation condition would be implied by the linear forms condition, but it is important that we can make
arbitrarily large.
For simplicity of notation we assume that the are bounded in magnitude by
rather than by
. We begin by expanding out (7) as
Shifting by
for some
and re-averaging, we can rewrite this as
where for
and
The inner expectation is the Gowers inner product of ,
,
, and
. Using the linear forms condition we may assume that
and so it will suffice by the Cauchy-Schwarz-Gowers inequality, followed by the Hölder inequality, to show that
and similarly for and
.
We just prove the claim for , as the other two cases are similar. We expand the left-hand side as
which we can upper bound by
We can factorise this as
Using (9), we see that the inner expectation is as long as
are distinct; in all other cases they are
, by (8). Combining these two cases we obtain the claim.
Exercise 4 Show that (7) also follows from a finite number of linear forms conditions and (9), if the
are only assumed to be bounded in magnitude by
rather than
, and the right-hand side of (9) is weakened to
, where
is a function obeying the moment bounds
for each
.
The above machinery was geared to getting Roth-type lower bounds on ; but it also can be used to give more precise asymptotics:
Exercise 5 Suppose that
obeys the hypotheses of Lemma 5 (with the
norm). Let
obey the pointwise bound
and has mean
; suppose also that one has the pseudorandomness bound
. Show that
.
Exercise 6 Repeat the previous exercise, but with the
norm replaced by the
norm.
Informally, the above exercises show that if one wants to obtain asymptotics for three-term progressions in a set which has positive relative density with respect to a Roth-pseudorandom measure, then it suffices to obtain a non-trivial bound on the exponential sums
for non-zero frequencies
.
For longer progressions, one uses higher-order Gowers norms, and a similar argument (using the inverse conjecture for the Gowers norms) shows (roughly speaking) that to obtain asymptotics for -term progressions (or more generally, linear patterns of complexity
) in a
-pseudorandom measure (by which we mean that the analogue of Lemma 5 for the
norm holds) then it suffices to obtain a non-trivial bound on sums of the form
for
-step nilsequences
. See this paper of Green and myself for further discussion.
— 2. A brief discussion of sieve theory —
In order to apply the above theory to find patterns in the primes, we need to build a measure with respect to which the primes have a positive density, and for which one can verify conditions such as the Fourier pseudorandomness condition (3), the restriction estimate (2), linear forms conditions, and the correlation condition (9).
There is an initial problem with this, namely that the primes themselves are not uniformly distributed with respect to small moduli. For instance, all primes are coprime to two (with one exception). In contrast, any measure obeying the Fourier pseudorandomness condition (3) (which is implied by the condition
, which would follow in turn from the linear forms condition), must be evenly distributed in both odd and even residue classes up to
errors; this forces the density of the primes in
to be at most
. A similar argument using all the prime moduli less than some parameter
shows in fact that the density of primes in
is at most
. Since
diverges to
,
diverges to zero, and so we see that the primes cannot in fact have a positive density with respect to any pseudorandom measure.
This difficulty can be overcome by a simple affine change of variables known as the -trick, where we replace the primes
by the modified set
, where
is the product of all the primes less than
, and
is a residue class coprime to
. In practice,
(and
) are slowly growing functions of
, e.g. one could take
. By the pigeonhole principle, for any given
and
there will exist a
for which
is large (of cardinality
, where
is the number of residue classes coprime to
); indeed, thanks to the prime number theorem in arithmetic progressions, any such
would work (e.g. one can take
). Note that every arithmetic progression in
is associated to a corresponding arithmetic progression in
. Thus, for the task of locating arithmetic progressions at least, we may as well work with
; a similar claim also holds for more complicated tasks, such as counting the number of linear patterns in
, though one now has to work with several residue classes at once. The point of passing from
to
is that the latter set no longer has any particular bias to favorr or disfavour any residue class with modulus less than
; there are still biases at higher moduli, but as long as
goes to infinity with
, the effect of such biases will end up being negligible (ultimately contributing
terms to things like the linear forms condition).
To simplify the exposition a bit, though, let us ignore the -trick and pretend that we are working with the primes themselves rather than the affine-shifted primes. We will also ignore the technical distinctions between the interval
and the cyclic group
.
The most natural candidate for the measure is the von Mangoldt function
, defined by setting
when
is a prime
or a power of a prime, and
otherwise. One hint as to the significance of this function is provided by the identity
for all natural numbers , which can be viewed as a generating function of the fundamental theorem of arithmetic.
The prime number theorem tells us that is indeed a measure:
. And the primes have full density with respect to this function:
. Furthermore, the von Mangoldt function has good Fourier pseudorandomness properties (after applying the
-trick), thanks to the classical techniques of Hardy-Littlewood and Vinogradov. Indeed, to control exponential sums such as
for some
, one can use tools such as the Siegel-Walfisz theorem (a quantitative version of the prime number theorem in arithmetic progressions) to control such sums in the “major arc” case when
is close to a rational of small height, while in the “minor arc” case when
behaves irrationally, one can use the standard identity
where is the Möbius function, to re-express such a sum in terms of expressions roughly of the form
where we are intentionally vague as to what range the parameters are being summed over. The idea is then to eliminate the
factor by tools such as the triangle inequality or the Cauchy-Schwarz inequality, leading to expressions such as
the point is that the inner sum does not contain any number-theoretic factors such as or
, but is still oscillatory (at least if
is sufficiently irrational), and so one can extract useful cancellation from here. Actually, the situation is more complicated than this, because there are regions of the range of
for which this method provides insufficient cancellation, in which case one has to rearrange the sum further using more arithmetic identities such as (10) (for instance, using a truncated version of (10) known as Vaughan’s identity). We will not discuss this further here, but any advanced analytic number theory text (e.g. Iwaniec-Kowalski) will cover this material.
Unfortunately, while the Fourier-pseudorandomness of is well-understood, the linear forms and correlation conditions are essentially equivalent to (and in fact slightly harder than) the original problem of obtaining asymptotics for linear patterns in primes, and so using
for the pseudorandom measure would result in a circular argument. Furthermore, correlations such as
(which essentially counts the number of twin primes up to ) are notoriously difficult to compute. For instance, if one tries to expand the above sum using (10), one ends up with expressions such as
By the Chinese remainder theorem, the two residue conditions can be combined to a single residue condition for
modulo the least common multiple
of
and
. If
and
are both small, e.g.
, then this least common multiple is much less than
, and in such a case one can compute the inner sum very precisely; as it turns out, the main term in this estimate is multiplicative in
, which allows the outer sum to be estimated using the techniques of multiplicative number theory (and in particular, using the theory of the Riemann zeta function). Unfortunately, for the bulk of the above sum,
and
are instead comparable to
, and the least common multiple is typically of size
, and then it becomes extraordinarily difficult to estimate the inner sum (and hence the entire sum).
However, if we truncate the divisor sum (10) to restrict to a range such as
, then the situation improves substantially. This leads to expressions such as
for some cutoff function , where
is a small power of
(e.g.
); the expression (11) corresponds to the case
. The presence of the square is to ensure that
is non-negative, and the presence of the
is a normalisation factor to ensure that
has mean close to
. Such expressions were essentially introduced to Selberg (as part of what is now known as the Selberg sieve), although the sieve weight factors
are usually modified slightly for the Selberg sieve (see this paper of Green and myself for further discussion). The correlation properties of the particular expression (11) were studied intensively by Goldston and Yildirim, and have particularly sharp estimates, although for applications discussed here, one can work instead with a smoother choice of cutoff
, which makes the required correlation estimates on
easier to prove (but with slightly worse bounds). Indeed, the required linear forms and correlation conditions can be verified for (12) (or more precisely, a variant of
in which the
-trick is applied) by a moderately lengthy, but elementary and straightforward calculation, based ultimately on the Chinese remainder theorem, an analysis of the local problem (working mod
for small
), and the fundamental fact that the Riemann zeta function
is approximately equal to
for
close to
. See for instance this unpublished note of mine for more discussion. (The exact power of
that one sets
equal to will depend on the complexity of the linear forms and co
If one uses (11), then we see that is equal to
when
is any prime larger than
; if
is comparable to
, we thus see (from the prime number theorem) that the primes in
do indeed have positive density relative to
. This is then enough to be able to invoke the transference principle and extend results such as Szemerédi’s theorem to the primes, establishing in particular that the primes contain arbitrarily long arithmetic progressions; see this paper of Green and myself for details.
To use the Fourier-analytic approach, it turns out to be convenient to replace the above measures by a slight variant which looks more complicated in the spatial domain, but is easier to manipulate in the frequency domain. More specifically, the expression (11) or (12) is replaced with a variant such as
where is the Euler totient function (the number of integers from
to
that are coprime to
). Some standard multiplicative number theory shows that the weights
are approximately equal to
in some sense. With such a representation, it turns out that the Fourier coefficients of
can be computed more or less explicitly, and is essentially supported on those frequencies of the form
with
. This makes it easy to verify the required Fourier-pseudorandomness hypothesis (3) (once one applies the
-trick). As for the restriction estimate (2), the first step is to use Exercise (3) and the Cauchy-Schwarz inequality to reduce matters to showing an estimate of the shape
The right-hand side can be rearranged to be of the shape
It is then possible to use the good pointwise control on the Fourier transform of
(in particular, the fact that it “decays” quite rapidly away from the major arcs) to get a good restriction estimate. See this paper of Green and myself for further discussion.
As discussed in the previous section, to get asymptotics for patterns in the primes we also need to control exponential sums such as
and more generally (for higher complexity patterns)
for various nilsequences . Again, it is convenient to use the von Mangoldt function
as a proxy for the primes, thus leading to expressions such as
Actually, for technical reasons it is convenient to use identities such as (10) to replace this type of expression with expressions such as
because the Möbius function enjoys better boundedness and equidistribution properties than
. (For instance,
strongly favours odd numbers over even numbers, whereas the Möbius function has no preference.) It turns out that these expressions can be controlled by a generalisation of the method of Vinogradov used to compute exponential sums over primes, using the equidistribution theory of nilsequences as a substitute for the classical theory of exponential sums over integers. See this paper of Green and myself for details.
4 comments
Comments feed for this article
5 June, 2010 at 4:56 pm
Anonymous
typo in the first sentence: “we give discuss”
[Corrected, thanks – T.]
5 September, 2010 at 6:55 am
Sylvain JULIEN
As you write “the even Goldbach equation {p_1+p_2=N} – remain out of reach of this technology”, must I conclude that Bruckman’s attempt of proof of this conjecture in “International Journal of Mathematical Education in Science and Technology, Volume 39, Issue 8 December 2008 , pages 1102 – 1109” is false ?
5 September, 2010 at 9:39 am
Terence Tao
I took a quick look at the article. Unfortunately it seems that the author has not handled asymptotic notation properly. (For instance, the conclusion of Theorem 2 does not even make sense, because it involves an asymptotic quantity o(1) but no asymptotic parameter N.) More seriously, it seems that there is an error in the previous paper [1] of the author which is being used here, due to another misuse of asymptotic notation; the author at one point implicitly uses the claim that if
for all k, and
are coefficients, then
if both sides converge. This claim is correct if all quantities are positive (and the convergence is sufficiently strong), but is not true in the presence of cancellation. For instance, if
, then
and
, but
.
Asymptotic notation is a convenient and time-saving tool when used correctly, but can be dangerous to use if one does not know what one is doing. A good rule of thumb when using any tool of this sort is to see whether one’s arguments could also be rephrased without resorting to this tool (possibly at the cost of making the argument longer and messier). If one does not see how one could do this, then that is a sign that one might not be properly understanding how that tool works, and in particular may not be aware of the limitations of that tool.
5 September, 2010 at 11:42 am
Sylvain JULIEN
Ok, thank you very much for your quick reply. I had sent an e-mail to you a few weeks ago to ask you this question, but I know that you do not have much time to answer. Thank you again.