You are currently browsing the tag archive for the ‘pseudorandomness’ tag.
Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic
where denotes the estimate
as
. Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if
is a random number selected uniformly between
and
, then the above estimate can be written as
that is to say the random variable has mean approximately
. (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near
, basically thanks to the Hardy-Ramanujan theorem.)
Now we turn to the pair correlations for a fixed positive integer
. There is a classical computation of Ingham that shows that
The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.
Using our probabilistic lens, the estimate (2) can be written as
From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables
and
both have a mean of
, so the additional factor of
represents some arithmetic coupling between the two random variables.
Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases
and
and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form
for various . For
much less than
this can be achieved using a further application of the hyperbola method, but for
comparable to
things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums
for various frequencies
. The contribution of “major arc”
can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.
Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the
as a normalisation constant needed to balance the
factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy
of
, then
using symmetry to order (discarding the diagonal case
) and making the change of variables
, we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of
in
is equal to
. (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if
denotes the asymptotic mean in
, then we have (heuristically at least)
and we obtain the desired consistency after multiplying by .
This still however does not explain the presence of the factor. Intuitively it is reasonable that if
has many prime factors, and
has a lot of factors, then
will have slightly more factors than average, because any common factor to
and
will automatically be acquired by
. But how to quantify this effect?
One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor
where the product is over all primes , and
is the local version of
at
(which in this case, is just one plus the
–valuation
of
:
). Note that all but finitely many of the terms in this product will equal
, so the infinite product is well-defined. In a similar fashion, we can factor
where
(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors
behave like independent random variables, and so the correlation between
and
should approximately decouple into the product of correlations between the local factors
and
. And indeed we do have the following local version of Ingham’s asymptotics:
Proposition 1 (Local Ingham asymptotics) For fixed
and integer
, we have
and
From the Euler formula
we see that
and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors
in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.
Remark 2 The relation between the local means
and the global mean
can also be seen heuristically through the application
of Mertens’ theorem, where
is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.
Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation
is equal to
with probability
, and with a little more effort one can also compute the joint distribution of
and
, at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of
and
force
to have a factor also.
It is first convenient to get rid of error terms by observing that in the limit , the random variable
converges vaguely to a uniform random variable
on the profinite integers
, or more precisely that the pair
converges vaguely to
. Because of this (and because of the easily verified uniform integrability properties of
and their powers), it suffices to establish the exact formulae
in the profinite setting (this setting will make it easier to set up the recursion).
We begin with (5). Observe that is coprime to
with probability
, in which case
is equal to
. Conditioning to the complementary probability
event that
is divisible by
, we can factor
where
is also uniformly distributed over the profinite integers, in which event we have
. We arrive at the identity
As and
have the same distribution, the quantities
and
are equal, and (5) follows by a brief amount of high-school algebra.
We use a similar method to treat (6). First treat the case when is coprime to
. Then we see that with probability
,
and
are simultaneously coprime to
, in which case
. Furthermore, with probability
,
is divisible by
and
is not; in which case we can write
as before, with
and
. Finally, in the remaining event with probability
,
is divisible by
and
is not; we can then write
, so that
and
. Putting all this together, we obtain
and the claim (6) in this case follows from (5) and a brief computation (noting that in this case).
Now suppose that is divisible by
, thus
for some integer
. Then with probability
,
and
are simultaneously coprime to
, in which case
. In the remaining
event, we can write
, and then
and
. Putting all this together we have
which by (5) (and replacing by
) leads to the recursive relation
and (6) then follows by induction on the number of powers of .
The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion
for certain complicated but explicit coefficients . For instance,
is given by the formula
where is the Euler-Mascheroni constant,
The formula for is similar but even more complicated. The error term
was improved by Heath-Brown to
; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation
here, but this is well out of reach of current methods.
These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.
There is a very nice recent paper by Lemke Oliver and Soundararajan (complete with a popular science article about it by the consistently excellent Erica Klarreich for Quanta) about a surprising (but now satisfactorily explained) bias in the distribution of pairs of consecutive primes when reduced to a small modulus
.
This phenomenon is superficially similar to the more well known Chebyshev bias concerning the reduction of a single prime to a small modulus
, but is in fact a rather different (and much stronger) bias than the Chebyshev bias, and seems to arise from a completely different source. The Chebyshev bias asserts, roughly speaking, that a randomly selected prime
of a large magnitude
will typically (though not always) be slightly more likely to be a quadratic non-residue modulo
than a quadratic residue, but the bias is small (the difference in probabilities is only about
for typical choices of
), and certainly consistent with known or conjectured positive results such as Dirichlet’s theorem or the generalised Riemann hypothesis. The reason for the Chebyshev bias can be traced back to the von Mangoldt explicit formula which relates the distribution of the von Mangoldt function
modulo
with the zeroes of the
-functions with period
. This formula predicts (assuming some standard conjectures like GRH) that the von Mangoldt function
is quite unbiased modulo
. The von Mangoldt function is mostly concentrated in the primes, but it also has a medium-sized contribution coming from squares of primes, which are of course all located in the quadratic residues modulo
. (Cubes and higher powers of primes also make a small contribution, but these are quite negligible asymptotically.) To balance everything out, the contribution of the primes must then exhibit a small preference towards quadratic non-residues, and this is the Chebyshev bias. (See this article of Rubinstein and Sarnak for a more technical discussion of the Chebyshev bias, and this survey of Granville and Martin for an accessible introduction. The story of the Chebyshev bias is also related to Skewes’ number, once considered the largest explicit constant to naturally appear in a mathematical argument.)
The paper of Lemke Oliver and Soundararajan considers instead the distribution of the pairs for small
and for large consecutive primes
, say drawn at random from the primes comparable to some large
. For sake of discussion let us just take
. Then all primes
larger than
are either
or
; Chebyshev’s bias gives a very slight preference to the latter (of order
, as discussed above), but apart from this, we expect the primes to be more or less equally distributed in both classes. For instance, assuming GRH, the probability that
lands in
would be
, and similarly for
.
In view of this, one would expect that up to errors of or so, the pair
should be equally distributed amongst the four options
,
,
,
, thus for instance the probability that this pair is
would naively be expected to be
, and similarly for the other three tuples. These assertions are not yet proven (although some non-trivial upper and lower bounds for such probabilities can be obtained from recent work of Maynard).
However, Lemke Oliver and Soundararajan argue (backed by both plausible heuristic arguments (based ultimately on the Hardy-Littlewood prime tuples conjecture), as well as substantial numerical evidence) that there is a significant bias away from the tuples and
– informally, adjacent primes don’t like being in the same residue class! For instance, they predict that the probability of attaining
is in fact
with similar predictions for the other three pairs (in fact they give a somewhat more precise prediction than this). The magnitude of this bias, being comparable to , is significantly stronger than the Chebyshev bias of
.
One consequence of this prediction is that the prime gaps are slightly less likely to be divisible by
than naive random models of the primes would predict. Indeed, if the four options
,
,
,
all occurred with equal probability
, then
should equal
with probability
, and
and
with probability
each (as would be the case when taking the difference of two random numbers drawn from those integers not divisible by
); but the Lemke Oliver-Soundararajan bias predicts that the probability of
being divisible by three should be slightly lower, being approximately
.
Below the fold we will give a somewhat informal justification of (a simplified version of) this phenomenon, based on the Lemke Oliver-Soundararajan calculation using the prime tuples conjecture.
We now move away from the world of multiplicative prime number theory covered in Notes 1 and Notes 2, and enter the wider, and complementary, world of non-multiplicative prime number theory, in which one studies statistics related to non-multiplicative patterns, such as twins . This creates a major jump in difficulty; for instance, even the most basic multiplicative result about the primes, namely Euclid’s theorem that there are infinitely many of them, remains unproven for twin primes. Of course, the situation is even worse for stronger results, such as Euler’s theorem, Dirichlet’s theorem, or the prime number theorem. Finally, even many multiplicative questions about the primes remain open. The most famous of these is the Riemann hypothesis, which gives the asymptotic
(see Proposition 24 from Notes 2). But even if one assumes the Riemann hypothesis, the precise distribution of the error term
in the above asymptotic (or in related asymptotics, such as for the sum
that measures the distribution of primes in short intervals) is not entirely clear.
Despite this, we do have a number of extremely convincing and well supported models for the primes (and related objects) that let us predict what the answer to many prime number theory questions (both multiplicative and non-multiplicative) should be, particularly in asymptotic regimes where one can work with aggregate statistics about the primes, rather than with a small number of individual primes. These models are based on taking some statistical distribution related to the primes (e.g. the primality properties of a randomly selected -tuple), and replacing that distribution by a model distribution that is easy to compute with (e.g. a distribution with strong joint independence properties). One can then predict the asymptotic value of various (normalised) statistics about the primes by replacing the relevant statistical distributions of the primes with their simplified models. In this non-rigorous setting, many difficult conjectures on the primes reduce to relatively simple calculations; for instance, all four of the (still unsolved) Landau problems may now be justified in the affirmative by one or more of these models. Indeed, the models are so effective at this task that analytic number theory is in the curious position of being able to confidently predict the answer to a large proportion of the open problems in the subject, whilst not possessing a clear way forward to rigorously confirm these answers!
As it turns out, the models for primes that have turned out to be the most accurate in practice are random models, which involve (either explicitly or implicitly) one or more random variables. This is despite the prime numbers being obviously deterministic in nature; no coins are flipped or dice rolled to create the set of primes. The point is that while the primes have a lot of obvious multiplicative structure (for instance, the product of two primes is never another prime), they do not appear to exhibit much discernible non-multiplicative structure asymptotically, in the sense that they rarely exhibit statistical anomalies in the asymptotic limit that cannot be easily explained in terms of the multiplicative properties of the primes. As such, when considering non-multiplicative statistics of the primes, the primes appear to behave pseudorandomly, and can thus be modeled with reasonable accuracy by a random model. And even for multiplicative problems, which are in principle controlled by the zeroes of the Riemann zeta function, one can obtain good predictions by positing various pseudorandomness properties of these zeroes, so that the distribution of these zeroes can be modeled by a random model.
Of course, one cannot expect perfect accuracy when replicating a deterministic set such as the primes by a probabilistic model of that set, and each of the heuristic models we discuss below have some limitations to the range of statistics about the primes that they can expect to track with reasonable accuracy. For instance, many of the models about the primes do not fully take into account the multiplicative structure of primes, such as the connection with a zeta function with a meromorphic continuation to the entire complex plane; at the opposite extreme, we have the GUE hypothesis which appears to accurately model the zeta function, but does not capture such basic properties of the primes as the fact that the primes are all natural numbers. Nevertheless, each of the models described below, when deployed within their sphere of reasonable application, has (possibly after some fine-tuning) given predictions that are in remarkable agreement with numerical computation and with known rigorous theoretical results, as well as with other models in overlapping spheres of application; they are also broadly compatible with the general heuristic (discussed in this previous post) that in the absence of any exploitable structure, asymptotic statistics should default to the most “uniform”, “pseudorandom”, or “independent” distribution allowable.
As hinted at above, we do not have a single unified model for the prime numbers (other than the primes themselves, of course), but instead have an overlapping family of useful models that each appear to accurately describe some, but not all, aspects of the prime numbers. In this set of notes, we will discuss four such models:
- The Cramér random model and its refinements, which model the set
of prime numbers by a random set.
- The Möbius pseudorandomness principle, which predicts that the Möbius function
does not correlate with any genuinely different arithmetic sequence of reasonable “complexity”.
- The equidistribution of residues principle, which predicts that the residue classes of a large number
modulo a small or medium-sized prime
behave as if they are independently and uniformly distributed as
varies.
- The GUE hypothesis, which asserts that the zeroes of the Riemann zeta function are distributed (at microscopic and mesoscopic scales) like the zeroes of a GUE random matrix, and which generalises the pair correlation conjecture regarding pairs of such zeroes.
This is not an exhaustive list of models for the primes and related objects; for instance, there is also the model in which the major arc contribution in the Hardy-Littlewood circle method is predicted to always dominate, and with regards to various finite groups of number-theoretic importance, such as the class groups discussed in Supplement 1, there are also heuristics of Cohen-Lenstra type. Historically, the first heuristic discussion of the primes along these lines was by Sylvester, who worked informally with a model somewhat related to the equidistribution of residues principle. However, we will not discuss any of these models here.
A word of warning: the discussion of the above four models will inevitably be largely informal, and “fuzzy” in nature. While one can certainly make precise formalisations of at least some aspects of these models, one should not be inflexibly wedded to a specific such formalisation as being “the” correct way to pin down the model rigorously. (To quote the statistician George Box: “all models are wrong, but some are useful”.) Indeed, we will see some examples below the fold in which some finer structure in the prime numbers leads to a correction term being added to a “naive” implementation of one of the above models to make it more accurate, and it is perfectly conceivable that some further such fine-tuning will be applied to one or more of these models in the future. These sorts of mathematical models are in some ways closer in nature to the scientific theories used to model the physical world, than they are to the axiomatic theories one is accustomed to in rigorous mathematics, and one should approach the discussion below accordingly. In particular, and in contrast to the other notes in this course, the material here is not directly used for proving further theorems, which is why we have marked it as “optional” material. Nevertheless, the heuristics and models here are still used indirectly for such purposes, for instance by
- giving a clearer indication of what results one expects to be true, thus guiding one to fruitful conjectures;
- providing a quick way to scan for possible errors in a mathematical claim (e.g. by finding that the main term is off from what a model predicts, or an error term is too small);
- gauging the relative strength of various assertions (e.g. classifying some results as “unsurprising”, others as “potential breakthroughs” or “powerful new estimates”, others as “unexpected new phenomena”, and yet others as “way too good to be true”); or
- setting up heuristic barriers (such as the parity barrier) that one has to resolve before resolving certain key problems (e.g. the twin prime conjecture).
See also my previous essay on the distinction between “rigorous” and “post-rigorous” mathematics, or Thurston’s essay discussing, among other things, the “definition-theorem-proof” model of mathematics and its limitations.
Remark 1 The material in this set of notes presumes some prior exposure to probability theory. See for instance this previous post for a quick review of the relevant concepts.
One of the basic general problems in analytic number theory is to understand as much as possible the fluctuations of the Möbius function , defined as
when
is the product of
distinct primes, and zero otherwise. For instance, as
takes values in
, we have the trivial bound
and the seemingly slight improvement
is already equivalent to the prime number theorem, as observed by Landau (see e.g. this previous blog post for a proof), while the much stronger (and still open) improvement
is equivalent to the notorious Riemann hypothesis.
There is a general Möbius pseudorandomness heuristic that suggests that the sign pattern behaves so randomly (or pseudorandomly) that one should expect a substantial amount of cancellation in sums that involve the sign fluctuation of the Möbius function in a nontrivial fashion, with the amount of cancellation present comparable to the amount that an analogous random sum would provide; cf. the probabilistic heuristic discussed in this recent blog post. There are a number of ways to make this heuristic precise. As already mentioned, the Riemann hypothesis can be considered one such manifestation of the heuristic. Another manifestation is the following old conjecture of Chowla:
Conjecture 1 (Chowla’s conjecture) For any fixed integer
and exponents
, with at least one of the
odd (so as not to completely destroy the sign cancellation), we have
Note that as for any
, we can reduce to the case when the
take values in
here. When only one of the
are odd, this is essentially the prime number theorem in arithmetic progressions (after some elementary sieving), but with two or more of the
are odd, the problem becomes completely open. For instance, the estimate
is morally very close to the conjectured asymptotic
for the von Mangoldt function , where
is the twin prime constant; this asymptotic in turn implies the twin prime conjecture. (To formally deduce estimates for von Mangoldt from estimates for Möbius, though, typically requires some better control on the error terms than
, in particular gains of some power of
are usually needed. See this previous blog post for more discussion.)
Remark 2 The Chowla conjecture resembles an assertion that, for
chosen randomly and uniformly from
to
, the random variables
become asymptotically independent of each other (in the probabilistic sense) as
. However, this is not quite accurate, because some moments (namely those with all exponents
even) have the “wrong” asymptotic value, leading to some unwanted correlation between the two variables. For instance, the events
and
have a strong correlation with each other, basically because they are both strongly correlated with the event of
being divisible by
. A more accurate interpretation of the Chowla conjecture is that the random variables
are asymptotically conditionally independent of each other, after conditioning on the zero pattern
; thus, it is the sign of the Möbius function that fluctuates like random noise, rather than the zero pattern. (The situation is a bit cleaner if one works instead with the Liouville function
instead of the Möbius function
, as this function never vanishes, but we will stick to the traditional Möbius function formalism here.)
A more recent formulation of the Möbius randomness heuristic is the following conjecture of Sarnak. Given a bounded sequence , define the topological entropy of the sequence to be the least exponent
with the property that for any fixed
, and for
going to infinity the set
of
can be covered by
balls of radius
(in the
metric). (If
arises from a minimal topological dynamical system
by
and
is generated by
and its shifts, the above notion is equivalent to the usual notion of the topological entropy of a dynamical system.) For instance, if the sequence is a bit sequence (i.e. it takes values in
), then there are only
-bit patterns that can appear as blocks of
consecutive bits in this sequence. As a special case, a Turing machine with bounded memory that had access to a random number generator at the rate of one random bit produced every
units of time, but otherwise evolved deterministically, would have an output sequence that had a topological entropy of at most
. A bounded sequence is said to be deterministic if its topological entropy is zero. A typical example is a polynomial sequence such as
for some fixed
; the
-blocks of such polynomials sequence have covering numbers that only grow polynomially in
, rather than exponentially, thus yielding the zero entropy. Unipotent flows, such as the horocycle flow on a compact hyperbolic surface, are another good source of deterministic sequences.
Conjecture 3 (Sarnak’s conjecture) Let
be a deterministic bounded sequence. Then
This conjecture in general is still quite far from being solved. However, special cases are known:
- For constant sequences, this is essentially the prime number theorem (1).
- For periodic sequences, this is essentially the prime number theorem in arithmetic progressions.
- For quasiperiodic sequences such as
for some continuous
, this follows from the work of Davenport.
- For nilsequences, this is a result of Ben Green and myself.
- For horocycle flows, this is a result of Bourgain, Sarnak, and Ziegler.
- For the Thue-Morse sequence, this is a result of Dartyge-Tenenbaum (with a stronger error term obtained by Maduit-Rivat). A subsequent result of Bourgain handles all bounded rank one sequences (though the Thue-Morse sequence is actually of rank two), and a related result of Green establishes asymptotic orthogonality of the Möbius function to bounded depth circuits, although such functions are not necessarily deterministic in nature.
- For the Rudin-Shapiro sequence, I sketched out an argument at this MathOverflow post.
- The Möbius function is known to itself be non-deterministic, because its square
(i.e. the indicator of the square-free functions) is known to be non-deterministic (indeed, its topological entropy is
). (The corresponding question for the Liouville function
, however, remains open, as the square
has zero entropy.)
- In the converse direction, it is easy to construct sequences of arbitrarily small positive entropy that correlate with the Möbius function (a rather silly example is
for some fixed large (squarefree)
, which has topological entropy at most
but clearly correlates with
).
See this survey of Sarnak for further discussion of this and related topics.
In this post I wanted to give a very nice argument of Sarnak that links the above two conjectures:
Proposition 4 The Chowla conjecture implies the Sarnak conjecture.
The argument does not use any number-theoretic properties of the Möbius function; one could replace in both conjectures by any other function from the natural numbers to
and obtain the same implication. The argument consists of the following ingredients:
- To show that
, it suffices to show that the expectation of the random variable
, where
is drawn uniformly at random from
to
, can be made arbitrary small by making
large (and
even larger).
- By the union bound and the zero topological entropy of
, it suffices to show that for any bounded deterministic coefficients
, the random variable
concentrates with exponentially high probability.
- Finally, this exponentially high concentration can be achieved by the moment method, using a slight variant of the moment method proof of the large deviation estimates such as the Chernoff inequality or Hoeffding inequality (as discussed in this blog post).
As is often the case, though, while the “top-down” order of steps presented above is perhaps the clearest way to think conceptually about the argument, in order to present the argument formally it is more convenient to present the arguments in the reverse (or “bottom-up”) order. This is the approach taken below the fold.
Read the rest of this entry »
In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes (or in dense subsets
of primes
). Unfortunately, the most famous linear equations in primes: the twin prime equation
and the even Goldbach equation
– remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions
for
(or equivalently,
for
) , as well as the odd Goldbach equation
, are tractable.
To illustrate the main ideas, we will focus on the following result of Green:
Theorem 1 (Roth’s theorem in the primes) Let
be a subset of primes whose upper density
is positive. Then
contains infinitely many arithmetic progressions of length three.
This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers
(or natural numbers
). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have
.
There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.
To transfer results from the integers to the primes, there are three basic steps:
- A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
- An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
- If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.
The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.
The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at , and no knowledge of L-functions is needed.
The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.
Recent Comments