This week I am at Penn State University, giving this year’s Marker lectures.  My chosen theme for my four lectures here is “recent developments in additive prime number theory”.  My first lecture, “Long arithmetic progressions in primes”, is similar to my AMS lecture on the same topic and so I am not reposting it here.  The second lecture, the notes for which begin after the fold, is on “Linear equations in primes”.  These two lectures focus primarily on work of myself and Ben Green.  The third and fourth lectures, entitled “Small gaps between primes” and “Sieving for almost primes and expander graphs”, will instead be focused on the work of Goldston-Yildirim-Pintz and Bourgain-Gamburd-Sarnak respectively.
In my previous lecture (or the AMS lecture), I focused on finding a specific type of pattern inside the prime numbers, namely that of an arithmetic progression $n, n+r, \ldots, n+(k-1)r$.  The main reason why the analysis there is specific to progressions is because of its reliance on Szemerédi’s theorem, which shows that arithmetic progressions are necessarily abundant in sufficiently “large” sets of integers.  There are several other variants and generalisations of this theorem known to a few other types of patterns (e.g. polynomial progressions $n+P_1(r), \ldots, n+P_k(r)$, where $P_1,\ldots,P_k$ are polynomials from the integers to the integers with $P_1(0) = \ldots = P_k(0)=0$ and $r \neq 0$), and in some cases the analogous results about primes are known (e.g. Tamar Ziegler and myself showed that for any given $P_1,\ldots,P_k$ as above, there are infinitely many polynomial progressions of primes).

However, for most patterns, there is no analogue of Szemerédi’s theorem, and the strategy used in the previous lecture or AMS lecture cannot be directly applied.  For instance, it is certainly not true that any subset of integers with positive upper density contains any twins $n,n+2$; the multiples of three, for instance, form a counterexample, among many others.  (In fact, there are so many counterexamples here, that it looks unlikely that the twin prime conjecture can be attacked by this method without a significant new idea.)

Furthermore, even in the cases when these methods do work, for instance in demonstrating for each k that there are infinitely many progressions of length k inside the primes, they do not settle the more quantitative problem of how many progressions of length k there are asymptotically in any given finite range of primes, e.g. the primes less than a number N in the asymptotic limit $N \to \infty$.  This is because Szemerédi’s theorem provides a lower bound for the number of progressions in a large finite set, but not a matching upper bound.  (For instance, given a subset A of $\{1,\ldots,N\}$ of density 1/2, the number of progressions of length 3 in A can be as large as $(1/4 + o(1)) N^2$ (if $A$ consists of the even integers from 1 to N, for instance) and as small as $(1/8 + o(1)) N^2$ (if $A$ is a randomly chosen set of density 1/2), and can even be a little bit smaller by perturbing this example slightly.)

On the other hand, as discussed in the previous lecture or AMS lecture, one can use standard random models for the primes to predict what the correct asymptotic for these questions should be.  For instance, the number of arithmetic progressions $n, n+r, \ldots, n+(k-1)r$ of a fixed length k consisting of primes less than N should be asymptotically

$\displaystyle (\frac{1}{2(k-1)} (\prod_p \beta_p) + o(1)) \frac{N^2}{\log^k N}$ (1)

where the product is over all primes p, and the quantity $\beta_p$ is defined as

$\displaystyle \beta_p := \frac{1}{p} (\frac{p}{p-1})^{k-1}$

for $p \leq k$, and

$\displaystyle \beta_p := (1-\frac{k-1}{p}) (\frac{p}{p-1})^{k-1}$

for $p \geq k$.

The various terms in this complicated-looking formula can be explained as follows.  The “Archimedean” factors $\frac{1}{2(k-1)}$ and $N^2$ come from the fact that the number of arithmetic progressions $n, n+r, \ldots,n+(k-1)r$ of natural numbers less than N is $(\frac{1}{2(k-1)} +o(1)) N^2$.  The “density” factor $\frac{1}{\log^k N}$ comes from the prime number theorem, which roughly speaking asserts that each of the k elements $n, n+r, \ldots,n+(k-1)r$ in a typical arithmetic progression has a $(1+o(1)) \frac{1}{\log N}$ “probability” of being prime.  The “local” factors $\beta_p$ measures how much bias arithmetic progressions with respect to being coprime to a fixed prime p, which is relevant for progressions of primes, since primes of course tend to be coprime to p.  More precisely, $\beta_p$ can be defined as the probability that a random arithmetic progression of length k has all entries coprime to p, divided by the probability that a random collection of k independent numbers are all coprime to p.  It is not difficult to show that the product $\prod_p \beta_p$ converges to some finite non-zero number for each k.

Similar heuristic asymptotic formulae exist for the number of many other patterns of primes; for instance, the number of representations $N=p_1+p_2$ of a large integer N as the sum of two primes should be equal to $(\prod_p \beta_{p,N} + o(1)) N$, where $\beta_{p,N}$ is the probability that two randomly chosen numbers conditioned to be coprime to p sum to N modulo p, divided by the probability that two randomly chosen numbers sum to N modulo p.  (In particular, $\beta_{2,N}=0$ for odd N, reflecting the fact that it is very difficult for an odd number to be representable as the sum of two primes.  More generally, one can compute that $\beta_{p,N} = 1 + \frac{1}{p-1}$ when p divides N, and $\beta_{p,N} = 1 - \frac{1}{(p-1)^2}$ otherwise.)  A more general prediction for counting linear patterns inside primes exists, and is essentially the Hardy-Littlewood prime tuples conjecture.  This conjecture, which is widely believed to be true, would imply many other conjectures in the subject, such as the twin primes conjecture and the Goldbach conjecture (for sufficiently large even numbers).  Unfortunately, the cases of the prime tuples conjecture which would have these consequences remain out of reach of current technology.

Using some elementary linear algebra, one can recast the prime tuples conjecture not as a question of finding linear patterns inside primes, but rather that of solving linear equations in which all the unknowns are required to be prime, subject to some additional linear inequalities.  For instance, finding progressions of length k consisting entirely of primes less than N is essentially the same as asking for solutions to the system of equations

$p_2 - p_1 = p_3 - p_2 = \ldots = p_k - p_{k-1}$

and inequalities

$0 \leq p_1,\ldots,p_k \leq N$

where the unknowns $p_1,\ldots,p_k$ are required to be primes.  More generally, one could imagine the question of asking the number of k-tuples $(p_1,\ldots,p_k)$ consisting entirely of primes which is contained in some convex set B in ${\Bbb R}^k$ of some intermediate dimension $1 \leq d \leq k$, which is contained in a ball of radius O(N) around the origin.  For instance, in the above example B is the 2-dimensional set

$\{ (x_1,\ldots,x_k) \in {\Bbb R}^k: x_2-x_1 = \ldots = x_k-x_{k-1}; 0 \leq x_1,\ldots,x_k \leq N \}$.

(We make the technical assumption that the linear coefficients of the equations defining the d-dimensional subspace that B lives in are independent of N; k and d are of course also assumed to be independent of N.  The constant coefficients, however, are allowed to vary with N; this is the situation that comes up for instance in the Goldbach conjectures.)  One can think of the problem of finding points in B as that of solving k-d equations in k unknowns.  One can also generalise this problem slightly by enforcing some residue constraints $x_j = a_j \mod q_j$ on the unknowns, but we will ignore this minor extension to simplify the discussion.

The prime tuples conjecture for this problem can roughly speaking be phrased as follows.  Suppose that the number if k-tuples $(n_1,\ldots,n_k)$ in B consisting of natural numbers is known to be

$(\beta_\infty + o(1)) N^d$

for some constant $\beta_\infty$ independent of N (one can think of this constant as the normalised volume of B).  Suppose also that for any fixed prime p, the number of k-tuples in B consisting of natural numbers coprime to p is known to be

$\displaystyle (\beta_\infty \beta_p + o(1)) (1 - \frac{1}{p})^k N^d$

for some constant $latex\ beta_p$ independent of n.  (The factor $(1 - \frac{1}{p})^k$ is natural, as it represents the proportion of tuples of k natural numbers in which all the entries are coprime to p.)  Then the prime tuples conjecture asserts that the number of k-tuples in B consisting of primes should be

$\displaystyle (\beta_\infty \prod_p \beta_p + o(1)) \frac{N^d}{\log^k N}.$ (2)

In particular, this can be shown to imply that if $\beta_\infty > 0$ (thus there are no obstructions to solving the system of equations at infinity) and if $\beta_p > 0$ for all p (thus there are no obstructions to solvability mod p for any p) then there will exist many solutions to the system of equations in primes when N is large enough.

As mentioned earlier, this conjecture remains open in several important cases, most particularly in the one-dimensional case d=1.  For instance, the twin prime conjecture would follow from the case $B := \{ (x_1,x_2): x_2 -x_1 = 2, 0 \leq x_1,x_2 \leq N \}$, but this case remains open.  However, there has now been significant progress in the higher dimensional cases $d \geq 2$, especially when the codimension $k-d$ (representing the number of equations in the system) is low.  Firstly, the prime number theorem settles the zero codimension case $d=k$ (and is pretty much the only situation in which we can handle a $d=1$ case).  The Hardy-Littlewood circle method, based on Fourier analysis, settles all “non-degenerate” cases when $d \geq \max(k-1, 2)$, where “non-degenerate” roughly speaking means that the problem does not secretly contain a $d=1$ problem inside it as a lower-dimensional projection (e.g. $B := \{ (x_1,x_2,x_3): x_2-x_1=2, 0 \leq x_1,x_2,x_3 \leq N \}$ would be degenerate; more generally, B is non-degenerate if it is not contained in any hyperplane that can be defined using at most two of the unknowns).  It can also handle some cases in which the codimension k-d exceeds 1 (e.g. one could take the Cartesian product of some codimension 1 examples); the precise description of what problems are within reach of this method is a little technical to state and will not be given here.

The main result of this paper of Ben Green and myself (combined with results from these other papers of ours) is the following:

Theorem. The prime tuples conjecture is true in all non-degenerate situations in which $d \geq \max(k-2, 2)$.  If the inverse conjecture for the Gowers norms over the integers is true, then the prime tuples conjecture is true in all non-degenerate situations in which $d \geq 2$.

I will say a little bit more about what the inverse conjecture for the Gowers norms is later.  This theorem unfortunately does not touch the most interesting case $d=1$ (when the patterns one is seeking only have one degree of freedom), but it does largely settle all the other cases.  For instance, this theorem implies the asymptotic (1) for prime progressions of length k for $k \leq 4$ (the cases $k \leq 3$ were established earlier by van der Corput using the circle method), and the case of higher k would also follow from the theorem once the inverse conjecture is proven.  As with the circle method, we can also unconditionally handle some cases in which d is less than $\max(k-2,2)$, but the precise statement here is technical and will be omitted.  (Details and further examples can of course be found in our paper.)

Now I would like to turn to the proof of this theorem.  At first glance, the result looks like it is going to be quite complicated, due to the presence of all the different factors in the asymptotic (2) that one is trying to prove.  However, most of the factors can be dealt with by various standard tricks.  The Archimedean factor $\beta_\infty$ can be eliminated from the problem by working locally (with respect to the infinite place), covering B by cubes of sidelength o(N).  For similar reasons, the local factors $\beta_p$ can be eliminated by working locally mod p (i.e. restricting to a single residue class mod p for various small p).  The factors $\frac{1}{\log^k N}$, which come from the density $\frac{1}{\log N}$ of primes in the region of interest (i.e. from the prime number theorem), can largely be compensated for (with some effort) from the transference principle technology developed in our earlier paper on long progressions in the primes, which was discussed in the previous lecture (or at the AMS lecture).  After all this, the problem basically boils down to the following.  We have a certain subset A of the integers $\{1,\ldots,N\}$ with some density $0 < \delta < 1$ (one should think of A as being a “model” for the primes, after all the distorting structure coming from local obstructions has been stripped out; in actuality, one has to replace the set A by a weight function $f: \{1,\ldots,N\} \to [0,1]$ of mean value $\delta$, but let us ignore this technicality to simplify the discussion).  We pick a random instance of some linear pattern inside $\{1,\ldots,N\}$ (for sake of concreteness, let us pick a random arithmetic progression $n, n+r, \ldots,n+(k-1) r$) and ask what is the probability that all the elements of this pattern lie in A.  Since A has density $\delta$, we expect each element $n+jr$ of our random progression to have a probability $\delta+o(1)$ to lie in A.  (This is not quite the case if A is biased to lie on one side of $\{1,\ldots,N\}$ than on the other, but it turns out that one can ignore this possibility.)  Since our pattern consists of k elements, we thus expect

${\Bbb P}( n, n+r, \ldots, n+(k-1)r \in A ) = \delta^k + o(1).$  (3)

Roughly speaking, the key issue in proving the theorem is to work out some “easily checkable” conditions on A that would guarantee that the heuristic (3) is in fact valid.  One then verifies that these “easily checkable” conditions do indeed hold for the set A of interest (which is a proxy for the set of primes).

As stated before, we expect ${\Bbb P}(n+jr \in A) = \delta+o(1)$ for each $0 \leq j < k$.  Thus (3) is asserting in some sense that the events $n+jr \in A$ are “approximately independent”.  This would be a reasonable assertion if A was pseudorandom (i.e. it behaved like a random subset of $\{1,\ldots,N\}$ of the given density $\delta$), and is consistent with the general heuristic from number theory that we expect the prime numbers to behave randomly once all the “obvious” irregularities in distribution (in particular, irregularity modulo p for small p) has been dealt with.  But if A exhibits certain types of structure (or at least some bias towards structure), then (3) can fail.  For instance, suppose that A consists entirely of odd numbers.  Then, if the first two elements n, n+r of an arithmetic progression lie in A, they are necessarily odd, which then forces the rest of the elements of this progression to be odd.  As A is concentrated entirely in these odd numbers, these elements of the progression are thus expected to have an elevated probability of lying in A, and so the left-hand side of (3) would be expected to significantly exceed the former once $k \geq 3$.  (The asymptotic (3) becomes trivially true for $k<3$.) A similar distorting effect occurs if A is not entirely contained in the odd numbers, but is merely biased towards them, in that odd numbers are more likely to lie in A than even numbers.  In this example, the bias in A caused the number of progressions to go up from the expected number predicted by (3); it is also possible (but more tricky) to concoct examples in which bias in A forces the number of progressions to go down somewhat, though Szemerédi’s theorem does prevent one from extinguishing these progressions completely when N is large.

Bias towards odd or even numbers is equivalent to a correlation between A and the linear character $\chi(n) := (-1)^n$; the algebraic constraints between the $\chi(n+jr)$, and in particular the relationship

$\chi(n+2r) \chi(n+r)^{-2} \chi(n)=1$ (4)

can be viewed as the underlying source of the distorting effect that can prevent (3) from holding for $k \geq 3$.  The same algebraic constraint holds for any other linear character, e.g. the Fourier character $\chi(n) := e(\xi n)$ (where $e(x) := e^{2\pi i x}$) for some fixed frequency $\xi \in {\Bbb R}$, for much the same reason that two points on a line determine the rest of the line (it is also closely related to the fact that the second derivative of a linear function vanishes).  Because of this, we expect (3) to be distorted when A correlates with such a character (which means that the Fourier coefficient $\sum_{n \in A} \chi(n)$ is unexpectedly large in magnitude).

It turns out that in the case k=3 of progressions of length three, correlation with a linear character is the only source of distortion in the count (3).  A sign of this can be seen from the identity

$\displaystyle \# \{ (n,r): n,n+r,n+2r \in A \} = \int_0^1 (\sum_{n_1 \in A} e(\xi n_1) (\sum_{n_2 \in A} e(-2\xi n_2)) (\sum_{n_3 \in A} e(\xi n_3))\ d\xi$

which can be viewed as a “Fourier transform” of the algebraic identity (4).  One can formalise this using (a slight variant of) the above identity and some other Fourier-analytic tools (in particular, the Plancherel identity) to conclude

Inverse theorem for length three progressions. (Informal statement)  Let k=3. Suppose that A is a subset of $\{1,\ldots,N\}$ of density $\delta$ for which (3) fails.  Then A correlates with a non-trivial linear character $\chi(n) = e(\xi n)$.  (“Non-trivial” basically means that $\chi$ oscillates at least once on the interval $\{1,\ldots,N\}$.)

Applying this theorem in the contrapositive, we conclude that we can justify the asymptotic (3) in the k=3 case as long as we can show that A does not correlate with a linear character.  In the case when A is a proxy for the primes, this task essentially boils down to that of establishing non-trivial estimates for exponential sums over primes, such as

$\sum_{p < N} e(\xi p)$;

for technical reasons it is more convenient to deal with slight variants of this sum such as

$\sum_{n=1}^N \Lambda(n) e(\xi n)$ (5)

where $\Lambda$ is the von Mangoldt function, or

$\sum_{n=1}^N \mu(n) e(\xi n)$ (6)

where $\mu$ is the Möbius function.  (There are various elementary identities, such as summation by parts, that allow one to express one of these sums in terms of the others.  One has a lot of flexibility in here as long as one retains a factor in the sum which is somehow sensitive to the prime factorisation of n.)  The reason for using these functions instead is that they enjoy a number of very useful identities, such as

$\displaystyle \sum_{n=1}^\infty \frac{\Lambda(n) \chi(n)}{n^s} = -\frac{L'(s,\chi)}{L(s,\chi)}$ (7)

and

$\displaystyle \sum_{n=1}^\infty \frac{\mu(n) \chi(n)}{n^s} = \frac{1}{L(s,\chi)}$ (8)

for any Dirichlet character $\chi$ (where $L(s,\chi)$ is the Dirichlet L-function), and also multiplicative identities such as

$\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d}$ (9)

and

$\displaystyle \mu(n) = \sum_{n=abc} \mu(a) \mu(b)$ (10)

To cut a long story very short, identities such as (7), (8) are useful for estimating (5), (6) respectively in the major arc case when $\xi$ is rational or close to rational (with small denominator), while (variants of) identities such as (9) or (10) (in particular, certain truncated versions of (9) and (10) such as Vaughan’s identity) are useful for estimating (5), (6) respectively in the minor arc case when $\xi$ is far from a rational with small denominator (or close to a rational with large denominator).  This theory was pioneered by Vinogradov (and also Hardy and Littlewood), and refined and simplified over the years with many contributions by Vaughan, Davenport, Heath-Brown, and others, with the upshot being that we now have a fairly good understanding of sums such as (5) and (6), and in particular that the sum (6) exhibits a strong cancellation (by a factor of $O_A(\log^{-A} N)$ for any fixed A) uniformly in $\xi$ (i.e. we can handle both major and minor arcs with a uniform estimate).  Combining this with the inverse theorem and the previous reductions, one can eventually establish the asymptotic (1) in the k=3 case.  (This is not exactly how the original proof of (1) by van der Corput in this case proceeded, but both proofs use the same general ingredients and method, i.e. the Hardy-Littlewood circle method and the Vinogradov method for estimating exponential sums.)

Now we turn to progressions of longer length, such as the case k=4.  Here, linear characters $\chi(n) = e(\xi n)$ continue to cause bias that distorts the expected asymptotic (3), and so it is still necessary to control sums such as (5) or (6) to prevent such bias from occuring.  However, a major new difficulty arises that new sources of bias also arise.  For instance, if one takes a quadratic character $\chi(n) := e(\xi n^2)$ for some $\xi$, then one easily verifies the identity

$\chi(n) \chi(n+r)^{-3} \chi(n+2r)^3 \chi(n+3r)^{-1} = 1$ (11)

which reflects the fact that the third derivative of a quadratic function (such as $n \mapsto \xi n^2$) is zero (it also reflects the fact that three points on the graph of a quadratic (i.e. a parabola) determine the entire parabola).  One consequence of this is that if $\chi(n), \chi(n+r), \chi(n+2r)$ are all close to 1 (say), then $\chi(n+3r)$ will be also.  This constraint between the four values of $\chi$ along an arithmetic progression suggests that if A exhibits significant correlation with $\chi$, then the event that n+3r lies in A will be influenced in some non-trivial manner by whether n, n+r, and n+2r already lie in A, which will lead to some distortion in (3).  Thus one will need to update the inverse theorem by taking quadratic characters into account.  (Easy examples show that it is possible for a set to correlate with a quadratic character without exhibiting any correlation with linear characters, by choosing a quadratic character with irrational coefficients.)  The most optimistic conjecture in this regard would be

Proposed inverse theorem for length four progressions. (Informal statement)  Let k=4. Suppose that A is a subset of $\{1,\ldots,N\}$ of density $\delta$ for which (3) fails.  Then A correlates with a non-trivial quadratic character $\chi(n) = e( \xi_2 n^2 + \xi_1 n)$.

Unfortunately, this conjecture fails.  The easiest way to see this is to consider a bracket quadratic character, such as $\chi(n) = e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n )$, where $\lfloor \rfloor$ is the greatest integer function.  This is not quite a quadratic character, because $\lfloor \rfloor$ is not quite a linear function.  However, this function is linear “a positive fraction of the time”; if one picks x and y to be some generic real numbers, one expects $\lfloor x+y \rfloor$ to equal $\lfloor x \rfloor + \lfloor y \rfloor$ about half of the time.  Because of this, we see that while the identity (11) certainly doesn’t hold all the time for $\chi(n)$, it does hold a positive fraction of the time, and this is enough to still cause significant bias to disrupt (3) if A correlates with this object.  It is furthermore possible to concoct examples of sets A that correlate with bracket quadratic characters such as $e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n )$ but not any linear or quadratic characters.  (The same phenomenon is not visible at the linear level; a bracket linear phase such as $e( \lfloor \sqrt{2} n \rfloor \sqrt{3} )$ can be rewritten as $e( \sqrt{6} n ) e( -\sqrt{3} \{ \sqrt{2} n \} )$, which by Fourier series can be expressed as a linear combination of linear characters $e( (\sqrt{6} + k \sqrt{2}) n)$ for integer k.  Note that the same trick does not work for $e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n)$.)  Once one throws in these bracket quadratics, it turns out that these do in fact constitute all the possible obstructions to (3) holding in the k=4 case, as shown by Ben Green and myself:

Inverse theorem for length four progressions. (Informal statement)  Let k=4. Suppose that A is a subset of $\{1,\ldots,N\}$ of density $\delta$ for which (3) fails.  Then A correlates with a non-trivial bracket quadratic character $\chi(n) = e( \sum_{j=1}^J \lfloor \alpha_j n \rfloor \beta_n j + \xi_2 n^2 + \xi_1 n)$ for some real numbers $\alpha_j, \beta_j, \xi_1, \xi_2$ and bounded J.

The proof of this result involves both Fourier analysis and additive combinatorics, relying heavily on ideas from a paper of Gowers on Szemerédi’s theorem for progressions of length 4.  It will not be discussed here.

In view of the inverse theorem, the problem of establishing the asymptotic (1) for length 4 progressions then reduces (by suitable generalisations of the various methods discussed previously) to that of estimating exponential sums of which

$\sum_{n=1}^N \mu(n) e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n )$

is a typical example.  One can begin to apply the methods of Vinogradov and Vaughan to control this type of expression.  But one is soon faced with the problem of understanding the distribution of quadratic phases such as $\lfloor \sqrt{2} n \rfloor \sqrt{3} n$, and in particular to estimate exponential sums such as

$\sum_{n=1}^N e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n )$.

This turns out to be somewhat unpleasant; the standard technology of Weyl differencing and the van der Corput lemma eventually works (see this paper of Ben Green and myself), but does not scale well to bracket polynomials of higher degree such as $e( \lfloor \lfloor \sqrt{2} n \rfloor \sqrt{3} n \rfloor \sqrt{5} n )$, which would be necessary if one were to extend (1) to k beyond 4.

To resolve this, and inspired by the work in ergodic theory by Furstenberg-Weiss, Host-Kra, Bergelson-Leibman, and others, we re-interpreted bracket polynomials from a more dynamical systems perspective.  To motivate this, observe that the linear character $\chi(n) = e(\alpha n)$ is closely tied to the circle rotation $T: x \mapsto x+\alpha$ on the unit circle ${\Bbb R}/{\Bbb Z}$, in that the character $\chi$ can be described as a function $\chi(n) = F(T^n x_0)$ of an orbit $(T^n x_0)_{n \in {\Bbb Z}}$ on this system, where $x_0=0$ is the origin and $F(x) := e(x)$ is the exponential function.  In a similar spirit, a quadratic character such as $\chi(n) = e(\alpha \frac{n(n-1)}{2})$ can be expressed in terms of the skew shift system $(x,y) \mapsto (x+\alpha,y+x)$ on the torus $({\Bbb R}/{\Bbb Z})^2$, being of the form $\chi(n) = F(T^n x_0)$ where $x_0 := (0,0)$ and $F(x,y) := e(y)$.  More generally, one can also express bracket quadratic polynomials such as $e( \rfloor \sqrt{2} n \rfloor \sqrt{3} n)$ in the form $\chi(n) = F(T^n x_0)$, where T is now the action of a group element $x \mapsto gx$ on a 2-step nilmanifold $G/\Gamma$, and F is some reasonable (e.g. piecewise smooth) function on this nilmanifold.  (See my lecture notes, this paper of Bergelson and Leibman, or this paper of Ben and myself for details.)  The relevance of 2-step nilpotent groups and nilmanifolds to length 4 progressions can be glimpsed in the identity

$(g^n x) (g^{n+r} x)^{-3} (g^{n+3r} x)^{-1} (g^{n+2r} x)^3 = 1$

which is valid for all g, x in a 2-step nilpotent group G (compare this with (11)); it is an instructive exercise to prove this identity and to see how the 2-step nilpotency is used.  (To make the connection more precise, one needs a variant of this constraint in which x lies in $G/\Gamma$ rather than G, which is harder to state; see Ziegler’s thesis, or this paper of Ben and myself, for details.)  Indeed, one can reformulate the inverse theorem for length 4 progressions in an equivalent form:

Inverse theorem for length four progressions, again. (Informal statement)  Let k=4. Suppose that A is a subset of $\{1,\ldots,N\}$ of density $\delta$ for which (3) fails.  Then A correlates with a non-trivial 2-step nilsequence $\chi(n) = F(T^n x_0)$ for some 2-step nilmanifold $G/\Gamma$ (of bounded “complexity”), some group rotation $T: x \mapsto gx$, some starting point $x \in G/\Gamma$, and some function $F: G/\Gamma \to {\Bbb C}$ (also of “bounded complexity”; e.g. bounded Lipschitz norm will do).

The precise formulation of the theorem is a little technical; see this paper of Ben and myself for details.  Using this theorem and all the standard machinery, the task of establishing asymptotics such as (1) in the k=4 case now reduces to that of understanding sums such as

$\sum_{n=1}^N F(T^n x_0)$.

At this point, one can start using the existing theory of equidistribution of orbits on homogeneous spaces $G/\Gamma$ (of which nilmanifolds are an important example).  It turns out that the existing theory is not quite quantitative enough for our purposes, and we had to develop a quantitative analogue of this theory; see these blog posts for more discussion.  Anyway, it all works, and gives asymptotics for progressions of length 4 in the primes, as well as other linear patterns of similar “complexity” (e.g. any non-degenerate system of two equations in four prime unknowns is OK).  To handle higher patterns, what we need is

Inverse conjecture for arithmetic progressions. (Informal statement)  Let $k \geq 3$.  Suppose that A is a subset of $\{1,\ldots,N\}$ of density $\delta$ for which (3) fails.  Then A correlates with a non-trivial $k-2$-step nilsequence $\chi(n) = F(T^n x_0)$ for some (k-2)-step nilmanifold $G/\Gamma$ (of bounded “complexity”), some group rotation $T: x \mapsto gx$, some starting point $x \in G/\Gamma$, and some function $F: G/\Gamma \to {\Bbb C}$ (also of “bounded complexity”).

This is a consequence of (and very closely related to) the inverse conjecture for the Gowers norm.  It is already known for k=3 and k=4, and hopefully the higher k cases will be resolved in the near future, presumably using a mix of techniques from Fourier analysis, additive combinatorics, and ergodic theory.  At this time of writing, this is the only remaining obstacle before we can understand the asymptotics of linear patterns in primes which genuinely involve two or more free parameters (as mentioned earlier, one-parameter problems such as the twin prime conjecture seem well out of reach of these methods for a number of reasons, one of which is that there is definitely no analogue of the inverse conjecture for such one-parameter patterns).