You are currently browsing the monthly archive for March 2016.

Tamar Ziegler and I have just uploaded to the arXiv two related papers: “Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteoristic factors” and “polynomial patterns in primes“, with the former developing a “quantitative Bessel inequality” for local Gowers norms that is crucial in the latter.

We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a joint direction. A trivial example of such a concatenation theorem is the following: if a function {f: {\bf Z} \times {\bf Z} \rightarrow {\bf R}} is constant in the first variable (thus {x \mapsto f(x,y)} is constant for each {y}), and also constant in the second variable (thus {y \mapsto f(x,y)} is constant for each {x}), then it is constant in the joint variable {(x,y)}. A slightly less trivial example: if a function {f: {\bf Z} \times {\bf Z} \rightarrow {\bf R}} is affine-linear in the first variable (thus, for each {y}, there exist {\alpha(y), \beta(y)} such that {f(x,y) = \alpha(y) x + \beta(y)} for all {x}) and affine-linear in the second variable (thus, for each {x}, there exist {\gamma(x), \delta(x)} such that {f(x,y) = \gamma(x)y + \delta(x)} for all {y}) then {f} is a quadratic polynomial in {x,y}; in fact it must take the form

\displaystyle f(x,y) = \epsilon xy + \zeta x + \eta y + \theta \ \ \ \ \ (1)


for some real numbers {\epsilon, \zeta, \eta, \theta}. (This can be seen for instance by using the affine linearity in {y} to show that the coefficients {\alpha(y), \beta(y)} are also affine linear.)

The same phenomenon extends to higher degree polynomials. Given a function {f: G \rightarrow K} from one additive group {G} to another, we say that {f} is of degree less than {d} along a subgroup {H} of {G} if all the {d}-fold iterated differences of {f} along directions in {H} vanish, that is to say

\displaystyle \partial_{h_1} \dots \partial_{h_d} f(x) = 0

for all {x \in G} and {h_1,\dots,h_d \in H}, where {\partial_h} is the difference operator

\displaystyle \partial_h f(x) := f(x+h) - f(x).

(We adopt the convention that the only {f} of degree less than {0} is the zero function.)

We then have the following simple proposition:

Proposition 1 (Concatenation of polynomiality) Let {f: G \rightarrow K} be of degree less than {d_1} along one subgroup {H_1} of {G}, and of degree less than {d_2} along another subgroup {H_2} of {G}, for some {d_1,d_2 \geq 1}. Then {f} is of degree less than {d_1+d_2-1} along the subgroup {H_1+H_2} of {G}.

Note the previous example was basically the case when {G = {\bf Z} \times {\bf Z}}, {H_1 = {\bf Z} \times \{0\}}, {H_2 = \{0\} \times {\bf Z}}, {K = {\bf R}}, and {d_1=d_2=2}.

Proof: The claim is trivial for {d_1=1} or {d_2=1} (in which {f} is constant along {H_1} or {H_2} respectively), so suppose inductively {d_1,d_2 \geq 2} and the claim has already been proven for smaller values of {d_1-1}.

We take a derivative in a direction {h_1 \in H_1} along {h_1} to obtain

\displaystyle T^{-h_1} f = f + \partial_{h_1} f

where {T^{-h_1} f(x) = f(x+h_1)} is the shift of {f} by {-h_1}. Then we take a further shift by a direction {h_2 \in H_2} to obtain

\displaystyle T^{-h_1-h_2} f = T^{-h_2} f + T^{-h_2} \partial_{h_1} f = f + \partial_{h_2} f + T^{-h_2} \partial_{h_1} f

leading to the cocycle equation

\displaystyle \partial_{h_1+h_2} f = \partial_{h_2} f + T^{-h_2} \partial_{h_1} f.

Since {f} has degree less than {d_1} along {H_1} and degree less than {d_2} along {H_2}, {\partial_{h_1} f} has degree less than {d_1-1} along {H_1} and less than {d_2} along {H_2}, so is degree less than {d_1+d_2-2} along {H_1+H_2} by induction hypothesis. Similarly {\partial_{h_2} f} is also of degree less than {d_1+d_2-2} along {H_1+H_2}. Combining this with the cocycle equation we see that {\partial_{h_1+h_2}f} is of degree less than {d_1+d_2-2} along {H_1+H_2} for any {h_1+h_2 \in H_1+H_2}, and hence {f} is of degree less than {d_1+d_2-1} along {H_1+H_2}, as required. \Box

While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:

  • (i) One should perform induction on the degrees {d_1,d_2} involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree {d} along some subgroup {H} of directions iff all of its first derivatives along {H} are of degree less than {d-1}).
  • (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function {f} is of degree less than {d} along some subgroup {H}, then any derivative {\partial_k f} of {f} is also of degree less than {d} along {H}, even if {k} does not belong to {H}.

Here is another simple example of a concatenation theorem. Suppose an at most countable additive group {G} acts by measure-preserving shifts {T: g \mapsto T^g} on some probability space {(X, {\mathcal X}, \mu)}; we call the pair {(X,T)} (or more precisely {(X, {\mathcal X}, \mu, T)}) a {G}-system. We say that a function {f \in L^\infty(X)} is a generalised eigenfunction of degree less than {d} along some subgroup {H} of {G} and some {d \geq 1} if one has

\displaystyle T^h f = \lambda_h f

almost everywhere for all {h \in H}, and some functions {\lambda_h \in L^\infty(X)} of degree less than {d-1} along {H}, with the convention that a function has degree less than {0} if and only if it is equal to {1}. Thus for instance, a function {f} is an generalised eigenfunction of degree less than {1} along {H} if it is constant on almost every {H}-ergodic component of {G}, and is a generalised function of degree less than {2} along {H} if it is an eigenfunction of the shift action on almost every {H}-ergodic component of {G}. A basic example of a higher order eigenfunction is the function {f(x,y) := e^{2\pi i y}} on the skew shift {({\bf R}/{\bf Z})^2} with {{\bf Z}} action given by the generator {T(x,y) := (x+\alpha,y+x)} for some irrational {\alpha}. One can check that {T^h f = \lambda_h f} for every integer {h}, where {\lambda_h: x \mapsto e^{2\pi i \binom{h}{2} \alpha} e^{2\pi i h x}} is a generalised eigenfunction of degree less than {2} along {{\bf Z}}, so {f} is of degree less than {3} along {{\bf Z}}.

We then have

Proposition 2 (Concatenation of higher order eigenfunctions) Let {(X,T)} be a {G}-system, and let {f \in L^\infty(X)} be a generalised eigenfunction of degree less than {d_1} along one subgroup {H_1} of {G}, and a generalised eigenfunction of degree less than {d_2} along another subgroup {H_2} of {G}, for some {d_1,d_2 \geq 1}. Then {f} is a generalised eigenfunction of degree less than {d_1+d_2-1} along the subgroup {H_1+H_2} of {G}.

The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than {d} along {H} is preserved by multiplication and shifts, as well as the operation of “taking derivatives” {f \mapsto \lambda_k} even along directions {k} that do not lie in {H}. (To prove this latter claim, one should restrict to the region where {f} is non-zero, and then divide {T^k f} by {f} to locate {\lambda_k}.)

A typical example of this proposition in action is as follows: consider the {{\bf Z}^2}-system given by the {3}-torus {({\bf R}/{\bf Z})^3} with generating shifts

\displaystyle T^{(1,0)}(x,y,z) := (x+\alpha,y,z+y)

\displaystyle T^{(0,1)}(x,y,z) := (x,y+\alpha,z+x)

for some irrational {\alpha}, which can be checked to give a {{\bf Z}^2} action

\displaystyle T^{(n,m)}(x,y,z) := (x+n\alpha, y+m\alpha, z+ny+mx+nm\alpha).

The function {f(x,y,z) := e^{2\pi i z}} can then be checked to be a generalised eigenfunction of degree less than {2} along {{\bf Z} \times \{0\}}, and also less than {2} along {\{0\} \times {\bf Z}}, and less than {3} along {{\bf Z}^2}. One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).

The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the Host-Kra characteristic factors {Z^{<d}_H(X)} of a {G}-system {X} along a subgroup {H}. These factors can be defined in a number of ways. One is by duality, using the Gowers-Host-Kra uniformity seminorms (defined for instance here) {\| \|_{U^d_H(X)}}. Namely, {Z^{<d}_H(X)} is the factor of {X} defined up to equivalence by the requirement that

\displaystyle \|f\|_{U^d_H(X)} = 0 \iff {\bf E}(f | Z^{<d}_H(X) ) = 0.

An equivalent definition is in terms of the dual functions {{\mathcal D}^d_H(f)} of {f} along {H}, which can be defined recursively by setting {{\mathcal D}^0_H(f) = 1} and

\displaystyle {\mathcal D}^d_H(f) = {\bf E}_h T^h f {\mathcal D}^{d-1}( f \overline{T^h f} )

where {{\bf E}_h} denotes the ergodic average along a Følner sequence in {G} (in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor {Z^{<d}_H(X)} can then be alternately defined as the factor generated by the dual functions {{\mathcal D}^d_H(f)} for {f \in L^\infty(X)}.

In the case when {G=H={\bf Z}} and {X} is {G}-ergodic, a deep theorem of Host and Kra shows that the factor {Z^{<d}_H(X)} is equivalent to the inverse limit of nilsystems of step less than {d}. A similar statement holds with {{\bf Z}} replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when {X} is not {G}-ergodic, or when {X} is {G}-ergodic but {H} is a proper subgroup of {G} acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).

One of our main theorems is then

Proposition 3 (Concatenation of characteristic factors) Let {(X,T)} be a {G}-system, and let {f} be measurable with respect to the factor {Z^{<d_1}_{H_1}(X)} and with respect to the factor {Z^{<d_2}_{H_2}(X)} for some {d_1,d_2 \geq 1} and some subgroups {H_1,H_2} of {G}. Then {f} is also measurable with respect to the factor {Z^{<d_1+d_2-1}_{H_1+H_2}(X)}.

We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type {<d} (along a subgroup {H})”, which can be used to inductively describe the factors {Z^{<d}_H}, and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small {U^{d_1}_{H_1}} norm, and also to all bounded functions of small {U^{d_2}_{H_2}} norm, is also nearly orthogonal to alll bounded functions of small {U^{d_1+d_2-1}_{H_1+H_2}} norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions {F := {\mathcal D}^d_H(f)} obey a property analogous to being a generalised eigenfunction, namely that

\displaystyle T^h F = {\bf E}_k \lambda_{h,k} F_k

where {F_k := T^k F} and {\lambda_{h,k} := {\mathcal D}^{d-1}( T^h f \overline{T^k f} )} is a “structured function of order {d-1}” along {H}. (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order {d}.) Again, the point (ii) above is crucial, and in particular it is key that any structure that {F} has is inherited by the associated functions {\lambda_{h,k}} and {F_k}. This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and {\sigma}-algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.

For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in {U^{d_1+d_2-1}_{H_1+H_2}} norm can be split into a component that is small in {U^{d_1}_{H_1}} norm, and a component that is small in {U^{d_2}_{H_2}} norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of {H_1+H_2}, can be decomposed as the sum of a function that has mean zero on every {H_1} coset, and a function that has mean zero on every {H_2} coset. This is dual to the assertion that a function that is constant on every {H_1} coset and constant on every {H_2} coset, is constant on every {H_1+H_2} coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups {H_1,\dots,H_k} and a bounded function is small in {U^{2d-1}_{H_i+H_j}} norm for most {i,j}, then it is also small in {U^d_{H_i}} norm for most {i}. (Here is a baby version one may wish to warm up on: if a function {f} has small mean on {({\bf Z}/p{\bf Z})^2} for some large prime {p}, then it has small mean on most of the cosets of most of the one-dimensional subgroups of {({\bf Z}/p{\bf Z})^2}.)

There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups {H_i} are replaced by more general coset progressions {H_i+P_i} (of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as {U^d_{P_i}} by “global” Gowers uniformity norms such as {U^{2d-1}_{P_i+P_j}}. This turns out to be particularly useful when attempting to compute polynomial averages such as

\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} f(n) g(n+r^2) h(n+2r^2) \ \ \ \ \ (2)


for various functions {f,g,h}. After repeated use of the van der Corput lemma, one can control such averages by expressions such as

\displaystyle \sum_{n \leq N} \sum_{h,m,k \leq \sqrt{N}} f(n) f(n+mh) f(n+mk) f(n+m(h+k))

(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various {U^2} Gowers uniformity norms of {f} along arithmetic progressions of the form {\{ mh: h \leq \sqrt{N}\}} for various {m \leq \sqrt{N}}. Using the above Bessel inequality, this can be controlled in turn by an average of various {U^3} Gowers uniformity norms along rank two generalised arithmetic progressions of the form {\{ m_1 h_1 + m_2 h_2: h_1,h_2 \le \sqrt{N}\}} for various {m_1,m_2 \leq \sqrt{N}}. But for generic {m_1,m_2}, this rank two progression is close in a certain technical sense to the “global” interval {\{ n: n \leq N \}} (this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of global Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as

\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} \mu(n) \mu(n+r^2) \mu(n+2r^2)


\displaystyle \sum_{n \leq N} \sum_{r \leq \sqrt{N}} \Lambda(n) \Lambda(n+r^2) \Lambda(n+2r^2)

where {\mu} and {\Lambda} are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).

By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:

Theorem 4 (Polynomial patterns in the primes) Let {P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}} be polynomials of degree at most {d}, whose degree {d} coefficients are all distinct, for some {d \geq 1}. Suppose that {P_1,\dots,P_k} is admissible in the sense that for every prime {p}, there are {n,r} such that {n+P_1(r),\dots,n+P_k(r)} are all coprime to {p}. Then there exist infinitely many pairs {n,r} of natural numbers such that {n+P_1(r),\dots,n+P_k(r)} are prime.

Furthermore, we obtain an asymptotic for the number of such pairs {n,r} in the range {n \leq N}, {r \leq N^{1/d}} (actually for minor technical reasons we reduce the range of {r} to be very slightly less than {N^{1/d}}). In fact one could in principle obtain asymptotics for smaller values of {r}, and relax the requirement that the degree {d} coefficients be distinct with the requirement that no two of the {P_i} differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form {n, n+r,n+r^d} unconditionally for {d \leq 5}, and conditionally on GRH for all {d}, using known results on primes in short intervals on average.

The {d=1} case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher {d}, an older result of Tamar and myself was able to tackle the case when {P_1(0)=\dots=P_k(0)=0} (though our results there only give lower bounds on the number of pairs {(n,r)}, and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.

. This latter Bessel type inequality is particularly useful in combinatorial and number-theoretic applications, as it allows one to convert “global” Gowers uniformity norm (basically, bounds on norms such as {U^{2d-1}_{H_i+H_j}}) to “local” Gowers uniformity norm control.

There is a very nice recent paper by Lemke Oliver and Soundararajan (complete with a popular science article about it by the consistently excellent Erica Klarreich for Quanta) about a surprising (but now satisfactorily explained) bias in the distribution of pairs of consecutive primes {p_n, p_{n+1}} when reduced to a small modulus {q}.

This phenomenon is superficially similar to the more well known Chebyshev bias concerning the reduction of a single prime {p_n} to a small modulus {q}, but is in fact a rather different (and much stronger) bias than the Chebyshev bias, and seems to arise from a completely different source. The Chebyshev bias asserts, roughly speaking, that a randomly selected prime {p} of a large magnitude {x} will typically (though not always) be slightly more likely to be a quadratic non-residue modulo {q} than a quadratic residue, but the bias is small (the difference in probabilities is only about {O(1/\sqrt{x})} for typical choices of {x}), and certainly consistent with known or conjectured positive results such as Dirichlet’s theorem or the generalised Riemann hypothesis. The reason for the Chebyshev bias can be traced back to the von Mangoldt explicit formula which relates the distribution of the von Mangoldt function {\Lambda} modulo {q} with the zeroes of the {L}-functions with period {q}. This formula predicts (assuming some standard conjectures like GRH) that the von Mangoldt function {\Lambda} is quite unbiased modulo {q}. The von Mangoldt function is mostly concentrated in the primes, but it also has a medium-sized contribution coming from squares of primes, which are of course all located in the quadratic residues modulo {q}. (Cubes and higher powers of primes also make a small contribution, but these are quite negligible asymptotically.) To balance everything out, the contribution of the primes must then exhibit a small preference towards quadratic non-residues, and this is the Chebyshev bias. (See this article of Rubinstein and Sarnak for a more technical discussion of the Chebyshev bias, and this survey of Granville and Martin for an accessible introduction. The story of the Chebyshev bias is also related to Skewes’ number, once considered the largest explicit constant to naturally appear in a mathematical argument.)

The paper of Lemke Oliver and Soundararajan considers instead the distribution of the pairs {(p_n \hbox{ mod } q, p_{n+1} \hbox{ mod } q)} for small {q} and for large consecutive primes {p_n, p_{n+1}}, say drawn at random from the primes comparable to some large {x}. For sake of discussion let us just take {q=3}. Then all primes {p_n} larger than {3} are either {1 \hbox{ mod } 3} or {2 \hbox{ mod } 3}; Chebyshev’s bias gives a very slight preference to the latter (of order {O(1/\sqrt{x})}, as discussed above), but apart from this, we expect the primes to be more or less equally distributed in both classes. For instance, assuming GRH, the probability that {p_n} lands in {1 \hbox{ mod } 3} would be {1/2 + O( x^{-1/2+o(1)} )}, and similarly for {2 \hbox{ mod } 3}.

In view of this, one would expect that up to errors of {O(x^{-1/2+o(1)})} or so, the pair {(p_n \hbox{ mod } 3, p_{n+1} \hbox{ mod } 3)} should be equally distributed amongst the four options {(1 \hbox{ mod } 3, 1 \hbox{ mod } 3)}, {(1 \hbox{ mod } 3, 2 \hbox{ mod } 3)}, {(2 \hbox{ mod } 3, 1 \hbox{ mod } 3)}, {(2 \hbox{ mod } 3, 2 \hbox{ mod } 3)}, thus for instance the probability that this pair is {(1 \hbox{ mod } 3, 1 \hbox{ mod } 3)} would naively be expected to be {1/4 + O(x^{-1/2+o(1)})}, and similarly for the other three tuples. These assertions are not yet proven (although some non-trivial upper and lower bounds for such probabilities can be obtained from recent work of Maynard).

However, Lemke Oliver and Soundararajan argue (backed by both plausible heuristic arguments (based ultimately on the Hardy-Littlewood prime tuples conjecture), as well as substantial numerical evidence) that there is a significant bias away from the tuples {(1 \hbox{ mod } 3, 1 \hbox{ mod } 3)} and {(2 \hbox{ mod } 3, 2 \hbox{ mod } 3)} – informally, adjacent primes don’t like being in the same residue class! For instance, they predict that the probability of attaining {(1 \hbox{ mod } 3, 1 \hbox{ mod } 3)} is in fact

\displaystyle  \frac{1}{4} - \frac{1}{8} \frac{\log\log x}{\log x} + O( \frac{1}{\log x} )

with similar predictions for the other three pairs (in fact they give a somewhat more precise prediction than this). The magnitude of this bias, being comparable to {\log\log x / \log x}, is significantly stronger than the Chebyshev bias of {O(1/\sqrt{x})}.

One consequence of this prediction is that the prime gaps {p_{n+1}-p_n} are slightly less likely to be divisible by {3} than naive random models of the primes would predict. Indeed, if the four options {(1 \hbox{ mod } 3, 1 \hbox{ mod } 3)}, {(1 \hbox{ mod } 3, 2 \hbox{ mod } 3)}, {(2 \hbox{ mod } 3, 1 \hbox{ mod } 3)}, {(2 \hbox{ mod } 3, 2 \hbox{ mod } 3)} all occurred with equal probability {1/4}, then {p_{n+1}-p_n} should equal {0 \hbox{ mod } 3} with probability {1/2}, and {1 \hbox{ mod } 3} and {2 \hbox{ mod } 3} with probability {1/4} each (as would be the case when taking the difference of two random numbers drawn from those integers not divisible by {3}); but the Lemke Oliver-Soundararajan bias predicts that the probability of {p_{n+1}-p_n} being divisible by three should be slightly lower, being approximately {1/2 - \frac{1}{4} \frac{\log\log x}{\log x}}.

Below the fold we will give a somewhat informal justification of (a simplified version of) this phenomenon, based on the Lemke Oliver-Soundararajan calculation using the prime tuples conjecture.

Read the rest of this entry »

Van Vu and I just posted to the arXiv our paper “sum-free sets in groups” (submitted to Discrete Analysis), as well as a companion survey article (submitted to J. Comb.). Given a subset {A} of an additive group {G = (G,+)}, define the quantity {\phi(A)} to be the cardinality of the largest subset {B} of {A} which is sum-free in {A} in the sense that all the sums {b_1+b_2} with {b_1,b_2} distinct elements of {B} lie outside of {A}. For instance, if {A} is itself a group, then {\phi(A)=1}, since no two elements of {A} can sum to something outside of {A}. More generally, if {A} is the union of {k} groups, then {\phi(A)} is at most {k}, thanks to the pigeonhole principle.

If {G} is the integers, then there are no non-trivial subgroups, and one can thus expect {\phi(A)} to start growing with {A}. For instance, one has the following easy result:

Proposition 1 Let {A} be a set of {2^k} natural numbers. Then {\phi(A) > k}.

Proof: We use an argument of Ruzsa, which is based in turn on an older argument of Choi. Let {x_1} be the largest element of {A}, and then recursively, once {x_1,\dots,x_i} has been selected, let {x_{i+1}} be the largest element of {A} not equal to any of the {x_1,\dots,x_i}, such that {x_{i+1}+x_j \not \in A} for all {j=1,\dots,i}, terminating this construction when no such {x_{i+1}} can be located. This gives a sequence {x_1 > x_2 > \dots > x_m} of elements in {A} which are sum-free in {A}, and with the property that for any {y \in A}, either {y} is equal to one of the {x_i}, or else {y + x_i \in A} for some {i} with {x_i > y}. Iterating this, we see that any {y \in A} is of the form {x_{i_1} - x_{i_2} - \dots - x_{i_j}} for some {j \geq 1} and {1 \leq i_1 < i_2 < \dots \leq i_j \leq m}. The number of such expressions {x_{i_1} - x_{i_2} - \dots - x_{i_j}} is at most {2^{m}-1}, thus {2^k \leq 2^m-1} which implies {m \geq k+1}. Since {\phi(A) \geq m}, the claim follows. \Box

In particular, we have {\phi(A) \gg \log |A|} for subsets {A} of the integers. It has been possible to improve upon this easy bound, but only with remarkable effort. The best lower bound currently is

\displaystyle \phi(A) \geq \log |A| (\log\log|A|)^{1/2 - o(1)},

a result of Shao (building upon earlier work of Sudakov, Szemeredi, and Vu and of Dousse). In the opposite direction, a construction of Ruzsa gives examples of large sets {A} with {\phi(A) \leq \exp( O( \sqrt{\log |A|} ) )}.

Using the standard tool of Freiman homomorphisms, the above results for the integers extend to other torsion-free abelian groups {G}. In our paper we study the opposite case where {G} is finite (but still abelian). In this paper of Erdös (in which the quantity {\phi(A)} was first introduced), the following question was posed: if {A} is sufficiently large depending on {\phi(A)}, does this imply the existence of two elements {x,y \in A} with {x+y=0}? As it turns out, we were able to find some simple counterexamples to this statement. For instance, if {H} is any finite additive group, then the set {A := \{ 1 \hbox{ mod } 7, 2 \hbox{ mod } 7, 4 \hbox{ mod } 7\} \times H \subset {\bf Z}/7{\bf Z} \times H} has {\phi(A)=3} but with no {x,y \in A} summing to zero; this type of example in fact works with {7} replaced by any larger Mersenne prime, and we also have a counterexample in {{\bf Z}/2^n{\bf Z}} for {n} arbitrarily large. However, in the positive direction, we can show that the answer to Erdös’s question is positive if {|G|} is assumed to have no small prime factors. That is to say,

Theorem 2 For every {k \geq 1} there exists {C \geq 1} such that if {G} is a finite abelian group whose order is not divisible by any prime less than or equal to {C}, and {A} is a subset of {G} with order at least {C} and {\phi(A) \leq k}, then there exist {x,y \in A} with {x+y=0}.

There are two main tools used to prove this result. One is an “arithmetic removal lemma” proven by Král, Serra, and Vena. Note that the condition {\phi(A) \leq k} means that for any distinct {x_1,\dots,x_{k+1} \in A}, at least one of the {x_i+x_j}, {1 \leq i < j \leq k+1}, must also lie in {A}. Roughly speaking, the arithmetic removal lemma allows one to “almost” remove the requirement that {x_1,\dots,x_{k+1}} be distinct, which basically now means that {x \in A \implies 2x \in A} for almost all {x \in A}. This near-dilation symmetry, when combined with the hypothesis that {|G|} has no small prime factors, gives a lot of “dispersion” in the Fourier coefficients of {1_A} which can now be exploited to prove the theorem.

The second tool is the following structure theorem, which is the main result of our paper, and goes a fair ways towards classifying sets {A} for which {\phi(A)} is small:

Theorem 3 Let {A} be a finite subset of an arbitrary additive group {G}, with {\phi(A) \leq k}. Then one can find finite subgroups {H_1,\dots,H_m} with {m \leq k} such that {|A \cap H_i| \gg_k |H_i|} and {|A \backslash (H_1 \cup \dots \cup H_m)| \ll_k 1}. Furthermore, if {m=k}, then the exceptional set {A \backslash (H_1 \cup \dots \cup H_m)} is empty.

Roughly speaking, this theorem shows that the example of the union of {k} subgroups mentioned earlier is more or less the “only” example of sets {A} with {\phi(A) \leq k}, modulo the addition of some small exceptional sets and some refinement of the subgroups to dense subsets.

This theorem has the flavour of other inverse theorems in additive combinatorics, such as Freiman’s theorem, and indeed one can use Freiman’s theorem (and related tools, such as the Balog-Szemeredi theorem) to easily get a weaker version of this theorem. Indeed, if there are no sum-free subsets of {A} of order {k+1}, then a fraction {\gg_k 1} of all pairs {a,b} in {A} must have their sum also in {A} (otherwise one could take {k+1} random elements of {A} and they would be sum-free in {A} with positive probability). From this and the Balog-Szemeredi theorem and Freiman’s theorem (in arbitrary abelian groups, as established by Green and Ruzsa), we see that {A} must be “commensurate” with a “coset progression” {H+P} of bounded rank. One can then eliminate the torsion-free component {P} of this coset progression by a number of methods (e.g. by using variants of the argument in Proposition 1), with the upshot being that one can locate a finite group {H_1} that has large intersection with {A}.

At this point it is tempting to simply remove {H_1} from {A} and iterate. But one runs into a technical difficulty that removing a set such as {H_1} from {A} can alter the quantity {\phi(A)} in unpredictable ways, so one has to still keep {H_1} around when analysing the residual set {A \backslash H_1}. A second difficulty is that the latter set {A \backslash H_1} could be considerably smaller than {A} or {H_1}, but still large in absolute terms, so in particular any error term whose size is only bounded by {\varepsilon |A|} for a small {\varepsilon} could be massive compared with the residual set {A\backslash H_1}, and so such error terms would be unacceptable. One can get around these difficulties if one first performs some preliminary “normalisation” of the group {H_1}, so that the residual set {A \backslash H_1} does not intersect any coset of {H_1} too strongly. The arguments become even more complicated when one starts removing more than one group {H_1,\dots,H_i} from {A} and analyses the residual set {A \backslash (H_1 \cup \dots \cup H_i)}; indeed the “epsilon management” involved became so fearsomely intricate that we were forced to use a nonstandard analysis formulation of the problem in order to keep the complexity of the argument at a reasonable level (cf. my previous blog post on this topic). One drawback of doing so is that we have no effective bounds for the implied constants in our main theorem; it would be of interest to obtain a more direct proof of our main theorem that would lead to effective bounds.

I’ve just uploaded to the arXiv my paper Finite time blowup for high dimensional nonlinear wave systems with bounded smooth nonlinearity, submitted to Comm. PDE. This paper is in the same spirit as (though not directly related to) my previous paper on finite time blowup of supercritical NLW systems, and was inspired by a question posed to me some time ago by Jeffrey Rauch. Here, instead of looking at supercritical equations, we look at an extremely subcritical equation, namely a system of the form

\displaystyle \Box u = f(u) \ \ \ \ \ (1)


where {u: {\bf R}^{1+d} \rightarrow {\bf R}^m} is the unknown field, and {f: {\bf R}^m \rightarrow {\bf R}^m} is the nonlinearity, which we assume to have all derivatives bounded. A typical example of such an equation is the higher-dimensional sine-Gordon equation

\displaystyle \Box u = \sin u

for a scalar field {u: {\bf R}^{1+d} \rightarrow {\bf R}}. Here {\Box = -\partial_t^2 + \Delta} is the d’Alembertian operator. We restrict attention here to classical (i.e. smooth) solutions to (1).

We do not assume any Hamiltonian structure, so we do not require {f} to be a gradient {f = \nabla F} of a potential {F: {\bf R}^m \rightarrow {\bf R}}. But even without such Hamiltonian structure, the equation (1) is very well behaved, with many a priori bounds available. For instance, if the initial position {u_0(x) = u(0,x)} and initial velocity {u_1(x) = \partial_t u(0,x)} are smooth and compactly supported, then from finite speed of propagation {u(t)} has uniformly bounded compact support for all {t} in a bounded interval. As the nonlinearity {f} is bounded, this immediately places {f(u)} in {L^\infty_t L^2_x} in any bounded time interval, which by the energy inequality gives an a priori {L^\infty_t H^1_x} bound on {u} in this time interval. Next, from the chain rule we have

\displaystyle \nabla f(u) = (\nabla_{{\bf R}^m} f)(u) \nabla u

which (from the assumption that {\nabla_{{\bf R}^m} f} is bounded) shows that {f(u)} is in {L^\infty_t H^1_x}, which by the energy inequality again now gives an a priori {L^\infty_t H^2_x} bound on {u}.

One might expect that one could keep iterating this and obtain a priori bounds on {u} in arbitrarily smooth norms. In low dimensions such as {d \leq 3}, this is a fairly easy task, since the above estimates and Sobolev embedding already place one in {L^\infty_t L^\infty_x}, and the nonlinear map {f} is easily verified to preserve the space {L^\infty_t H^k_x \cap L^\infty_t L^\infty_x} for any natural number {k}, from which one obtains a priori bounds in any Sobolev space; from this and standard energy methods, one can then establish global regularity for this equation (that is to say, any smooth choice of initial data generates a global smooth solution). However, one starts running into trouble in higher dimensions, in which no {L^\infty_x} bound is available. The main problem is that even a really nice nonlinearity such as {u \mapsto \sin u} is unbounded in higher Sobolev norms. The estimates

\displaystyle |\sin u| \leq |u|


\displaystyle |\nabla(\sin u)| \leq |\nabla u|

ensure that the map {u \mapsto \sin u} is bounded in low regularity spaces like {L^2_x} or {H^1_x}, but one already runs into trouble with the second derivative

\displaystyle \nabla^2(\sin u) = (\cos u) \nabla^2 u - (\sin u) \nabla u \nabla u

where there is a troublesome lower order term of size {O( |\nabla u|^2 )} which becomes difficult to control in higher dimensions, preventing the map {u \mapsto \sin u} to be bounded in {H^2_x}. Ultimately, the issue here is that when {u} is not controlled in {L^\infty}, the function {\sin u} can oscillate at a much higher frequency than {u}; for instance, if {u} is the one-dimensional wave {u = A \sin(kx)}for some {k > 0} and {A>1}, then {u} oscillates at frequency {k}, but the function {\sin(u)= \sin(A \sin(kx))} more or less oscillates at the larger frequency {Ak}.

In medium dimensions, it is possible to use dispersive estimates for the wave equation (such as the famous Strichartz estimates) to overcome these problems. This line of inquiry was pursued (albeit for slightly different classes of nonlinearity {f} than those considered here) by Heinz-von Wahl, Pecher (in a series of papers), Brenner, and Brenner-von Wahl; to cut a long story short, one of the conclusions of these papers was that one had global regularity for equations such as (1) in dimensions {d \leq 9}. (I reprove this result using modern Strichartz estimate and Littlewood-Paley techniques in an appendix to my paper. The references given also allow for some growth in the nonlinearity {f}, but we will not detail the precise hypotheses used in these papers here.)

In my paper, I complement these positive results with an almost matching negative result:

Theorem 1 If {d \geq 11} and {m \geq 2}, then there exists a nonlinearity {f: {\bf R}^m \rightarrow {\bf R}^m} with all derivatives bounded, and a solution {u} to (1) that is smooth at time zero, but develops a singularity in finite time.

The construction crucially relies on the ability to choose the nonlinearity {f}, and also needs some injectivity properties on the solution {u: {\bf R}^{1+d} \rightarrow {\bf R}^m} (after making a symmetry reduction using an assumption of spherical symmetry to view {u} as a function of {1+1} variables rather than {1+d}) which restricts our counterexample to the {m \geq 2} case. Thus the model case of the higher-dimensional sine-Gordon equation {\Box u =\sin u} is not covered by our arguments. Nevertheless (as with previous finite-time blowup results discussed on this blog), one can view this result as a barrier to trying to prove regularity for equations such as {\Box u = \sin u} in eleven and higher dimensions, as any such argument must somehow use a property of that equation that is not applicable to the more general system (1).

Let us first give some back-of-the-envelope calculations suggesting why there could be finite time blowup in eleven and higher dimensions. For sake of this discussion let us restrict attention to the sine-Gordon equation {\Box u = \sin u}. The blowup ansatz we will use is as follows: for each frequency {N_j} in a sequence {1 < N_1 < N_2 < N_3 < \dots} of large quantities going to infinity, there will be a spacetime “cube” {Q_j = \{ (t,x): t \sim \frac{1}{N_j}; x = O(\frac{1}{N_j})\}} on which the solution {u} oscillates with “amplitude” {N_j^\alpha} and “frequency” {N_j}, where {\alpha>0} is an exponent to be chosen later; this ansatz is of course compatible with the uncertainty principle. Since {N_j^\alpha \rightarrow \infty} as {j \rightarrow \infty}, this will create a singularity at the spacetime origin {(0,0)}. To make this ansatz plausible, we wish to make the oscillation of {u} on {Q_j} driven primarily by the forcing term {\sin u} at {Q_{j-1}}. Thus, by Duhamel’s formula, we expect a relation roughly of the form

\displaystyle u(t,x) \approx \int \frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}} \sin(1_{Q_{j-1}} u(s)) (x)\ ds

on {Q_j}, where {\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}} is the usual free wave propagator, and {1_{Q_{j-1}}} is the indicator function of {Q_{j-1}}.

On {Q_{j-1}}, {u} oscillates with amplitude {N_{j-1}^\alpha} and frequency {N_{j-1}}, we expect the derivative {\nabla_{t,x} u} to be of size about {N_{j-1}^{\alpha+1}}, and so from the principle of stationary phase we expect {\sin(u)} to oscillate at frequency about {N_{j-1}^{\alpha+1}}. Since the wave propagator {\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}} preserves frequencies, and {u} is supposed to be of frequency {N_j} on {Q_j} we are thus led to the requirement

\displaystyle N_j \approx N_{j-1}^{\alpha+1}. \ \ \ \ \ (2)


Next, when restricted to frequencies of order {N_{j}}, the propagator {\frac{\sin((s-t)\sqrt{-\Delta})}{\sqrt{-\Delta}}} “behaves like” {N_{j}^{\frac{d-3}{2}} (s-t)^{\frac{d-1}{2}} A_{s-t}}, where {A_{s-t}} is the spherical averaging operator

\displaystyle A_{s-t} f(x) := \frac{1}{\omega_{d-1}} \int_{S^{d-1}} f(x + (s-t)\theta)\ d\theta

where {d\theta} is surface measure on the unit sphere {S^{d-1}}, and {\omega_{d-1}} is the volume of that sphere. In our setting, {s-t} is comparable to {1/N_{j-1}}, and so we have the informal approximation

\displaystyle u(t,x) \approx N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \int_{s \sim 1/N_{j-1}} A_{s-t} \sin(u(s))(x)\ ds

on {Q_j}.

Since {\sin(u(s))} is bounded, {A_{s-t} \sin(u(s))} is bounded as well. This gives a (non-rigorous) upper bound

\displaystyle u(t,x) \lessapprox N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \frac{1}{N_{j-1}}

which when combined with our ansatz that {u} has ampitude about {N_j^\alpha} on {Q_j}, gives the constraint

\displaystyle N_j^\alpha \lessapprox N_j^{\frac{d-3}{2}} N_{j-1}^{-\frac{d-1}{2}} \frac{1}{N_{j-1}}

which on applying (2) gives the further constraint

\displaystyle \alpha(\alpha+1) \leq \frac{d-3}{2} (\alpha+1) - \frac{d-1}{2} - 1

which can be rearranged as

\displaystyle \left(\alpha - \frac{d-5}{4}\right)^2 \leq \frac{d^2-10d-7}{16}.

It is now clear that the optimal choice of {\alpha} is

\displaystyle \alpha = \frac{d-5}{4},

and this blowup ansatz is only self-consistent when

\displaystyle \frac{d^2-10d-7}{16} \geq 0

or equivalently if {d \geq 11}.

To turn this ansatz into an actual blowup example, we will construct {u} as the sum of various functions {u_j} that solve the wave equation with forcing term in {Q_{j+1}}, and which concentrate in {Q_j} with the amplitude and frequency indicated by the above heuristic analysis. The remaining task is to show that {\Box u} can be written in the form {f(u)} for some {f} with all derivatives bounded. For this one needs some injectivity properties of {u} (after imposing spherical symmetry to impose a dimensional reduction on the domain of {u} from {d+1} dimensions to {1+1}). This requires one to construct some solutions to the free wave equation that have some unusual restrictions on the range (for instance, we will need a solution taking values in the plane {{\bf R}^2} that avoid one quadrant of that plane). In order to do this we take advantage of the very explicit nature of the fundamental solution to the wave equation in odd dimensions (such as {d=11}), particularly under the assumption of spherical symmetry. Specifically, one can show that in odd dimension {d}, any spherically symmetric function {u(t,x) = u(t,r)} of the form

\displaystyle u(t,r) = \left(\frac{1}{r} \partial_r\right)^{\frac{d-1}{2}} (g(t+r) + g(t-r))

for an arbitrary smooth function {g: {\bf R} \rightarrow {\bf R}^m}, will solve the free wave equation; this is ultimately due to iterating the “ladder operator” identity

\displaystyle \left( \partial_{tt} + \partial_{rr} + \frac{d-1}{r} \partial_r \right) \frac{1}{r} \partial_r = \frac{1}{r} \partial_r \left( \partial_{tt} + \partial_{rr} + \frac{d-3}{r} \partial_r \right).

This precise and relatively simple formula for {u} allows one to create “bespoke” solutions {u} that obey various unusual properties, without too much difficulty.

It is not clear to me what to conjecture for {d=10}. The blowup ansatz given above is a little inefficient, in that the frequency {N_{j+1}} component of the solution is only generated from a portion of the {N_j} component, namely the portion close to a certain light cone. In particular, the solution does not saturate the Strichartz estimates that are used to establish the positive results for {d \leq 9}, which helps explain the slight gap between the positive and negative results. It may be that a more complicated ansatz could work to give a negative result in ten dimensions; conversely, it is also possible that one could use more advanced estimates than the Strichartz estimate (that somehow capture the “thinness” of the fundamental solution, and not just its dispersive properties) to stretch the positive results to ten dimensions. Which side the {d=10} case falls in all come down to some rather delicate numerology.

Nominations for the 2017 Breakthrough Prize in mathematics and the New Horizons Prizes in mathematics are now open.  In 2016, the Breakthrough Prize was awarded to Ian Agol.  The New Horizons prizes are for breakthroughs given by junior mathematicians, usually restricted to within 10 years of PhD; the 2016 prizes were awarded to Andre Neves, Larry Guth, and Peter Scholze (declined).

The rules for the prizes are listed on this page, and nominations can be made at this page.  (No self-nominations are allowed, for the obvious reasons; also, a third-party letter of recommendation is also required.)

Just a quick post to note that the arXiv overlay journal Discrete Analysis, managed by Timothy Gowers, has now gone live with its permanent (and quite modern looking) web site, which is run using the Scholastica platform, as well as the first half-dozen or so accepted papers (including one of my own).  See Tim’s announcement for more details.  I am one of the editors of this journal (and am already handling a few submissions). Needless to say, we are happy to take in more submissions (though they will have to be peer reviewed if they are to be accepted, of course).


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.