You are currently browsing the tag archive for the ‘Tamar Ziegler’ tag.

Tamar Ziegler and I have just uploaded to the arXiv our paper “Narrow progressions in the primes“, submitted to the special issue “Analytic Number Theory” in honor of the 60th birthday of Helmut Maier. The results here are vaguely reminiscent of the recent progress on bounded gaps in the primes, but use different methods.

About a decade ago, Ben Green and I showed that the primes contained arbitrarily long arithmetic progressions: given any {k}, one could find a progression {n, n+r, \dots, n+(k-1)r} with {r>0} consisting entirely of primes. In fact we showed the same statement was true if the primes were replaced by any subset of the primes of positive relative density.

A little while later, Tamar Ziegler and I obtained the following generalisation: given any {k} and any polynomials {P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}} with {P_1(0)=\dots=P_k(0)}, one could find a “polynomial progression” {n+P_1(r),\dots,n+P_k(r)} with {r>0} consisting entirely of primes. Furthermore, we could make this progression somewhat “narrow” by taking {r = n^{o(1)}} (where {o(1)} denotes a quantity that goes to zero as {n} goes to infinity). Again, the same statement also applies if the primes were replaced by a subset of positive relative density. My previous result with Ben corresponds to the linear case {P_i(r) = (i-1)r}.

In this paper we were able to make the progressions a bit narrower still: given any {k} and any polynomials {P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}} with {P_1(0)=\dots=P_k(0)}, one could find a “polynomial progression” {n+P_1(r),\dots,n+P_k(r)} with {r>0} consisting entirely of primes, and such that {r \leq \log^L n}, where {L} depends only on {k} and {P_1,\dots,P_k} (in fact it depends only on {k} and the degrees of {P_1,\dots,P_k}). The result is still true if the primes are replaced by a subset of positive density {\delta}, but unfortunately in our arguments we must then let {L} depend on {\delta}. However, in the linear case {P_i(r) = (i-1)r}, we were able to make {L} independent of {\delta} (although it is still somewhat large, of the order of {k 2^k}).

The polylogarithmic factor is somewhat necessary: using an upper bound sieve, one can easily construct a subset of the primes of density, say, {90\%}, whose arithmetic progressions {n,n+r,\dots,n+(k-1)r} of length {k} all obey the lower bound {r \gg \log^{k-1} n}. On the other hand, the prime tuples conjecture predicts that if one works with the actual primes rather than dense subsets of the primes, then one should have infinitely many length {k} arithmetic progressions of bounded width for any fixed {k}. The {k=2} case of this is precisely the celebrated theorem of Yitang Zhang that was the focus of the recently concluded Polymath8 project here. The higher {k} case is conjecturally true, but appears to be out of reach of known methods. (Using the multidimensional Selberg sieve of Maynard, one can get {m} primes inside an interval of length {O( \exp(O(m)) )}, but this is such a sparse set of primes that one would not expect to find even a progression of length three within such an interval.)

The argument in the previous paper was unable to obtain a polylogarithmic bound on the width of the progressions, due to the reliance on a certain technical “correlation condition” on a certain Selberg sieve weight {\nu}. This correlation condition required one to control arbitrarily long correlations of {\nu}, which was not compatible with a bounded value of {L} (particularly if one wanted to keep {L} independent of {\delta}).

However, thanks to recent advances in this area by Conlon, Fox, and Zhao (who introduced a very nice “densification” technique), it is now possible (in principle, at least) to delete this correlation condition from the arguments. Conlon-Fox-Zhao did this for my original theorem with Ben; and in the current paper we apply the densification method to our previous argument to similarly remove the correlation condition. This method does not fully eliminate the need to control arbitrarily long correlations, but allows most of the factors in such a long correlation to be bounded, rather than merely controlled by an unbounded weight such as {\nu}. This turns out to be significantly easier to control, although in the non-linear case we still unfortunately had to make {L} large compared to {\delta} due to a certain “clearing denominators” step arising from the complicated nature of the Gowers-type uniformity norms that we were using to control polynomial averages. We believe though that this an artefact of our method, and one should be able to prove our theorem with an {L} that is uniform in {\delta}.

Here is a simple instance of the densification trick in action. Suppose that one wishes to establish an estimate of the form

\displaystyle  {\bf E}_n {\bf E}_r f(n) g(n+r) h(n+r^2) = o(1) \ \ \ \ \ (1)

for some real-valued functions {f,g,h} which are bounded in magnitude by a weight function {\nu}, but which are not expected to be bounded; this average will naturally arise when trying to locate the pattern {(n,n+r,n+r^2)} in a set such as the primes. Here I will be vague as to exactly what range the parameters {n,r} are being averaged over. Suppose that the factor {g} (say) has enough uniformity that one can already show a smallness bound

\displaystyle  {\bf E}_n {\bf E}_r F(n) g(n+r) H(n+r^2) = o(1) \ \ \ \ \ (2)

whenever {F, H} are bounded functions. (One should think of {F,H} as being like the indicator functions of “dense” sets, in contrast to {f,h} which are like the normalised indicator functions of “sparse” sets). The bound (2) cannot be directly applied to control (1) because of the unbounded (or “sparse”) nature of {f} and {h}. However one can “densify” {f} and {h} as follows. Since {f} is bounded in magnitude by {\nu}, we can bound the left-hand side of (1) as

\displaystyle  {\bf E}_n \nu(n) | {\bf E}_r g(n+r) h(n+r^2) |.

The weight function {\nu} will be normalised so that {{\bf E}_n \nu(n) = O(1)}, so by the Cauchy-Schwarz inequality it suffices to show that

\displaystyle  {\bf E}_n \nu(n) | {\bf E}_r g(n+r) h(n+r^2) |^2 = o(1).

The left-hand side expands as

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s \nu(n) g(n+r) h(n+r^2) g(n+s) h(n+s^2).

Now, it turns out that after an enormous (but finite) number of applications of the Cauchy-Schwarz inequality to steadily eliminate the {g,h} factors, as well as a certain “polynomial forms condition” hypothesis on {\nu}, one can show that

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s (\nu-1)(n) g(n+r) h(n+r^2) g(n+s) h(n+s^2) = o(1).

(Because of the polynomial shifts, this requires a method known as “PET induction”, but let me skip over this point here.) In view of this estimate, we now just need to show that

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s g(n+r) h(n+r^2) g(n+s) h(n+s^2) = o(1).

Now we can reverse the previous steps. First, we collapse back to

\displaystyle  {\bf E}_n | {\bf E}_r g(n+r) h(n+r^2) |^2 = o(1).

One can bound {|{\bf E}_r g(n+r) h(n+r^2)|} by {{\bf E}_r \nu(n+r) \nu(n+r^2)}, which can be shown to be “bounded on average” in a suitable sense (e.g. bounded {L^4} norm) via the aforementioned polynomial forms condition. Because of this and the Hölder inequality, the above estimate is equivalent to

\displaystyle  {\bf E}_n | {\bf E}_r g(n+r) h(n+r^2) | = o(1).

By setting {F} to be the signum of {{\bf E}_r g(n+r) h(n+r^2)}, this is equivalent to

\displaystyle  {\bf E}_n {\bf E}_r F(n) g(n+r) h(n+r^2) = o(1).

This is halfway between (1) and (2); the sparsely supported function {f} has been replaced by its “densification” {F}, but we have not yet densified {h} to {H}. However, one can shift {n} by {r^2} and repeat the above arguments to achieve a similar densificiation of {h}, at which point one has reduced (1) to (2).

Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:

Theorem 1 (Szemerédi’s theorem in the primes) Let {A} be a subset of the primes {{\mathcal P}} of positive relative density, thus {\limsup_{N \rightarrow \infty} \frac{|A \cap [N]|}{|{\mathcal P} \cap [N]|} > 0}. Then {A} contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of {{\bf Z}^d} necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes) Let {d \geq 1}, and let {A} be a subset of the {d^{th}} Cartesian power {{\mathcal P}^d} of the primes of positive relative density, thus

\displaystyle  \limsup_{N \rightarrow \infty} \frac{|A \cap [N]^d|}{|{\mathcal P}^d \cap [N]^d|} > 0.

Then for any {v_1,\ldots,v_k \in {\bf Z}^d}, {A} contains infinitely many “constellations” of the form {a+r v_1, \ldots, a + rv_k} with {a \in {\bf Z}^k} and {r} a positive integer.

In the case when {A} is itself a Cartesian product of one-dimensional sets (in particular, if {A} is all of {{\mathcal P}^d}), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function {\nu}) of almost primes or almost Gaussian primes, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the linear forms condition and the correlation condition. Very roughly speaking, these conditions assert statements of the following form: if {n} is a randomly selected integer, then the events of {n+h_1,\ldots,n+h_k} simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of {h_1,\ldots,h_k}. Once these conditions are satisfied, one can then run a transference argument (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as {{\mathcal P}^2}, a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square {{\mathcal A}^2} of the almost primes – pairs {(n,m)} whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as {\nu(n) \nu(m)} that is concentrated on a set such as {{\mathcal A}^2}, but let me ignore this distinction for now.) However, this set {{\mathcal A}^2} does not enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed {h, k}, and random {(n,m)}, the four events

\displaystyle  (n,m) \in {\mathcal A}^2

\displaystyle  (n+h,m) \in {\mathcal A}^2

\displaystyle  (n,m+k) \in {\mathcal A}^2

\displaystyle  (n+h,m+k) \in {\mathcal A}^2

do not behave independently (as they would if {{\mathcal A}^2} were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form {(n,m), (n+r,m), (n,m+r)}) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for {{\mathcal A}^2} (or for weights concentrated on {{\mathcal A}^2}) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the result of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an infinite number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire {\sigma}-algebra; for this, it is not enough to specify the measure {\mu(A)} of a single set such as {A}, but one also has to specify the measure {\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)} of “cylinder sets” such as {T^{n_1} A \cap \ldots \cap T^{n_m} A} where {m} could be arbitrarily large. The larger {m} gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights {\nu} we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function {\Lambda}) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure {\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)} of cylinder sets, with each {m} requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to {{\bf F}_{p}^{\omega}}-actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if {(X,{\mathcal X}, \mu,T)} is a measure-preserving {{\bf Z}}-system (which, in this post, means that {(X,{\mathcal X}, \mu)} is a probability space and {T: X \mapsto X} is measure-preserving and invertible, thus giving an action {(T^n)_{n \in {\bf Z}}} of the integers), and {f,g \in L^2(X,{\mathcal X}, \mu)} are functions, and {X} is ergodic (which means that {L^2(X,{\mathcal X}, \mu)} contains no {T}-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average

\displaystyle  \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x)\ d\mu \ \ \ \ \ (1)

converges as {N \rightarrow \infty} to the expression

\displaystyle  (\int_X f(x)\ d\mu) (\int_X g(x)\ d\mu);

see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if {x} is drawn at random from {X} (using the probability measure {\mu}), and {n} is drawn at random from {\{1,\ldots,N\}} for some large {N}, then the pair {(x, T^n x)} becomes uniformly distributed in the product space {X \times X} (using product measure {\mu \times \mu}) in the limit as {N \rightarrow \infty}.

If we allow {(X,\mu)} to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let {{\mathcal X}^T} be the {T}-invariant measurable sets in {{\mathcal X}}; the {{\bf Z}}-system {(X, {\mathcal X}^T, \mu, T)} can then be viewed as a factor of the original system {(X, {\mathcal X}, \mu, T)}, which is equivalent (in the sense of measure-preserving systems) to a trivial system {(Z_0, {\mathcal Z}_0, \mu_{Z_0}, 1)} (known as the invariant factor) in which the shift is trivial. There is then a projection map {\pi_0: X \rightarrow Z_0} to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression

\displaystyle  \int_{Z_0} (\pi_0)_* f(z) (\pi_0)_* g(z)\ d\mu_{Z_0}(x), \ \ \ \ \ (2)

where {(\pi_0)_*: L^2(X,{\mathcal X},\mu) \rightarrow L^2(Z_0,{\mathcal Z}_0,\mu_{Z_0})} is the pushforward map associated to the map {\pi_0: X \rightarrow Z_0}; see e.g. this previous blog post. We can interpret this as an equidistribution result. If {(x,T^n x)} is a pair as before, then we no longer expect complete equidistribution in {X \times X} in the non-ergodic, because there are now non-trivial constraints relating {x} with {T^n x}; indeed, for any {T}-invariant function {f: X \rightarrow {\bf C}}, we have the constraint {f(x) = f(T^n x)}; putting all these constraints together we see that {\pi_0(x) = \pi_0(T^n x)} (for almost every {x}, at least). The limit (2) can be viewed as an assertion that this constraint {\pi_0(x) = \pi_0(T^n x)} are in some sense the “only” constraints between {x} and {T^n x}, and that the pair {(x,T^n x)} is uniformly distributed relative to these constraints.

Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression

\displaystyle  \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x)\ d\mu \ \ \ \ \ (3)

for three functions {f,g,h \in L^\infty(X, {\mathcal X}, \mu)}; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system {(X,{\mathcal X},\mu,T)} to be ergodic. Naively one might expect this limit to then converge to

\displaystyle  (\int_X f\ d\mu) (\int_X g\ d\mu) (\int_X h\ d\mu)

which would roughly speaking correspond to an assertion that the triplet {(x,T^n x, T^{2n} x)} is asymptotically equidistributed in {X \times X \times X}. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs {(x,T^n x)}, {(x, T^{2n} x)}. The key obstruction here is that of eigenfunctions of the shift {T: X \rightarrow X}, that is to say non-trivial functions {f: X \rightarrow S^1} that obey the eigenfunction equation {Tf = \lambda f} almost everywhere for some constant (or {T}-invariant) {\lambda}. Each such eigenfunction generates a constraint

\displaystyle  f(x) \overline{f(T^n x)}^2 f(T^{2n} x) = 1 \ \ \ \ \ (4)

tying together {x}, {T^n x}, and {T^{2n} x}. However, it turns out that these are in some sense the only constraints on {x,T^n x, T^{2n} x} that are relevant for the limit (3). More precisely, if one sets {{\mathcal X}_1} to be the sub-algebra of {{\mathcal X}} generated by the eigenfunctions of {T}, then it turns out that the factor {(X, {\mathcal X}_1, \mu, T)} is isomorphic to a shift system {(Z_1, {\mathcal Z}_1, \mu_{Z_1}, x \mapsto x+\alpha)} known as the Kronecker factor, for some compact abelian group {Z_1 = (Z_1,+)} and some (irrational) shift {\alpha \in Z_1}; the factor map {\pi_1: X \rightarrow Z_1} pushes eigenfunctions forward to (affine) characters on {Z_1}. It is then known that the limit of (3) is

\displaystyle  \int_\Sigma (\pi_1)_* f(x_0) (\pi_1)_* g(x_1) (\pi_1)_* h(x_2)\ d\mu_\Sigma

where {\Sigma \subset Z_1^3} is the closed subgroup

\displaystyle  \Sigma = \{ (x_1,x_2,x_3) \in Z_1^3: x_1-2x_2+x_3=0 \}

and {\mu_\Sigma} is the Haar probability measure on {\Sigma}; see this previous blog post. The equation {x_1-2x_2+x_3=0} defining {\Sigma} corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when {f=g=h} is non-negative and not identically vanishing.

If one considers a quadruple average

\displaystyle  \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x) k(T^{3n} x)\ d\mu \ \ \ \ \ (5)

(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions {f: X \rightarrow S^1}, which obey an eigenfunction equation {Tf = \lambda f} in which {\lambda} is no longer constant, but is now a linear eigenfunction. For such functions, {f(T^n x)} behaves quadratically in {n}, and one can compute the existence of a constraint

\displaystyle  f(x) \overline{f(T^n x)}^3 f(T^{2n} x)^3 \overline{f(T^{3n} x)} = 1 \ \ \ \ \ (6)

between {x}, {T^n x}, {T^{2n} x}, and {T^{3n} x} that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor {(Z_2, {\mathcal Z}_2, \mu_{Z_2}, S)} which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.

The above discussion was concerned with {{\bf Z}}-systems, but one can adapt much of the theory to measure-preserving {G}-systems for other discrete countable abelian groups {G}, in which one now has a family {(T_g)_{g \in G}} of shifts indexed by {G} rather than a single shift, obeying the compatibility relation {T_{g+h}=T_g T_h}. The role of the intervals {\{1,\ldots,N\}} in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian {G}, the theory for double averages (1) and triple limits (3) is essentially identical to the {{\bf Z}}-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary {G}, still not fully understood). However one model case which is now well understood is the finite field case when {G = {\bf F}_p^\omega = \bigcup_{n=1}^\infty {\bf F}_p^n} is an infinite-dimensional vector space over a finite field {{\bf F}_p} (with the finite subspaces {{\bf F}_p^n} then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if {x} is drawn at random from a {{\bf F}_p^\omega}-system and {n} drawn randomly from a large subspace of {{\bf F}_p^\omega}, then the only constraints between {x, T^n x, \ldots, T^{(p-1)n} x} are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for {{\bf Z}}-systems, was one of the main results of this paper of Host and Kra).

As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for {{\bf Z}}-systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set {A} in an ergodic {{\bf F}_p^\omega}-system and any {\epsilon>0}, one has

\displaystyle  \mu( T_{c_1 n} A \cap \ldots \cap T_{c_k n} A ) > \mu(A)^k - \epsilon

for a syndetic set of {n}, where {c_1,\ldots,c_k \in {\bf F}_p} are distinct residue classes. It turns out that Khintchine-type theorems always hold for {k=1,2,3} (and for {k=1,2} ergodicity is not required), and for {k=4} it holds whenever {c_1,c_2,c_3,c_4} form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger {k} we could show that the Khintchine property failed for generic choices of {c_1,\ldots,c_k}, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.

One of the basic problems in analytic number theory is to estimate sums of the form

\displaystyle  \sum_{p<x} f(p)

as {x \rightarrow \infty}, where {p} ranges over primes and {f} is some explicit function of interest (e.g. a linear phase function {f(p) = e^{2\pi i \alpha p}} for some real number {\alpha}). This is essentially the same task as obtaining estimates on the sum

\displaystyle  \sum_{n<x} \Lambda(n) f(n)

where {\Lambda} is the von Mangoldt function. If {f} is bounded, {f(n)=O(1)}, then from the prime number theorem one has the trivial bound

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = O(x)

but often (when {f} is somehow “oscillatory” in nature) one is seeking the refinement

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (1)

or equivalently

\displaystyle  \sum_{p<x} f(p) = o(\frac{x}{\log x}). \ \ \ \ \ (2)

Thanks to identities such as

\displaystyle  \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)

where {\mu} is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

\displaystyle  \sum_{n<x} \mu(n) f(n) = o(x). \ \ \ \ \ (4)

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of {\log x} before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When {f} is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of {f}. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

\displaystyle  \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),

the Type I sum

\displaystyle  -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),

and the error term {\sum_{d \leq V} \Lambda(n) f(n)}, whenever {1 \leq U, V \leq x} are parameters, and {a, b} are the sequences

\displaystyle  a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)

and

\displaystyle  b(m) := \sum_{d|m: d \leq U} \mu(d).

Similarly one can express (4) as the Type I sum

\displaystyle  -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)

and the error term {\sum_{d \leq UV} \mu(n) f(N)}, whenever {1 \leq U,V \leq x} with {UV \leq x}, and {c} is the sequence

\displaystyle  c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).

After eliminating troublesome sequences such as {a(), b(), c()} via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

\displaystyle  \sum_{r \leq y} f(rd)

or Type II sums such as

\displaystyle  \sum_{r \leq y} f(rd) \overline{f(rd')}

for various {y, d, d' \geq 1}. Here, the trivial bound is {O(y)}, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like {O( \frac{y}{\log^C y})} for some constant {C} (e.g. {C=5}) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of {f}. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)

for any distinct primes {p, q} (where the decay rate of the error term {o(x)} may depend on {p} and {q}). Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which {\mu} can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that {\sum_{n \leq x} f(n) = o(x)} if one has {\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)} for each fixed non-zero {h}.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

\displaystyle  \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)

for any irrational {\alpha}. (For rational {\alpha}, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then {\mu(n)} exhibits strong correlation with {f(n)}; by change of variables, we then expect {\mu(pn)} to correlate with {f(pn)} and {\mu(pm)} to correlate with {f(qn)}, for “typical” {p,q} at least. On the other hand, since {\mu} is multiplicative, {\mu(pn)} exhibits strong correlation with {\mu(qn)}. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if {P} is a “large” but finite set of primes (in the sense that the sum {A := \sum_{p \in P} \frac{1}{p}} is large), then for a typical large number {n} (much larger than the elements of {P}), the number of primes in {P} that divide {n} is pretty close to {A = \sum_{p \in P} \frac{1}{p}}:

\displaystyle  \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against {\mu(n) f(n)} and obtain an approximation

\displaystyle  \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)

that approximates a sum of {\mu(n) f(n)} by a bunch of sparser sums of {\mu(n) f(n)}. Since

\displaystyle  x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

\displaystyle  \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})

for all {p \in P} (or at least for “most” {p \in P}).

Now we make the change of variables {n = pm}. As the Möbius function is multiplicative, we usually have {\mu(n) = \mu(p) \mu(m) = - \mu(m)}. (There is an exception when {n} is divisible by {p^2}, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

\displaystyle  \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )

for most {p \in P}. However, by the hypothesis (5), the sequences {m \mapsto f(pm)} are asymptotically orthogonal as {p} varies, and this claim will then follow from a Cauchy-Schwarz argument.

Read the rest of this entry »

Tamar Ziegler and I have just uploaded to the arXiv our paper “The inverse conjecture for the Gowers norm over finite fields in low characteristic“, submitted to Annals of Combinatorics. This paper completes another case of the inverse conjecture for the Gowers norm, this time for vector spaces {{\bf F}^n} over a fixed finite field {{\bf F} = {\bf F}_p} of prime order; with Vitaly Bergelson, we had previously established this claim when the characteristic of the field was large, so the main new result here is the extension to the low characteristic case. (The case of a cyclic group {{\bf Z}/N{\bf Z}} or interval {[N]} was established by Ben Green and ourselves in another recent paper. For an arbitrary abelian (or nilpotent) group, a general but less explicit description of the obstructions to Gowers uniformity was recently obtained by Szegedy; the latter result recovers the high-characteristic case of our result (as was done in a subsequent paper of Szegedy), as well as our results with Green, but it is not immediately evident whether Szegedy’s description of the obstructions matches up with the one predicted by the inverse conjecture in low characteristic.)

The statement of the main theorem is as follows. Given a finite-dimensional vector space {V = {\bf F}^n} and a function {f: V \rightarrow {\bf C}}, and an integer {s \geq 0}, one can define the Gowers uniformity norm {\|f\|_{U^{s+1}(V)}} by the formula

\displaystyle  \|f\|_{U^{s+1}(V)} := \left( \mathop{\bf E}_{x,h_1,\ldots,h_{s+1} \in V} \Delta_{h_1} \ldots \Delta_{h_{s+1}} f(x) \right)^{1/2^{s+1}}

where {\Delta_h f(x) := f(x+h) \overline{f(x)}}. If {f} is bounded in magnitude by {1}, it is easy to see that {\|f\|_{U^{s+1}(V)}} is bounded by {1} also, with equality if and only if {f(x) = e(P)} for some non-classical polynomial {P: V \rightarrow {\bf R}/{\bf Z}} of degree at most {s}, where {e(x) := e^{2\pi ix}}, and a non-classical polynomial of degree at most {s} is a function whose {s+1^{th}} “derivatives” vanish in the sense that

\displaystyle  \partial_{h_1} \ldots \partial_{h_{s+1}} P(x) = 0

for all {x,h_1,\ldots,h_{s+1} \in V}, where {\partial_h P(x) := P(x+h) - P(x)}. Our result generalises this to the case when the uniformity norm is not equal to {1}, but is still bounded away from zero:

Theorem 1 (Inverse conjecture) Let {f: V \rightarrow {\bf C}} be bounded by {1} with {\|f\|_{U^{s+1}(V)} \geq \delta > 0} for some {s \geq 0}. Then there exists a non-classical polynomial {P: V \rightarrow {\bf R}/{\bf Z}} of degree at most {s} such that {|\langle f, e(P) \rangle_{L^2(V)}| := |{\bf E}_{x \in V} f(x) e(-P(x))| \geq c(s,p, \delta) > 0}, where {c(s,p, \delta)} is a positive quantity depending only on the indicated parameters.

This theorem is trivial for {s=0}, and follows easily from Fourier analysis for {s=1}. The case {s=2} was done in odd characteristic by Ben Green and myself, and in even characteristic by Samorodnitsky. In two papers, one with Vitaly Bergelson, we established this theorem in the “high characteristic” case when the characteristic {p} of {{\bf F}} was greater than {s} (in which case there is essentially no distinction between non-classical polynomials and their classical counterparts, as discussed previously on this blog). The need to deal with genuinely non-classical polynomials is the main new difficulty in this paper that was not dealt with in previous literature.

In our previous paper with Bergelson, a “weak” version of the above theorem was proven, in which the polynomial {P} in the conclusion had bounded degree {O_{s,p}(1)}, rather than being of degree at most {s}. In the current paper, we use this weak inverse theorem to reduce the inverse conjecture to a statement purely about polynomials:

Theorem 2 (Inverse conjecture for polynomials) Let {s \geq 0}, and let {P: V \rightarrow {\bf C}} be a non-classical polynomial of degree at most {s+1} such that {\|e(P)\|_{U^{s+1}(V)} \geq \delta > 0}. Then {P} has bounded rank in the sense that {P} is a function of {O_{s,p,\delta}(1)} polynomials of degree at most {s}.

This type of inverse theorem was first introduced by Bogdanov and Viola. The deduction of Theorem 1 from Theorem 2 and the weak inverse Gowers conjecture is fairly standard, so the main difficulty is to show Theorem 2.

The quantity {-\log_{|{\bf F}|} \|e(P)\|_{U^{s+1}(V)}^{1/2^{s+1}}} of a polynomial {P} of degree at most {s+1} was denoted the analytic rank of {P} by Gowers and Wolf. They observed that the analytic rank of {P} was closely related to the rank of {P}, defined as the least number of degree {s} polynomials needed to express {P}. For instance, in the quadratic case {s=1} the two ranks are identical (in odd characteristic, at least). For general {s}, it was easy to see that bounded rank implied bounded analytic rank; Theorem 2 is the converse statement.

We tried a number of ways to show that bounded analytic rank implied bounded rank, in particular spending a lot of time on ergodic-theoretic approaches, but eventually we settled on a “brute force” approach that relies on classifying those polynomials of bounded analytic rank as precisely as possible. The argument splits up into establishing three separate facts:

  1. (Classical case) If a classical polynomial has bounded analytic rank, then it has bounded rank.
  2. (Multiplication by {p}) If a non-classical polynomial {P} (of degree at most {s+1}) has bounded analytic rank, then {pP} (which can be shown to have degree at most {\max(s-p,0)}) also has bounded analytic rank.
  3. (Division by {p}) If {Q} is a non-clsasical polynomial of degree {\max(s-p,0)} of bounded rank, then there is a non-classical polynomial {P} of degree at most {s+1} of bounded rank such that {pQ=P}.

The multiplication by {p} and division by {p} facts allow one to easily extend the classical case of the theorem to the non-classical case of the theorem, basically because classical polynomials are the kernel of the multiplication-by-{p} homomorphism. Indeed, if {P} is a non-classical polynomial of bounded analytic rank of the right degree, then the multiplication by {p} claim tells us that {pP} also has bounded analytic rank, which by an induction hypothesis implies that {pP} has bounded rank. Applying the division by {p} claim, we find a bounded rank polynomial {P'} such that {pP = pP'}, thus {P} differs from {P'} by a classical polynomial, which necessarily has bounded analytic rank, hence bounded rank by the classical claim, and the claim follows.

Of the three claims, the multiplication-by-{p} claim is the easiest to prove using known results; after a bit of Fourier analysis, it turns out to follow more or less immediately from the multidimensional Szemerédi theorem over finite fields of Bergelson, Leibman, and McCutcheon (one can also use the density Hales-Jewett theorem here if one desires).

The next easiest claim is the classical case. Here, the idea is to analyse a degree {s+1} classical polynomial {P: V \rightarrow {\bf F}} via its derivative {d^{s+1} P: V^{s+1} \rightarrow {\bf F}}, defined by the formula

\displaystyle  d^{s+1} P( h_1,\ldots,h_{s+1}) := \partial_{h_1} \ldots \partial_{h_{s+1}} P(x)

for any {x,h_1,\ldots,h_{s+1} \in V} (the RHS is independent of {x} as {P} has degree {s+1}). This is a multilinear form, and if {P} has bounded analytic rank, this form is biased (in the sense that the mean of {e(d^{s+1} P)} is large). Applying a general equidistribution theorem of Kaufman and Lovett (based on this earlier paper of Green and myself) this implies that {d^{s+1} P} is a function of a bounded number of multilinear forms of lower degree. Using some “regularity lemma” theory to clean up these forms so that they have good equidistribution properties, it is possible to understand exactly how the original multilinear form {d^{s+1} P} depends on these lower degree forms; indeed, the description one eventually obtains is so explicit that one can write down by inspection another bounded rank polynomial {Q} such that {d^{s+1} P} is equal to {d^{s+1} Q}. Thus {P} differs from the bounded rank polynomial {Q} by a lower degree error, which is automatically of bounded rank also, and the claim follows.

The trickiest thing to establish is the division by {p} claim. The polynomial {Q} is some function {F(R_1,\ldots,R_m)} of lower degree polynomials {R_1,\ldots,R_m}. Ideally, one would like to find a function {F'(R_1,\ldots,R_m)} of the same polynomials with {pF' = F}, such that {F'(R_1,\ldots,R_m)} has the correct degree; however, we have counterexamples that show that this is not always possible. (These counterexamples are the main obstruction to making the ergodic theory approach work: in ergodic theory, one is only allowed to work with “measurable” functions, which are roughly analogous in this context to functions of the indicated polynomials {Q, R_1,\ldots,R_m} and their shifts.) To get around this we have to first apply a regularity lemma to place {R_1,\ldots,R_m} in a suitably equidistributed form (although the fact that {R_1,\ldots,R_m} may be non-classical leads to a rather messy and technical description of this equidistribution), and then we have to extend each {R_j} to a higher degree polynomial {R'_j} with {pR'_j = R_j}. There is a crucial “exact roots” property of polynomials that allows one to do this, with {R'_j} having degree exactly {p-1} higher than {R_j}. It turns out that it is possible to find a function {P = F'(R'_1,\ldots,R'_m)} of these extended polynomials that have the right degree and which solves the required equation {pP=Q}; this is established by classifying completely all functions of the equidistributed polynomials {R_1,\ldots,R_m} or {R'_1,\ldots,R'_m} that are of a given degree.

Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers U^{s+1}[N] norm“, which was previously announced on this blog.  We are still planning one final round of reviewing the preprint before submitting the paper, but it has gotten to the stage where we are comfortable with having the paper available on the arXiv.

The main result of the paper is to establish the inverse conjecture for the Gowers norm over the integers, which has a number of applications, in particular to counting solutions to various linear equations in primes.  In spirit, the proof of the paper follows the 21-page announcement that was uploaded previously.  However, for various rather annoying technical reasons, the 117-page paper has to devote a large amount of space to setting up various bits of auxiliary machinery (as well as a dozen or so pages worth of examples and discussion).  For instance, the announcement motivates many of the steps of the argument by heuristically identifying nilsequences n \mapsto F(g(n) \Gamma) with bracket polynomial phases such as n \mapsto e( \{ \alpha n \} \beta n ).  However, a rather significant amount of theory (which was already worked out to a large extent by Leibman) is needed to formalise the “bracket algebra” needed to manipulate such bracket polynomials and to connect them with nilsequences.  Furthermore, the “piecewise smooth” nature of bracket polynomials causes some technical issues with the equidistribution theory for these sequences.  Our original version of the paper (which was even longer than the current version) set out this theory.  But we eventually decided that it was best to eschew almost all use of bracket polynomials (except as motivation and examples), and run the argument almost entirely within the language of nilsequences, to keep the argument a bit more notationally focused (and to make the equidistribution theory easier to establish).  But this was not without a tradeoff; some statements that are almost trivially true for bracket polynomials, required some “nilpotent algebra” to convert to the language of nilsequences.  Here are some examples of this:

  1. It is intuitively clear that a bracket polynomial phase e(P(n)) of degree k in one variable n can be “multilinearised” to a polynomial e(Q(n_1,\ldots,n_k)) of multi-degree (1,\ldots,1) in k variables n_1,\ldots,n_k, such that e(P(n)) and e(Q(n,\ldots,n)) agree modulo lower order terms.  For instance, if e(P(n)) = e(\alpha n \{ \beta n \{ \gamma n \} \}) (so k=3), then one could take e(Q(n_1,n_2,n_3)) = e( \alpha n_1 \{ \beta n_2 \{ \gamma n_3 \} \}).   The analogue of this statement for nilsequences is true, but required a moderately complicated nilpotent algebra construction using the Baker-Campbell-Hausdorff formula.
  2. Suppose one has a bracket polynomial phase e(P_h(n)) of degree k in one variable n that depends on an additional parameter h, in such a way that exactly one of the coefficients in each monomial depends on h.  Furthermore, suppose this dependence is bracket linear in h.  Then it is intuitively clear that this phase can be rewritten (modulo lower order terms) as e( Q(h,n) ) where Q is a bracket polynomial of multidegree (1,k) in (h,n).  For instance, if e(P_h(n)) = e( \{ \alpha_h n \} \beta n ) and \alpha_h = \{\gamma h \} \delta, then we can take e(Q(h,n)) = e(\{ \{\gamma h\} \delta n\} \beta n ).  The nilpotent algebra analogue of this claim is true, but requires another moderately complicated nilpotent algebra construction based on semi-direct products.
  3. A bracket polynomial has a fairly visible concept of a “degree” (analogous to the corresponding notion for true polynomials), as well as a “rank” (which, roughly speaking measures the number of parentheses in the bracket monomials, plus one).  Thus, for instance, the bracket monomial \{\{ \alpha n^4 \} \beta n \} \gamma n^2 has degree 7 and rank 3.  Defining degree and rank for nilsequences requires one to generalise the notion of a (filtered) nilmanifold to one in which the lower central series is replaced by a filtration indexed by both the degree and the rank.

There are various other tradeoffs of this type in this paper.  For instance, nonstandard analysis tools were introduced to eliminate what would otherwise be quite a large number of epsilons and regularity lemmas to manage, at the cost of some notational overhead; and the piecewise discontinuities mentioned earlier were eliminated by the use of vector-valued nilsequences, though this again caused some further notational overhead.    These difficulties may be a sign that we do not yet have the “right” proof of this conjecture, but one will probably have to wait a few years before we get a proper amount of perspective and understanding on this circle of ideas and results.

Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv the note “An inverse theorem for the Gowers norm {U^{s+1}[N]} (announcement)“, not intended for publication. This is an announcement of our forthcoming solution of the inverse conjecture for the Gowers norm, which roughly speaking asserts that {U^{s+1}[N]} norm of a bounded function is large if and only if that function correlates with an {s}-step nilsequence of bounded complexity.

The full argument is quite lengthy (our most recent draft is about 90 pages long), but this is in large part due to the presence of various technical details which are necessary in order to make the argument fully rigorous. In this 20-page announcement, we instead sketch a heuristic proof of the conjecture, relying in a number of “cheats” to avoid the above-mentioned technical details. In particular:

  • In the announcement, we rely on somewhat vaguely defined terms such as “bounded complexity” or “linearly independent with respect to bounded linear combinations” or “equivalent modulo lower step errors” without specifying them rigorously. In the full paper we will use the machinery of nonstandard analysis to rigorously and precisely define these concepts.
  • In the announcement, we deal with the traditional linear nilsequences rather than the polynomial nilsequences that turn out to be better suited for finitary equidistribution theory, but require more notation and machinery in order to use.
  • In a similar vein, we restrict attention to scalar-valued nilsequences in the announcement, though due to topological obstructions arising from the twisted nature of the torus bundles used to build nilmanifolds, we will have to deal instead with vector-valued nilsequences in the main paper.
  • In the announcement, we pretend that nilsequences can be described by bracket polynomial phases, at least for the sake of making examples, although strictly speaking bracket polynomial phases only give examples of piecewise Lipschitz nilsequences rather than genuinely Lipschitz nilsequences.

With these cheats, it becomes possible to shorten the length of the argument substantially. Also, it becomes clearer that the main task is a cohomological one; in order to inductively deduce the inverse conjecture for a given step {s} from the conjecture for the preceding step {s-1}, the basic problem is to show that a certain (quasi-)cocycle is necessarily a (quasi-)coboundary. This in turn requires a detailed analysis of the top order and second-to-top order terms in the cocycle, which requires a certain amount of nilsequence equidistribution theory and additive combinatorics, as well as a “sunflower decomposition” to arrange the various nilsequences one encounters into a usable “normal form”.

It is often the case in modern mathematics that the informal heuristic way to explain an argument looks quite different (and is significantly shorter) than the way one would formally present the argument with all the details. This seems to be particularly true in this case; at a superficial level, the full paper has a very different set of notation than the announcement, and a lot of space is invested in setting up additional machinery that one can quickly gloss over in the announcement. We hope though that the announcement can provide a “road map” to help navigate the much longer paper to come.

Ben Green, Tamar Ziegler and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers U^4 norm“.  This paper establishes the next case of the inverse conjecture for the Gowers norm for the integers (after the U^3 case, which was done by Ben and myself a few years ago).  This conjecture has a number of combinatorial and number-theoretic consequences, for instance by combining this new inverse theorem with previous results, one can now get the correct asymptotic for the number of arithmetic progressions of primes of length five in any large interval [N] = \{1,\ldots,N\}.

To state the inverse conjecture properly requires a certain amount of notation.  Given a function f: {\Bbb Z} \to {\Bbb C} and a shift h \in {\Bbb Z}, define the multiplicative derivative

\Delta_h f(x) := f(x+h) \overline{f(x)}

and then define the Gowers U^{s+1}[N] norm of a function f: [N] \to {\Bbb C} to (essentially) be the quantity

\| f\|_{U^{s+1}[N]} := ({\Bbb E}_{h_1,\ldots,h_{s+1} \in [-N,N]} {\Bbb E}_{x \in [N]} |\Delta_{h_1} \ldots \Delta_{h_{s+1}} f(x)|)^{1/2^{s+1}},

where we extend f by zero outside of [N].  (Actually, we use a slightly different normalisation to ensure that the function 1 has a U^{s+1} norm of 1, but never mind this for now.)

Informally, the Gowers norm \|f\|_{U^{s+1}[N]} measures the amount of bias present in the (s+1)^{st} multiplicative derivatives of f.  In particular, if f = e(P) := e^{2\pi i P} for some polynomial P: {\Bbb Z} \to {\Bbb C}, then the (s+1)^{th} derivative of f is identically 1, and so is the Gowers norm.

However, polynomial phases are not the only functions with large Gowers norm.  For instance, consider the function f(n) := e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n ), which is what we call a quadratic bracket polynomial phase.  This function isn’t quite quadratic, but it is close enough to being quadratic (because one has the approximate linearity relationship \lfloor x+y \rfloor = \lfloor x \rfloor + \lfloor y \rfloor holding a good fraction of the time) that it turns out that third derivative is trivial fairly often, and the Gowers norm \|f\|_{U^3[N]} is comparable to 1.  This bracket polynomial phase can be modeled as a nilsequence n \mapsto F( g(n) \Gamma ), where n \mapsto g(n) \Gamma is a polynomial orbit on a nilmanifold G/\Gamma, which in this case has step 2.  (The function F is only piecewise smooth, due to the discontinuity in the floor function \lfloor \rfloor, so strictly speaking we would classify this as an almost nilsequence rather than a nilsequence, but let us ignore this technical issue here.)  In fact, there is a very close relationship between nilsequences and bracket polynomial phases, but I will detail this in a later post.

The inverse conjecture for the Gowers norm, GI(s), asserts that such nilsequences are the only obstruction to the Gowers norm being small.  Roughly speaking, it goes like this:

Inverse conjecture, GI(s). (Informal statement)  Suppose that f: [N] \to {\Bbb C} is bounded but has large U^{s+1}[N] norm.  Then there is an s-step nilsequence n \mapsto F( g(n) \Gamma ) of “bounded complexity” that correlates with f.

This conjecture is trivial for s=0, is a short consequence of Fourier analysis when s=1, and was proven for s=2 by Ben and myself.  In this paper we establish the s=3 case.  An equivalent formulation in this case is that any bounded function f of large U^4 norm must correlate with a “bracket cubic phase”, which is the product of a bounded number of phases from the following list

e( \alpha n^3 + \beta n^2 + \gamma n), e( \lfloor \alpha n \rfloor \beta n^2 ), e( \lfloor \alpha n \rfloor \lfloor \beta n \rfloor \gamma n ), e( \lfloor \alpha n \rfloor \beta n ) (*)

for various real numbers \alpha,\beta,\gamma.

It appears that our methods also work in higher step, though for technical reasons it is convenient to make a number of adjustments to our arguments to do so, most notably a switch from standard analysis to non-standard analysis, about which I hope to say more later.  But there are a number of simplifications available on the s=3 case which make the argument significantly shorter, and so we will be writing the higher s argument in a separate paper.

The arguments largely follow those for the s=2 case (which in turn are based on this paper of Gowers).  Two major new ingredients are a deployment of a normal form and equidistribution theory for bracket quadratic phases, and a combinatorial decomposition of frequency space which we call the sunflower decomposition.  I will sketch these ideas below the fold.

Read the rest of this entry »

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the uniformity seminorms associated with the action of F^\infty_p“. This paper establishes the ergodic inverse theorems that are needed in our other recent paper to establish the inverse conjecture for the Gowers norms over finite fields in high characteristic (and to establish a partial result in low characteristic), as follows:

Theorem. Let {\Bbb F} be a finite field of characteristic p.  Suppose that X = (X,{\mathcal B},\mu) is a probability space with an ergodic measure-preserving action (T_g)_{g \in {\Bbb F}^\omega} of {\Bbb F}^\omega.  Let f \in L^\infty(X) be such that the Gowers-Host-Kra seminorm \|f\|_{U^k(X)} (defined in a previous post) is non-zero.

  1. In the high-characteristic case p \geq k, there exists a phase polynomial g of degree <k (as defined in the previous post) such that |\int_X f \overline{g}\ d\mu| > 0.
  2. In general characteristic, there exists a phase polynomial of degree <C(k) for some C(k) depending only on k such that |\int_X f \overline{g}\ d\mu| > 0.

This theorem is closely analogous to a similar theorem of Host and Kra on ergodic actions of {\Bbb Z}, in which the role of phase polynomials is played by functions that arise from nilsystem factors of X.  Indeed, our arguments rely heavily on the machinery of Host and Kra.

The paper is rather technical (60+ pages!) and difficult to describe in detail here, but I will try to sketch out (in very broad brush strokes) what the key steps in the proof of part 2 of the theorem are.  (Part 1 is similar but requires a more delicate analysis at various stages, keeping more careful track of the degrees of various polynomials.)

Read the rest of this entry »

Let k \geq 0 be an integer.  The concept of a polynomial P: {\Bbb R} \to {\Bbb R} of one variable of degree <k (or \leq k-1) can be defined in one of two equivalent ways:

  • (Global definition) P: {\Bbb R} \to {\Bbb R} is a polynomial of degree <k iff it can be written in the form P(x) = \sum_{0 \leq j < k} c_j x^j for some coefficients c_j \in {\Bbb R}.
  • (Local definition) P: {\Bbb R} \to {\Bbb R} is a polynomial of degree <k if it is k-times continuously differentiable and \frac{d^k}{dx^k} P \equiv 0.

From single variable calculus we know that if P is a polynomial in the global sense, then it is a polynomial in the local sense; conversely, if P is a polynomial in the local sense, then from the Taylor series expansion

\displaystyle P(x) = \sum_{0 \leq j < k} \frac{P^{(j)}(0)}{j!} x^j

we see that P is a polynomial in the global sense. We make the trivial remark that we have no difficulty dividing by j! here, because the field {\Bbb R} is of characteristic zero.

The above equivalence carries over to higher dimensions:

  • (Global definition) P: {\Bbb R}^n \to {\Bbb R} is a polynomial of degree <k iff it can be written in the form P(x_1,\ldots,x_n) = \sum_{0 \leq j_1,\ldots,j_n; j_1+\ldots+j_n < k} c_{j_1,\ldots,j_n} x_1^{j_1} \ldots x_n^{j_n} for some coefficients c_{j_1,\ldots,j_n} \in {\Bbb R}.
  • (Local definition) P: {\Bbb R}^n \to {\Bbb R} is a polynomial of degree <k if it is k-times continuously differentiable and (h_1 \cdot \nabla) \ldots (h_k \cdot \nabla) P \equiv 0 for all h_1,\ldots,h_k \in {\Bbb R}^n.

Again, it is not difficult to use several variable calculus to show that these two definitions of a polynomial are equivalent.

The purpose of this (somewhat technical) post here is to record some basic analogues of the above facts in finite characteristic, in which the underlying domain of the polynomial P is F or F^n for some finite field F.  In the “classical” case when the range of P is also the field F, it is a well-known fact (which we reproduce here) that the local and global definitions of polynomial are equivalent.  But in the “non-classical” case, when P ranges in a more general group (and in particular in the unit circle {\Bbb R}/{\Bbb Z}), the global definition needs to be corrected somewhat by adding some new monomials to the classical ones x_1^{j_1} \ldots x_n^{j_n}.  Once one does this, one can recover the equivalence between the local and global definitions.

(The results here are derived from forthcoming work with Vitaly Bergelson and Tamar Ziegler.)

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,958 other followers