You are currently browsing the tag archive for the ‘Tamar Ziegler’ tag.
Tamar Ziegler and I have just uploaded to the arXiv our paper “Infinite partial sumsets in the primes“. This is a short paper inspired by a recent result of Kra, Moreira, Richter, and Robertson (discussed for instance in this Quanta article from last December) showing that for any set of natural numbers of positive upper density, there exists a sequence
of natural numbers and a shift
such that
for all
this answers a question of Erdős). In view of the “transference principle“, it is then plausible to ask whether the same result holds if
is replaced by the primes. We can show the following results:
Theorem 1
- (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence
of primes such that
is prime for all
.
- (ii) Unconditionally, there exist increasing sequences
and
of natural numbers such that
is prime for all
.
- (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).
We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but finite sequences of primes such that
is prime for all
. (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences
and
of natural numbers such that
is prime for all
.
The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given , there exist infinitely many
-tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.
Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence recursively by repeated application of the prime tuples conjecture.
The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples for which there are infinitely many
with
all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence
such that every initial segment
is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:
Lemma 2 (Bergelson intersectivity lemma) Letbe subsets of a probability space
of measure uniformly bounded away from zero, thus
. Then there exists a subsequence
such that
for all
.
This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from , one can assume that all finite intersections
are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function
has a positive integral, hence must be positive at some point
. Thus there is a subsequence
whose finite intersections all contain
, thus have positive measure as desired by the previous reduction.
It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events that show up (which roughly correspond to the event that
is prime for some random number
(with a well-chosen probability distribution) and some shift
) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts
into various clusters
, chosen in such a way that the probability that at least one of
is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.
Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function . One form of this conjecture is the assertion that
The conjecture gets more difficult as increases, and also becomes more difficult the more slowly
grows with
. The
conjecture is equivalent to the assertion
For , the conjecture is equivalent to the assertion
Now we apply the same strategy to (4). For abelian the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence
attached to many
, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation
We give two applications of this higher order Fourier uniformity. One regards the growth of the number
The second application is to obtain cancellation for various polynomial averages involving the Liouville function or von Mangoldt function
, such as
Tamar Ziegler and I have just uploaded to the arXiv two related papers: “Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteoristic factors” and “polynomial patterns in primes“, with the former developing a “quantitative Bessel inequality” for local Gowers norms that is crucial in the latter.
We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a joint direction. A trivial example of such a concatenation theorem is the following: if a function is constant in the first variable (thus
is constant for each
), and also constant in the second variable (thus
is constant for each
), then it is constant in the joint variable
. A slightly less trivial example: if a function
is affine-linear in the first variable (thus, for each
, there exist
such that
for all
) and affine-linear in the second variable (thus, for each
, there exist
such that
for all
) then
is a quadratic polynomial in
; in fact it must take the form
for some real numbers . (This can be seen for instance by using the affine linearity in
to show that the coefficients
are also affine linear.)
The same phenomenon extends to higher degree polynomials. Given a function from one additive group
to another, we say that
is of degree less than
along a subgroup
of
if all the
-fold iterated differences of
along directions in
vanish, that is to say
for all and
, where
is the difference operator
(We adopt the convention that the only of degree less than
is the zero function.)
We then have the following simple proposition:
Proposition 1 (Concatenation of polynomiality) Let
be of degree less than
along one subgroup
of
, and of degree less than
along another subgroup
of
, for some
. Then
is of degree less than
along the subgroup
of
.
Note the previous example was basically the case when ,
,
,
, and
.
Proof: The claim is trivial for or
(in which
is constant along
or
respectively), so suppose inductively
and the claim has already been proven for smaller values of
.
We take a derivative in a direction along
to obtain
where is the shift of
by
. Then we take a further shift by a direction
to obtain
leading to the cocycle equation
Since has degree less than
along
and degree less than
along
,
has degree less than
along
and less than
along
, so is degree less than
along
by induction hypothesis. Similarly
is also of degree less than
along
. Combining this with the cocycle equation we see that
is of degree less than
along
for any
, and hence
is of degree less than
along
, as required.
While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:
- (i) One should perform induction on the degrees
involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree
along some subgroup
of directions iff all of its first derivatives along
are of degree less than
).
- (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function
is of degree less than
along some subgroup
, then any derivative
of
is also of degree less than
along
, even if
does not belong to
.
Here is another simple example of a concatenation theorem. Suppose an at most countable additive group acts by measure-preserving shifts
on some probability space
; we call the pair
(or more precisely
) a
-system. We say that a function
is a generalised eigenfunction of degree less than
along some subgroup
of
and some
if one has
almost everywhere for all , and some functions
of degree less than
along
, with the convention that a function has degree less than
if and only if it is equal to
. Thus for instance, a function
is an generalised eigenfunction of degree less than
along
if it is constant on almost every
-ergodic component of
, and is a generalised function of degree less than
along
if it is an eigenfunction of the shift action on almost every
-ergodic component of
. A basic example of a higher order eigenfunction is the function
on the skew shift
with
action given by the generator
for some irrational
. One can check that
for every integer
, where
is a generalised eigenfunction of degree less than
along
, so
is of degree less than
along
.
We then have
Proposition 2 (Concatenation of higher order eigenfunctions) Let
be a
-system, and let
be a generalised eigenfunction of degree less than
along one subgroup
of
, and a generalised eigenfunction of degree less than
along another subgroup
of
, for some
. Then
is a generalised eigenfunction of degree less than
along the subgroup
of
.
The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than along
is preserved by multiplication and shifts, as well as the operation of “taking derivatives”
even along directions
that do not lie in
. (To prove this latter claim, one should restrict to the region where
is non-zero, and then divide
by
to locate
.)
A typical example of this proposition in action is as follows: consider the -system given by the
-torus
with generating shifts
for some irrational , which can be checked to give a
action
The function can then be checked to be a generalised eigenfunction of degree less than
along
, and also less than
along
, and less than
along
. One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).
The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the Host-Kra characteristic factors of a
-system
along a subgroup
. These factors can be defined in a number of ways. One is by duality, using the Gowers-Host-Kra uniformity seminorms (defined for instance here)
. Namely,
is the factor of
defined up to equivalence by the requirement that
An equivalent definition is in terms of the dual functions of
along
, which can be defined recursively by setting
and
where denotes the ergodic average along a Følner sequence in
(in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor
can then be alternately defined as the factor generated by the dual functions
for
.
In the case when and
is
-ergodic, a deep theorem of Host and Kra shows that the factor
is equivalent to the inverse limit of nilsystems of step less than
. A similar statement holds with
replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when
is not
-ergodic, or when
is
-ergodic but
is a proper subgroup of
acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).
One of our main theorems is then
Proposition 3 (Concatenation of characteristic factors) Let
be a
-system, and let
be measurable with respect to the factor
and with respect to the factor
for some
and some subgroups
of
. Then
is also measurable with respect to the factor
.
We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type (along a subgroup
)”, which can be used to inductively describe the factors
, and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small
norm, and also to all bounded functions of small
norm, is also nearly orthogonal to alll bounded functions of small
norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions
obey a property analogous to being a generalised eigenfunction, namely that
where and
is a “structured function of order
” along
. (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order
.) Again, the point (ii) above is crucial, and in particular it is key that any structure that
has is inherited by the associated functions
and
. This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and
-algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.
For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in norm can be split into a component that is small in
norm, and a component that is small in
norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of
, can be decomposed as the sum of a function that has mean zero on every
coset, and a function that has mean zero on every
coset. This is dual to the assertion that a function that is constant on every
coset and constant on every
coset, is constant on every
coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups
and a bounded function is small in
norm for most
, then it is also small in
norm for most
. (Here is a baby version one may wish to warm up on: if a function
has small mean on
for some large prime
, then it has small mean on most of the cosets of most of the one-dimensional subgroups of
.)
There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups are replaced by more general coset progressions
(of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as
by “global” Gowers uniformity norms such as
. This turns out to be particularly useful when attempting to compute polynomial averages such as
for various functions . After repeated use of the van der Corput lemma, one can control such averages by expressions such as
(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various Gowers uniformity norms of
along arithmetic progressions of the form
for various
. Using the above Bessel inequality, this can be controlled in turn by an average of various
Gowers uniformity norms along rank two generalised arithmetic progressions of the form
for various
. But for generic
, this rank two progression is close in a certain technical sense to the “global” interval
(this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of global Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as
or
where and
are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).
By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:
Theorem 4 (Polynomial patterns in the primes) Let
be polynomials of degree at most
, whose degree
coefficients are all distinct, for some
. Suppose that
is admissible in the sense that for every prime
, there are
such that
are all coprime to
. Then there exist infinitely many pairs
of natural numbers such that
are prime.
Furthermore, we obtain an asymptotic for the number of such pairs in the range
,
(actually for minor technical reasons we reduce the range of
to be very slightly less than
). In fact one could in principle obtain asymptotics for smaller values of
, and relax the requirement that the degree
coefficients be distinct with the requirement that no two of the
differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form
unconditionally for
, and conditionally on GRH for all
, using known results on primes in short intervals on average.
The case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher
, an older result of Tamar and myself was able to tackle the case when
(though our results there only give lower bounds on the number of pairs
, and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.
A few years ago, Ben Green, Tamar Ziegler, and myself proved the following (rather technical-looking) inverse theorem for the Gowers norms:
Theorem 1 (Discrete inverse theorem for Gowers norms) Let
and
be integers, and let
. Suppose that
is a function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
For the definitions of “filtered nilmanifold”, “degree”, “complexity”, and “polynomial sequence”, see the paper of Ben, Tammy, and myself. (I should caution the reader that this blog post will presume a fair amount of familiarity with this subfield of additive combinatorics.) This result has a number of applications, for instance to establishing asymptotics for linear equations in the primes, but this will not be the focus of discussion here.
The purpose of this post is to record the observation that this “discrete” inverse theorem, together with an equidistribution theorem for nilsequences that Ben and I worked out in a separate paper, implies a continuous version:
Theorem 2 (Continuous inverse theorem for Gowers norms) Let
be an integer, and let
. Suppose that
is a measurable function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a (smooth) polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
The interval can be easily replaced with any other fixed interval by a change of variables. A key point here is that the bounds are completely uniform in the choice of
. Note though that the coefficients of
can be arbitrarily large (and this is necessary, as can be seen just by considering functions of the form
for some arbitrarily large frequency
).
It is likely that one could prove Theorem 2 by carefully going through the proof of Theorem 1 and replacing all instances of with
(and making appropriate modifications to the argument to accommodate this). However, the proof of Theorem 1 is quite lengthy. Here, we shall proceed by the usual limiting process of viewing the continuous interval
as a limit of the discrete interval
as
. However there will be some problems taking the limit due to a failure of compactness, and specifically with regards to the coefficients of the polynomial sequence
produced by Theorem 1, after normalising these coefficients by
. Fortunately, a factorisation theorem from a paper of Ben Green and myself resolves this problem by splitting
into a “smooth” part which does enjoy good compactness properties, as well as “totally equidistributed” and “periodic” parts which can be eliminated using the measurability (and thus, approximate smoothness), of
.
Tamar Ziegler and I have just uploaded to the arXiv our paper “Narrow progressions in the primes“, submitted to the special issue “Analytic Number Theory” in honor of the 60th birthday of Helmut Maier. The results here are vaguely reminiscent of the recent progress on bounded gaps in the primes, but use different methods.
About a decade ago, Ben Green and I showed that the primes contained arbitrarily long arithmetic progressions: given any , one could find a progression
with
consisting entirely of primes. In fact we showed the same statement was true if the primes were replaced by any subset of the primes of positive relative density.
A little while later, Tamar Ziegler and I obtained the following generalisation: given any and any polynomials
with
, one could find a “polynomial progression”
with
consisting entirely of primes. Furthermore, we could make this progression somewhat “narrow” by taking
(where
denotes a quantity that goes to zero as
goes to infinity). Again, the same statement also applies if the primes were replaced by a subset of positive relative density. My previous result with Ben corresponds to the linear case
.
In this paper we were able to make the progressions a bit narrower still: given any and any polynomials
with
, one could find a “polynomial progression”
with
consisting entirely of primes, and such that
, where
depends only on
and
(in fact it depends only on
and the degrees of
). The result is still true if the primes are replaced by a subset of positive density
, but unfortunately in our arguments we must then let
depend on
. However, in the linear case
, we were able to make
independent of
(although it is still somewhat large, of the order of
).
The polylogarithmic factor is somewhat necessary: using an upper bound sieve, one can easily construct a subset of the primes of density, say, , whose arithmetic progressions
of length
all obey the lower bound
. On the other hand, the prime tuples conjecture predicts that if one works with the actual primes rather than dense subsets of the primes, then one should have infinitely many length
arithmetic progressions of bounded width for any fixed
. The
case of this is precisely the celebrated theorem of Yitang Zhang that was the focus of the recently concluded Polymath8 project here. The higher
case is conjecturally true, but appears to be out of reach of known methods. (Using the multidimensional Selberg sieve of Maynard, one can get
primes inside an interval of length
, but this is such a sparse set of primes that one would not expect to find even a progression of length three within such an interval.)
The argument in the previous paper was unable to obtain a polylogarithmic bound on the width of the progressions, due to the reliance on a certain technical “correlation condition” on a certain Selberg sieve weight . This correlation condition required one to control arbitrarily long correlations of
, which was not compatible with a bounded value of
(particularly if one wanted to keep
independent of
).
However, thanks to recent advances in this area by Conlon, Fox, and Zhao (who introduced a very nice “densification” technique), it is now possible (in principle, at least) to delete this correlation condition from the arguments. Conlon-Fox-Zhao did this for my original theorem with Ben; and in the current paper we apply the densification method to our previous argument to similarly remove the correlation condition. This method does not fully eliminate the need to control arbitrarily long correlations, but allows most of the factors in such a long correlation to be bounded, rather than merely controlled by an unbounded weight such as . This turns out to be significantly easier to control, although in the non-linear case we still unfortunately had to make
large compared to
due to a certain “clearing denominators” step arising from the complicated nature of the Gowers-type uniformity norms that we were using to control polynomial averages. We believe though that this an artefact of our method, and one should be able to prove our theorem with an
that is uniform in
.
Here is a simple instance of the densification trick in action. Suppose that one wishes to establish an estimate of the form
for some real-valued functions which are bounded in magnitude by a weight function
, but which are not expected to be bounded; this average will naturally arise when trying to locate the pattern
in a set such as the primes. Here I will be vague as to exactly what range the parameters
are being averaged over. Suppose that the factor
(say) has enough uniformity that one can already show a smallness bound
whenever are bounded functions. (One should think of
as being like the indicator functions of “dense” sets, in contrast to
which are like the normalised indicator functions of “sparse” sets). The bound (2) cannot be directly applied to control (1) because of the unbounded (or “sparse”) nature of
and
. However one can “densify”
and
as follows. Since
is bounded in magnitude by
, we can bound the left-hand side of (1) as
The weight function will be normalised so that
, so by the Cauchy-Schwarz inequality it suffices to show that
The left-hand side expands as
Now, it turns out that after an enormous (but finite) number of applications of the Cauchy-Schwarz inequality to steadily eliminate the factors, as well as a certain “polynomial forms condition” hypothesis on
, one can show that
(Because of the polynomial shifts, this requires a method known as “PET induction”, but let me skip over this point here.) In view of this estimate, we now just need to show that
Now we can reverse the previous steps. First, we collapse back to
One can bound by
, which can be shown to be “bounded on average” in a suitable sense (e.g. bounded
norm) via the aforementioned polynomial forms condition. Because of this and the Hölder inequality, the above estimate is equivalent to
By setting to be the signum of
, this is equivalent to
This is halfway between (1) and (2); the sparsely supported function has been replaced by its “densification”
, but we have not yet densified
to
. However, one can shift
by
and repeat the above arguments to achieve a similar densificiation of
, at which point one has reduced (1) to (2).
Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:
Theorem 1 (Szemerédi’s theorem in the primes) Let
be a subset of the primes
of positive relative density, thus
. Then
contains arbitrarily long arithmetic progressions.
This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.
Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:
Theorem 2 (Multidimensional Szemerédi theorem in the primes) Let
, and let
be a subset of the
Cartesian power
of the primes of positive relative density, thus
Then for any
,
contains infinitely many “constellations” of the form
with
and
a positive integer.
In the case when is itself a Cartesian product of one-dimensional sets (in particular, if
is all of
), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.
The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ) of almost primes or almost Gaussian primes, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the linear forms condition and the correlation condition. Very roughly speaking, these conditions assert statements of the following form: if
is a randomly selected integer, then the events of
simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of
. Once these conditions are satisfied, one can then run a transference argument (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.
However, when one tries to adapt these arguments to sets such as , a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square
of the almost primes – pairs
whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as
that is concentrated on a set such as
, but let me ignore this distinction for now.) However, this set
does not enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed
, and random
, the four events
do not behave independently (as they would if were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form
) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for
(or for weights concentrated on
) when applied to rectangular patterns.
There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the result of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.
In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an infinite number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire -algebra; for this, it is not enough to specify the measure
of a single set such as
, but one also has to specify the measure
of “cylinder sets” such as
where
could be arbitrarily large. The larger
gets, the more linear forms conditions one needs to keep the correspondence under control.
With the sieve weights we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function
) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure
of cylinder sets, with each
requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)
Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to -actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if
is a measure-preserving
-system (which, in this post, means that
is a probability space and
is measure-preserving and invertible, thus giving an action
of the integers), and
are functions, and
is ergodic (which means that
contains no
-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average
converges as to the expression
see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if is drawn at random from
(using the probability measure
), and
is drawn at random from
for some large
, then the pair
becomes uniformly distributed in the product space
(using product measure
) in the limit as
.
If we allow to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let
be the
-invariant measurable sets in
; the
-system
can then be viewed as a factor of the original system
, which is equivalent (in the sense of measure-preserving systems) to a trivial system
(known as the invariant factor) in which the shift is trivial. There is then a projection map
to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression
where is the pushforward map associated to the map
; see e.g. this previous blog post. We can interpret this as an equidistribution result. If
is a pair as before, then we no longer expect complete equidistribution in
in the non-ergodic, because there are now non-trivial constraints relating
with
; indeed, for any
-invariant function
, we have the constraint
; putting all these constraints together we see that
(for almost every
, at least). The limit (2) can be viewed as an assertion that this constraint
are in some sense the “only” constraints between
and
, and that the pair
is uniformly distributed relative to these constraints.
Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression
for three functions ; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system
to be ergodic. Naively one might expect this limit to then converge to
which would roughly speaking correspond to an assertion that the triplet is asymptotically equidistributed in
. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs
,
. The key obstruction here is that of eigenfunctions of the shift
, that is to say non-trivial functions
that obey the eigenfunction equation
almost everywhere for some constant (or
-invariant)
. Each such eigenfunction generates a constraint
tying together ,
, and
. However, it turns out that these are in some sense the only constraints on
that are relevant for the limit (3). More precisely, if one sets
to be the sub-algebra of
generated by the eigenfunctions of
, then it turns out that the factor
is isomorphic to a shift system
known as the Kronecker factor, for some compact abelian group
and some (irrational) shift
; the factor map
pushes eigenfunctions forward to (affine) characters on
. It is then known that the limit of (3) is
where is the closed subgroup
and is the Haar probability measure on
; see this previous blog post. The equation
defining
corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when
is non-negative and not identically vanishing.
If one considers a quadruple average
(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions , which obey an eigenfunction equation
in which
is no longer constant, but is now a linear eigenfunction. For such functions,
behaves quadratically in
, and one can compute the existence of a constraint
between ,
,
, and
that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor
which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.
The above discussion was concerned with -systems, but one can adapt much of the theory to measure-preserving
-systems for other discrete countable abelian groups
, in which one now has a family
of shifts indexed by
rather than a single shift, obeying the compatibility relation
. The role of the intervals
in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian
, the theory for double averages (1) and triple limits (3) is essentially identical to the
-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary
, still not fully understood). However one model case which is now well understood is the finite field case when
is an infinite-dimensional vector space over a finite field
(with the finite subspaces
then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if
is drawn at random from a
-system and
drawn randomly from a large subspace of
, then the only constraints between
are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for
-systems, was one of the main results of this paper of Host and Kra).
As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for -systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set
in an ergodic
-system and any
, one has
for a syndetic set of , where
are distinct residue classes. It turns out that Khintchine-type theorems always hold for
(and for
ergodicity is not required), and for
it holds whenever
form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger
we could show that the Khintchine property failed for generic choices of
, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.
One of the basic problems in analytic number theory is to estimate sums of the form
as , where
ranges over primes and
is some explicit function of interest (e.g. a linear phase function
for some real number
). This is essentially the same task as obtaining estimates on the sum
where is the von Mangoldt function. If
is bounded,
, then from the prime number theorem one has the trivial bound
but often (when is somehow “oscillatory” in nature) one is seeking the refinement
where is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form
Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.
When is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of
. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum
the Type I sum
the Type II sum
and the error term , whenever
are parameters, and
are the sequences
and
Similarly one can express (4) as the Type I sum
the Type II sum
and the error term , whenever
with
, and
is the sequence
After eliminating troublesome sequences such as via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as
or Type II sums such as
for various . Here, the trivial bound is
, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like
for some constant
(e.g.
) in order to end up with an asymptotic such as (1) or (4).
However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of . A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:
Proposition 1 (Orthogonality criterion) Let
be a bounded function such that
for any distinct primes
(where the decay rate of the error term
may depend on
and
). Then
Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that
if one has
for each fixed non-zero
.
As a sample application, Proposition 1 easily gives a proof of the asymptotic
for any irrational . (For rational
, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).
Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then exhibits strong correlation with
; by change of variables, we then expect
to correlate with
and
to correlate with
, for “typical”
at least. On the other hand, since
is multiplicative,
exhibits strong correlation with
. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.
I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if is a “large” but finite set of primes (in the sense that the sum
is large), then for a typical large number
(much larger than the elements of
), the number of primes in
that divide
is pretty close to
:
A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.
In particular, one can sum (7) against and obtain an approximation
that approximates a sum of by a bunch of sparser sums of
. Since
we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates
for all (or at least for “most”
).
Now we make the change of variables . As the Möbius function is multiplicative, we usually have
. (There is an exception when
is divisible by
, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that
for most . However, by the hypothesis (5), the sequences
are asymptotically orthogonal as
varies, and this claim will then follow from a Cauchy-Schwarz argument.
Tamar Ziegler and I have just uploaded to the arXiv our paper “The inverse conjecture for the Gowers norm over finite fields in low characteristic“, submitted to Annals of Combinatorics. This paper completes another case of the inverse conjecture for the Gowers norm, this time for vector spaces over a fixed finite field
of prime order; with Vitaly Bergelson, we had previously established this claim when the characteristic of the field was large, so the main new result here is the extension to the low characteristic case. (The case of a cyclic group
or interval
was established by Ben Green and ourselves in another recent paper. For an arbitrary abelian (or nilpotent) group, a general but less explicit description of the obstructions to Gowers uniformity was recently obtained by Szegedy; the latter result recovers the high-characteristic case of our result (as was done in a subsequent paper of Szegedy), as well as our results with Green, but it is not immediately evident whether Szegedy’s description of the obstructions matches up with the one predicted by the inverse conjecture in low characteristic.)
The statement of the main theorem is as follows. Given a finite-dimensional vector space and a function
, and an integer
, one can define the Gowers uniformity norm
by the formula
where . If
is bounded in magnitude by
, it is easy to see that
is bounded by
also, with equality if and only if
for some non-classical polynomial
of degree at most
, where
, and a non-classical polynomial of degree at most
is a function whose
“derivatives” vanish in the sense that
for all , where
. Our result generalises this to the case when the uniformity norm is not equal to
, but is still bounded away from zero:
Theorem 1 (Inverse conjecture) Let
be bounded by
with
for some
. Then there exists a non-classical polynomial
of degree at most
such that
, where
is a positive quantity depending only on the indicated parameters.
This theorem is trivial for , and follows easily from Fourier analysis for
. The case
was done in odd characteristic by Ben Green and myself, and in even characteristic by Samorodnitsky. In two papers, one with Vitaly Bergelson, we established this theorem in the “high characteristic” case when the characteristic
of
was greater than
(in which case there is essentially no distinction between non-classical polynomials and their classical counterparts, as discussed previously on this blog). The need to deal with genuinely non-classical polynomials is the main new difficulty in this paper that was not dealt with in previous literature.
In our previous paper with Bergelson, a “weak” version of the above theorem was proven, in which the polynomial in the conclusion had bounded degree
, rather than being of degree at most
. In the current paper, we use this weak inverse theorem to reduce the inverse conjecture to a statement purely about polynomials:
Theorem 2 (Inverse conjecture for polynomials) Let
, and let
be a non-classical polynomial of degree at most
such that
. Then
has bounded rank in the sense that
is a function of
polynomials of degree at most
.
This type of inverse theorem was first introduced by Bogdanov and Viola. The deduction of Theorem 1 from Theorem 2 and the weak inverse Gowers conjecture is fairly standard, so the main difficulty is to show Theorem 2.
The quantity of a polynomial
of degree at most
was denoted the analytic rank of
by Gowers and Wolf. They observed that the analytic rank of
was closely related to the rank of
, defined as the least number of degree
polynomials needed to express
. For instance, in the quadratic case
the two ranks are identical (in odd characteristic, at least). For general
, it was easy to see that bounded rank implied bounded analytic rank; Theorem 2 is the converse statement.
We tried a number of ways to show that bounded analytic rank implied bounded rank, in particular spending a lot of time on ergodic-theoretic approaches, but eventually we settled on a “brute force” approach that relies on classifying those polynomials of bounded analytic rank as precisely as possible. The argument splits up into establishing three separate facts:
- (Classical case) If a classical polynomial has bounded analytic rank, then it has bounded rank.
- (Multiplication by
) If a non-classical polynomial
(of degree at most
) has bounded analytic rank, then
(which can be shown to have degree at most
) also has bounded analytic rank.
- (Division by
) If
is a non-clsasical polynomial of degree
of bounded rank, then there is a non-classical polynomial
of degree at most
of bounded rank such that
.
The multiplication by and division by
facts allow one to easily extend the classical case of the theorem to the non-classical case of the theorem, basically because classical polynomials are the kernel of the multiplication-by-
homomorphism. Indeed, if
is a non-classical polynomial of bounded analytic rank of the right degree, then the multiplication by
claim tells us that
also has bounded analytic rank, which by an induction hypothesis implies that
has bounded rank. Applying the division by
claim, we find a bounded rank polynomial
such that
, thus
differs from
by a classical polynomial, which necessarily has bounded analytic rank, hence bounded rank by the classical claim, and the claim follows.
Of the three claims, the multiplication-by- claim is the easiest to prove using known results; after a bit of Fourier analysis, it turns out to follow more or less immediately from the multidimensional Szemerédi theorem over finite fields of Bergelson, Leibman, and McCutcheon (one can also use the density Hales-Jewett theorem here if one desires).
The next easiest claim is the classical case. Here, the idea is to analyse a degree classical polynomial
via its derivative
, defined by the formula
for any (the RHS is independent of
as
has degree
). This is a multilinear form, and if
has bounded analytic rank, this form is biased (in the sense that the mean of
is large). Applying a general equidistribution theorem of Kaufman and Lovett (based on this earlier paper of Green and myself) this implies that
is a function of a bounded number of multilinear forms of lower degree. Using some “regularity lemma” theory to clean up these forms so that they have good equidistribution properties, it is possible to understand exactly how the original multilinear form
depends on these lower degree forms; indeed, the description one eventually obtains is so explicit that one can write down by inspection another bounded rank polynomial
such that
is equal to
. Thus
differs from the bounded rank polynomial
by a lower degree error, which is automatically of bounded rank also, and the claim follows.
The trickiest thing to establish is the division by claim. The polynomial
is some function
of lower degree polynomials
. Ideally, one would like to find a function
of the same polynomials with
, such that
has the correct degree; however, we have counterexamples that show that this is not always possible. (These counterexamples are the main obstruction to making the ergodic theory approach work: in ergodic theory, one is only allowed to work with “measurable” functions, which are roughly analogous in this context to functions of the indicated polynomials
and their shifts.) To get around this we have to first apply a regularity lemma to place
in a suitably equidistributed form (although the fact that
may be non-classical leads to a rather messy and technical description of this equidistribution), and then we have to extend each
to a higher degree polynomial
with
. There is a crucial “exact roots” property of polynomials that allows one to do this, with
having degree exactly
higher than
. It turns out that it is possible to find a function
of these extended polynomials that have the right degree and which solves the required equation
; this is established by classifying completely all functions of the equidistributed polynomials
or
that are of a given degree.
Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers U^{s+1}[N] norm“, which was previously announced on this blog. We are still planning one final round of reviewing the preprint before submitting the paper, but it has gotten to the stage where we are comfortable with having the paper available on the arXiv.
The main result of the paper is to establish the inverse conjecture for the Gowers norm over the integers, which has a number of applications, in particular to counting solutions to various linear equations in primes. In spirit, the proof of the paper follows the 21-page announcement that was uploaded previously. However, for various rather annoying technical reasons, the 117-page paper has to devote a large amount of space to setting up various bits of auxiliary machinery (as well as a dozen or so pages worth of examples and discussion). For instance, the announcement motivates many of the steps of the argument by heuristically identifying nilsequences with bracket polynomial phases such as
. However, a rather significant amount of theory (which was already worked out to a large extent by Leibman) is needed to formalise the “bracket algebra” needed to manipulate such bracket polynomials and to connect them with nilsequences. Furthermore, the “piecewise smooth” nature of bracket polynomials causes some technical issues with the equidistribution theory for these sequences. Our original version of the paper (which was even longer than the current version) set out this theory. But we eventually decided that it was best to eschew almost all use of bracket polynomials (except as motivation and examples), and run the argument almost entirely within the language of nilsequences, to keep the argument a bit more notationally focused (and to make the equidistribution theory easier to establish). But this was not without a tradeoff; some statements that are almost trivially true for bracket polynomials, required some “nilpotent algebra” to convert to the language of nilsequences. Here are some examples of this:
- It is intuitively clear that a bracket polynomial phase e(P(n)) of degree k in one variable n can be “multilinearised” to a polynomial
of multi-degree
in k variables
, such that
and
agree modulo lower order terms. For instance, if
(so k=3), then one could take
. The analogue of this statement for nilsequences is true, but required a moderately complicated nilpotent algebra construction using the Baker-Campbell-Hausdorff formula.
- Suppose one has a bracket polynomial phase e(P_h(n)) of degree k in one variable n that depends on an additional parameter h, in such a way that exactly one of the coefficients in each monomial depends on h. Furthermore, suppose this dependence is bracket linear in h. Then it is intuitively clear that this phase can be rewritten (modulo lower order terms) as e( Q(h,n) ) where Q is a bracket polynomial of multidegree (1,k) in (h,n). For instance, if
and
, then we can take
. The nilpotent algebra analogue of this claim is true, but requires another moderately complicated nilpotent algebra construction based on semi-direct products.
- A bracket polynomial has a fairly visible concept of a “degree” (analogous to the corresponding notion for true polynomials), as well as a “rank” (which, roughly speaking measures the number of parentheses in the bracket monomials, plus one). Thus, for instance, the bracket monomial
has degree 7 and rank 3. Defining degree and rank for nilsequences requires one to generalise the notion of a (filtered) nilmanifold to one in which the lower central series is replaced by a filtration indexed by both the degree and the rank.
There are various other tradeoffs of this type in this paper. For instance, nonstandard analysis tools were introduced to eliminate what would otherwise be quite a large number of epsilons and regularity lemmas to manage, at the cost of some notational overhead; and the piecewise discontinuities mentioned earlier were eliminated by the use of vector-valued nilsequences, though this again caused some further notational overhead. These difficulties may be a sign that we do not yet have the “right” proof of this conjecture, but one will probably have to wait a few years before we get a proper amount of perspective and understanding on this circle of ideas and results.
Recent Comments