You are currently browsing the category archive for the ‘math.DS’ category.

One of the most notorious problems in elementary mathematics that remains unsolved is the Collatz conjecture, concerning the function defined by setting when is odd, and when is even. (Here, is understood to be the positive natural numbers .)

Conjecture 1 (Collatz conjecture)For any given natural number , the orbit passes through (i.e. for some ).

Open questions with this level of notoriety can lead to what Richard Lipton calls “mathematical diseases” (and what I termed an unhealthy amount of obsession on a single famous problem). (See also this xkcd comic regarding the Collatz conjecture.) As such, most practicing mathematicians tend to spend the majority of their time on more productive research areas that are only just beyond the range of current techniques. Nevertheless, it can still be diverting to spend a day or two each year on these sorts of questions, before returning to other matters; so I recently had a go at the problem. Needless to say, I didn’t solve the problem, but I have a better appreciation of why the conjecture is (a) plausible, and (b) unlikely be proven by current technology, and I thought I would share what I had found out here on this blog.

Let me begin with some very well known facts. If is odd, then is even, and so . Because of this, one could replace by the function , defined by when is odd, and when is even, and obtain an equivalent conjecture. Now we see that if one chooses “at random”, in the sense that it is odd with probability and even with probability , then increases by a factor of roughly half the time, and decreases it by a factor of half the time. Furthermore, if is uniformly distributed modulo , one easily verifies that is uniformly distributed modulo , and so should be roughly times as large as half the time, and roughly times as large as the other half of the time. Continuing this at a heuristic level, we expect generically that half the time, and the other half of the time. The logarithm of this orbit can then be modeled heuristically by a random walk with steps and occuring with equal probability. The expectation

is negative, and so (by the classic gambler’s ruin) we expect the orbit to decrease over the long term. This can be viewed as heuristic justification of the Collatz conjecture, at least in the “average case” scenario in which is chosen uniform at random (e.g. in some large interval ). (It also suggests that if one modifies the problem, e.g. by replacing to , then one can obtain orbits that tend to increase over time, and indeed numerically for this variant one sees orbits that appear to escape to infinity.) Unfortunately, one can only rigorously keep the orbit uniformly distributed modulo for time about or so; after that, the system is too complicated for naive methods to control at anything other than a heuristic level.

Remark 1One can obtain a rigorous analogue of the above arguments by extending from the integers to the -adics . This compact abelian group comes with a Haar probability measure, and one can verify that this measure is invariant with respect to ; with a bit more effort one can verify that it is ergodic. This suggests the introduction of ergodic theory methods. For instance, using the pointwise ergodic theorem, we see that if is a random-adicinteger, then almost surely the orbit will be even half the time and odd half the time asymptotically, thus supporting the above heuristics. Unfortunately, this does not directly tell us much about the dynamics on , as this is a measure zero subset of . More generally, unless a dynamical system is somehow “polynomial”, “nilpotent”, or “unipotent” in nature, the current state of ergodic theory is usually only able to say something meaningful aboutgenericorbits, but not aboutallorbits. For instance, the very simple system on the unit circle is well understood from ergodic theory (in particular, almost all orbits will be uniformly distributed), but the orbit of a specific point, e.g. , is still nearly impossible to understand (this particular problem being equivalent to the notorious unsolved question of whether the digits of are uniformly distributed).

The above heuristic argument only suggests decreasing orbits for *almost all* (though even this remains unproven, the state of the art is that the number of in that eventually go to is , a result of Krasikov and Lagarias). It leaves open the possibility of some very rare exceptional for which the orbit goes to infinity, or gets trapped in a periodic loop. Since the only loop that lies in is (for ) or (for ), we thus may isolate a weaker consequence of the Collatz conjecture:

Conjecture 2 (Weak Collatz conjecture)Suppose that is a natural number such that for some . Then is equal to , , or .

Of course, we may replace with (and delete ““) and obtain an equivalent conjecture.

This weaker version of the Collatz conjecture is also unproven. However, it was observed by Bohm and Sontacchi that this weak conjecture is equivalent to a divisibility problem involving powers of and :

Conjecture 3 (Reformulated weak Collatz conjecture)There does not exist and integerssuch that is a positive integer that is a proper divisor of

*Proof:* To see this, it is convenient to reformulate Conjecture 2 slightly. Define an equivalence relation on by declaring if for some integer , thus giving rise to the quotient space of equivalence classes (which can be placed, if one wishes, in one-to-one correspondence with the odd natural numbers). We can then define a function by declaring

for any , where is the largest power of that divides . It is easy to see that is well-defined (it is essentially the *Syracuse function*, after identifying with the odd natural numbers), and that periodic orbits of correspond to periodic orbits of or . Thus, Conjecture 2 is equivalent to the conjecture that is the only periodic orbit of .

Now suppose that Conjecture 2 failed, thus there exists such that for some . Without loss of generality we may take to be odd, then . It is easy to see that is the only fixed point of , and so . An easy induction using (2) shows that

where, for each , is the largest power of that divides

In particular, as is odd, . Using the recursion

we see from induction that divides , and thus :

Since , we have

for some integer . Since is divisible by , and is odd, we conclude ; if we rearrange the above equation as (1), then we obtain a counterexample to Conjecture 3.

Conversely, suppose that Conjecture 3 failed. Then we have , integers

and a natural number such that (1) holds. As , we see that the right-hand side of (1) is odd, so is odd also. If we then introduce the natural numbers by the formula (3), then an easy induction using (4) shows that

with the periodic convention for . As the are increasing in (even for ), we see that is the largest power of that divides the right-hand side of (5); as is odd, we conclude that is also the largest power of that divides . We conclude that

and thus is a periodic orbit of . Since is an odd number larger than , this contradicts Conjecture 3.

Call a *counterexample* a tuple that contradicts Conjecture 3, i.e. an integer and an increasing set of integers

such that (1) holds for some . We record a simple bound on such counterexamples, due to Terras and to Garner :

Lemma 5 (Exponent bounds)Let , and suppose that the Collatz conjecture is true for all . Let be a counterexample. Then

*Proof:* The first bound is immediate from the positivity of . To prove the second bound, observe from the proof of Proposition 4 that the counterexample will generate a counterexample to Conjecture 2, i.e. a non-trivial periodic orbit . As the conjecture is true for all , all terms in this orbit must be at least . An inspection of the proof of Proposition 4 reveals that this orbit consists of steps of the form , and steps of the form . As all terms are at least , the former steps can increase magnitude by a multiplicative factor of at most . As the orbit returns to where it started, we conclude that

whence the claim.

The Collatz conjecture has already been verified for many values of (up to at least , according to this web site). Inserting this into the above lemma, one can get lower bounds on . For instance, by methods such as this, it is known that any non-trivial periodic orbit has length at least , as shown in Garner’s paper (and this bound, which uses the much smaller value that was available in 1981, can surely be improved using the most recent computational bounds).

Now we can perform a heuristic count on the number of counterexamples. If we fix and , then , and from basic combinatorics we see that there are different ways to choose the remaining integers

to form a potential counterexample . As a crude heuristic, one expects that for a “random” such choice of integers, the expression (1) has a probability of holding for some integer . (Note that is not divisible by or , and so one does not expect the special structure of the right-hand side of (1) with respect to those moduli to be relevant. There will be some choices of where the right-hand side in (1) is too small to be divisible by , but using the estimates in Lemma 5, one expects this to occur very infrequently.) Thus, the total expected number of solutions for this choice of is

The heuristic number of solutions overall is then expected to be

where, in view of Lemma 5, one should restrict the double summation to the heuristic regime , with the approximation here accurate to many decimal places.

We need a lower bound on . Here, we will use Baker’s theorem (as discussed in this previous post), which among other things gives the lower bound

for some absolute constant . Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation gives

where is the *entropy function*

A brief computation shows that

and so (ignoring all subexponential terms)

which makes the series (6) convergent. (Actually, one does not need the full strength of Lemma 5 here; anything that kept well away from would suffice. In particular, one does not need an enormous value of ; even (say) would be more than sufficient to obtain the heuristic that there are finitely many counterexamples.) Heuristically applying the Borel-Cantelli lemma, we thus expect that there are only a finite number of counterexamples to the weak Collatz conjecture (and inserting a bound such as , one in fact expects it to be extremely likely that there are no counterexamples at all).

This, of course, is far short of any rigorous proof of Conjecture 2. In order to make rigorous progress on this conjecture, it seems that one would need to somehow exploit the structural properties of numbers of the form

In some very special cases, this can be done. For instance, suppose that one had with at most one exception (this is essentially what is called a *-cycle* by Steiner). Then (8) simplifies via the geometric series formula to a combination of just a bounded number of powers of and , rather than an unbounded number. In that case, one can start using tools from transcendence theory such as Baker’s theorem to obtain good results; for instance, in the above-referenced paper of Steiner, it was shown that -cycles cannot actually occur, and similar methods have been used to show that -cycles (in which there are at most exceptions to ) do not occur for any , as was shown by Simons and de Weger. However, for general increasing tuples of integers , there is no such representation by bounded numbers of powers, and it does not seem that methods from transcendence theory will be sufficient to control the expressions (8) to the extent that one can understand their divisibility properties by quantities such as .

Amusingly, there is a slight connection to Littlewood-Offord theory in additive combinatorics – the study of the random sums

generated by some elements of an additive group , or equivalently, the vertices of an -dimensional parallelepiped inside . Here, the relevant group is . The point is that if one fixes and (and hence ), and lets vary inside the simplex

then the set of all sums of the form (8) (viewed as an element of ) contains many large parallelepipeds. (Note, incidentally, that once one fixes , all the sums of the form (8) are distinct; because given (8) and , one can read off as the largest power of that divides (8), and then subtracting off one can then read off , and so forth.) This is because the simplex contains many large cubes. Indeed, if one picks a typical element of , then one expects (thanks to Lemma 5) that there there will be indices such that for , which allows one to adjust each of the independently by if desired and still remain inside . This gives a cube in of dimension , which then induces a parallelepiped of the same dimension in . A short computation shows that the generators of this parallelepiped consist of products of a power of and a power of , and in particular will be coprime to .

If the weak Collatz conjecture is true, then the set must avoid the residue class in . Let us suppose temporarily that we did not know about Baker’s theorem (and the associated bound (7)), so that could potentially be quite small. Then we would have a large parallelepiped inside a small cyclic group that did not cover all of , which would not be possible for small enough. Indeed, an easy induction shows that a -dimensional parallelepiped in , with all generators coprime to , has cardinality at least . This argument already shows the lower bound . In other words, we have

Proposition 6Suppose the weak Collatz conjecture is true. Then for any natural numbers with , one has .

This bound is very weak when compared against the unconditional bound (7). However, I know of no way to get a nontrivial separation property between powers of and powers of other than via transcendence theory methods. Thus, this result strongly suggests that any proof of the Collatz conjecture must either use existing results in transcendence theory, or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve the Collatz conjecture.)

By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence theory bound (7)):

Proposition 7Suppose the weak Collatz conjecture is true. Then for any natural numbers with , one has for some absolute constant .

*Proof:* (Informal sketch only) Suppose not, then we can find with of size . We form the set as before, which contains parallelepipeds in of large dimension that avoid . We can count the number of times occurs in one of these parallelepipeds by a standard Fourier-analytic computation involving Riesz products (see Chapter 7 of my book with Van Vu, or this recent preprint of Maples). Using this Fourier representation, the fact that this parallelepiped avoids (and the fact that ) forces the generators to be concentrated in a *Bohr set*, in that one can find a non-zero frequency such that of the generators lie in the set . However, one can choose the generators to essentially have the structure of a (generalised) geometric progression (up to scaling, it resembles something like for ranging over a generalised arithmetic progression, and a fixed irrational), and one can show that such progressions cannot be concentrated in Bohr sets (this is similar in spirit to the exponential sum estimates of Bourgain on approximate multiplicative subgroups of , though one can use more elementary methods here due to the very strong nature of the Bohr set concentration (being of the “ concentration” variety rather than the “ concentration”).). This furnishes the required contradiction.

Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful enough to create exponential separation between powers of and powers of .

Unfortunately, once one uses the transcendence theory bound (7), the size of the cyclic group becomes larger than the volume of any cube in , and Littlewood-Offord techniques are no longer of much use (they can be used to show that is highly equidistributed in , but this does not directly give any way to prevent from containing ).

One possible toy model problem for the (weak) Collatz conjecture is a conjecture of Erdos asserting that for , the base representation of contains at least one . (See this paper of Lagarias for some work on this conjecture and on related problems.) To put it another way, the conjecture asserts that there are no integer solutions to

with and . (When , of course, one has .) In this form we see a resemblance to Conjecture 3, but it looks like a simpler problem to attack (though one which is still a fair distance beyond what one can do with current technology). Note that one has a similar heuristic support for this conjecture as one does for Proposition 3; a number of magnitude has about base digits, so the heuristic probability that none of these digits are equal to is , which is absolutely summable.

In 1977, Furstenberg established his multiple recurrence theorem:

Theorem 1 (Furstenberg multiple recurrence)Let be a measure-preserving system, thus is a probability space and is a measure-preserving bijection such that and are both measurable. Let be a measurable subset of of positive measure . Then for any , there exists such thatEquivalently, there exists and such that

As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.

The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system . Indeed, for very simple systems, such as periodic systems (in which is the identity for some , which is for instance the case for the circle shift , with a rational shift ), the theorem is trivial; at a slightly more advanced level, *almost periodic* (or *compact*) systems (in which is a precompact subset of for every , which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various *extension* operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the *Furstenberg-Zimmer structure theorem* for measure-preserving systems. See these lecture notes for further discussion.

From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under *compact extensions*; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.

However, I recently realised that there is a special case of the compact extension step – namely that of *finite* extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.

Let us first explain what a finite extension is. Given a measure-preserving system , a finite set , and a measurable map from to the permutation group of , one can form the *finite extension*

which as a probability space is the product of with the finite probability space (with the discrete -algebra and uniform probability measure), and with shift map

One easily verifies that this is indeed a measure-preserving system. We refer to as the *cocycle* of the system.

An example of finite extensions comes from group theory. Suppose we have a short exact sequence

of finite groups. Let be a group element of , and let be its projection in . Then the shift map on (with the discrete -algebra and uniform probability measure) can be viewed as a finite extension of the shift map on (again with the discrete -algebra and uniform probability measure), by arbitrarily selecting a section that inverts the projection map, identifying with by identifying with for , and using the cocycle

Thus, for instance, the unit shift on can be thought of as a finite extension of the unit shift on whenever is a multiple of .

Another example comes from Riemannian geometry. If is a Riemannian manifold that is a finite cover of another Riemannian manifold (with the metric on being the pullback of that on ), then (unit time) geodesic flow on the cosphere bundle of is a finite extension of the corresponding flow on .

Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:

Proposition 2 (Finite extensions)Let be a finite extension of a measure-preserving system . If obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does .

Before we prove this proposition, let us first give the combinatorial analogue.

Lemma 3Let be a subset of the integers that contains arbitrarily long arithmetic progressions, and let be a colouring of by colours (or equivalently, a partition of into colour classes ). Then at least one of the contains arbitrarily long arithmetic progressions.

*Proof:* By the infinite pigeonhole principle, it suffices to show that for each , one of the colour classes contains an arithmetic progression of length .

Let be a large integer (depending on and ) to be chosen later. Then contains an arithmetic progression of length , which may be identified with . The colouring of then induces a colouring on into colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if is sufficiently large depending on and , then one of these colouring classes contains an arithmetic progression of length ; undoing the identification, we conclude that one of the contains an arithmetic progression of length , as desired.

Of course, by specialising to the case , we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.

Now we prove Proposition 2.

*Proof:* Fix . Let be a positive measure subset of . By Fubini’s theorem, we have

where and is the fibre of at . Since is positive, we conclude that the set

is a positive measure subset of . Note for each , we can find an element such that . While not strictly necessary for this argument, one can ensure if one wishes that the function is measurable by totally ordering , and then letting the minimal element of for which .

Let be a large integer (which will depend on and the cardinality of ) to be chosen later. Because obeys the multiple recurrence theorem, we can find a positive integer and such that

Now consider the sequence of points

for . From (1), we see that

for some sequence . This can be viewed as a colouring of by colours, where is the cardinality of . Applying van der Waerden’s theorem, we conclude (if is sufficiently large depending on and ) that there is an arithmetic progression in with such that

for some . If we then let , we see from (2) that

for all , and the claim follows.

Remark 1The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with as in the proof of Proposition 2, and , the setcan be partitioned into the classes

where is the graph of . The multiple recurrence property for ensures that contains arbitrarily long arithmetic progressions, and so therefore one of the must also, which gives the multiple recurrence property for .

Remark 2Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions on the extension, the shifts of behave in an almost periodic fashion on most fibres, so that the orbits become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map to which van der Waerden’s theorem can be applied.

Let be an abelian countable discrete group. A measure-preserving -system (or *-system for short*) is a probability space , equipped with a measure-preserving action of the group , thus

for all and , and

for all , with equal to the identity map. Classically, ergodic theory has focused on the cyclic case (in which the are iterates of a single map , with elements of being interpreted as a time parameter), but one can certainly consider actions of other groups also (including continuous or non-abelian groups).

A -system is said to be *strongly -mixing*, or strongly mixing for short, if one has

for all , where the convergence is with respect to the one-point compactification of (thus, for every , there exists a compact (hence finite) subset of such that for all ).

Similarly, we say that a -system is *strongly -mixing* if one has

for all , thus for every , there exists a finite subset of such that

whenever all lie outside .

It is obvious that a strongly -mixing system is necessarily strong -mixing. In the case of -systems, it has been an open problem for some time, due to Rohlin, whether the converse is true:

Problem 1 (Rohlin’s problem)Is every strongly mixing -system necessarily strongly -mixing?

This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corput’s inequality) shows that every strongly mixing system is *weakly -mixing*, which roughly speaking means that converges to for *most* . Indeed, every weakly mixing system is in fact weakly mixing of all orders; see for instance this blog post of Carlos Matheus, or these lecture notes of myself. So the problem is to exclude the possibility of correlation between , , and for a small but non-trivial number of pairs .

It is also known that the answer to Rohlin’s problem is affirmative for rank one transformations (a result of Kalikow) and for shifts with purely singular continuous spectrum (a result of Host; note that strongly mixing systems cannot have any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.

In the other direction, Rohlin’s problem is known to have a negative answer for -systems, by a well-known counterexample of Ledrappier which can be described as follows. One can view a -system as being essentially equivalent to a stationary process of random variables in some range space indexed by , with being with the obvious shift map

In Ledrappier’s example, the take values in the finite field of two elements, and are selected at uniformly random subject to the “Pascal’s triangle” linear constraints

A routine application of the Kolmogorov extension theorem allows one to build such a process. The point is that due to the properties of Pascal’s triangle modulo (known as Sierpinski’s triangle), one has

for all powers of two . This is enough to destroy strong -mixing, because it shows a strong correlation between , , and for arbitrarily large and randomly chosen . On the other hand, one can still show that and are asymptotically uncorrelated for large , giving strong -mixing. Unfortunately, there are significant obstructions to converting Ledrappier’s example from a -system to a -system, as pointed out by de la Rue.

In this post, I would like to record a “finite field” variant of Ledrappier’s construction, in which is replaced by the function field ring , which is a “dyadic” (or more precisely, “triadic”) model for the integers (cf. this earlier blog post of mine). In other words:

Theorem 2There exists a -system that is strongly -mixing but not strongly -mixing.

The idea is much the same as that of Ledrappier; one builds a stationary -process in which are chosen uniformly at random subject to the constraints

for all and all . Again, this system is manifestly not strongly -mixing, but can be shown to be strongly -mixing; I give details below the fold.

As I discussed in this previous post, in many cases the dyadic model serves as a good guide for the non-dyadic model. However, in this case there is a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting; once one restores the Archimedean nature of the underlying group, the constraints (1) not only reinforce each other strongly, but also force so much linearity on the system that one loses the strong mixing property.

Tanja Eisner and I have just uploaded to the arXiv our paper “Large values of the Gowers-Host-Kra seminorms“, submitted to Journal d’Analyse Mathematique. This paper is concerned with the properties of three closely related families of (semi)norms, indexed by a positive integer :

- The
*Gowers uniformity norms*of a (bounded, measurable, compactly supported) function taking values on a locally compact abelian group , equipped with a Haar measure ; - The Gowers uniformity norms of a function on a discrete interval ; and
- The Gowers-Host-Kra seminorms of a function on an ergodic measure-preserving system .

These norms have been discussed in depth in previous blog posts, so I will just quickly review the definition of the first norm here (the other two (semi)norms are defined similarly). The norm is defined recursively by setting

and

where . Equivalently, one has

Informally, the Gowers uniformity norm measures the extent to which (the phase of ) behaves like a polynomial of degree less than . Indeed, if and is compact with normalised Haar measure , it is not difficult to show that is at most , with equality if and only if takes the form almost everywhere, where is a polynomial of degree less than (which means that for all ).

Our first result is to show that this result is robust, uniformly over all choices of group :

Theorem 1 (-near extremisers)Let be a compact abelian group with normalised Haar measure , and let be such that and for some and . Then there exists a polynomial of degree at most such that , where is bounded by a quantity that goes to zero as for fixed .

The quantity can be described effectively (it is of polynomial size in ), but we did not seek to optimise it here. This result was already known in the case of vector spaces over a fixed finite field (where it is essentially equivalent to the assertion that the property of being a polynomial of degree at most is locally testable); the extension to general groups turns out to fairly routine. The basic idea is to use the recursive structure of the Gowers norms, which tells us in particular that if is close to one, then is close to one for most , which by induction implies that is close to for some polynomials of degree at most and for most . (Actually, it is not difficult to use cocycle equations such as (when ) to upgrade “for most ” to “for all “.) To finish the job, one would like to express the as derivatives of a polynomial of degree at most . This turns out to be equivalent to requiring that the obey the cocycle equation

where is the translate of by . (In the paper, the sign conventions are reversed, so that , in order to be compatible with ergodic theory notation, but this makes no substantial difference to the arguments or results.) However, one does not quite get this right away; instead, by using some separation properties of polynomials, one can show the weaker statement that

where the are small real constants. To eliminate these constants, one exploits the trivial cohomology of the real line. From (1) one soon concludes that the obey the -cocycle equation

and an averaging argument then shows that is a -coboundary in the sense that

for some small scalar depending on . Subtracting from then gives the claim.

Similar results and arguments also hold for the and norms, which we will not detail here.

Dimensional analysis reveals that the norm is not actually the most natural norm with which to compare the norms against. An application of Young’s convolution inequality in fact reveals that one has the inequality

where is the critical exponent , without any compactness or normalisation hypothesis on the group and the Haar measure . This allows us to extend the norm to all of . There is then a stronger inverse theorem available:

Theorem 2 (-near extremisers)Let be a locally compact abelian group, and let be such that and for some and . Then there exists a coset of a compact open subgroup of , and a polynomial of degree at most such that .

Conversely, it is not difficult to show that equality in (2) is attained when takes the form as above. The main idea of proof is to use an inverse theorem for Young’s inequality due to Fournier to reduce matters to the case that was already established. An analogous result is also obtained for the norm on an ergodic system; but for technical reasons, the methods do not seem to apply easily to the norm. (This norm is essentially equivalent to the norm up to constants, with comparable to , but when working with near-extremisers, norms that are only equivalent up to constants can have quite different near-extremal behaviour.)

In the case when is a Euclidean group , it is possible to use the sharp Young inequality of Beckner and of Brascamp-Lieb to improve (2) somewhat. For instance, when , one has

with equality attained if and only if is a gaussian modulated by a quadratic polynomial phase. This additional gain of allows one to pinpoint the threshold for the previous near-extremiser results in the case of norms. For instance, by using the Host-Kra machinery of characteristic factors for the norm, combined with an explicit and concrete analysis of the -step nilsystems generated by that machinery, we can show that

whenever is a *totally* ergodic system and is orthogonal to all linear and quadratic eigenfunctions (which would otherwise form immediate counterexamples to the above inequality), with the factor being best possible. We can also establish analogous results for the and norms (using the inverse theorem of Ben Green and myself, in place of the Host-Kra machinery), although it is not clear to us whether the threshold remains best possible in this case.

Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers U^{s+1}[N] norm“, which was previously announced on this blog. We are still planning one final round of reviewing the preprint before submitting the paper, but it has gotten to the stage where we are comfortable with having the paper available on the arXiv.

The main result of the paper is to establish the inverse conjecture for the Gowers norm over the integers, which has a number of applications, in particular to counting solutions to various linear equations in primes. In spirit, the proof of the paper follows the 21-page announcement that was uploaded previously. However, for various rather annoying technical reasons, the 117-page paper has to devote a large amount of space to setting up various bits of auxiliary machinery (as well as a dozen or so pages worth of examples and discussion). For instance, the announcement motivates many of the steps of the argument by heuristically identifying nilsequences with bracket polynomial phases such as . However, a rather significant amount of theory (which was already worked out to a large extent by Leibman) is needed to formalise the “bracket algebra” needed to manipulate such bracket polynomials and to connect them with nilsequences. Furthermore, the “piecewise smooth” nature of bracket polynomials causes some technical issues with the equidistribution theory for these sequences. Our original version of the paper (which was even longer than the current version) set out this theory. But we eventually decided that it was best to eschew almost all use of bracket polynomials (except as motivation and examples), and run the argument almost entirely within the language of nilsequences, to keep the argument a bit more notationally focused (and to make the equidistribution theory easier to establish). But this was not without a tradeoff; some statements that are almost trivially true for bracket polynomials, required some “nilpotent algebra” to convert to the language of nilsequences. Here are some examples of this:

- It is intuitively clear that a bracket polynomial phase e(P(n)) of degree k in one variable n can be “multilinearised” to a polynomial of multi-degree in k variables , such that and agree modulo lower order terms. For instance, if (so k=3), then one could take . The analogue of this statement for nilsequences is true, but required a moderately complicated nilpotent algebra construction using the Baker-Campbell-Hausdorff formula.
- Suppose one has a bracket polynomial phase e(P_h(n)) of degree k in one variable n that depends on an additional parameter h, in such a way that exactly one of the coefficients in each monomial depends on h. Furthermore, suppose this dependence is bracket linear in h. Then it is intuitively clear that this phase can be rewritten (modulo lower order terms) as e( Q(h,n) ) where Q is a bracket polynomial of multidegree (1,k) in (h,n). For instance, if and , then we can take . The nilpotent algebra analogue of this claim is true, but requires another moderately complicated nilpotent algebra construction based on semi-direct products.
- A bracket polynomial has a fairly visible concept of a “degree” (analogous to the corresponding notion for true polynomials), as well as a “rank” (which, roughly speaking measures the number of parentheses in the bracket monomials, plus one). Thus, for instance, the bracket monomial has degree 7 and rank 3. Defining degree and rank for nilsequences requires one to generalise the notion of a (filtered) nilmanifold to one in which the lower central series is replaced by a filtration indexed by both the degree and the rank.

There are various other tradeoffs of this type in this paper. For instance, nonstandard analysis tools were introduced to eliminate what would otherwise be quite a large number of epsilons and regularity lemmas to manage, at the cost of some notational overhead; and the piecewise discontinuities mentioned earlier were eliminated by the use of vector-valued nilsequences, though this again caused some further notational overhead. These difficulties may be a sign that we do not yet have the “right” proof of this conjecture, but one will probably have to wait a few years before we get a proper amount of perspective and understanding on this circle of ideas and results.

A (smooth) Riemannian manifold is a smooth manifold without boundary, equipped with a Riemannian metric , which assigns a length to every tangent vector at a point , and more generally assigns an inner product

to every pair of tangent vectors at a point . (We use Roman font for here, as we will need to use to denote group elements later in this post.) This inner product is assumed to symmetric, positive definite, and smoothly varying in , and the length is then given in terms of the inner product by the formula

In coordinates (and also using abstract index notation), the metric can be viewed as an invertible symmetric rank tensor , with

One can also view the Riemannian metric as providing a (self-adjoint) identification between the tangent bundle of the manifold and the cotangent bundle ; indeed, every tangent vector is then identified with the cotangent vector , defined by the formula

In coordinates, .

A fundamental dynamical system on the tangent bundle (or equivalently, the cotangent bundle, using the above identification) of a Riemannian manifold is that of geodesic flow. Recall that geodesics are smooth curves that minimise the length

There is some degeneracy in this definition, because one can reparameterise the curve without affecting the length. In order to fix this degeneracy (and also because the square of the speed is a more tractable quantity analytically than the speed itself), it is better if one replaces the length with the *energy*

Minimising the energy of a parameterised curve turns out to be the same as minimising the length, together with an additional requirement that the speed stay constant in time. Minimisers (and more generally, critical points) of the energy functional (holding the endpoints fixed) are known as *geodesic flows*. From a physical perspective, geodesic flow governs the motion of a particle that is subject to no external forces and thus moves freely, save for the constraint that it must always lie on the manifold .

One can also view geodesic flows as a dynamical system on the tangent bundle (with the state at any time given by the position and the velocity ) or on the cotangent bundle (with the state then given by the position and the *momentum* ). With the latter perspective (sometimes referred to as *cogeodesic flow*), geodesic flow becomes a Hamiltonian flow, with Hamiltonian given as

where is the inverse inner product to , which can be defined for instance by the formula

In coordinates, geodesic flow is given by Hamilton’s equations of motion

In terms of the velocity , we can rewrite these equations as the geodesic equation

where

are the Christoffel symbols; using the Levi-Civita connection , this can be written more succinctly as

If the manifold is an embedded submanifold of a larger Euclidean space , with the metric on being induced from the standard metric on , then the geodesic flow equation can be rewritten in the equivalent form

where is now viewed as taking values in , and is similarly viewed as a subspace of . This is intuitively obvious from the geometric interpretation of geodesics: if the curvature of a curve contains components that are transverse to the manifold rather than normal to it, then it is geometrically clear that one should be able to shorten the curve by shifting it along the indicated transverse direction. It is an instructive exercise to rigorously formulate the above intuitive argument. This fact also conforms well with one’s physical intuition of geodesic flow as the motion of a free particle constrained to be in ; the normal quantity then corresponds to the *centripetal force* necessary to keep the particle lying in (otherwise it would fly off along a tangent line to , as per Newton’s first law). The precise value of the normal vector can be computed via the second fundamental form as , but we will not need this formula here.

In a beautiful paper from 1966, Vladimir Arnold (who, sadly, passed away last week), observed that many basic equations in physics, including the Euler equations of motion of a rigid body, and also (by which is *a priori* a remarkable coincidence) the Euler equations of fluid dynamics of an inviscid incompressible fluid, can be viewed (formally, at least) as geodesic flows on a (finite or infinite dimensional) Riemannian manifold. And not just any Riemannian manifold: the manifold is a Lie group (or, to be truly pedantic, a torsor of that group), equipped with a right-invariant (or left-invariant, depending on one’s conventions) metric. In the context of rigid bodies, the Lie group is the group of rigid motions; in the context of incompressible fluids, it is the group ) of measure-preserving diffeomorphisms. The right-invariance makes the Hamiltonian mechanics of geodesic flow in this context (where it is sometimes known as the *Euler-Arnold equation* or the *Euler-Poisson equation*) quite special; it becomes (formally, at least) completely integrable, and also indicates (in principle, at least) a way to reformulate these equations in a Lax pair formulation. And indeed, many further completely integrable equations, such as the Korteweg-de Vries equation, have since been reinterpreted as Euler-Arnold flows.

From a physical perspective, this all fits well with the interpretation of geodesic flow as the free motion of a system subject only to a physical constraint, such as rigidity or incompressibility. (I do not know, though, of a similarly intuitive explanation as to why the Korteweg de Vries equation is a geodesic flow.)

One consequence of being a completely integrable system is that one has a large number of conserved quantities. In the case of the Euler equations of motion of a rigid body, the conserved quantities are the linear and angular momentum (as observed in an external reference frame, rather than the frame of the object). In the case of the two-dimensional Euler equations, the conserved quantities are the pointwise values of the vorticity (as viewed in Lagrangian coordinates, rather than Eulerian coordinates). In higher dimensions, the conserved quantity is now the (Hodge star of) the vorticity, again viewed in Lagrangian coordinates. The vorticity itself then evolves by the *vorticity equation*, and is subject to vortex stretching as the diffeomorphism between the initial and final state becomes increasingly sheared.

The elegant Euler-Arnold formalism is reasonably well-known in some circles (particularly in Lagrangian and symplectic dynamics, where it can be viewed as a special case of the *Euler-Poincaré formalism* or *Lie-Poisson formalism* respectively), but not in others; I for instance was only vaguely aware of it until recently, and I think that even in fluid mechanics this perspective to the subject is not always emphasised. Given the circumstances, I thought it would therefore be appropriate to present Arnold’s original 1966 paper here. (For a more modern treatment of these topics, see the books of Arnold-Khesin and Marsden-Ratiu.)

In order to avoid technical issues, I will work formally, ignoring questions of regularity or integrability, and pretending that infinite-dimensional manifolds behave in exactly the same way as their finite-dimensional counterparts. In the finite-dimensional setting, it is not difficult to make all of the formal discussion below rigorous; but the situation in infinite dimensions is substantially more delicate. (Indeed, it is a notorious open problem whether the Euler equations for incompressible fluids even forms a global continuous flow in a reasonable topology in the first place!) However, I do not want to discuss these analytic issues here; see this paper of Ebin and Marsden for a treatment of these topics.

Ben Green, Tamar Ziegler, and I have just uploaded to the arXiv the note “An inverse theorem for the Gowers norm (announcement)“, not intended for publication. This is an announcement of our forthcoming solution of the *inverse conjecture for the Gowers norm*, which roughly speaking asserts that norm of a bounded function is large if and only if that function correlates with an -step nilsequence of bounded complexity.

The full argument is quite lengthy (our most recent draft is about 90 pages long), but this is in large part due to the presence of various technical details which are necessary in order to make the argument fully rigorous. In this 20-page announcement, we instead sketch a *heuristic* proof of the conjecture, relying in a number of “cheats” to avoid the above-mentioned technical details. In particular:

- In the announcement, we rely on somewhat vaguely defined terms such as “bounded complexity” or “linearly independent with respect to bounded linear combinations” or “equivalent modulo lower step errors” without specifying them rigorously. In the full paper we will use the machinery of nonstandard analysis to rigorously and precisely define these concepts.
- In the announcement, we deal with the traditional linear nilsequences rather than the polynomial nilsequences that turn out to be better suited for finitary equidistribution theory, but require more notation and machinery in order to use.
- In a similar vein, we restrict attention to scalar-valued nilsequences in the announcement, though due to topological obstructions arising from the twisted nature of the torus bundles used to build nilmanifolds, we will have to deal instead with vector-valued nilsequences in the main paper.
- In the announcement, we pretend that nilsequences can be described by bracket polynomial phases, at least for the sake of making examples, although strictly speaking bracket polynomial phases only give examples of
*piecewise*Lipschitz nilsequences rather than genuinely Lipschitz nilsequences.

With these cheats, it becomes possible to shorten the length of the argument substantially. Also, it becomes clearer that the main task is a cohomological one; in order to inductively deduce the inverse conjecture for a given step from the conjecture for the preceding step , the basic problem is to show that a certain (quasi-)cocycle is necessarily a (quasi-)coboundary. This in turn requires a detailed analysis of the top order and second-to-top order terms in the cocycle, which requires a certain amount of nilsequence equidistribution theory and additive combinatorics, as well as a “sunflower decomposition” to arrange the various nilsequences one encounters into a usable “normal form”.

It is often the case in modern mathematics that the informal heuristic way to explain an argument looks quite different (and is significantly shorter) than the way one would formally present the argument with all the details. This seems to be particularly true in this case; at a superficial level, the full paper has a very different set of notation than the announcement, and a lot of space is invested in setting up additional machinery that one can quickly gloss over in the announcement. We hope though that the announcement can provide a “road map” to help navigate the much longer paper to come.

In Notes 5, we saw that the Gowers uniformity norms on vector spaces in high characteristic were controlled by classical polynomial phases .

Now we study the analogous situation on cyclic groups . Here, there is an unexpected surprise: the polynomial phases (classical or otherwise) are no longer sufficient to control the Gowers norms once exceeds . To resolve this problem, one must enlarge the space of polynomials to a larger class. It turns out that there are at least three closely related options for this class: the *local polynomials*, the *bracket polynomials*, and the *nilsequences*. Each of the three classes has its own strengths and weaknesses, but in my opinion the nilsequences seem to be the most natural class, due to the rich algebraic and dynamical structure coming from the nilpotent Lie group undergirding such sequences. For reasons of space we shall focus primarily on the nilsequence viewpoint here.

Traditionally, nilsequences have been defined in terms of linear orbits on nilmanifolds ; however, in recent years it has been realised that it is convenient for technical reasons (particularly for the quantitative “single-scale” theory) to generalise this setup to that of *polynomial* orbits , and this is the perspective we will take here.

A polynomial phase on a finite abelian group is formed by starting with a polynomial to the unit circle, and then composing it with the exponential function . To create a nilsequence , we generalise this construction by starting with a polynomial into a *nilmanifold* , and then composing this with a Lipschitz function . (The Lipschitz regularity class is convenient for minor technical reasons, but one could also use other regularity classes here if desired.) These classes of sequences certainly include the polynomial phases, but are somewhat more general; for instance, they *almost* include *bracket polynomial* phases such as . (The “almost” here is because the relevant functions involved are only piecewise Lipschitz rather than Lipschitz, but this is primarily a technical issue and one should view bracket polynomial phases as “morally” being nilsequences.)

In these notes we set out the basic theory for these nilsequences, including their equidistribution theory (which generalises the equidistribution theory of polynomial flows on tori from Notes 1) and show that they are indeed obstructions to the Gowers norm being small. This leads to the *inverse conjecture for the Gowers norms* that shows that the Gowers norms on cyclic groups are indeed controlled by these sequences.

A (complex, semi-definite) inner product space is a complex vector space equipped with a sesquilinear form which is conjugate symmetric, in the sense that for all , and non-negative in the sense that for all . By inspecting the non-negativity of for complex numbers , one obtains the Cauchy-Schwarz inequality

if one then defines , one then quickly concludes the triangle inequality

which then soon implies that is a semi-norm on . If we make the additional assumption that the inner product is positive definite, i.e. that whenever is non-zero, then this semi-norm becomes a norm. If is complete with respect to the metric induced by this norm, then is called a Hilbert space.

The above material is extremely standard, and can be found in any graduate real analysis course; I myself covered it here. But what is perhaps less well known (except inside the fields of additive combinatorics and ergodic theory) is that the above theory of classical Hilbert spaces is just the first case of a hierarchy of *higher order Hilbert spaces*, in which the binary inner product is replaced with a -ary inner product that obeys an appropriate generalisation of the conjugate symmetry, sesquilinearity, and positive semi-definiteness axioms. Such inner products then obey a higher order Cauchy-Schwarz inequality, known as the *Cauchy-Schwarz-Gowers* inequality, and then also obey a triangle inequality and become semi-norms (or norms, if the inner product was non-degenerate). Examples of such norms and spaces include the *Gowers uniformity norms* , the *Gowers box norms* , and the *Gowers-Host-Kra seminorms* ; a more elementary example are the family of Lebesgue spaces when the exponent is a power of two. They play a central role in modern additive combinatorics and to certain aspects of ergodic theory, particularly those relating to Szemerédi’s theorem (or its ergodic counterpart, the Furstenberg multiple recurrence theorem); they also arise in the regularity theory of hypergraphs (which is not unrelated to the other two topics).

A simple example to keep in mind here is the order two Hilbert space on a measure space , where the inner product takes the form

In this brief note I would like to set out the abstract theory of such higher order Hilbert spaces. This is not new material, being already implicit in the breakthrough papers of Gowers and Host-Kra, but I just wanted to emphasise the fact that the material is abstract, and is not particularly tied to any explicit choice of norm so long as a certain axiom are satisfied. (Also, I wanted to write things down so that I would not have to reconstruct this formalism again in the future.) Unfortunately, the notation is quite heavy and the abstract axiom is a little strange; it may be that there is a better way to formulate things. In this particular case it does seem that a concrete approach is significantly clearer, but abstraction is at least possible.

Note: the discussion below is likely to be comprehensible only to readers who already have some exposure to the Gowers norms.

(Linear) Fourier analysis can be viewed as a tool to study an arbitrary function on (say) the integers , by looking at how such a function correlates with *linear phases* such as , where is the fundamental character, and is a frequency. These correlations control a number of expressions relating to , such as the expected behaviour of on arithmetic progressions of length three.

In this course we will be studying higher-order correlations, such as the correlation of with quadratic phases such as , as these will control the expected behaviour of on more complex patterns, such as arithmetic progressions of length four. In order to do this, we must first understand the behaviour of exponential sums such as

Such sums are closely related to the distribution of expressions such as in the unit circle , as varies from to . More generally, one is interested in the distribution of polynomials of one or more variables taking values in a torus ; for instance, one might be interested in the distribution of the quadruplet as both vary from to . Roughly speaking, once we understand these types of distributions, then the general machinery of quadratic Fourier analysis will then allow us to understand the distribution of the quadruplet for more general classes of functions ; this can lead for instance to an understanding of the distribution of arithmetic progressions of length in the primes, if is somehow related to the primes.

More generally, to find arithmetic progressions such as in a set , it would suffice to understand the equidistribution of the quadruplet in as and vary. This is the starting point for the fundamental connection between *combinatorics* (and more specifically, the task of finding patterns inside sets) and *dynamics* (and more specifically, the theory of equidistribution and recurrence in measure-preserving dynamical systems, which is a subfield of ergodic theory). This connection was explored in one of my previous classes; it will also be important in this course (particularly as a source of motivation), but the primary focus will be on finitary, and Fourier-based, methods.

The theory of equidistribution of polynomial orbits was developed in the linear case by Dirichlet and Kronecker, and in the polynomial case by Weyl. There are two regimes of interest; the (qualitative) *asymptotic regime* in which the scale parameter is sent to infinity, and the (quantitative) *single-scale regime* in which is kept fixed (but large). Traditionally, it is the asymptotic regime which is studied, which connects the subject to other asymptotic fields of mathematics, such as dynamical systems and ergodic theory. However, for many applications (such as the study of the primes), it is the single-scale regime which is of greater importance. The two regimes are not directly equivalent, but are closely related: the single-scale theory can be usually used to derive analogous results in the asymptotic regime, and conversely the arguments in the asymptotic regime can serve as a simplified model to show the way to proceed in the single-scale regime. The analogy between the two can be made tighter by introducing the (qualitative) *ultralimit regime*, which is formally equivalent to the single-scale regime (except for the fact that explicitly quantitative bounds are abandoned in the ultralimit), but resembles the asymptotic regime quite closely.

We will view the equidistribution theory of polynomial orbits as a special case of Ratner’s theorem, which we will study in more generality later in this course.

For the finitary portion of the course, we will be using *asymptotic notation*: , , or denotes the bound for some absolute constant , and if we need to depend on additional parameters then we will indicate this by subscripts, e.g. means that for some depending only on . In the ultralimit theory we will use an analogue of asymptotic notation, which we will review later in these notes.

## Recent Comments