However, there is an intriguing “alternate universe” in which the Möbius function *is* strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, *in principle*, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1At least one of the following two statements are true:

- (Twin prime conjecture) There are infinitely many primes such that is also prime.
- (No Siegel zeroes) There exists a constant such that for every real Dirichlet character of conductor , the associated Dirichlet -function has no zeroes in the interval .

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

for some large value of , where is the von Mangoldt function. Actually, in this post we will work with the slight variant

where

is the second von Mangoldt function, and denotes Dirichlet convolution, and is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval and remove very small primes from , but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve is essentially replaced by the more combinatorial restriction for some large , where is the primorial of , but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero with close to and a Dirichlet character of conductor , then multiplicative number theory methods can be used to show that the Möbius function “pretends” to be like the character in the sense that for “most” primes near (e.g. in the range for some small and large ). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that pretends to be like can be used to construct a tractable approximation (after inserting the sieve weight ) in the range (where for some large ) for the second von Mangoldt function , namely the function

Roughly speaking, we think of the periodic function and the slowly varying function as being of about the same “complexity” as the constant function , so that is roughly of the same “complexity” as the divisor function

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate to accuracy with little difficulty, whereas to obtain a comparable level of accuracy for or is essentially the Riemann hypothesis.)

One expects to be a good approximant to if is of size and has no prime factors less than for some large constant . The Selberg sieve will be mostly supported on numbers with no prime factor less than . As such, one can hope to approximate (1) by the expression

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation with in various ranges; this is clearly related to understanding the equidistribution of the hyperbola in . Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

where denotes the inverse of in . One can then use the Weil bound

where is the greatest common divisor of (with the convention that this is equal to if vanish), and the decays to zero as . The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that with large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound)One haswhenever and are coprime to , where the is with respect to the limit (and is uniform in ).

*Proof:* Observe from change of variables that the Kloosterman sum is unchanged if one replaces with for . For fixed , the number of such pairs is at least , thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

The left-hand side can be rearranged as

which by Fourier summation is equal to

Observe from the quadratic formula and the divisor bound that each pair has at most solutions to the system of equations . Hence the number of quadruples of the desired form is , and the claim follows.

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound)Let be a primitive real Dirichlet character of conductor , and let . Then

*Proof:* As is the conductor of a primitive real Dirichlet character, is equal to times a squarefree odd number for some . By the Chinese remainder theorem, it thus suffices to establish the claim when is an odd prime. We may assume that is not divisible by this prime , as the claim is trivial otherwise. If vanishes then does not vanish, and the claim follows from the mean zero nature of ; similarly if vanishes. Hence we may assume that do not vanish, and then we can normalise them to equal . By completing the square it now suffices to show that

whenever . As is on the quadratic residues and on the non-residues, it now suffices to show that

But by making the change of variables , the left-hand side becomes , and the claim follows.

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of . These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

** — 1. Consequences of a Siegel zero — **

It is convenient to phrase Heath-Brown’s theorem in the following equivalent form:

Theorem 4Suppose one has a sequence of real Dirichlet characters of conductor going to infinity, and a sequence of real zeroes with as . Then there are infinitely many prime twins.

Henceforth, we omit the dependence on from all of our quantities (unless they are explicitly declared to be “fixed”), and the asymptotic notation , , , etc. will always be understood to be with respect to the parameter, e.g. means that for some fixed . (In the language of this previous blog post, we are thus implicitly using “cheap nonstandard analysis”, although we will not explicitly use nonstandard analysis notation (other than the asymptotic notation mentioned above) further in this post. With this convention, we now have a single (but not fixed) Dirichlet character of some conductor with a Siegel zero

It will also be convenient to use the crude bound

which can be proven by elementary means (see e.g. Exercise 57 of this post), although one can use Siegel’s theorem to obtain the better bound . Standard arguments (see also Lemma 59 of this blog post) then give

We now use this Siegel zero to show that pretends to be like for primes that are comparable (in log-scale) to :

For more precise estimates on the error, see the paper of Heath-Brown (particularly Lemma 3).

*Proof:* It suffices to show, for sufficiently large fixed , that

for each fixed natural number .

We begin by considering the sum

for some large (which we will eventually take to be a power of ); we will exploit the fact that this sum is very stable for comparable to in log-scale. By the Dirichlet hyperbola method, we can write this as

Since , one can show through summation by parts (see Lemma 71 of this previous post) that

for any , while from the integral test (see Lemma 2 of this previous post) we have

We can thus estimate (9) as

From summation by parts we again have

and we have the crude bound

so by using (7) and we arrive at

for any , where the exponent does not depend on . In particular, if and is large enough, then by (6), (7), (8) we have

Setting and and subtracting, we conclude that

On the other hand, observe that is always non-negative, and that whenever and , with primes with . Since any number with has at most representations of the form with and , and no outside of the range has such a representation, we thus see that

Comparing this with (10), we conclude that

since , the claim follows.

** — 2. Main argument — **

We let be a large absolute constant ( will do) and set to be the primorial of . Set for some large fixed (large compared to or ). Let be a smooth non-negative function supported on and equal to at . Set

and

Thus is a smooth cutoff to the region , and is a smooth cutoff to the region . It will suffice to establish the lower bound

because the non-twin primes contribute at most to the left-hand side. The weight is an unsquared Selberg sieve designed to damp out those for which or have somewhat small prime factors; we did not square this weight as is customary with the Selberg sieve in order to simplify the calculations slightly (the fact that the weight can be non-negative sometimes will not be a serious concern for us).

Thus is non-negative, and supported on those products of primes with and . Convolving (11) by and using the identity , we have

where . (The quantities are all non-negative, but we will not take advantage of these facts here.) It thus suffices to establish the two bounds

the intuition here is that Lemma 5 is showing that is “sparse” and so the contribution of should be relatively small

We begin with (13). Let be a small fixed quantity to be chosen later. Observe that if is non-zero, then must have a factor on which is non-zero, which implies that is either divisible by a prime with , or by the square of a prime. If the former case occurs, then either or is divisible by ; since , this implies that either is divisible by a prime with , or that is divisible by a prime less than . To summarise, at least one of the following three statements must hold:

- is divisible by a prime .
- is divisible by the square of a prime .
- is divisible by a prime with .

It thus suffices to establish the estimates

as the claim then follows by summing and sending slowly to zero.

We begin with (15). Observe that if divides then either divides or divides . In particular the number of with is . The summand is by the divisor bound, so the left-hand side of (15) is bounded by

and the claim follows.

Next we turn to (14). We can very crudely bound

By Mertens’ theorem, it suffices to show that

for all .

We use a modification of the argument used to prove Proposition 4.2 of this Polymath8b paper. By Fourier inversion, we may write

for some rapidly decreasing function , so that

and hence

and hence by the triangle inequality

for any fixed . Since , we can thus (after substituting ) bound the left-hand side of (18) by

and so it will suffice to show the bound

for any and .

We factor where are primes, and then write where and is the largest index for which . Clearly and with , and the least prime factor of is such that

we have on the support of , and so

and thus . Clearly we have

We write , where denotes the number of prime factors of counting multiplicity. We can thus bound the left-hand side of (19) by

We may replace the weight with a restriction of to the interval . The constraint removes two residue classes modulo every odd prime less than , while the constraint restricts to residue classes modulo . Standard sieve theory then gives

and so we are reduced to showing that

Factoring , we can bound the left-hand side by

which (for large enough) is bounded by

which by Mertens’ theorem is bounded by

and the claim follows.

For future reference we observe that the above arguments also establish the bound

for all .

Finally, we turn to (16). Using (17) again, it suffices to show that

The claim then follows from (21) and Lemma 5.

It remains to prove (12), which we write as

On the support of , we can write

The contribution of the error term can be bounded by

applying (20), this is bounded by which is acceptable for large enough. Thus it suffices to show that

which we write as

where . We split where , , are smooth truncations of to the intervals , , and respectively. It will suffice to establish the bounds

We begin with (24), which is a relatively easy consequence of the cancellation properties of . We may rewrite the left-hand side as

The summand vanishes unless , , and is coprime to , so that . For fixed , the constraints , restricts to residue classes of the form , with , in particular and for some with . Let us fix and consider the sum

Writing , this becomes

From Lemma 3, we have

since is coprime to . From summation by parts we thus have

(noting that if is large enough) and so we can bound the left-hand side of (24) in magnitude by

and (24) follows.

Now we prove (23), which is where we need nontrivial bounds on Kloosterman sums. Expanding out and using the triangle inequality, it suffices (for large enough) to show that

for all . By Fourier expansion of the and constraints (retaining only the restriction that is odd), it suffices to show that

for every .

Fix . If for an odd , then we can uniquely factor such that , , and . It thus suffices to show that

Actually, we may delete the condition since this is implied by the constraints and odd.

We first dispose of the case when is large in the sense that . Making the change of variables , we may rewrite the left-hand side as

We can assume is coprime to and odd with coprime to and , as the contribution of all other cases vanish. The constraints that is odd and then restricts to a single residue class modulo , with restricted to a single residue class modulo . We split this into residue classes modulo to make the phase constant on each residue class. The modulus is not divisible by , since is coprime to and . As such, has mean zero on every consecutive elements in each residue class modulo under consideration, and from summation by parts we then have

and hence the contribution of the case to (25) is

which is acceptable.

It remains to control the contribution of the case to (25). By the triangle inequality, it suffices to show that

for all coprime to . We can of course restrict to be coprime to each other and to . Writing , the constraint is equivalent to

and so we can rewrite the left-hand side as

By Fourier expansion, we can write as a linear combination of with bounded coefficients and , so it suffices to show that

Next, by Fourier expansion of the constraint , we write the left-hand side as

From Poisson summation and the smoothness of , we see that the inner sum is unless

for some integer , where denotes the distance from to the nearest integer. The contribution of the which do not satisfy this relation is easily seen to be acceptable. From the support of we see in particular that there are only remaining choices for . Thus it suffices by the triangle inequality to show that

for each of the form (26).

We rearrange the left-hand side as

Suppose first that is of the form for some integer . Then the phase is periodic with period and has mean zero here (since ). From this, we can estimate the inner sum by ; since is restricted to be of size , this contribution is certainly acceptable. Thus we may assume that is not of the form . A similar argument works when (say), so we may assume that , so that .

By (26), this forces the denominator of in lowest form to be . By Lemma 2, we thus have

for any , so from Poisson summation we have

since is constrained to be , the claim follows.

Finally, we prove (22), which is a routine sieve-theoretic calculation. We rewrite the left-hand side as

The summand vanishes unless are coprime to with and . From Poisson summation one then has

The error term is certainly negligible, so it suffices to show that

We can control the left-hand side by Fourier analysis. Writing

and

for some rapidly decreasing functions , the left-hand side may be expressed as

for , and

for . From Mertens’ theorem we have the crude bound

which by the rapid decrease of allows one to restrict to the range with an error of . In particular, we now have .

Recalling that

for , we can factor

where

(the restriction being to prevent vanishing for and small) and one has

for , and

and

for odd . In particular, from the Cauchy integral formula we see that

for . Since we also have in this region, we thus can write (27) as

and our task is now to show that

We have

when (even when have negative real part); since , we conclude from the Cauchy integral formula that

when . For the remaining primes , we have

when and . Summing in using Lemma 5 to handle those between and , and Mertens’ theorem and the trivial bound for all other , we conclude that

and thus

From this and the rapid decrease of , we may restrict the range of even further to for any that goes to infinity arbitrarily slowly with . For sufficiently slow , the above estimates on and Lemma 5 (now used to handle those between and for some going sufficiently slowly to zero) give

and so we are reduced to establishing that

We may once again use the rapid decrease of to remove the prefactor as well as the restrictions , and reduce to showing that

For large enough, it will suffice to show that

with the implied constant independent of . But the left-hand side evaluates to , and the claim follows.

Filed under: expository, math.NT Tagged: prime numbers, Roger Heath-Brown, Siegel zero, twin primes ]]>

via fractional linear transformations:

Here and in the rest of the post we will abuse notation by identifying elements of the special linear group with their equivalence class in ; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix by its negation . In particular, we recommend that the reader ignore the signs that appear from time to time in the discussion below.

As the action of on is transitive, and any given point in (e.g. ) has a stabiliser isomorphic to the projective rotation group , we can view the Poincaré upper half-plane as a homogeneous space for , and more specifically the quotient space of of a maximal compact subgroup . In fact, we can make the half-plane a symmetric space for , by endowing with the Riemannian metric

(using Cartesian coordinates ), which is invariant with respect to the action. Like any other Riemannian metric, the metric on generates a number of other important geometric objects on , such as the distance function which can be computed to be given by the formula

the volume measure , which can be computed to be

and the Laplace-Beltrami operator, which can be computed to be (here we use the negative definite sign convention for ). As the metric was -invariant, all of these quantities arising from the metric are similarly -invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant , thus is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere in is a model for two-dimensional spherical geometry (or is a model for two-dimensional Euclidean geometry). (Indeed, is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid in the Minkowski spacetime , which is the direct analogue of the unit sphere in Euclidean spacetime or the plane in Galilean spacetime .)

One can inject arithmetic into this geometric structure by passing from the Lie group to the full modular group

or congruence subgroups such as

for natural number , or to the discrete stabiliser of the point at infinity:

These are discrete subgroups of , nested by the subgroup inclusions

There are many further discrete subgroups of (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup of generates a quotient space , which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain – a set consisting of a single representative of each of the orbits of in . This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure on from the volume measure on (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as when there is no chance of confusion.

For instance, a fundamental domain for is given (up to null sets) by the strip , with identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for is famously given (again up to null sets) by an upper portion , with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for can be formed by gluing together

copies of a fundamental domain for in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces . In order to analyse functions on such orbifolds, it is convenient to lift such functions back up to and identify them with functions which are *-automorphic* in the sense that for all and . Such functions will be referred to as -automorphic forms, or *automorphic forms* for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over than the trivial bundle that is implicitly present when analysing scalar functions . However, we will not discuss this (important) more general situation here.)

An important way to create a -automorphic form is to start with a non-automorphic function obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series defined by

which is clearly -automorphic. (One could equivalently write in place of here; there are good argument for both conventions, but I have ultimately decided to use the convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over associated with -automorphic forms. A little more generally, given a subgroup of and a -automorphic function of suitable decay, we can form a relative Poincaré series by

where is any fundamental domain for , that is to say a subset of consisting of exactly one representative for each right coset of . As is -automorphic, we see (if has suitable decay) that does not depend on the precise choice of fundamental domain, and is -automorphic. These operations are all compatible with each other, for instance . A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function .

For future reference we record the basic but fundamental *unfolding identities*

for any function with sufficient decay, and any -automorphic function of reasonable growth (e.g. bounded and compact support, and bounded, will suffice). Note that is viewed as a function on on the left-hand side, and as a -automorphic function on on the right-hand side. More generally, one has

whenever are discrete subgroups of , is a -automorphic function with sufficient decay on , and is a -automorphic (and thus also -automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains , , and in our analysis.

When computing various statistics of a Poincaré series , such as its values at special points , or the quantity , expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

where is the divisor function. This can be rewritten (by factoring and ) as

which is basically a sum over the full modular group . At this point we will “cheat” a little by moving to the related, but different, sum

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between and is given by the formula

and so one can express the above sum as

(the factor of coming from the quotient by in the projective special linear group); one can express this as , where and is the indicator function of the ball . Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

of the sum (7). Note from multiplicativity that (7) can be written as , which is superficially very similar to (10), but with the key difference that the polynomial is irreducible over the integers.

As with (7), we may expand (10) as

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation of into Gaussian integers gives rise (upon taking norms) to an identity of the form , where and . Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form gives rise to a factorisation of the form , essentially uniquely up to units. Now note that is of the form if and only if , in which case . Thus we can essentially write the above sum as something like

and one the modular group is now manifest. An equivalent way to see these manipulations is as follows. A triple of natural numbers with gives rise to a positive quadratic form of normalised discriminant equal to with integer coefficients (it is natural here to allow to take integer values rather than just natural number values by essentially doubling the sum). The group acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse of an element of ). Because the discriminant has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form in this space is equivalent (under the action of some element of ) with the standard quadratic form . In other words, one has

which (up to a harmless sign) is exactly the representation , , introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that has real part and imaginary part . Thus (11) is (up to a factor of two) the Poincaré series as in the preceding example, except that is now the indicator of the sector .

Sums involving subgroups of the full modular group, such as , often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression when and are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over rather than .

The third and final example concerns averages of Kloosterman sums

where and is the inverse of in the multiplicative group . It turns out that the norms of Poincaré series or are closely tied to such averages. Consider for instance the quantity

where is a natural number and is a -automorphic form that is of the form

for some integer and some test function , which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

To compute this, we use the double coset decomposition

where for each , are arbitrarily chosen integers such that . To see this decomposition, observe that every element in outside of can be assumed to have by applying a sign , and then using the row and column operations coming from left and right multiplication by (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place in the interval and to be any specified integer pair with . From this we see that

and so from further use of the unfolding formula (5) we may expand (13) as

The first integral is just . The second expression is more interesting. We have

so we can write

as

which on shifting by simplifies a little to

and then on scaling by simplifies a little further to

Note that as , we have modulo . Comparing the above calculations with (12), we can thus write (13) as

is a certain integral involving and a parameter , but which does not depend explicitly on parameters such as . Thus we have indeed expressed the expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator on spaces such as or , so that a Poincaré series such as might be expanded out using inner products of (or, by the unfolding identities, ) with various generalised eigenfunctions of (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions of the second kind, play a prominent role, basically because the -automorphic functions

for and non-zero are generalised eigenfunctions of (with eigenvalue ), and are almost square-integrable on (the norm diverges only logarithmically at one end of the cylinder , while decaying exponentially fast at the other end ).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation , the heat equation , the Schrödinger equation , or the wave equation . Thus, one can hope to rephrase many arguments that involve spectral data of into arguments that instead involve resolvents , heat kernels , Schrödinger propagators , or wave propagators , or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

Actually it will be a bit more convenient to normalise the Laplacian by , and look instead at the *automorphic wave equation*

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance and are -automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series and will be -automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

- Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for on (discussed previously here);
- Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s bound to the optimal ) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
- Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

** — 1. Selberg’s theorem — **

We begin with a proof of the following celebrated result of Selberg:

Theorem 1Let be a natural number. Then every eigenvalue of on (the mean zero functions on ) is at least .

One can show that has only pure point spectrum below on (see this previous blog post for more discussion). Thus, this theorem shows that the spectrum of on is contained in .

We now prove this theorem. Suppose this were not the case, then we have a non-zero eigenfunction of in with eigenvalue for some ; we may assume to be real-valued, and by elliptic regularity it is smooth (on ). If it is constant in the horizontal variable, thus , then by the -automorphic nature of it is easy to see that is globally constant, contradicting the fact that it is mean zero but not identically zero. Thus it is not identically constant in the horizontal variable. By Fourier analysis on the cylinder , one can then find a -automorphic function of the form for some non-zero integer which has a non-zero inner product with on , where is a smooth compactly supported function.

Now we evolve by the wave equation

to obtain a smooth function such that and ; the existence (and uniqueness) of such a solution to this initial value problem can be established by standard wave equation methods (e.g. parametrices and energy estimates), or by the spectral theory of the Laplacian. (One can also solve for explicitly in terms of modified Bessel functions, but we will not need to do so here, which is one of the main points of using the wave equation method.) Since the initial data obeyed the translation symmetry for all and , we see (from the uniqueness theory and translation invariance of the wave equation) that also obeys this symmetry; in particular is -automorphic for all times . By finite speed of propagation, remains compactly supported in for all time , in fact for positive time it will lie in the strip , where we allow the implied constants to depend on the initial data .

Taking the inner product of with the eigenfunction on , differentiating under the integral sign, and integrating by parts, we see that

Since is initially non-zero with zero velocity, we conclude from solving the ODE that is a non-zero multiple of . In particular, it grows like as . Using the unfolding identity (6) to write

and then using the Cauchy-Schwarz inequality, we conclude that

as , where we allow implied constants to depend on .

We complement this lower bound with slightly crude upper bound in which we are willing to lose some powers of . We have already seen that is supported in the strip . Compactly supported solutions to (16) on the cylinder conserve the energy

In particular, this quantity is for all time (recall we allow implied constants to depend on ). From Hardy’s inequality, the quantity

is non-negative. Discarding this term and using , and using the fact that is non-zero, we arrive at the bounds

and

(We allow implied constants to depend on , but not on .) From the fundamental theorem of calculus and Minkowski’s inequality in , the latter inequality implies that

for , which on combining with the former inequality gives

The function also obeys the wave equation (16), so a similar argument gives

Applying a Sobolev inequality on unit squares (for ) or on squares of length comparable to (for ) we conclude the pointwise estimates

for and

for . In particular, we write , we have the somewhat crude estimates

for all and . (One can do better than this, particularly for large , but this bound will suffice for us.)

By repeating the analysis of (13) at the start of this post, we see that the quantity

where

Since is supported on and is bounded by , the integral is for . We also see that vanishes unless (otherwise and cannot simultaneously be , and for such values of , we have the triangle inequality bound

Evaluating the integral and then the integral, we arrive at

and so we can bound (18) (ignoring any potential cancellation in ) by

Now we use the Weil bound for Kloosterman sums, which gives

(see e.g. this previous post for a discussion of this bound) to arrive at the bound

as . Comparing this with (17) we obtain a contradiction as since we have , and the claim follows.

Remark 2It was conjectured by Linnik thatas for any fixed ; this, when combined with a more refined analysis of the above type, implies the Selberg eigenvalue conjecture that all eigenvalues of on are at least .

** — 2. Consequences of Selberg’s conjecture — **

In the previous section we saw how bounds on Kloosterman sums gave rise to lower bounds on eigenvalues of the Laplacian. It turns out that this implication is reversible. The simplest case (at least from the perspective of wave equation methods) is when Selberg’s eigenvalue conjecture is true, so that the Laplacian on has spectrum in . Equivalently, one has the inequality

for all (interpreting derivatives in a distributional sense if necessary). Integrating by parts, this shows that

for all , where the gradient and its magnitude are computed using the Riemannian metric in .

Now suppose one has a smooth, compactly supported in space solution to the inhomogeneous wave equation

for some forcing term which is also smooth and compactly supported in space. We assume that has mean zero for all . Introducing the energy

which is non-negative thanks to (19) and integrating by parts, we obtain the energy identity

and hence by Cauchy-Schwarz

and hence

(in a distributional sense at least), giving rise to the energy inequality

We can lift this inequality to the cylinder , concluding that for any smooth, compactly supported in space solution to the inhomogeneous equation

for some forcing term which is also smooth and compactly supported in space, with mean zero for all time, we have the energy inequality

One can use this inequality to analyse the norm of Poincaré series by testing on various functions (and working out using (20)). Suppose for instance that is a fixed natural number, and is a smooth compactly supported function. We consider the traveling wave given by the formula

where is the primitive of ; the point is that this is an approximate solution to the homogeneous wave equation, particularly at small values of . Clearly is compactly supported with mean zero for , in the region (we allow implied constants to depend on but not on ). In the region , and its first derivatives are , giving a contribution of to the energy (note that the shifts of the region by have bounded overlap). In particular we have

and thus by the energy inequality (using only the portion of the energy)

for , where

Clearly is supported on the region . For , one can compute that , giving a contribution of to the right-hand side. When is much less than but much larger than , we have , which after some calculation yields . As this decays so quickly as , one can compute (using for instance the expansion (14) of (13) and crude estimates, ignoring all cancellation) that this contributes a total of to the right-hand side also. Finally one has to deal with the region , but is much less than . Here, is equal to , and is equal to , which after some computation makes equal to . Again, one can compute the contribution of this term to the energy inequality to be . We conclude that

Applying the expansion (14) of (13), we conclude that

The expression is only non-zero when , and the integrand is only non-zero when and , which makes the phase of size . For much smaller than , the phase is thus largely irrelevant and the quantity is roughly comparable to for . As such, the bound (21) can be viewed as a smoothed out version of the estimate

which is basically Linnik’s conjecture, mentioned in Remark 2. One can make this connection between Selberg’s eigenvalue conjecture and Linnik’s conjecture more precise: see Section 16.6 of Iwaniec and Kowalski, which goes through modified Bessel functions rather than through wave equation methods.

** — 3. Pointwise bounds on Poincaré series — **

The formula (14) for (13) allows one to compute norms of Poincaré series. By using Sobolev embedding, one can then obtain pointwise control on such Poincaré series, as long as one stays away from the cusps. For instance, suppose we are interested in evaluating a Poincaré series at a point of the form for some . From the Sobolev inequality we have

for any smooth function , and thus by translation

The ball meets only boundedly many translates of the standard fundamental domain of , and hence does too. Since is a subgroup of , we conclude that meets only boundedly many translates of a fundamental domain for . In particular, we obtain the Sobolev inequality

for any smooth -automorphic function . This estimate is unfortunately a little inefficient when is large, since the ball has area comparable to one, whereas the quotient space has area roughly comparable to , so that one is conceding quite a bit by replacing the ball by the quotient space. Nevertheless this estimate is still useful enough to give some good results. We illustrate this by proving the estimate

for with coprime to , where is a fixed smooth function supported in, say, (and implied constants are allowed to depend on ), and the asymptotic notation is with regard to the limit . This type of estimate (appearing for instance (in a stronger form) in this paper of Duke, Friedlander, and Iwaniec; see also Proposition 21.10 of Iwaniec and Kowalski.) establishes some equidistribution of the square roots as varies (while staying comparable to ). For comparison, crude estimates (ignoring the cancellation in the phase ) give a bound of , so the bound (23) is non-trivial whenever is significantly smaller than . Estimates such as (23) are also useful for getting good error terms in the asymptotics for the expression (10), as was first done by Hooley.

One can write (23) in terms of Poincaré series much as was done for (10). Using the fact that the discriminant has class number one as before, we see that for every positive and with , we can find an element of such that has imaginary part and real part modulo one (thus, and ); this element is unique up to left translation by . We can thus write the left-hand side of (23) as

where

and are the bottom two entries of the matrix (determined up to sign). The condition implies (since must be coprime) that are coprime to with for some with ; conversely, if obey such a condition then . The number of such is at most . Thus it suffices to show that

for each such .

The constraint constrains to a single right coset of . Thus the left-hand side can be written as

which is just . Applying (22) (and interchanging the Poincaré series and the Laplacian), it thus suffices to show that

where

By hypothesis, the coefficient is bounded, and so has all derivatives bounded while remaining supported in . Because of this, the arguments used to establish (24) can be adapted without difficulty to establish (25).

Using the expansion (14) of (13), we can write the left-hand side of (24) as

where

The first term can be computed to give a contribution of , so it suffices to show that

The quantity is vanishing unless . In that case, the integrand vanishes unless and , so by the triangle inequality we have . So the left-hand side of (26) is bounded by

By the Weil bound for Kloosterman sums, we have , so on factoring out from we can bound the previous expression by

and the claim follows.

Remark 3By using improvements to Selberg’s 3/16 theorem (such as the result of Kim and Sarnak improving this fraction to ) one can improve the second term in the right-hand side of (23) slightly.

Filed under: expository, math.AP, math.NT, math.SP Tagged: automorphic forms, Poincare series, wave equation ]]>

for any continuous (or equivalently, for any smooth) function . By approximating uniformly by a Fourier series, this claim is equivalent to that of showing that

for any non-zero integer (where ), which is easily verified from the irrationality of and the geometric series formula. Conversely, if is rational, then clearly fails to go to zero when is a multiple of the denominator of .

One can then ask for more quantitative information about the decay of exponential sums of , or more generally on exponential sums of the form for an arithmetic progression (in this post all progressions are understood to be finite) and a polynomial . It will be convenient to phrase such information in the form of an *inverse theorem*, describing those phases for which the exponential sum is large. Indeed, we have

Lemma 1 (Geometric series formula, inverse form)Let be an arithmetic progression of length at most for some , and let be a linear polynomial for some . Iffor some , then there exists a subprogression of of size such that varies by at most on (that is to say, lies in a subinterval of of length at most ).

*Proof:* By a linear change of variable we may assume that is of the form for some . We may of course assume that is non-zero in , so that ( denotes the distance from to the nearest integer). From the geometric series formula we see that

and so . Setting for some sufficiently small absolute constant , we obtain the claim.

Thus, in order for a linear phase to fail to be equidistributed on some long progression , must in fact be almost constant on large piece of .

As is well known, this phenomenon generalises to higher order polynomials. To achieve this, we need two elementary additional lemmas. The first relates the exponential sums of to the exponential sums of its “first derivatives” .

Lemma 2 (Van der Corput lemma, inverse form)Let be an arithmetic progression of length at most , and let be an arbitrary function such that

for some . Then, for integers , there exists a subprogression of , of the same spacing as , such that

*Proof:* Squaring (1), we see that

We write and conclude that

where is a subprogression of of the same spacing. Since , we conclude that

for values of (this can be seen, much like the pigeonhole principle, by arguing via contradiction for a suitable choice of implied constants). The claim follows.

The second lemma (which we recycle from this previous blog post) is a variant of the equidistribution theorem.

Lemma 3 (Vinogradov lemma)Let be an interval for some , and let be such that for at least values of , for some . Then eitheror

or else there is a natural number such that

*Proof:* We may assume that and , since we are done otherwise. Then there are at least two with , and by the pigeonhole principle we can find in with and . By the triangle inequality, we conclude that there exists at least one natural number for which

We take to be minimal amongst all such natural numbers, then we see that there exists coprime to and such that

If then we are done, so suppose that . Suppose that are elements of such that and . Writing for some , we have

By hypothesis, ; note that as and we also have . This implies that and thus . We then have

We conclude that for fixed with , there are at most elements of such that . Iterating this with a greedy algorithm, we see that the number of with is at most ; since , this implies that

and the claim follows.

Now we can quickly obtain a higher degree version of Lemma 1:

Proposition 4 (Weyl exponential sum estimate, inverse form)Let be an arithmetic progression of length at most for some , and let be a polynomial of some degree at most . Iffor some , then there exists a subprogression of with such that varies by at most on .

*Proof:* We induct on . The cases are immediate from Lemma 1. Now suppose that , and that the claim had already been proven for . To simplify the notation we allow implied constants to depend on . Let the hypotheses be as in the proposition. Clearly cannot exceed . By shrinking as necessary we may assume that for some sufficiently small constant depending on .

By rescaling we may assume . By Lemma 3, we see that for choices of such that

for some interval . We write , then is a polynomial of degree at most with leading coefficient . We conclude from induction hypothesis that for each such , there exists a natural number such that , by double-counting, this implies that there are integers in the interval such that . Applying Lemma 3, we conclude that either , or that

In the former case the claim is trivial (just take to be a point), so we may assume that we are in the latter case.

We partition into arithmetic progressions of spacing and length comparable to for some large depending on to be chosen later. By hypothesis, we have

so by the pigeonhole principle, we have

for at least one such progression . On this progression, we may use the binomial theorem and (4) to write as a polynomial in of degree at most , plus an error of size . We thus can write for for some polynomial of degree at most . By the triangle inequality, we thus have (for large enough) that

and hence by induction hypothesis we may find a subprogression of of size such that varies by most on , and thus (for large enough again) that varies by at most on , and the claim follows.

This gives the following corollary (also given as Exercise 16 in this previous blog post):

Corollary 5 (Weyl exponential sum estimate, inverse form II)Let be a discrete interval for some , and let polynomial of some degree at most for some . Iffor some , then there is a natural number such that for all .

One can obtain much better exponents here using Vinogradov’s mean value theorem; see Theorem 1.6 this paper of Wooley. (Thanks to Mariusz Mirek for this reference.) However, this weaker result already suffices for many applications, and does not need any result as deep as the mean value theorem.

*Proof:* To simplify notation we allow implied constants to depend on . As before, we may assume that for some small constant depending only on . We may also assume that for some large , as the claim is trivial otherwise (set ).

Applying Proposition 4, we can find a natural number and an arithmetic subprogression of such that and such that varies by at most on . Writing for some interval of length and some , we conclude that the polynomial varies by at most on . Taking order differences, we conclude that the coefficient of this polynomial is ; by the binomial theorem, this implies that differs by at most on from a polynomial of degree at most . Iterating this, we conclude that the coefficient of is for , and the claim then follows by inverting the change of variables (and replacing with a larger quantity such as as necessary).

For future reference we also record a higher degree version of the Vinogradov lemma.

Lemma 6 (Polynomial Vinogradov lemma)Let be a discrete interval for some , and let be a polynomial of degree at most for some such that for at least values of , for some . Then either

or else there is a natural number such that

for all .

*Proof:* We induct on . For this follows from Lemma 3 (noting that if then ), so suppose that and that the claim is already proven for . We now allow all implied constants to depend on .

For each , let denote the number of such that . By hypothesis, , and clearly , so we must have for choices of . For each such , we then have for choices of , so by induction hypothesis, either (5) or (6) holds, or else for choices of , there is a natural number such that

for , where are the coefficients of the degree polynomial . We may of course assume it is the latter which holds. By the pigeonhole principle we may take to be independent of .

Since , we have

for choices of , so by Lemma 3, either (5) or (6) holds, or else (after increasing as necessary) we have

We can again assume it is the latter that holds. This implies that modulo , so that

for choices of . Arguing as before and iterating, we obtain the claim.

The above results also extend to higher dimensions. Here is the higher dimensional version of Proposition 4:

Proposition 7 (Multidimensional Weyl exponential sum estimate, inverse form)Let and , and let be arithmetic progressions of length at most for each . Let be a polynomial of degrees at most in each of the variables separately. Iffor some , then there exists a subprogression of with for each such that varies by at most on .

A much more general statement, in which the polynomial phase is replaced by a nilsequence, and in which one does not necessarily assume the exponential sum is small, is given in Theorem 8.6 of this paper of Ben Green and myself, but it involves far more notation to even state properly.

*Proof:* We induct on . The case was established in Proposition 5, so we assume that and that the claim has already been proven for . To simplify notation we allow all implied constants to depend on . We may assume that for some small depending only on .

By a linear change of variables, we may assume that for all .

We write . First suppose that . Then by the pigeonhole principle we can find such that

and the claim then follows from the induction hypothesis. Thus we may assume that for some large depending only on . Similarly we may assume that for all .

By the triangle inequality, we have

The inner sum is , and the outer sum has terms. Thus, for choices of , one has

for some polynomials of degrees at most in the variables . For each obeying (7), we apply Corollary 5 to conclude that there exists a natural number such that

for (the claim also holds for but we discard it as being trivial). By the pigeonhole principle, there thus exists a natural number such that

for all and for choices of . If we write

where is a polynomial of degrees at most , then for choices of we then have

Applying Lemma 6 in the and the largeness hypotheses on the (and also the assumption that ) we conclude (after enlarging as necessary, and pigeonholing to keep independent of ) that

for all (note that we now include that case, which is no longer trivial) and for choices of . Iterating this, we eventually conclude (after enlarging as necessary) that

whenever for , with nonzero. Permuting the indices, and observing that the claim is trivial for , we in fact obtain (8) for all , at which point the claim easily follows by taking for each .

An inspection of the proof of the above result (or alternatively, by combining the above result again with many applications of Lemma 6) reveals the following general form of Proposition 4, which was posed as Exercise 17 in this previous blog post, but had a slight misprint in it (it did not properly treat the possibility that some of the could be small) and was a bit trickier to prove than anticipated (in fact, the reason for this post was that I was asked to supply a more detailed solution for this exercise):

Proposition 8 (Multidimensional Weyl exponential sum estimate, inverse form, II)Let be an natural number, and for each , let be a discrete interval for some . Letbe a polynomial in variables of multidegrees for some . If

for some , or else there is a natural number such that

Again, the factor of is natural in this bound. In the case, the option (10) may be deleted since (11) trivially holds in this case, but this simplification is no longer available for since one needs (10) to hold for *all* (not just one ) to make (11) completely trivial. Indeed, the above proposition fails for if one removes (10) completely, as can be seen for instance by inspecting the exponential sum , which has size comparable to regardless of how irrational is.

Filed under: expository, math.CA, math.NT Tagged: equidistribution, polynomials, Vinogradov lemma, Weyl equidistribution theorem ]]>

Applications for Postdoctoral Fellowships, Research Memberships, and Research Professorships for this program (and for other MSRI programs in this time period, namely theÂ companion program in Harmonic AnalysisÂ and the Fall program in Geometric Group Theory, as well as the complementary program in all other areas of mathematics) have just opened up today. Â Applications are open to everyone (until they close on Dec 1), but require supporting documentation, such as a CV, statement of purpose, and letters of recommendation from other mathematicians; see the application page for more details.

Filed under: advertising, math.NT, non-technical Tagged: Andrew Granville, Chantal David, Emmanuel Kowalski, Kannan Soundararajan, MSRI, Phillipe Michel ]]>

As mentioned in the previous two posts, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:

Theorem 1 (Inverse theorem for Gowers norms)Let and be integers, and let . Suppose that is a function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

There is a higher dimensional generalisation, which first appeared explicitly (in a more general form) in this preprint of Szegedy (which used a slightly different argument than the one of Ben, Tammy, and myself; see also this previous preprint of Szegedy with related results):

Theorem 2 (Inverse theorem for multidimensional Gowers norms)Let and be integers, and let . Suppose that is a function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

The case of this theorem was recently used by Wenbo Sun. One can replace the polynomial sequence with a linear sequence if desired by using a lifting trick (essentially due to Furstenberg, but which appears explicitly in Appendix C of my paper with Ben and Tammy).

In this post I would like to record a very neat and simple observation of Ben Green and Nikos Frantzikinakis, that uses the tool of Freiman isomorphisms to derive Theorem 2 as a corollary of the one-dimensional theorem. Namely, consider the linear map defined by

that is to say is the digit string base that has digits . This map is a linear map from to a subset of of density . Furthermore it has the following “Freiman isomorphism” property: if lie in with in the image set of for all , then there exist (unique) lifts such that

and

for all . Indeed, the injectivity of on uniquely determines the sum for each , and one can use base arithmetic to verify that the alternating sum of these sums on any -facet of the cube vanishes, which gives the claim. (In the language of additive combinatorics, the point is that is a Freiman isomorphism of order (say) on .)

Now let be the function defined by setting whenever , with vanishing outside of . If obeys (1), then from the above Freiman isomorphism property we have

Applying the one-dimensional inverse theorem (Theorem 1), with reduced by a factor of and replaced by , this implies the existence of a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

which by the Freiman isomorphism property again implies that

But the map is clearly a polynomial map from to (the composition of two polynomial maps is polynomial, see e.g. Appendix B of my paper with Ben and Tammy), and the claim follows.

Remark 3This trick appears to be largely restricted to the case of boundedly generated groups such as ; I do not see any easy way to deduce an inverse theorem for, say, from the -inverse theorem by this method.

Remark 4By combining this argument with the one in the previous post, one can obtain a weak ergodic inverse theorem for -actions. Interestingly, the Freiman isomorphism argument appears to be difficult to implement directly in the ergodic category; in particular, there does not appear to be an obvious direct way to derive the Host-Kra inverse theorem for actions (a result first obtained in the PhD thesis of Griesmer) from the counterpart for actions.

Filed under: expository, math.CO Tagged: Ben Green, Freiman isomorphism, Gowers uniformity norms, Nikos Frantzikinakis ]]>

As mentioned in the previous post, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:

Theorem 1 (Inverse theorem for Gowers norms)Let and be integers, and let . Suppose that is a function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

This result was conjectured earlier by Ben Green and myself; this conjecture was strongly motivated by an analogous inverse theorem in ergodic theory by Host and Kra, which we formulate here in a form designed to resemble Theorem 1 as closely as possible:

Theorem 2 (Inverse theorem for Gowers-Host-Kra seminorms)Let be an integer, and let be an ergodic, countably generated measure-preserving system. Suppose that one hasfor all non-zero (all spaces are real-valued in this post). Then is an inverse limit (in the category of measure-preserving systems, up to almost everywhere equivalence) of ergodic degree nilsystems, that is to say systems of the form for some degree filtered nilmanifold and a group element that acts ergodically on .

It is a natural question to ask if there is any logical relationship between the two theorems. In the finite field category, one can deduce the combinatorial inverse theorem from the ergodic inverse theorem by a variant of the Furstenberg correspondence principle, as worked out by Tamar Ziegler and myself, however in the current context of -actions, the connection is less clear.

One can split Theorem 2 into two components:

Theorem 3 (Weak inverse theorem for Gowers-Host-Kra seminorms)Let be an integer, and let be an ergodic, countably generated measure-preserving system. Suppose that one hasfor all non-zero , where . Then is a

factorof an inverse limit of ergodic degree nilsystems.

Theorem 4 (Pro-nilsystems closed under factors)Let be an integer. Then any factor of an inverse limit of ergodic degree nilsystems, is again an inverse limit of ergodic degree nilsystems.

Indeed, it is clear that Theorem 2 implies both Theorem 3 and Theorem 4, and conversely that the two latter theorems jointly imply the former. Theorem 4 is, in principle, purely a fact about nilsystems, and should have an independent proof, but this is not known; the only known proofs go through the full machinery needed to prove Theorem 2 (or the closely related theorem of Ziegler). (However, the fact that a factor of a nilsystem is again a nilsystem was established previously by Parry.)

The purpose of this post is to record a partial implication in reverse direction to the correspondence principle:

As mentioned at the start of the post, a fair amount of familiarity with the area is presumed here, and some routine steps will be presented with only a fairly brief explanation.

To show that is a factor of another system up to almost everywhere equivalence, it suffices to obtain a unital algebra homomorphism from to that intertwines with , and which is measure-preserving (or more precisely, integral-preserving). On the other hand, by hypothesis, is generated (as a von Neumann algebra) by the dual functions

for , where

indeed we may restrict to a countable sequence that is dense in in the (say) topology, together with their shifts. To obtain such a factor representation, it thus suffices to find a “model” associated to each dual function in such a fashion that

for all and , and all polynomials . Of course it suffices to do so for those polynomials with rational coefficients (so now there are only a countable number of constraints to consider).

We may normalise all the to take values in . For any , we can find a scale such that

If we then define the exceptional set

then has measure at most (say), and so the function is absolutely integrable. By the maximal ergodic theorem, we thus see that for almost every , there exists a finite such that

for all and all . Informally, we thus have the approximation

for “most” .

Next, we observe from the Cauchy-Schwarz-Gowers inequality that for almost every , the dual function is anti-uniform in the sense that

for any function . By the usual structure theorems (e.g. Theorem 1.2 of this paper of Ben Green and myself) this shows that for almost every and every , there exists a degree nilsequence of complexity such that

(say). (Sketch of proof: standard structure theorems give a decomposition of the form

where is a nilsequence as above, is small in norm, and is very small in norm; has small inner product with , , and , and thus with itself, and so and are both small in , giving the claim.)

For each , let denote the set of all such that there exists a degree nilsequence (depending on ) of complexity such that

From the Hardy-Littlewood maximal inequality (and the measure-preserving nature of ) we see that has measure . This implies that the functions

are uniformly bounded in as , which by Fatou’s lemma implies that

is also absolutely integrable. In particular, for almost every , we have

for some finite , which implies that

for an infinite sequence of (the exact choice of sequence depends on ); in particular, there is a such that for all in this sequence, one has

for all and all . Thus

for all in this sequence, all , and all ; combining with (2) we see (for almost every ) that

and thus for all , all , and all we have

where the limit is along the sequence.

For given , there are only finitely many possibilities for the nilmanifold , so by the usual diagonalisation argument we may pass to a subsequence of and assume that does not depend on for any . By Arzela-Ascoli we may similarly assume that the Lipschitz function converges uniformly to , so we now have

along the remaining subsequence for all , all , and all .

By repeatedly breaking the coefficients of the polynomial sequence into fractional parts and integer parts, and absorbing the latter in , we may assume that these coefficients are bounded. Thus, by Bolzano-Weierstrass and refining the sequence of further, we may assume that converges locally uniformly in as goes to infinity to a polynomial sequence , for every . We thus have (for almost every ) that

for all , all , and all . Henceforth we shall cease to keep control of the complexity of or .

We can lift the polynomial sequence up to a linear sequence (enlarging as necessary), thus

for all , all , and some . By replacing various nilsystems with Cartesian powers, we may assume that the nilsystems are increasing in and in the sense that the nilsystem for is a factor of that for or , with the origin mapping to the origin. Then, by restricting to the orbit of the origin, we may assume that all the nilsystems are ergodic (and thus also uniquely ergodic, by the special properties of nilsystems). The nilsystems then have an ergodic inverse limit with an origin , and each function lifts up to a continuous function on , with . Thus

From the triangle inequality we see in particular that

for all and all , which by unique ergodicity of the nilsystems implies that

Thus the sequence is Cauchy in and tends to a some limit .

If is generic for (which is true for almost every ), we conclude from (4) and unique ergodicity of nilsystems that

for , which on taking limits as gives

A similar argument gives (1) for almost every , for each choice of . Since one only needs to verify a countable number of these conditions, we can find an for which all the (1) hold simultaneously, and the claim follows.

Remark 6In order to use the combinatorial inverse theorem to prove the full ergodic inverse theorem (and not just the weak version), it appears that one needs an “algorithmic” or “measurable” version of the combinatorial inverse theorem, in which the nilsequence produced by the inverse theorem can be generated in a suitable “algorithmic” sense from the original function . In the setting of the inverse theorem over finite fields, a result in this direction was established by Tulsiani and Wolf (building upon a well-known paper of Goldreich and Levin handling the case). It is thus reasonable to expect that a similarly algorithmic version of the combinatorial inverse conjecture is true for higher Gowers uniformity norms, though this has not yet been achieved in the literature to my knowledge.

Filed under: expository, math.CO, math.DS Tagged: characteristic factor, Gowers uniformity norms, inverse conjecture, nilmanifolds, nilsequences ]]>

Theorem 1 (Discrete inverse theorem for Gowers norms)Let and be integers, and let . Suppose that is a function supported on such that

For the definitions of “filtered nilmanifold”, “degree”, “complexity”, and “polynomial sequence”, see the paper of Ben, Tammy, and myself. (I should caution the reader that this blog post will presume a fair amount of familiarity with this subfield of additive combinatorics.) This result has a number of applications, for instance to establishing asymptotics for linear equations in the primes, but this will not be the focus of discussion here.

The purpose of this post is to record the observation that this “discrete” inverse theorem, together with an equidistribution theorem for nilsequences that Ben and I worked out in a separate paper, implies a continuous version:

Theorem 2 (Continuous inverse theorem for Gowers norms)Let be an integer, and let . Suppose that is a measurable function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a (smooth) polynomial sequence , and a Lipschitz function of Lipschitz constant such that

The interval can be easily replaced with any other fixed interval by a change of variables. A key point here is that the bounds are completely uniform in the choice of . Note though that the coefficients of can be arbitrarily large (and this is necessary, as can be seen just by considering functions of the form for some arbitrarily large frequency ).

It is likely that one could prove Theorem 2 by carefully going through the proof of Theorem 1 and replacing all instances of with (and making appropriate modifications to the argument to accommodate this). However, the proof of Theorem 1 is quite lengthy. Here, we shall proceed by the usual limiting process of viewing the continuous interval as a limit of the discrete interval as . However there will be some problems taking the limit due to a failure of compactness, and specifically with regards to the coefficients of the polynomial sequence produced by Theorem 1, after normalising these coefficients by . Fortunately, a factorisation theorem from a paper of Ben Green and myself resolves this problem by splitting into a “smooth” part which does enjoy good compactness properties, as well as “totally equidistributed” and “periodic” parts which can be eliminated using the measurability (and thus, approximate smoothness), of .

We now prove Theorem 2. Firstly observe from Hölder’s inequality that the Gowers norm expression in the left-hand side of (1) is continuous in in the topology. As such, it suffices to prove the theorem for a dense class of , such as the Lipschitz-continuous , so long as the bounds remain uniform in . Thus, we may assume without loss of generality that is Lipschitz continuous.

Now let be a large integer (which will eventually be sent to infinity along a subsequence). As is Lipschitz continuous, the integral in (1) is certainly Riemann integrable, and so for sufficiently large (where we allow “sufficiently large” to depend on the Lipschitz constant) we will have

(say). Applying Theorem 2, we can thus find for sufficiently large , a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

Now we prepare to take limits as , passing to subsequences as necessary. Using Mal’cev bases, one can easily check that there are only finitely many filtered nilmanifolds of a fixed degree and complexity, hence by passing to a subsequence of we may assume that is independent of . The Lipschitz functions are now equicontinuous on a fixed compact domain , so by the Arzelá-Ascoli theorem and further passage to a subsequence we may assume that converges uniformly to a Lipschitz function of Lipschitz constant . In particular (passing to a further subsequence as necessary) we have

We have removed a lot of the dependencies of the nilsequence on , however there is still a serious lack of compactness in the remaining dependency of the polynomial sequence on . Fortunately, we can argue as follows. Let be large quantities (depending on , and the Lipschitz constant of ) to be chosen later. Applying the factorisation theorem for polynomial sequences (see Theorem 1.19 of this paper of Ben Green and myself), we may find for each in the current subsequence, an integer , a rational subgroup of whose associated filtered nilmanifold has structure constants that are -rational with respect to the Mal’cev bais of , and a decomposition

where

- is a polynomial sequence which is -smooth;
- is a polynomial sequence with is totally -equidistributed in ; and
- is a polynomial sequence which is -rational, and is periodic with period at most .

See the above referenced paper for a definition of all the terminology used here.

Once again we can make a lot of the data here independent of by passing to a subsequence. Firstly, takes only finitely many values so by passing to a subsequence we may assume that is independent of . Then the number of rational subgroups with -rational structure constants is also finite, so by passing to a further subsequence we may take independent of , so is also independent of . Up to right multiplication by polynomial sequences from to (which do not affect the value of ), there are only finitely many -rational polynomial sequences that are periodic with period at most , so we may take independent of . Finally, using coordinates one can write where is a continuous polynomial sequence whose coefficients are bounded uniformly in . By Bolzano-Weierstrass, we may assume on passing to a subsequence that converges locally uniformly to a limit , which is again a continuous polynomial sequence. Thus, on passing to a further subsequence, we have

Let be the period of . By the pigeonhole principle (and again passing to a subsequence) we may find a residue class independent of such that

Because is totally -equidistributed in , and is -rational, the conjugate is totally -equidistributed in , where and ; see Section 2 of this paper of Ben and myself for a derivation of this fact. From this, we have the approximation

for any and any fixed interval , where is the length of and the integral is with respect to Haar measure, and are sufficiently large depending on . Using Riemann integration, we thus see that the left-hand side of (2) is thus of the form

for sufficiently large if are sufficiently large depending on , and the Lipschitz constant of , and so (writing ) we have

if are sufficiently large depending on , and the Lipschitz constant of . If we let be a continuous polynomial sequence that is equidistributed in (which will happen as soon as the sequence is equidistributed with respect to the abelianisation of , by an old result of Leon Green), then a similar argument shows that

and thus there exists such that

Setting , we obtain the claim.

I thank Ben Green for helpful conversations that inspired this post.

Filed under: expository, math.CO, math.DS Tagged: Ben Green, Gowers uniformity norms, inverse conjecture, regularity lemma, Tamar Ziegler ]]>

Theorem 1 (Szemerédi’s theorem)Let be a positive integer, and let be a function with for some , where we use the averaging notation , , etc.. Then for we havefor some depending only on .

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

By routine “compactness and contradiction” arguments (as discussed in this previous post), Theorem 1 can be deduced from the following nonstandard variant:

Theorem 2Let be a nonstandard positive integer, let be the nonstandard cyclic group , and let be an internal function with . Then for any standard ,Here of course the averaging notation is interpreted internally.

Indeed, if Theorem 1 failed, one could create a sequence of functions of density at least for some fixed , and a fixed such that

taking ultralimits one can then soon obtain a counterexample to Theorem 2.

It remains to prove Theorem 2. Henceforth is a fixed nonstandard positive integer, and . By the Loeb measure construction (discussed in this previous blog post), one can give the structure of a probability space (the *Loeb space* of ), such that every internal subset of is (Loeb) measurable with

which implies that any bounded internal function has standard part which is (Loeb) measurable with

Conversely, a countable saturation argument shows that any function in is equal almost everywhere to the standard part of a bounded internal function.

From Hölder’s inequality we see that the -linear form

vanishes if one of the has standard part vanishing almost everywhere. As such, we can (by abuse of notation) extend this -linear form to functions that are elements of , rather than bounded internal functions. With this convention, we see that Theorem 2 is equivalent to the following assertion.

Theorem 3For any non-negative with , one has for any standard ,

The next step is to introduce the *Gowers-Host-Kra uniformity seminorms* , defined for by the formula

where is any bounded internal function whose standard part equals almost everywhere. From Hölder’s inequality one can see that the exact choice of does not matter, so that this seminorm is well-defined. (It is indeed a seminorm, but we will not need this fact here.)

We have the following application of the van der Corput inequality:

Theorem 4 (Generalised von Neumann theorem)Let be standard. For any with for some , one has

This estimate is proven in numerous places in the literature (e.g. Lemma 11.4 of my book with Van Vu, or Exercise 23 of this blog post) and will not be repeated here. In particular, from multilinearity we see that

Dual to the Gowers norms are the uniformly almost periodic norms . Let us first define the internal version of these norms. We define to be the space of constant internal functions , with internal norm . Once is defined for some , we define to be the internal normed vector space of internal functions for which there exists a nonstandard real number , an internally finite non-empty set , an internal family of internal functions bounded in magnitude by one for each , and an internal family of internal functions in the unit ball of such that one had the representation

for all , where is the shift of by . The internal infimum of all such is then the norm of . This gives each of the the structure of an internal shift-invariant Banach algebra; see Section 5 of . The norms also controlled the supremum norm:

In particular, if we write for the space of standard parts of internal functions of bounded norm in , then is an (external) Banach algebra contained (as a real vector space) in . For , we can then define a factor of to be the probability space , where is the subalgebra of consisting of those sets such that lies in the closure of . This is easily seen to be a shift-invariant -algebra, and so is a factor.

We have the following key *characteristic factor* relationship:

Theorem 5Let with . Then .

In fact the converse implication is true also (making the *universal characteristic factor* for the seminorm), but we will not need this direction of the implication.

*Proof:* Suppose for contradiction that ; we can normalise . Writing for some bounded internal , we then see that has a non-zero inner product with , where the *dual function* for is the bounded internal function

From the easily verified identity

and a routine induction, we see that lies in the unit ball of , and so is measurable with respect to . By hypothesis this implies that is orthogonal to , a contradiction.

In view of the above theorem and (1), we may replace by without affecting the average in Theorem 3. Thus that theorem is equivalent to the following.

Theorem 6Let and be standard. Then for any non-negative with , one has

We only apply this theorem in the case and , but for inductive purposes it is convenient to decouple the two parameters.

We prove Theorem 6 by induction on (allowing to be arbitrary). When , the claim is obvious for any because all functions in are essentially constant. Now suppose that and that the claim has already been proven for .

Let be a nonnegative function whose mean is positive; we may normalise to take values in . Let be standard, and let be a sufficiently small standard quantity depending on to be chosen later (one could for instance take , but we will not attempt to optimise in ). As is -measurable, one can find an internal function with and bounded norm such that . (Note though that while the norm of is bounded, this bound could be extremely large compared to , , .)

Set . We define the relative inner product for by the formula

and the relative norm

This gives the structure of a (pre-)Hilbert module over , as discussed in this previous blog post.

A crucial point is that the function is *relatively almost periodic* over the previous characteristic factor , in the following sense.

Proposition 7 (Relative almost periodicity)There exists a standard natural number and functions in the unit ball of with the following “relative total boundedness” property: for any , there exists a -measurable function such that almost everywhere (where is short-hand for ).

*Proof:* This will be a relative version of the standard analysis fact that integral operators on finite measure spaces with bounded kernel are in the Hilbert-Schmidt class, and thus compact.

By construction, there exists an internally finite non-empty set , an internal collection of internal functions that are uniformly bounded in , and an internal collection of internal functions that are uniformly bounded in , such that

for all . Note in particular that the all lie in a bounded subset of , and the all lie in a bounded subset of .

We give the -algebra generated from the standard parts of bounded internal functions such that the standard parts of all lie in a bounded subset of ; this gives a probability space that extends the product measure of and . We define an operator as follows. If , then is the standard part of some bounded internal function . We then define by the formula

This can easily be seen to not depend on the choice of , and defines a -linear operator (embedding into both and in the obvious fashion). Note that lies in the range of applied to a function in the unit ball of .

Now we claim that this operator is *relatively Hilbert-Schmidt* over , in the sense that there exists a finite bound such that

for all finite collections of functions that are relatively orthonormal over in the sense that

and

for all and . Indeed, the left-hand side of (4) may be expanded first as

for some sequence in with , and then as

where we use Loeb measure on and is the function , and are lifted up to in the obvious fashion. By Cauchy-Schwarz and the boundedness of , we can bound this by

But the are relatively orthonormal over (this reflects the relative orthogonality of and over ), so that

and the claim follows from the hypotheses on .

Using the relative spectral theorem for relative Hilbert-Schmidt operators (see Corollary 17 of this blog post), we may thus find relatively orthonormal systems in and respectively over and a non-increasing sequence of non-negative coefficients (the relative singular values) with almost everywhere, such that we have the spectral decomposition

wiht the sum converging in . (If were standard Borel spaces, one could deduce this theorem from the usual spectral theorem for Hilbert-Schmidt operators using disintegration. Loeb spaces are certainly not standard Borel, but as discussed in the linked blog post above, one can adapt the *proof* of the spectral theorem to the relative setting without using the device of disintegration.

Since and the are decreasing, one can find an such that almost everywhere for all . For in the unit ball of , this lets one approximate by the finite rank operator to within almost everywhere in norm. If one rounds to the nearest multiple of for each , and lets be the collection of linear combinations of the form with a multiple of , we obtain the claim.

We return to the proof of (2). Since and , we have

if is small enough. In particular there is a -measurable set of measure at least such that on . Since

we see from Markov’s inequality (for small enough ) that there is a -measurable subset of of measure at least such that

for the relative norm. In particular we have

Let be a sufficiently large standard natural number (depending on and the quantity from Proposition 7), in fact it will essentially be a van der Waerden number of these inputs) to be chosen later. Applying the induction hypothesis, we have

In particular, there is a standard , such that for in a subset of of measure at least , we have

or equivalently that the set

has measure at least .

Let be as above, and let be the functions from Proposition 7. Then for , we can find a measurable function such that

almost everywhere on , hence by (5) we have

almost everywhere on . From this and the relative Hölder inequality, we see that

a.e. on whenever .

Now, for large enough, we see from van der Warden’s theorem that there exist measurable such that

almost everywhere in , and hence in (this can be seen by partitioning into finitely many pieces, with each of the constant on each of these pieces). For that choice of we have

and

and thus

almost everywhere on . But from (6) one has

a.e. on , so from Hölder’s inequality we have (for sufficiently small) that

From non-negativity of , this implies that

which on integrating in gives

Averaging in , we conclude that

Shifting by , we conclude that

Dilating by (and noting that the map is at most -to-one on ), we conclude that

and (2) follows.

Filed under: expository, math.CO, math.DS Tagged: Gowers uniformity norms, nonstandard analysis, Szemeredi's theorem, uniform almost periodicity ]]>

- Every positive integer has a prime factorisation
into (not necessarily distinct) primes , which is unique up to rearrangement. Taking logarithms, we obtain a partition

of .

- (Prime number theorem) A randomly selected integer of size will be prime with probability when is large.
- If is a randomly selected large integer of size , and is a randomly selected prime factor of (with each index being chosen with probability ), then is approximately uniformly distributed between and . (See Proposition 9 of this previous blog post.)
- The set of real numbers arising from the prime factorisation of a large random number converges (away from the origin, and in a suitable weak sense) to the Poisson-Dirichlet process in the limit . (See the previously mentioned blog post for a definition of the Poisson-Dirichlet process, and a proof of this claim.)

Now for the facts about the cycle decomposition of large permutations:

- Every permutation has a cycle decomposition
into disjoint cycles , which is unique up to rearrangement, and where we count each fixed point of as a cycle of length . If is the length of the cycle , we obtain a partition

of .

- (Prime number theorem for permutations) A randomly selected permutation of will be an -cycle with probability exactly . (This was noted in this previous blog post.)
- If is a random permutation in , and is a randomly selected cycle of (with each being selected with probability ), then is exactly uniformly distributed on . (See Proposition 8 of this blog post.)
- The set of real numbers arising from the cycle decomposition of a random permutation converges (in a suitable sense) to the Poisson-Dirichlet process in the limit . (Again, see this previous blog post for details.)

See this previous blog post (or the aforementioned article of Granville, or the Notices article of Arratia, Barbour, and Tavaré) for further exploration of the analogy between prime factorisation of integers and cycle decomposition of permutations.

There is however something unsatisfying about the analogy, in that it is not clear *why* there should be such a kinship between integer prime factorisation and permutation cycle decomposition. It turns out that the situation is clarified if one uses another fundamental analogy in number theory, namely the analogy between integers and polynomials over a finite field , discussed for instance in this previous post; this is the simplest case of the more general function field analogy between number fields and function fields. Just as we restrict attention to positive integers when talking about prime factorisation, it will be reasonable to restrict attention to monic polynomials . We then have another analogous list of facts, proven very similarly to the corresponding list of facts for the integers:

- Every monic polynomial has a factorisation
into irreducible monic polynomials , which is unique up to rearrangement. Taking degrees, we obtain a partition

of .

- (Prime number theorem for polynomials) A randomly selected monic polynomial of degree will be irreducible with probability when is fixed and is large.
- If is a random monic polynomial of degree , and is a random irreducible factor of (with each selected with probability ), then is approximately uniformly distributed in when is fixed and is large.
- The set of real numbers arising from the factorisation of a randomly selected polynomial of degree converges (in a suitable sense) to the Poisson-Dirichlet process when is fixed and is large.

The above list of facts addressed the *large limit* of the polynomial ring , where the order of the field is held fixed, but the degrees of the polynomials go to infinity. This is the limit that is most closely analogous to the integers . However, there is another interesting asymptotic limit of polynomial rings to consider, namely the *large limit* where it is now the *degree* that is held fixed, but the order of the field goes to infinity. Actually to simplify the exposition we will use the slightly more restrictive limit where the *characteristic* of the field goes to infinity (again keeping the degree fixed), although all of the results proven below for the large limit turn out to be true as well in the large limit.

The large (or large ) limit is technically a different limit than the large limit, but in practice the asymptotic statistics of the two limits often agree quite closely. For instance, here is the prime number theorem in the large limit:

Theorem 1 (Prime number theorem)The probability that a random monic polynomial of degree is irreducible is in the limit where is fixed and the characteristic goes to infinity.

*Proof:* There are monic polynomials of degree . If is irreducible, then the zeroes of are distinct and lie in the finite field , but do not lie in any proper subfield of that field. Conversely, every element of that does not lie in a proper subfield is the root of a unique monic polynomial in of degree (the minimal polynomial of ). Since the union of all the proper subfields of has size , the total number of irreducible polynomials of degree is thus , and the claim follows.

Remark 2The above argument and inclusion-exclusion in fact gives the well known exact formula for the number of irreducible monic polynomials of degree .

Now we can give a precise connection between the cycle distribution of a random permutation, and (the large limit of) the irreducible factorisation of a polynomial, giving a (somewhat indirect, but still connected) link between permutation cycle decomposition and integer factorisation:

Theorem 3The partition of a random monic polynomial of degree converges in distribution to the partition of a random permutation of length , in the limit where is fixed and the characteristic goes to infinity.

We can quickly prove this theorem as follows. We first need a basic fact:

Lemma 4 (Most polynomials square-free in large limit)A random monic polynomial of degree will be square-free with probability when is fixed and (or ) goes to infinity. In a similar spirit, two randomly selected monic polynomials of degree will be coprime with probability if are fixed and or goes to infinity.

*Proof:* For any polynomial of degree , the probability that is divisible by is at most . Summing over all polynomials of degree , and using the union bound, we see that the probability that is *not* squarefree is at most , giving the first claim. For the second, observe from the first claim (and the fact that has only a bounded number of factors) that is squarefree with probability , giving the claim.

Now we can prove the theorem. Elementary combinatorics tells us that the probability of a random permutation consisting of cycles of length for , where are nonnegative integers with , is precisely

since there are ways to write a given tuple of cycles in cycle notation in nondecreasing order of length, and ways to select the labels for the cycle notation. On the other hand, by Theorem 1 (and using Lemma 4 to isolate the small number of cases involving repeated factors) the number of monic polynomials of degree that are the product of irreducible polynomials of degree is

which simplifies to

and the claim follows.

This was a fairly short calculation, but it still doesn’t quite explain *why* there is such a link between the cycle decomposition of permutations and the factorisation of a polynomial. One immediate thought might be to try to link the multiplication structure of permutations in with the multiplication structure of polynomials; however, these structures are too dissimilar to set up a convincing analogy. For instance, the multiplication law on polynomials is abelian and non-invertible, whilst the multiplication law on is (extremely) non-abelian but invertible. Also, the multiplication of a degree and a degree polynomial is a degree polynomial, whereas the group multiplication law on permutations does not take a permutation in and a permutation in and return a permutation in .

I recently found (after some discussions with Ben Green) what I feel to be a satisfying conceptual (as opposed to computational) explanation of this link, which I will place below the fold.

To put cycle decomposition of permutations and factorisation of polynomials on an equal footing, we generalise the notion of a permutation to the notion of a *partial permutation* on a fixed (but possibly infinite) domain , which consists of a finite non-empty subset of the set , together with a bijection on ; I’ll call the *support* of the partial permutation. We say that a partial permutation is of *size* if the support is of cardinality , and denote this size as . And now we can introduce a multiplication law on partial permutations that is much closer to that of polynomials: if two partial permutations on the same domain have disjoint supports , then we can form their disjoint union , supported on , to be the bijection on that agrees with on and with on . Note that this is a commutative and associative operation (where it is defined), and is the disjoint union of a partial permutation of size and a partial permutation of size is a partial permutation of size , so this operation is much closer in behaviour to the multiplication law on polynomials than the group law on . There is the defect that the disjoint union operation is sometimes undefined (when the two partial permutations have overlapping support); but in the asymptotic regime where the size is fixed and the set is extremely large, this will be very rare (compare with Lemma 4).

Note that a partial permutation is irreducible with respect to disjoint union if and only if it is a cycle on its support, and every partial permutation has a decomposition into such partial cycles, unique up to permutations. If one then selects some set of partial cycles on the domain to serve as “generalised primes”, then one can define (in the spirit of Beurling integers) the set of “generalised integers”, defined as those partial permutations that are the disjoint union of partial cycles in . If one lets denote the set of generalised integers of size , one can (assuming that this set is non-empty and finite) select a partial permutation uniformly at random from , and consider the partition of arising from the decomposition into generalised primes.

We can now embed both the cycle decomposition for (complete) permutations and the factorisation of polynomials into this common framework. We begin with the cycle decomposition for permutations. Let be a large natural number, and set the domain to be the set . We define to be the set of *all* partial cycles on of size , and let be the union of the , that is to say the set of *all* partial cycles on (of arbitrary size). Then is of course the set of all partial permutations on , and is the set of all partial permutations on of size . To generate an element of uniformly at random for , one simply has to randomly select an -element subset of , and then form a random permutation of the elements of . From this, it is obvious that the partition of coming from a randomly chosen element of has exactly the same distribution as the partition of coming from a randomly chosen element of , as long as is at least as large as of course.

Now we embed the factorisation of polynomials into the same framework. The domain is now taken to be the algebraic closure of , or equivalently the direct limit of the finite fields (with the obvious inclusion maps). This domain has a fundamental bijection on it, the Frobenius map , which is a field automorphism that has as its fixed points. We define to be the set of partial permutations on formed by restricting the Frobenius map to a finite Frobenius-invariant set. It is easy to see that the irreducible Frobenius-invariant sets (that is to say, the orbits of ) arise from taking an element of together with all of its Galois conjugates, and so if we define to be the set of restrictions of Frobenius to a single such Galois orbit, then are the generalised integers to the generalised primes in the sense above. Next, observe that, when the characteristic is sufficiently large, every squarefree monic polynomial of degree generates a generalised integer of size , namely the restriction of the Frobenius map to the roots of (which will be necessarily distinct when the characteristic is large and is squarefree). This generalised integer will be a generalised prime precisely when is irreducible. Conversely, every generalised integer of size generates a squarefree monic polynomial in , namely the product of as ranges over the support of the integer. This product is clearly monic, squarefree, and Frobenius-invariant, thus it lies in . Thus we may identify with the monic squarefree polynomials of of degree . With this identification, the (now partially defined) multiplication operation on monic squarefree polynomials coincides exactly with the disjoint union operation on partial permutations. As such, we see that the partition associated to a randomly chosen squarefree monic polynomial of degree has exactly the same distribution as the partition associated to a randomly chosen generalised integer of size . By Lemma 4, one can drop the condition of being squarefree while only distorting the distribution by .

Now that we have placed cycle decomposition of permutations and factorisation of polynomials into the same framework, we can explain Theorem 3 as a consequence of the following *universality* result for generalised prime factorisations:

Theorem 5 (Universality)Let be collections of generalised primes and integers respectively on a domain , all of which depend on some asymptotic parameter that goes to infinity. Suppose that for any fixed and going to infinity, the sets are non-empty with cardinalities obeying the asymptoticAlso, suppose that only of the pairs have overlapping supports (informally, this means that is defined with probability ). Then, for fixed and going to infinity, the distribution of the partition of a random generalised integer from is universal in the limit; that is to say, the limiting distribution does not depend on the precise choice of .

Note that when consists of all the partial permutations of size on we have

while when consists of the monic squarefree polynomials of degree in then from Lemma 4 we also have

so in both cases the first hypothesis (1) is satisfied. The second hypothesis is easy to verify in the former case and follows from Lemma 4 in the latter case. Thus, Theorem 5 gives Theorem 3 as a corollary.

Remark 6An alternate way to interpret Theorem 3 is as an equidistribution theorem: if one randomly labels the zeroes of a random degree polynomial as , then the resulting permutation on induced by the Frobenius map is asymptotically equidistributed in the large (or large ) limit. This is the simplest case of a much more general (and deeper) result known as the Deligne equidistribution theorem, discussed for instance in this survey of Kowalski. See also this paper of Church, Ellenberg, and Farb concerning more precise asymptotics for the number of squarefree polynomials with a given cycle decomposition of Frobenius.

It remains to prove Theorem 5. The key is to establish an abstract form of the prime number theorem in this setting.

Theorem 7 (Prime number theorem)Let the hypotheses be as in Theorem 5. Then for fixed and , the density of in is . In particular, the asymptotic density is universal (it does not depend on the choice of ).

*Proof:* Let (this may only be defined for sufficiently large depending on ); our task is to show that for each fixed .

Consider the set of pairs where is an element of and is an element of the support of . Clearly, the number of such pairs is . On the other hand, given such a pair , there is a unique factorisation , where is the generalised prime in the decomposition of that contains in its support, and is formed from the remaining components of . has some size , has the complementary size and has disjoint support from , and has to be one of the elements of the support of . Conversely, if one selects , then selects a generalised prime , and a generalised integer with disjoint support from , and an element in the support of , we recover the pair . Using the hypotheses of Theorem 5, we thus obtain the double counting identity

and thus for every fixed , and so for fixed as claimed.

Remark 8One could cast this argument in a language more reminiscent of analytic number theory by forming generating series of and and treating these series as analogous to a zeta function and its log-derivative (in close analogy to what is done with Beurling primes), but we will not do so here.

We can now finish the proof of Theorem 5. To show asymptotic universality of the partition of a random generalised integer , we may assume inductively that asymptotic universality has already been shown for all smaller choices of . To generate a uniformly random generalised integer of size , we can repeat the process used to prove Theorem 7. It of course suffices to generate a uniformly random pair , where is a generalised integer of size and is an element of the support of , since on dropping we would obtain a uniformly drawn .

To obtain the pair , we first select uniformly at random, then select a generalised prime randomly from and a generalised integer randomly from (independently of once is fixed). Finally, we select uniformly at random from the support of , and set . The pair is certainly a pair of the required form, but this random variable is not quite uniformly distributed amongst all such pairs. However, by repeating the calculations in Theorem 5 (and in particular relying on the conclusion ), we see that this distribution is is within of the uniform distribution in total variation norm. Thus, the distribution of the cycle partition of a uniformly chosen lies within in total variation of the distribution of the cycle partition of a chosen by the above recipe. However, the cycle partition of is simply the union (with multiplicity) of with the cycle partition of . As the latter was already assumed to be asymptotically universal, we conclude that the former is also, as required.

Remark 9The above analysis helps explain why one could not easily link permutation cycle decomposition with integer factorisation – to produce permutations here with the right asymptotics we needed both the large limit and the Frobenius map, both of which are available in the function field setting but not in the number field setting.

Filed under: expository, math.CO, math.NT, math.PR Tagged: finite fields, permutations, prime number theorem ]]>

Let me discuss the former question first. Gromov’s theorem tells us that if a finite subset of a group exhibits polynomial growth in the sense that grows polynomially in , then the group generated by is virtually nilpotent (the converse direction also true, and is relatively easy to establish). This theorem has been strengthened a number of times over the years. For instance, a few years ago, I proved with Shalom that the condition that grew polynomially in could be replaced by for a *single* , as long as was sufficiently large depending on (in fact we gave a fairly explicit quantitative bound on how large needed to be). A little more recently, with Breuillard and Green, the condition was weakened to , that is to say it sufficed to have polynomial *relative* growth at a finite scale. In fact, the latter paper gave more information on in this case, roughly speaking it showed (at least in the case when was a symmetric neighbourhood of the identity) that was “commensurate” with a very structured object known as a *coset nilprogression*. This can then be used to establish further control on . For instance, it was recently shown by Breuillard and Tointon (again in the symmetric case) that if for a single that was sufficiently large depending on , then all the for have a doubling constant bounded by a bound depending only on , thus for all .

In this paper we are able to refine this analysis a bit further; under the same hypotheses, we can show an estimate of the form

for all and some piecewise linear, continuous, non-decreasing function with , where the error is bounded by a constant depending only on , and where has at most pieces, each of which has a slope that is a natural number of size . To put it another way, the function for behaves (up to multiplicative constants) like a piecewise polynomial function, where the degree of the function and number of pieces is bounded by a constant depending on .

One could ask whether the function has any convexity or concavity properties. It turns out that it can exhibit either convex or concave behaviour (or a combination of both). For instance, if is contained in a large finite group, then will eventually plateau to a constant, exhibiting concave behaviour. On the other hand, in nilpotent groups one can see convex behaviour; for instance, in the Heisenberg group , if one sets to be a set of matrices of the form for some large (abusing the notation somewhat), then grows cubically for but then grows quartically for .

To prove this proposition, it turns out (after using a somewhat difficult inverse theorem proven previously by Breuillard, Green, and myself) that one has to analyse the volume growth of nilprogressions . In the “infinitely proper” case where there are no unexpected relations between the generators of the nilprogression, one can lift everything to a simply connected Lie group (where one can take logarithms and exploit the Baker-Campbell-Hausdorff formula heavily), eventually describing with fair accuracy by a certain convex polytope with vertices depending polynomially on , which implies that depends polynomially on up to constants. If one is not in the “infinitely proper” case, then at some point the nilprogression develops a “collision”, but then one can use this collision to show (after some work) that the dimension of the “Lie model” of has dropped by at least one from the dimension of (the notion of a Lie model being developed in the previously mentioned paper of Breuillard, Greenm, and myself), so that this sort of collision can only occur a bounded number of times, with essentially polynomial volume growth behaviour between these collisions.

The arguments also give a precise description of the location of a set for which grows polynomially in . In the symmetric case, what ends up happening is that becomes commensurate to a “coset nilprogression” of bounded rank and nilpotency class, whilst is “virtually” contained in a scaled down version of that nilprogression. What “virtually” means is a little complicated; roughly speaking, it means that there is a set of bounded cardinality such that for all . Conversely, if is virtually contained in , then is commensurate to (and more generally, is commensurate to for any natural number ), giving quite a (qualitatively) precise description of in terms of coset nilprogressions.

The main tool used to prove these results is the structure theorem for approximate groups established by Breuillard, Green, and myself, which roughly speaking asserts that approximate groups are always commensurate with coset nilprogressions. A key additional trick is a pigeonholing argument of Sanders, which in this context is the assertion that if is comparable to , then there is an between and such that is very close in size to (up to a relative error of ). It is this fact, together with the comparability of to a coset nilprogression , that allows us (after some combinatorial argument) to virtually place inside .

Similar arguments apply when discussing iterated convolutions of (symmetric) probability measures on a (discrete) group , rather than combinatorial powers of a finite set. Here, the analogue of volume is given by the negative power of the norm of (thought of as a non-negative function on of total mass 1). One can also work with other norms here than , but this norm has some minor technical conveniences (and other measures of the “spread” of end up being more or less equivalent for our purposes). There is an analogous structure theorem that asserts that if spreads at most polynomially in , then is “commensurate” with the uniform probability distribution on a coset progression , and itself is largely concentrated near . The factor of here is the familiar scaling factor in random walks that arises for instance in the central limit theorem. The proof of (the precise version of) this statement proceeds similarly to the combinatorial case, using pigeonholing to locate a scale where has almost the same norm as .

A special case of this theory occurs when is the uniform probability measure on elements of and their inverses. The probability measure is then the distribution of a random product , where each is equal to one of or its inverse , selected at random with drawn uniformly from with replacement. This is very close to the Littlewood-Offord situation of random products where each is equal to or selected independently at random (thus is now fixed to equal rather than being randomly drawn from . In the case when is abelian, it turns out that a little bit of Fourier analysis shows that these two random walks have “comparable” distributions in a certain sense. As a consequence, the results in this paper can be used to recover an essentially optimal abelian inverse Littlewood-Offord theorem of Nguyen and Vu. In the nonabelian case, the only Littlewood-Offord theorem I am aware of is a recent result of Tiep and Vu for matrix groups, but in this case I do not know how to relate the above two random walks to each other, and so we can only obtain an analogue of the Tiep-Vu results for the symmetrised random walk instead of the ordered random walk .

Filed under: math.CO, math.PR, paper Tagged: additive combinatorics, Gromov's theorem ]]>