You are currently browsing the category archive for the ‘expository’ category.

This is another sequel to a recent post in which I showed the Riemann zeta function can be locally approximated by a polynomial, in the sense that for randomly chosen one has an approximation

where grows slowly with , and is a polynomial of degree . It turns out that in the function field setting there is an exact version of this approximation which captures many of the known features of the Riemann zeta function, namely Dirichlet -functions for a random character of given modulus over a function field. This model was (essentially) studied in a fairly recent paper by Andrade, Miller, Pratt, and Trinh; I am not sure if there is any further literature on this model beyond this paper (though the number field analogue of low-lying zeroes of Dirichlet -functions is certainly well studied). In this model it is possible to set fixed and let go to infinity, thus providing a simple finite-dimensional model problem for problems involving the statistics of zeroes of the zeta function.

In this post I would like to record this analogue precisely. We will need a finite field of some order and a natural number , and set

We will primarily think of as being large and as being either fixed or growing very slowly with , though it is possible to also consider other asymptotic regimes (such as holding fixed and letting go to infinity). Let be the ring of polynomials of one variable with coefficients in , and let be the multiplicative semigroup of monic polynomials in ; one should view and as the function field analogue of the integers and natural numbers respectively. We use the valuation for polynomials (with ); this is the analogue of the usual absolute value on the integers. We select an irreducible polynomial of size (i.e., has degree ). The multiplicative group can be shown to be cyclic of order . A Dirichlet character of modulus is a completely multiplicative function of modulus , that is periodic of period and vanishes on those not coprime to . From Fourier analysis we see that there are exactly Dirichlet characters of modulus . A Dirichlet character is said to be *odd* if it is not identically one on the group of non-zero constants; there are only non-odd characters (including the principal character), so in the limit most Dirichlet characters are odd. We will work primarily with odd characters in order to be able to ignore the effect of the place at infinity.

Let be an odd Dirichlet character of modulus . The Dirichlet -function is then defined (for of sufficiently large real part, at least) as

Note that for , the set is invariant under shifts whenever ; since this covers a full set of residue classes of , and the odd character has mean zero on this set of residue classes, we conclude that the sum vanishes for . In particular, the -function is entire, and for any real number and complex number , we can write the -function as a polynomial

where and the coefficients are given by the formula

Note that can easily be normalised to zero by the relation

In particular, the dependence on is periodic with period (so by abuse of notation one could also take to be an element of ).

Fourier inversion yields a functional equation for the polynomial :

Proposition 1 (Functional equation)Let be an odd Dirichlet character of modulus , and . There exists a phase (depending on ) such thatfor all , or equivalently that

where .

*Proof:* We can normalise . Let be the finite field . We can write

where denotes the subgroup of consisting of (residue classes of) polynomials of degree less than . Let be a non-trivial character of whose kernel lies in the space (this is easily achieved by pulling back a non-trivial character from the quotient ). We can use the Fourier inversion formula to write

where

From change of variables we see that is a scalar multiple of ; from Plancherel we conclude that

for some phase . We conclude that

The inner sum equals if , and vanishes otherwise, thus

For in , and the contribution of the sum vanishes as is odd. Thus we may restrict to , so that

By the multiplicativity of , this factorises as

From the one-dimensional version of (3) (and the fact that is odd) we have

for some phase . The claim follows.

As one corollary of the functional equation, is a phase rotation of and thus is non-zero, so has degree exactly . The functional equation is then equivalent to the zeroes of being symmetric across the unit circle. In fact we have the stronger

Theorem 2 (Riemann hypothesis for Dirichlet -functions over function fields)Let be an odd Dirichlet character of modulus , and . Then all the zeroes of lie on the unit circle.

We derive this result from the Riemann hypothesis for curves over function fields below the fold.

In view of this theorem (and the fact that ), we may write

for some unitary matrix . It is possible to interpret as the action of the geometric Frobenius map on a certain cohomology group, but we will not do so here. The situation here is simpler than in the number field case because the factor arising from very small primes is now absent (in the function field setting there are no primes of size between and ).

We now let vary uniformly at random over all odd characters of modulus , and uniformly over , independently of ; we also make the distribution of the random variable conjugation invariant in . We use to denote the expectation with respect to this randomness. One can then ask what the limiting distribution of is in various regimes; we will focus in this post on the regime where is fixed and is being sent to infinity. In the spirit of the Sato-Tate conjecture, one should expect to converge in distribution to the circular unitary ensemble (CUE), that is to say Haar probability measure on . This may well be provable from Deligne’s “Weil II” machinery (in the spirit of this monograph of Katz and Sarnak), though I do not know how feasible this is or whether it has already been done in the literature; here we shall avoid using this machinery and study what partial results towards this CUE hypothesis one can make without it.

If one lets be the eigenvalues of (ordered arbitrarily), then we now have

and hence the are essentially elementary symmetric polynomials of the eigenvalues:

One can take log derivatives to conclude

On the other hand, as in the number field case one has the Dirichlet series expansion

where has sufficiently large real part, , and the von Mangoldt function is defined as when is the power of an irreducible and otherwise. We conclude the “explicit formula”

Similarly on inverting we have

Since we also have

for sufficiently large real part, where the Möbius function is equal to when is the product of distinct irreducibles, and otherwise, we conclude that the Möbius coefficients

are just the complete homogeneous symmetric polynomials of the eigenvalues:

One can then derive various algebraic relationships between the coefficients from various identities involving symmetric polynomials, but we will not do so here.

What do we know about the distribution of ? By construction, it is conjugation-invariant; from (2) it is also invariant with respect to the rotations for any phase . We also have the function field analogue of the Rudnick-Sarnak asymptotics:

Proposition 3 (Rudnick-Sarnak asymptotics)Let be nonnegative integers. Ifis equal to in the limit (holding fixed) unless for all , in which case it is equal to

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of are consistent with the CUE hypothesis (and also with the ACUE hypothesis, again by the previous post). The case of this proposition was essentially established by Andrade, Miller, Pratt, and Trinh.

*Proof:* We may assume the homogeneity relationship

since otherwise the claim follows from the invariance under phase rotation . By (6), the expression (9) is equal to

where

and consists of copies of for each , and similarly consists of copies of for each .

The polynomials and are monic of degree , which by hypothesis is less than the degree of , and thus they can only be scalar multiples of each other in if they are identical (in ). As such, we see that the average

vanishes unless , in which case this average is equal to . Thus the expression (9) simplifies to

There are at most choices for the product , and each one contributes to the above sum. All but of these choices are square-free, so by accepting an error of , we may restrict attention to square-free . This forces to all be irreducible (as opposed to powers of irreducibles); as is a unique factorisation domain, this forces and to be a permutation of . By the size restrictions, this then forces for all (if the above expression is to be anything other than ), and each is associated to possible choices of . Writing and then reinstating the non-squarefree possibilities for , we can thus write the above expression as

Using the prime number theorem , we obtain the claim.

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of are consistent with the CUE and ACUE hypotheses:

Corollary 4 (CUE statistics at low frequencies)Let be the eigenvalues of , permuted uniformly at random. Let be a linear combination of monomials where are integers with either or . Then

The analogue of the GUE hypothesis in this setting would be the CUE hypothesis, which asserts that the threshold here can be replaced by an arbitrarily large quantity. As far as I know this is not known even for (though, as mentioned previously, in principle one may be able to resolve such cases using Deligne’s proof of the Riemann hypothesis for function fields). Among other things, this would allow one to distinguish CUE from ACUE, since as discussed in the previous post, these two distributions agree when tested against monomials up to threshold , though not to .

*Proof:* By permutation symmetry we can take to be symmetric, and by linearity we may then take to be the symmetrisation of a single monomial . If then both expectations vanish due to the phase rotation symmetry, so we may assume that and . We can write this symmetric polynomial as a constant multiple of plus other monomials with a smaller value of . Since , the claim now follows by induction from Proposition 3 and Proposition 1 from the previous post.

Thus, for instance, for , the moment

is equal to

because all the monomials in are of the required form when . The latter expectation can be computed exactly (for any natural number ) using a formula

of Baker-Forrester and Keating-Snaith, thus for instance

and more generally

when , where are the integers

and more generally

(OEIS A039622). Thus we have

for if and is sufficiently slowly growing depending on . The CUE hypothesis would imply that that this formula also holds for higher . (The situation here is cleaner than in the number field case, in which the GUE hypothesis only suggests the correct lower bound for the moments rather than an asymptotic, due to the absence of the wildly fluctuating additional factor that is present in the Riemann zeta function model.)

Now we can recover the analogue of Montgomery’s work on the pair correlation conjecture. Consider the statistic

where

is some finite linear combination of monomials independent of . We can expand the above sum as

Assuming the CUE hypothesis, then by Example 3 of the previous post, we would conclude that

This is the analogue of Montgomery’s pair correlation conjecture. Proposition 3 implies that this claim is true whenever is supported on . If instead we assume the ACUE hypothesis (or the weaker Alternative Hypothesis that the phase gaps are non-zero multiples of ), one should instead have

for arbitrary ; this is the function field analogue of a recent result of Baluyot. In any event, since is non-negative, we unconditionally have the lower bound

By applying (12) for various choices of test functions we can obtain various bounds on the behaviour of eigenvalues. For instance suppose we take the Fejér kernel

Then (12) applies unconditionally and we conclude that

The right-hand side evaluates to . On the other hand, is non-negative, and equal to when . Thus

The sum is at least , and is at least if is not a simple eigenvalue. Thus

and thus the expected number of simple eigenvalues is at least ; in particular, at least two thirds of the eigenvalues are simple asymptotically on average. If we had (12) without any restriction on the support of , the same arguments allow one to show that the expected proportion of simple eigenvalues is .

Suppose that the phase gaps in are all greater than almost surely. Let is non-negative and non-positive for outside of the arc . Then from (13) one has

so by taking contrapositives one can force the existence of a gap less than asymptotically if one can find with non-negative, non-positive for outside of the arc , and for which one has the inequality

By a suitable choice of (based on a minorant of Selberg) one can ensure this for for large; see Section 5 of these notes of Goldston. This is not the smallest value of currently obtainable in the literature for the number field case (which is currently , due to Goldston and Turnage-Butterbaugh, by a somewhat different method), but is still significantly less than the trivial value of . On the other hand, due to the compatibility of the ACUE distribution with Proposition 3, it is not possible to lower below purely through the use of Proposition 3.

In some cases it is possible to go beyond Proposition 3. Consider the mollified moment

where

for some coefficients . We can compute this moment in the CUE case:

Proposition 5We have

*Proof:* From (5) one has

hence

where we suppress the dependence on the eigenvalues . Now observe the Pieri formula

where are the hook Schur polynomials

and we adopt the convention that vanishes for , or when and . Then also vanishes for . We conclude that

As the Schur polynomials are orthonormal on the unitary group, the claim follows.

The CUE hypothesis would then imply the corresponding mollified moment conjecture

(See this paper of Conrey, and this paper of Radziwill, for some discussion of the analogous conjecture for the zeta function, which is essentially due to Farmer.)

From Proposition 3 one sees that this conjecture holds in the range . It is likely that the function field analogue of the calculations of Conrey (based ultimately on deep exponential sum estimates of Deshouillers and Iwaniec) can extend this range to for any , if is sufficiently large depending on ; these bounds thus go beyond what is available from Proposition 3. On the other hand, as discussed in Remark 7 of the previous post, ACUE would also predict (14) for as large as , so the available mollified moment estimates are not strong enough to rule out ACUE. It would be interesting to see if there is some other estimate in the function field setting that can be used to exclude the ACUE hypothesis (possibly one that exploits the fact that GRH is available in the function field case?).

In a recent post I discussed how the Riemann zeta function can be locally approximated by a polynomial, in the sense that for randomly chosen one has an approximation

where grows slowly with , and is a polynomial of degree . Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of should all lie on the unit circle, and one should then be able to write as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix , which we normalise as

Here is some quantity depending on . We view as a random element of ; in the limit , the GUE hypothesis is equivalent to becoming equidistributed with respect to Haar measure on (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

and hence on taking logarithmic derivatives of (1) in the variable we (heuristically) have

Morally speaking, we have

so on comparing coefficients we expect to interpret the moments of as a finite Dirichlet series:

To understand the distribution of in the unitary group , it suffices to understand the distribution of the moments

where denotes averaging over , and . The GUE hypothesis asserts that in the limit , these moments converge to their CUE counterparts

where is now drawn uniformly in with respect to the CUE ensemble, and denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

This follows from the fact that for any phase , has the same distribution as , where we use the number theory notation .

In the case when the degree is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model)Ifthen the moment (6) vanishes unless for all , in which case it is equal to

Another way of viewing this proposition is that for distributed according to CUE, the random variables are distributed like independent complex random variables of mean zero and variance , as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of , so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of ‘s and lets go to infinity. (The paper of Diaconis and Shahshahani writes in place of , but I believe this to be a typo.)

*Proof:* Let be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

Our starting point is Schur-Weyl duality. Namely, we consider the -dimensional complex vector space

This space has an action of the product group : the symmetric group acts by permutation on the tensor factors, while the general linear group acts diagonally on the factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

where ranges over Young tableaux of size with at most rows, is the -irreducible unitary representation corresponding to (which can be constructed for instance using Specht modules), and is the -irreducible polynomial representation corresponding with highest weight .

Let be a permutation consisting of cycles of length (this is uniquely determined up to conjugation), and let . The pair then acts on , with the action on basis elements given by

The trace of this action can then be computed as

where is the matrix coefficient of . Breaking up into cycles and summing, this is just

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

where is the character on associated to , and is the character on associated to . As is well known, is just the Schur polynomial of weight applied to the (algebraic, generalised) eigenvalues of . We can specialise to unitary matrices to conclude that

and similarly

where consists of cycles of length for each . On the other hand, the characters are an orthonormal system on with the CUE measure. Thus we can write the expectation (6) as

Now recall that ranges over all the Young tableaux of size with at most rows. But by (8) we have , and so the condition of having rows is redundant. Hence now ranges over *all* Young tableaux of size , which as is well known enumerates all the irreducible representations of . One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if , are not conjugate, and is equal to divided by the size of the conjugacy class of (or equivalently, by the size of the centraliser of ) otherwise. But the latter expression is easily computed to be , giving the claim.

Example 2We illustrate the identity (11) when , . The Schur polynomials are given aswhere are the (generalised) eigenvalues of , and the formula (11) in this case becomes

The functions are orthonormal on , so the three functions are also, and their norms are , , and respectively, reflecting the size in of the centralisers of the permutations , , and respectively. If is instead set to say , then the terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3Consider the moment . For , the above proposition shows us that this moment is equal to . What happens for ? The formula (12) computes this moment aswhere is a cycle of length in , and ranges over all Young tableaux with size and at most rows. The Murnaghan-Nakayama rule tells us that vanishes unless is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space of vectors in whose coordinates sum to zero), in which case it is equal to (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to . Thus in general we have

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges and is not quite the range but instead leaks into the adjacent range . This issue can be addressed by working in a “weak" sense in which parameters such as are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of to the range (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

where , , and the integers are in the ranges

for and , and

for and . Morally, the expectation here is negligible unless

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then is significantly less than , so the multiplicative error in (15) becomes an additive error of . On the other hand, because of the fundamental *integrality gap* – that the integers are always separated from each other by a distance of at least – this forces the integers , to in fact be equal:

The von Mangoldt factors effectively restrict to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces , and to be a permutation of , which then forces for all ._ For a given , the number of possible is then , and the expectation in (14) is equal to . Thus this expectation is morally

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree case of pair correlations due to Montgomery, and the degree case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for , thus

for all . Montgomery showed this for (essentially) the range (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of . One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose is some random polynomial depending on of degree at most . Let denote the eigenvalues of , and let be a parameter. Observe from the pigeonhole principle that if the quantity

then the arcs cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than ( times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than times the mean angle separation. By judiciously choosing the coefficients of as functions of the moments , one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

for sufficiently small , and this can be computed (in principle, at least) using (3) if the coefficients of are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than times the mean spacing and greater than times the mean spacing infinitely often for certain ; the current records are (due to Goldston and Turnage-Butterbaugh) and (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound for the smallest gap below . The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases of the eigenvalues of are nasymptotically always non-zero integer multiples of . The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of ); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which is very close to zero, then behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined -function will have a polynomial approximation which in our language looks like a scalar multiple of , where and is a phase. The zeroes of this approximation lie on a coset of the roots of unity; the polynomial is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of . Taking then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the *Alternative Circular Unitary Ensemble* (ACUE) which is indistinguishable in low moments in the sense that the expectation for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of . This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group in terms of the distribution of the phases of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

where , denotes the dot product, and is the “long word”, which implies that (20) is a trigonometric series with constant term ; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing using the probability measure (20), and then generating to be a random unitary matrix with eigenvalues .

For the alternative distribution, we first draw on the discrete torus (thus each is a root of unity) with probability density function

shift by a phase drawn uniformly at random, and then select to be a random unitary matrix with eigenvalues . Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form for . The diagonal contribution gives the constant function , which has total mass one. All of the other exponentials have a frequency that is not a multiple of , and hence will have mean zero on . The claim follows.

From construction it is clear that the matrix drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of . Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials with of size at most and of equal size remain orthonormal with respect to the alternative measure. That is to say,

when have size equal to each other and at most . In this case the phase in the definition of is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

By Fourier decomposition, it then suffices to show that the trigonometric polynomial does not contain any components of the form for some non-zero lattice vector . But we have already observed that is a linear combination of plane waves of the form for . Also, as is well known, is a linear combination of plane waves where is majorised by , and similarly is a linear combination of plane waves where is majorised by . So the product is a linear combination of plane waves of the form . But every coefficient of the vector lies between and , and so cannot be of the form for any non-zero lattice vector , giving the claim.

Example 4If , then the distribution (21) assigns a probability of to any pair that is a permuted rotation of , and a probability of to any pair that is a permuted rotation of . Thus, a matrix drawn from the alternative distribution will be conjugate to a phase rotation of with probability , and to with probability .A similar computation when gives conjugate to a phase rotation of with probability , to a phase rotation of or its adjoint with probability of each, and a phase rotation of with probability .

Remark 5For large it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of ; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then is periodic in with period , so from Proposition 1 for the alternative distribution one has

which differs from (13) for any . (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single with would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6One can view the CUE as normalised Lebesgue measure on (viewed as a smooth submanifold of ). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of consisting of those unitary matrices whose phase spacings are non-zero integer multiples of ; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel (or one can equivalently take ; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of ). In particular, the -point correlation functions of ACUE (after this rotation) are precisely the restriction of the -point correlation functions of CUE after normalisation, that is to say they are proportional to .

Remark 7One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics forfor some small even exponent (almost always or ) and some short Dirichlet polynomial ; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

where is now some random medium degree polynomial that depends on the unitary matrix associated to (and in applications will typically also contain some negative power of to cancel the corresponding powers of in ). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of involve products of traces of total order less than , then in terms of the eigenvalue phases , is a linear combination of plane waves where the frequencies have coefficients of magnitude less than . On the other hand, as each coefficient of is an elementary symmetric function of the eigenvalues, is a linear combination of plane waves where the frequencies have coefficients of magnitude at most . Thus is a linear combination of plane waves where the frequencies have coefficients of magnitude less than , and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus by the previous arguments. In other words, has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier has degree close to or exceeding , which corresponds to Dirichlet polynomials of length close to or exceeding , which is far beyond current technology for such moment estimates.

Remark 8The GUE hypothesis for the zeta function asserts that the averagefor any and any test function , where is the Dyson sine kernel and are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for . The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

A useful rule of thumb in complex analysis is that holomorphic functions behave like large degree polynomials . This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.

In some cases it can be convenient instead to work with polynomials of another variable such as (or more generally for a scaling parameter ). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula

one ends up having the following heuristic approximation in the neighbourhood of a point on the critical line:

Heuristic 1 (Polynomial approximation)Let be a height, let be a “typical” element of , and let be an integer. Let be the linear change of variables

The requirement is necessary since the right-hand side is periodic with period in the variable (or period in the variable), whereas the zeta function is not expected to have any such periodicity, even approximately.

Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with ) we have an approximate form

of (11). If we group the integers from to into bins depending on what powers of they lie between, we thus have

For with and we heuristically have

and so

where are the partial Dirichlet series

This gives the desired polynomial approximation.

A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have

where runs over the non-trivial zeroes of , and there are some additional factors arising from the trivial zeroes and poles of which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region , the dominant contribution to this product (besides multiplicative constants) should arise from zeroes that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical” one should have about such zeroes. If one lets be any enumeration of zeroes closest to , and then repeats this set of zeroes periodically by period , one then expects to have an approximation of the form

again ignoring all issues of convergence. If one writes and , then Euler’s famous product formula for sine basically gives

(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as ) and hence we expect

This again gives the desired polynomial approximation.

Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show

Theorem 2Let be an integer going sufficiently slowly to infinity. Let go to zero sufficiently slowly depending on . Let be drawn uniformly at random from . Then with probability (in the limit ), and possibly after adjusting by , there exists a polynomial of degree and obeying the functional equation (9) below, such that

It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting be anything growing like , and anything growing like ; also we should be able to delete the need to adjust by . We have not attempted these optimisations here.

Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials , which one can view as random degree polynomials if is interpreted as a random variable drawn uniformly at random from . These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in ). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable , as per the Weil conjectures). The parameter is at our disposal to choose, and reflects the scale at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take (or very slightly larger), while for “microscopic” questions one would take close to and only growing very slowly with . For the intermediate “mesoscopic” scales one would take somewhere between and . Unfortunately, the statistical properties of are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of and related functions (e.g., , , or ).

Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the zeroes of the polynomial lie on the unit circle (which, after the change of variables , corresponds to being real); in a similar vein, the GUE hypothesis corresponds to having the asymptotic law of a random scalar times the characteristic polynomial of a random unitary matrix. Next, we consider what happens to the functional equation

A routine calculation involving Stirling’s formula reveals that

with ; one also has the closely related approximation

when . Since , applying (5) with and using the approximation (2) suggests a functional equation for :

where is the polynomial with all the coefficients replaced by their complex conjugate. Thus if we write

then the functional equation can be written as

We remark that if we use the heuristic (3) (interpreting the cutoffs in the summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.

Another consequence of the functional equation is that the zeroes of are symmetric with respect to inversion across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation of instead.

One consequence of the functional equation is that is real for any ; the same is then true for the derivative . Among other things, this implies that cannot vanish unless does also; thus the zeroes of will not lie on the unit circle except where has repeated zeroes. The analogous statement is true for ; the zeroes of will not lie on the critical line except where has repeated zeroes.

Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative of the zeta function in the critical strip lie on or to the *right* of the critical line. The analogous result for polynomials is

Proposition 3We have(where all zeroes are counted with multiplicity.) In particular, the zeroes of all lie on the unit circle if and only if the zeroes of lie in the closed unit disk.

*Proof:* From the functional equation we have

Thus it will suffice to show that and have the same number of zeroes outside the closed unit disk.

Set , then is a rational function that does not have a zero or pole at infinity. For not a zero of , we have already seen that and are real, so on dividing we see that is always real, that is to say

(This can also be seen by writing , where runs over the zeroes of , and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When is a zero of , has a simple pole at with residue a positive multiple of , and so stays on the right half-plane if one traverses a semicircular arc around outside the unit disk. From this and continuity we see that stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim.

From the functional equation and the chain rule, is a zero of if and only if is a zero of . We can thus write the above proposition in the equivalent form

One can use this identity to get a lower bound on the number of zeroes of by the method of mollifiers. Namely, for any other polynomial , we clearly have

By Jensen’s formula, we have for any that

We therefore have

As the logarithm function is concave, we can apply Jensen’s inequality to conclude

where the expectation is over the parameter. It turns out that by choosing the mollifier carefully in order to make behave like the function (while keeping the degree small enough that one can compute the second moment here), and then optimising in , one can use this inequality to get a positive fraction of zeroes of on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier and of the differential operator that implicitly appears in the above approach. (The most recent lower bound I know of is , due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any , the number of zeroes of that lie further than from the unit circle is of order on average for some absolute constant . Thus, roughly speaking, most zeroes of lie within of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of .)

The zeroes of tend to live somewhat closer to the origin than the zeroes of . Suppose for instance that we write

where are the zeroes of , then by evaluating at zero we see that

and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate

where are the zeroes of , then by evaluating at zero we now see that

The right-hand side would now be typically expected to be of size , and so on average we expect the to have magnitude like , that is to say pushed inwards from the unit circle by a distance roughly . The analogous result for the Riemann zeta function is that the zeroes of at height lie at a distance roughly to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.

(This post is mostly intended for my own reference, as I found myself repeatedly looking up several conversions between polynomial bases on various occasions.)

Let denote the vector space of polynomials of one variable with real coefficients of degree at most . This is a vector space of dimension , and the sequence of these spaces form a filtration:

A standard basis for these vector spaces are given by the monomials : every polynomial in can be expressed uniquely as a linear combination of the first monomials . More generally, if one has any sequence of polynomials, with each of degree exactly , then an easy induction shows that forms a basis for .

In particular, if we have *two* such sequences and of polynomials, with each of degree and each of degree , then must be expressible uniquely as a linear combination of the polynomials , thus we have an identity of the form

for some *change of basis coefficients* . These coefficients describe how to convert a polynomial expressed in the basis into a polynomial expressed in the basis.

Many standard combinatorial quantities involving two natural numbers can be interpreted as such change of basis coefficients. The most familiar example are the binomial coefficients , which measures the conversion from the shifted monomial basis to the monomial basis , thanks to (a special case of) the binomial formula:

thus for instance

More generally, for any shift , the conversion from to is measured by the coefficients , thanks to the general case of the binomial formula.

But there are other bases of interest too. For instance if one uses the falling factorial basis

then the conversion from falling factorials to monomials is given by the Stirling numbers of the first kind :

thus for instance

and the conversion back is given by the Stirling numbers of the second kind :

thus for instance

If one uses the binomial functions as a basis instead of the falling factorials, one of course can rewrite these conversions as

and

thus for instance

and

As a slight variant, if one instead uses rising factorials

then the conversion to monomials yields the unsigned Stirling numbers of the first kind:

thus for instance

One final basis comes from the polylogarithm functions

For instance one has

and more generally one has

for all natural numbers and some polynomial of degree (the *Eulerian polynomials*), which when converted to the monomial basis yields the (shifted) Eulerian numbers

For instance

These particular coefficients also have useful combinatorial interpretations. For instance:

- The binomial coefficient is of course the number of -element subsets of .
- The unsigned Stirling numbers of the first kind are the number of permutations of with exactly cycles. The signed Stirling numbers are then given by the formula .
- The Stirling numbers of the second kind are the number of ways to partition into non-empty subsets.
- The Eulerian numbers are the number of permutations of with exactly ascents.

These coefficients behave similarly to each other in several ways. For instance, the binomial coefficients obey the well known Pascal identity

(with the convention that vanishes outside of the range ). In a similar spirit, the unsigned Stirling numbers of the first kind obey the identity

and the signed counterparts obey the identity

The Stirling numbers of the second kind obey the identity

and the Eulerian numbers obey the identity

I was pleased to learn this week that the 2019 Abel Prize was awarded to Karen Uhlenbeck. Uhlenbeck laid much of the foundations of modern geometric PDE. One of the few papers I have in this area is in fact a joint paper with Gang Tian extending a famous singularity removal theorem of Uhlenbeck for four-dimensional Yang-Mills connections to higher dimensions. In both these papers, it is crucial to be able to construct “Coulomb gauges” for various connections, and there is a clever trick of Uhlenbeck for doing so, introduced in another important paper of hers, which is absolutely critical in my own paper with Tian. Nowadays it would be considered a standard technique, but it was definitely not so at the time that Uhlenbeck introduced it.

Suppose one has a smooth connection on a (closed) unit ball in for some , taking values in some Lie algebra associated to a compact Lie group . This connection then has a curvature , defined in coordinates by the usual formula

It is natural to place the curvature in a scale-invariant space such as , and then the natural space for the connection would be the Sobolev space . It is easy to see from (1) and Sobolev embedding that if is bounded in , then will be bounded in . One can then ask the converse question: if is bounded in , is bounded in ? This can be viewed as asking whether the curvature equation (1) enjoys “elliptic regularity”.

There is a basic obstruction provided by gauge invariance. For any smooth gauge taking values in the Lie group, one can gauge transform to

and then a brief calculation shows that the curvature is conjugated to

This gauge symmetry does not affect the norm of the curvature tensor , but can make the connection extremely large in , since there is no control on how wildly can oscillate in space.

However, one can hope to overcome this problem by *gauge fixing*: perhaps if is bounded in , then one can make bounded in *after* applying a gauge transformation. The basic and useful result of Uhlenbeck is that this can be done if the norm of is sufficiently small (and then the conclusion is that is small in ). (For large connections there is a serious issue related to the Gribov ambiguity.) In my (much) later paper with Tian, we adapted this argument, replacing Lebesgue spaces by Morrey space counterparts. (This result was also independently obtained at about the same time by Meyer and Riviére.)

To make the problem elliptic, one can try to impose the *Coulomb gauge condition*

(also known as the *Lorenz gauge* or *Hodge gauge* in various papers), together with a natural boundary condition on that will not be discussed further here. This turns (1), (2) into a divergence-curl system that is elliptic at the linear level at least. Indeed if one takes the divergence of (1) using (2) one sees that

and if one could somehow ignore the nonlinear term then we would get the required regularity on by standard elliptic regularity estimates.

The problem is then how to handle the nonlinear term. If we already knew that was small in the right norm then one can use Sobolev embedding, Hölder’s inequality, and elliptic regularity to show that the second term in (3) is small compared to the first term, and so one could then hope to eliminate it by perturbative analysis. However, proving that is small in this norm is exactly what we are trying to prove! So this approach seems circular.

Uhlenbeck’s clever way out of this circularity is a textbook example of what is now known as a “continuity” argument. Instead of trying to work just with the original connection , one works with the rescaled connections for , with associated rescaled curvatures . If the original curvature is small in norm (e.g. bounded by some small ), then so are all the rescaled curvatures . We want to obtain a Coulomb gauge at time ; this is difficult to do directly, but it is trivial to obtain a Coulomb gauge at time , because the connection vanishes at this time. On the other hand, once one has successfully obtained a Coulomb gauge at some time with small in the natural norm (say bounded by for some constant which is large in absolute terms, but not so large compared with say ), the perturbative argument mentioned earlier (combined with the qualitative hypothesis that is smooth) actually works to show that a Coulomb gauge can also be constructed and be small for all sufficiently close *nearby* times to ; furthermore, the perturbative analysis actually shows that the nearby gauges enjoy a slightly better bound on the norm, say rather than . As a consequence of this, the set of times for which one has a good Coulomb gauge obeying the claimed estimates is both open and closed in , and also contains . Since the unit interval is connected, it must then also contain . This concludes the proof.

One of the lessons I drew from this example is to not be deterred (especially in PDE) by an argument seeming to be circular; if the argument is still sufficiently “nontrivial” in nature, it can often be modified into a usefully non-circular argument that achieves what one wants (possibly under an additional qualitative hypothesis, such as a continuity or smoothness hypothesis).

The celebrated decomposition theorem of Fefferman and Stein shows that every function of bounded mean oscillation can be decomposed in the form

modulo constants, for some , where are the Riesz transforms. A technical note here a function in BMO is defined only up to constants (as well as up to the usual almost everywhere equivalence); related to this, if is an function, then the Riesz transform is well defined as an element of , but is also only defined up to constants and almost everywhere equivalence.

The original proof of Fefferman and Stein was indirect (relying for instance on the Hahn-Banach theorem). A constructive proof was later given by Uchiyama, and was in fact the topic of the second post on this blog. A notable feature of Uchiyama’s argument is that the construction is quite nonlinear; the vector-valued function is defined to take values on a sphere, and the iterative construction to build these functions from involves repeatedly projecting a potential approximant to this function to the sphere (also, the high-frequency components of this approximant are constructed in a manner that depends nonlinearly on the low-frequency components, which is a type of technique that has become increasingly common in analysis and PDE in recent years).

It is natural to ask whether the Fefferman-Stein decomposition (1) can be made linear in , in the sense that each of the depend linearly on . Strictly speaking this is easily accomplished using the axiom of choice: take a Hamel basis of , choose a decomposition (1) for each element of this basis, and then extend linearly to all finite linear combinations of these basis functions, which then cover by definition of Hamel basis. But these linear operations have no reason to be continuous as a map from to . So the correct question is whether the decomposition can be made *continuously linear* (or equivalently, boundedly linear) in , that is to say whether there exist continuous linear transformations such that

modulo constants for all . Note from the open mapping theorem that one can choose the functions to depend in a bounded fashion on (thus for some constant , however the open mapping theorem does not guarantee linearity. Using a result of Bartle and Graves one can also make the depend continuously on , but again the dependence is not guaranteed to be linear.

It is generally accepted folklore that continuous linear dependence is known to be impossible, but I had difficulty recently tracking down an explicit proof of this assertion in the literature (if anyone knows of a reference, I would be glad to know of it). The closest I found was a proof of a similar statement in this paper of Bourgain and Brezis, which I was able to adapt to establish the current claim. The basic idea is to average over the symmetries of the decomposition, which in the case of (1) are translation invariance, rotation invariance, and dilation invariance. This effectively makes the operators invariant under all these symmetries, which forces them to themselves be linear combinations of the identity and Riesz transform operators; however, no such non-trivial linear combination maps to , and the claim follows. Formal details of this argument (which we phrase in a dual form in order to avoid some technicalities) appear below the fold.

Let be a field, and let be a finite extension of that field; in this post we will denote such a relationship by . We say that is a Galois extension of if the cardinality of the automorphism group of fixing is as large as it can be, namely the degree of the extension. In that case, we call the Galois group of over and denote it also by . The fundamental theorem of Galois theory then gives a one-to-one correspondence (also known as the *Galois correspondence*) between the intermediate extensions between and and the subgroups of :

Theorem 1 (Fundamental theorem of Galois theory)Let be a Galois extension of .

- (i) If is an intermediate field betwen and , then is a Galois extension of , and is a subgroup of .
- (ii) Conversely, if is a subgroup of , then there is a unique intermediate field such that ; namely is the set of elements of that are fixed by .
- (iii) If and , then if and only if is a subgroup of .
- (iv) If is an intermediate field between and , then is a Galois extension of if and only if is a normal subgroup of . In that case, is isomorphic to the quotient group .

Example 2Let , and let be the degree Galois extension formed by adjoining a primitive root of unity (that is to say, is the cyclotomic field of order ). Then is isomorphic to the multiplicative cyclic group (the invertible elements of the ring ). Amongst the intermediate fields, one has the cyclotomic fields of the form where divides ; they are also Galois extensions, with isomorphic to and isomorphic to the elements of such that modulo . (There can also be other intermediate fields, corresponding to other subgroups of .)

Example 3Let be the field of rational functions of one indeterminate with complex coefficients, and let be the field formed by adjoining an root to , thus . Then is a degree Galois extension of with Galois group isomorphic to (with an element corresponding to the field automorphism of that sends to ). The intermediate fields are of the form where divides ; they are also Galois extensions, with isomorphic to and isomorphic to the multiples of in .

There is an analogous Galois correspondence in the covering theory of manifolds. For simplicity we restrict attention to finite covers. If is a connected manifold and is a finite covering map of by another connected manifold , we denote this relationship by . (Later on we will change our function notations slightly and write in place of the more traditional , and similarly for the deck transformations below; more on this below the fold.) If , we can define to be the group of deck transformations: continuous maps which preserve the fibres of . We say that this covering map is a *Galois cover* if the cardinality of the group is as large as it can be. In that case we call the *Galois group* of over and denote it by .

Suppose is a finite cover of . An *intermediate cover* between and is a cover of by , such that , in such a way that the covering maps are compatible, in the sense that is the composition of and . This sort of compatibilty condition will be implicitly assumed whenever we chain together multiple instances of the notation. Two intermediate covers are *equivalent* if they cover each other, in a fashion compatible with all the other covering maps, thus and . We then have the analogous Galois correspondence:

Theorem 4 (Fundamental theorem of covering spaces)Let be a Galois covering.

- (i) If is an intermediate cover betwen and , then is a Galois extension of , and is a subgroup of .
- (ii) Conversely, if is a subgroup of , then there is a intermediate cover , unique up to equivalence, such that .
- (iii) If and , then if and only if is a subgroup of .
- (iv) If , then is a Galois cover of if and only if is a normal subgroup of . In that case, is isomorphic to the quotient group .

Example 5Let , and let be the -fold cover of with covering map . Then is a Galois cover of , and is isomorphic to the cyclic group . The intermediate covers are (up to equivalence) of the form with covering map where divides ; they are also Galois covers, with isomorphic to and isomorphic to the multiples of in .

Given the strong similarity between the two theorems, it is natural to ask if there is some more concrete connection between Galois theory and the theory of finite covers.

In one direction, if the manifolds have an algebraic structure (or a complex structure), then one can relate covering spaces to field extensions by considering the field of rational functions (or meromorphic functions) on the space. For instance, if and is the coordinate on , one can consider the field of rational functions on ; the -fold cover with coordinate from Example 5 similarly has a field of rational functions. The covering relates the two coordinates by the relation , at which point one sees that the rational functions on are a degree extension of that of (formed by adjoining the root of unity to ). In this way we see that Example 5 is in fact closely related to Example 3.

Exercise 6What happens if one uses meromorphic functions in place of rational functions in the above example? (To answer this question, I found it convenient to use a discrete Fourier transform associated to the multiplicative action of the roots of unity on to decompose the meromorphic functions on as a linear combination of functions invariant under this action, times a power of the coordinate for .)

I was curious however about the reverse direction. Starting with some field extensions , is it is possible to create manifold like spaces associated to these fields in such a fashion that (say) behaves like a “covering space” to with a group of deck transformations isomorphic to , so that the Galois correspondences agree? Also, given how the notion of a path (and associated concepts such as loops, monodromy and the fundamental group) play a prominent role in the theory of covering spaces, can spaces such as or also come with a notion of a path that is somehow compatible with the Galois correspondence?

The standard answer from modern algebraic geometry (as articulated for instance in this nice MathOverflow answer by Minhyong Kim) is to set equal to the spectrum of the field . As a set, the spectrum of a commutative ring is defined as the set of prime ideals of . Generally speaking, the map that maps a commutative ring to its spectrum tends to act like an inverse of the operation that maps a space to a ring of functions on that space. For instance, if one considers the commutative ring of regular functions on , then each point in gives rise to the prime ideal , and one can check that these are the only such prime ideals (other than the zero ideal ), giving an almost one-to-one correspondence between and . (The zero ideal corresponds instead to the generic point of .)

Of course, the spectrum of a field such as is just a point, as the zero ideal is the only prime ideal. Naively, it would then seem that there is not enough space inside such a point to support a rich enough structure of paths to recover the Galois theory of this field. In modern algebraic geometry, one addresses this issue by considering not just the set-theoretic elements of , but more general “base points” that map from some other (affine) scheme to (one could also consider non-affine base points of course). One has to rework many of the fundamentals of the subject to accommodate this “relative point of view“, for instance replacing the usual notion of topology with an étale topology, but once one does so one obtains a very satisfactory theory.

As an exercise, I set myself the task of trying to interpret Galois theory as an analogue of covering space theory in a more classical fashion, without explicit reference to more modern concepts such as schemes, spectra, or étale topology. After some experimentation, I found a reasonably satisfactory way to do so as follows. The space that one associates with in this classical perspective is not the single point , but instead the much larger space consisting of ring homomorphisms from to arbitrary integral domains ; informally, consists of all the “models” or “representations” of (in the spirit of this previous blog post). (There is a technical set-theoretic issue here because the class of integral domains is a proper class, so that will also be a proper class; I will completely ignore such technicalities in this post.) We view each such homomorphism as a single point in . The analogous notion of a path from one point to another is then a homomorphism of integral domains, such that is the composition of with . Note that every prime ideal in the spectrum of a commutative ring gives rise to a point in the space defined here, namely the quotient map to the ring , which is an integral domain because is prime. So one can think of as being a distinguished subset of ; alternatively, one can think of as a sort of “penumbra” surrounding . In particular, when is a field, defines a special point in , namely the identity homomorphism .

Below the fold I would like to record this interpretation of Galois theory, by first revisiting the theory of covering spaces using paths as the basic building block, and then adapting that theory to the theory of field extensions using the spaces indicated above. This is not too far from the usual scheme-theoretic way of phrasing the connection between the two topics (basically I have replaced étale-type points with more classical points ), but I had not seen it explicitly articulated before, so I am recording it here for my own benefit and for any other readers who may be interested.

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “( AND ) OR (NOT )”, where are some formulas). A typical such rule is Modus Ponens: if the sentence is known to be true, and the implication “ IMPLIES ” is also known to be true, then one can deduce that is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption , one is able to derive the statement , then one can conclude that the implication “ IMPLIES ” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Let be a measure-preserving system – a probability space equipped with a measure-preserving translation (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points in this space as being “close” if for some that is not too large; this allows one to distinguish between “local” structure at a point (in which one only looks at nearby points for moderately large ) and “global” structure (in which one looks at the entire space ). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function which is *locally essentially constant* in the sense that for -almost every , is necessarily *globally essentially constant* in the sense that there is a constant such that for -almost every . A basic consequence of ergodicity is the mean ergodic theorem: if , then the averages converge in norm to the mean . (The mean ergodic theorem also applies to other spaces with , though it is usually proven first in the Hilbert space .) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that converges to for any measurable set .

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1Let be an ergodic measure-preserving system, and let be measurable. Suppose that

for some . Then there exists a constant such that for in a set of measure at least .

Informally: if is locally constant on pairs at least of the time, then is globally constant at least of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

*Proof:* By composing with (say) the arctangent function, we may assume without loss of generality that is bounded. Let , and partition as , where is the level set

For each , only finitely many of the are non-empty. By (1), one has

Using the ergodic theorem, we conclude that

On the other hand, . Thus there exists such that , thus

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where converges to a limit , then we have

for infinitely many , and hence

The claim follows.

Let , be additive groups (i.e., groups with an abelian addition group law). A map is a homomorphism if one has

for all . A map is an *affine* homomorphism if one has

for all *additive quadruples* in , by which we mean that and . The two notions are closely related; it is easy to verify that is an affine homomorphism if and only if is the sum of a homomorphism and a constant.

Now suppose that also has a translation-invariant metric . A map is said to be a quasimorphism if one has

for all , where denotes a quantity at a bounded distance from the origin. Similarly, is an *affine quasimorphism* if

for all additive quadruples in . Again, one can check that is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism . Iterating (2), we see that for any integer and natural number , which we can rewrite as for non-zero . Also, is Lipschitz. Sending , we can verify that is a Cauchy sequence as and thus tends to some limit ; we have for , hence for positive , and then one can use (2) one last time to obtain for all . Thus is the sum of the homomorphism and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map a *-cocycle*. A *-cocycle* is a map obeying the identity

for all . Given a -cocycle , one can form its *derivative* by the formula

Such functions are called *-coboundaries*. It is easy to see that the abelian group of -coboundaries is a subgroup of the abelian group of -cocycles. The quotient of these two groups is the first group cohomology of with coefficients in , and is denoted .

If a -cocycle is bounded then its derivative is a bounded -coboundary. The quotient of the group of bounded -cocycles by the derivatives of bounded -cocycles is called the *bounded first group cohomology* of with coefficients in , and is denoted . There is an obvious homomorphism from to , formed by taking a coset of the space of derivatives of bounded -cocycles, and enlarging it to a coset of the space of -coboundaries. By chasing all the definitions, we see that all quasimorphism from to are the sum of a homomorphism and a bounded function if and only if this homomorphism is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of .

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “ of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of -structure to -structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1Let , be additive groups with , let be a subset of , let , and let be a function such thatfor additive quadruples in . Then there exists a subset of containing with , a subset of with , and a function such that

for all (thus, the derivative takes values in on ), and such that for each , one has

Presumably the constants and can be improved further, but we have not attempted to optimise these constants. We chose as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside . In applications, the set need not have bounded size, or even bounded doubling; for instance, in the inverse theory over a small finite fields , one would be interested in the situation where is the group of matrices with coefficients in (for some large , and being the subset consisting of those matrices of rank bounded by some bound .

*Proof:* By hypothesis, there are triples such that and

Thus, there is a set with such that for all , one has (6) for pairs with ; in particular, there exists such that (6) holds for values of . Setting , we conclude that for each , one has

Consider the bipartite graph whose vertex sets are two copies of , and and connected by a (directed) edge if and (7) holds. Then this graph has edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset of with with the property that for any , there exist triples such that the edges all lie in this bipartite graph. This implies that, for all , there exist septuples obeying the constraints

and for . These constraints imply in particular that

Also observe that

Thus, if and are such that , we see that

for octuples in the hyperplane

By the pigeonhole principle, this implies that for any fixed , there can be at most sets of the form with , that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set of cardinality , such that each set with , intersects for some , or in other words that

This implies that there exists a subset of with , and an element for each , such that

for all . Note we may assume without loss of generality that and .

By construction of , and permuting labels, we can find 16-tuples such that

and

for . We sum this to obtain

and hence by (8)

where . Since

we see that there are only possible values of . By the pigeonhole principle, we conclude that at most of the sets can be disjoint. Arguing as before, we conclude that there exists a set of cardinality such that

whenever (10) holds.

For any , write arbitrarily as for some (with if , and if ) and then set

Then from (11) we have (4). For we have , and (5) then follows from (9).

## Recent Comments