You are currently browsing the category archive for the ‘math.NT’ category.

I recently came across this question on MathOverflow asking if there are any polynomials {P} of two variables with rational coefficients, such that the map {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} is a bijection. The answer to this question is almost surely “no”, but it is remarkable how hard this problem resists any attempt at rigorous proof. (MathOverflow users with enough privileges to see deleted answers will find that there are no fewer than seventeen deleted attempts at a proof in response to this question!)

On the other hand, the one surviving response to the question does point out this paper of Poonen which shows that assuming a powerful conjecture in Diophantine geometry known as the Bombieri-Lang conjecture (discussed in this previous post), it is at least possible to exhibit polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} which are injective.

I believe that it should be possible to also rule out the existence of bijective polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} if one assumes the Bombieri-Lang conjecture, and have sketched out a strategy to do so, but filling in the gaps requires a fair bit more algebraic geometry than I am capable of. So as a sort of experiment, I would like to see if a rigorous implication of this form (similarly to the rigorous implication of the Erdos-Ulam conjecture from the Bombieri-Lang conjecture in my previous post) can be crowdsourced, in the spirit of the polymath projects (though I feel that this particular problem should be significantly quicker to resolve than a typical such project).

Here is how I imagine a Bombieri-Lang-powered resolution of this question should proceed (modulo a large number of unjustified and somewhat vague steps that I believe to be true but have not established rigorously). Suppose for contradiction that we have a bijective polynomial {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}}. Then for any polynomial {Q: {\bf Q} \rightarrow {\bf Q}} of one variable, the surface

\displaystyle  S_Q := \{ (x,y,z) \in \mathbb{A}^3: P(x,y) = Q(z) \}

has infinitely many rational points; indeed, every rational {z \in {\bf Q}} lifts to exactly one rational point in {S_Q}. I believe that for “typical” {Q} this surface {S_Q} should be irreducible. One can now split into two cases:

  • (a) The rational points in {S_Q} are Zariski dense in {S_Q}.
  • (b) The rational points in {S_Q} are not Zariski dense in {S_Q}.

Consider case (b) first. By definition, this case asserts that the rational points in {S_Q} are contained in a finite number of algebraic curves. By Faltings’ theorem (a special case of the Bombieri-Lang conjecture), any curve of genus two or higher only contains a finite number of rational points. So all but finitely many of the rational points in {S_Q} are contained in a finite union of genus zero and genus one curves. I think all genus zero curves are birational to a line, and all the genus one curves are birational to an elliptic curve (though I don’t have an immediate reference for this). These curves {C} all can have an infinity of rational points, but very few of them should have “enough” rational points {C \cap {\bf Q}^3} that their projection {\pi(C \cap {\bf Q}^3) := \{ z \in {\bf Q} : (x,y,z) \in C \hbox{ for some } x,y \in {\bf Q} \}} to the third coordinate is “large”. In particular, I believe

  • (i) If {C \subset {\mathbb A}^3} is birational to an elliptic curve, then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow at most polylogarithmically in {H} (i.e., be of order {O( \log^{O(1)} H )}.
  • (ii) If {C \subset {\mathbb A}^3} is birational to a line but not of the form {\{ (f(z), g(z), z) \}} for some rational {f,g}, then then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow slower than {H^2} (in fact I think it can only grow like {O(H)}).

I do not have proofs of these results (though I think something similar to (i) can be found in Knapp’s book, and (ii) should basically follow by using a rational parameterisation {\{(f(t),g(t),h(t))\}} of {C} with {h} nonlinear). Assuming these assertions, this would mean that there is a curve of the form {\{ (f(z),g(z),z)\}} that captures a “positive fraction” of the rational points of {S_Q}, as measured by restricting the height of the third coordinate {z} to lie below a large threshold {H}, computing density, and sending {H} to infinity (taking a limit superior). I believe this forces an identity of the form

\displaystyle  P(f(z), g(z)) = Q(z) \ \ \ \ \ (1)

for all {z}. Such identities are certainly possible for some choices of {Q} (e.g. {Q(z) = P(F(z), G(z))} for arbitrary polynomials {F,G} of one variable) but I believe that the only way that such identities hold for a “positive fraction” of {Q} (as measured using height as before) is if there is in fact a rational identity of the form

\displaystyle  P( f_0(z), g_0(z) ) = z

for some rational functions {f_0,g_0} with rational coefficients (in which case we would have {f = f_0 \circ Q} and {g = g_0 \circ Q}). But such an identity would contradict the hypothesis that {P} is bijective, since one can take a rational point {(x,y)} outside of the curve {\{ (f_0(z), g_0(z)): z \in {\bf Q} \}}, and set {z := P(x,y)}, in which case we have {P(x,y) = P(f_0(z), g_0(z) )} violating the injective nature of {P}. Thus, modulo a lot of steps that have not been fully justified, we have ruled out the scenario in which case (b) holds for a “positive fraction” of {Q}.

This leaves the scenario in which case (a) holds for a “positive fraction” of {Q}. Assuming the Bombieri-Lang conjecture, this implies that for such {Q}, any resolution of singularities of {S_Q} fails to be of general type. I would imagine that this places some very strong constraints on {P,Q}, since I would expect the equation {P(x,y) = Q(z)} to describe a surface of general type for “generic” choices of {P,Q} (after resolving singularities). However, I do not have a good set of techniques for detecting whether a given surface is of general type or not. Presumably one should proceed by viewing the surface {\{ (x,y,z): P(x,y) = Q(z) \}} as a fibre product of the simpler surface {\{ (x,y,w): P(x,y) = w \}} and the curve {\{ (z,w): Q(z) = w \}} over the line {\{w \}}. In any event, I believe the way to handle (a) is to show that the failure of general type of {S_Q} implies some strong algebraic constraint between {P} and {Q} (something in the spirit of (1), perhaps), and then use this constraint to rule out the bijectivity of {P} by some further ad hoc method.

This is another sequel to a recent post in which I showed the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. It turns out that in the function field setting there is an exact version of this approximation which captures many of the known features of the Riemann zeta function, namely Dirichlet {L}-functions for a random character of given modulus over a function field. This model was (essentially) studied in a fairly recent paper by Andrade, Miller, Pratt, and Trinh; I am not sure if there is any further literature on this model beyond this paper (though the number field analogue of low-lying zeroes of Dirichlet {L}-functions is certainly well studied). In this model it is possible to set {N} fixed and let {T} go to infinity, thus providing a simple finite-dimensional model problem for problems involving the statistics of zeroes of the zeta function.

In this post I would like to record this analogue precisely. We will need a finite field {{\mathbb F}} of some order {q} and a natural number {N}, and set

\displaystyle  T := q^{N+1}.

We will primarily think of {q} as being large and {N} as being either fixed or growing very slowly with {q}, though it is possible to also consider other asymptotic regimes (such as holding {q} fixed and letting {N} go to infinity). Let {{\mathbb F}[X]} be the ring of polynomials of one variable {X} with coefficients in {{\mathbb F}}, and let {{\mathbb F}[X]'} be the multiplicative semigroup of monic polynomials in {{\mathbb F}[X]}; one should view {{\mathbb F}[X]} and {{\mathbb F}[X]'} as the function field analogue of the integers and natural numbers respectively. We use the valuation {|n| := q^{\mathrm{deg}(n)}} for polynomials {n \in {\mathbb F}[X]} (with {|0|=0}); this is the analogue of the usual absolute value on the integers. We select an irreducible polynomial {Q \in {\mathbb F}[X]} of size {|Q|=T} (i.e., {Q} has degree {N+1}). The multiplicative group {({\mathbb F}[X]/Q{\mathbb F}[X])^\times} can be shown to be cyclic of order {|Q|-1=T-1}. A Dirichlet character of modulus {Q} is a completely multiplicative function {\chi: {\mathbb F}[X] \rightarrow {\bf C}} of modulus {Q}, that is periodic of period {Q} and vanishes on those {n \in {\mathbb F}[X]} not coprime to {Q}. From Fourier analysis we see that there are exactly {\phi(Q) := |Q|-1} Dirichlet characters of modulus {Q}. A Dirichlet character is said to be odd if it is not identically one on the group {{\mathbb F}^\times} of non-zero constants; there are only {\frac{1}{q-1} \phi(Q)} non-odd characters (including the principal character), so in the limit {q \rightarrow \infty} most Dirichlet characters are odd. We will work primarily with odd characters in order to be able to ignore the effect of the place at infinity.

Let {\chi} be an odd Dirichlet character of modulus {Q}. The Dirichlet {L}-function {L(s, \chi)} is then defined (for {s \in {\bf C}} of sufficiently large real part, at least) as

\displaystyle  L(s,\chi) := \sum_{n \in {\mathbb F}[X]'} \frac{\chi(n)}{|n|^s}

\displaystyle  = \sum_{m=0}^\infty q^{-sm} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that for {m \geq N+1}, the set {n \in {\mathbb F}[X]': |n| = q^m} is invariant under shifts {h} whenever {|h| < T}; since this covers a full set of residue classes of {{\mathbb F}[X]/Q{\mathbb F}[X]}, and the odd character {\chi} has mean zero on this set of residue classes, we conclude that the sum {\sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n)} vanishes for {m \geq N+1}. In particular, the {L}-function is entire, and for any real number {t} and complex number {z}, we can write the {L}-function as a polynomial

\displaystyle  L(\frac{1}{2} + it - \frac{2\pi i z}{\log T},\chi) = P(Z) = P_{t,\chi}(Z) := \sum_{m=0}^N c^1_m(t,\chi) Z^j

where {Z := e(z/N) = e^{2\pi i z/N}} and the coefficients {c^1_m = c^1_m(t,\chi)} are given by the formula

\displaystyle  c^1_m(t,\chi) := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that {t} can easily be normalised to zero by the relation

\displaystyle  P_{t,\chi}(Z) = P_{0,\chi}( q^{-it} Z ). \ \ \ \ \ (2)

In particular, the dependence on {t} is periodic with period {\frac{2\pi}{\log q}} (so by abuse of notation one could also take {t} to be an element of {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}).

Fourier inversion yields a functional equation for the polynomial {P}:

Proposition 1 (Functional equation) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. There exists a phase {e(\theta)} (depending on {t,\chi}) such that

\displaystyle  a_{N-m}^1 = e(\theta) \overline{c^1_m}

for all {0 \leq m \leq N}, or equivalently that

\displaystyle  P(1/Z) = e^{i\theta} Z^{-N} \overline{P}(Z)

where {\overline{P}(Z) := \overline{P(\overline{Z})}}.

Proof: We can normalise {t=0}. Let {G} be the finite field {{\mathbb F}[X] / Q {\mathbb F}[X]}. We can write

\displaystyle  a_{N-m} = q^{-(N-m)/2} \sum_{n \in q^{N-m} + H_{N-m}} \chi(n)

where {H_j} denotes the subgroup of {G} consisting of (residue classes of) polynomials of degree less than {j}. Let {e_G: G \rightarrow S^1} be a non-trivial character of {G} whose kernel lies in the space {H_N} (this is easily achieved by pulling back a non-trivial character from the quotient {G/H_N \equiv {\mathbb F}}). We can use the Fourier inversion formula to write

\displaystyle  a_{N-m} = q^{(m-N)/2} \sum_{\xi \in G} \hat \chi(\xi) \sum_{n \in T^{N-m} + H_{N-m}} e_G( n\xi )

where

\displaystyle  \hat \chi(\xi) := q^{-N-1} \sum_{n \in G} \chi(n) e_G(-n\xi).

From change of variables we see that {\hat \chi} is a scalar multiple of {\overline{\chi}}; from Plancherel we conclude that

\displaystyle  \hat \chi = e(\theta_0) q^{-(N+1)/2} \overline{\chi} \ \ \ \ \ (3)

for some phase {e(\theta_0)}. We conclude that

\displaystyle  a_{N-m} = e(\theta_0) q^{-(2N-m+1)/2} \sum_{\xi \in G} \overline{\chi}(\xi) e_G( T^{N-j} \xi) \sum_{n \in H_{N-j}} e_G( n\xi ). \ \ \ \ \ (4)

The inner sum {\sum_{n \in H_{N-m}} e_G( n\xi )} equals {q^{N-m}} if {\xi \in H_{j+1}}, and vanishes otherwise, thus

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{\xi \in H_{j+1}} \overline{\chi}(\xi) e_G( T^{N-m} \xi).

For {\xi} in {H_j}, {e_G(T^{N-m} \xi)=1} and the contribution of the sum vanishes as {\chi} is odd. Thus we may restrict {\xi} to {H_{m+1} \backslash H_m}, so that

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{h \in {\mathbb F}^\times} e_G( T^{N} h) \sum_{\xi \in h T^m + H_{m}} \overline{\chi}(\xi).

By the multiplicativity of {\chi}, this factorises as

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} (\sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h)) (\sum_{\xi \in T^m + H_{m}} \overline{\chi}(\xi)).

From the one-dimensional version of (3) (and the fact that {\chi} is odd) we have

\displaystyle  \sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h) = e(\theta_1) q^{1/2}

for some phase {e(\theta_1)}. The claim follows. \Box

As one corollary of the functional equation, {a_N} is a phase rotation of {\overline{a_1} = 1} and thus is non-zero, so {P} has degree exactly {N}. The functional equation is then equivalent to the {N} zeroes of {P} being symmetric across the unit circle. In fact we have the stronger

Theorem 2 (Riemann hypothesis for Dirichlet {L}-functions over function fields) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. Then all the zeroes of {P} lie on the unit circle.

We derive this result from the Riemann hypothesis for curves over function fields below the fold.

In view of this theorem (and the fact that {a_1=1}), we may write

\displaystyle  P(Z) = \mathrm{det}(1 - ZU)

for some unitary {N \times N} matrix {U = U_{t,\chi}}. It is possible to interpret {U} as the action of the geometric Frobenius map on a certain cohomology group, but we will not do so here. The situation here is simpler than in the number field case because the factor {\exp(A)} arising from very small primes is now absent (in the function field setting there are no primes of size between {1} and {q}).

We now let {\chi} vary uniformly at random over all odd characters of modulus {Q}, and {t} uniformly over {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}, independently of {\chi}; we also make the distribution of the random variable {U} conjugation invariant in {U(N)}. We use {{\mathbf E}_Q} to denote the expectation with respect to this randomness. One can then ask what the limiting distribution of {U} is in various regimes; we will focus in this post on the regime where {N} is fixed and {q} is being sent to infinity. In the spirit of the Sato-Tate conjecture, one should expect {U} to converge in distribution to the circular unitary ensemble (CUE), that is to say Haar probability measure on {U(N)}. This may well be provable from Deligne’s “Weil II” machinery (in the spirit of this monograph of Katz and Sarnak), though I do not know how feasible this is or whether it has already been done in the literature; here we shall avoid using this machinery and study what partial results towards this CUE hypothesis one can make without it.

If one lets {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U} (ordered arbitrarily), then we now have

\displaystyle  \sum_{m=0}^N c^1_m Z^m = P(Z) = \prod_{j=1}^N (1 - \lambda_j Z)

and hence the {c^1_m} are essentially elementary symmetric polynomials of the eigenvalues:

\displaystyle  c^1_m = (-1)^j e_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (5)

One can take log derivatives to conclude

\displaystyle  \frac{P'(Z)}{P(Z)} = \sum_{j=1}^N \frac{\lambda_j}{1-\lambda_j Z}.

On the other hand, as in the number field case one has the Dirichlet series expansion

\displaystyle  Z \frac{P'(Z)}{P(Z)} = \sum_{n \in {\mathbb F}[X]'} \frac{\Lambda_q(n) \chi(n)}{|n|^s}

where {s = \frac{1}{2} + it - \frac{2\pi i z}{\log T}} has sufficiently large real part, {Z = e(z/N)}, and the von Mangoldt function {\Lambda_q(n)} is defined as {\log_q |p| = \mathrm{deg} p} when {n} is the power of an irreducible {p} and {0} otherwise. We conclude the “explicit formula”

\displaystyle  c^{\Lambda_q}_m = \sum_{j=1}^N \lambda_j^m = \mathrm{tr}(U^m) \ \ \ \ \ (6)

for {m \geq 1}, where

\displaystyle  c^{\Lambda_q}_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \Lambda_q(n) \chi(n).

Similarly on inverting {P(Z)} we have

\displaystyle  P(Z)^{-1} = \prod_{j=1}^N (1 - \lambda_j Z)^{-1}.

Since we also have

\displaystyle  P(Z)^{-1} = \sum_{n \in {\mathbb F}[X]'} \frac{\mu(n) \chi(n)}{|n|^s}

for {s} sufficiently large real part, where the Möbius function {\mu(n)} is equal to {(-1)^k} when {n} is the product of {k} distinct irreducibles, and {0} otherwise, we conclude that the Möbius coefficients

\displaystyle  c^\mu_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \mu(n) \chi(n)

are just the complete homogeneous symmetric polynomials of the eigenvalues:

\displaystyle  c^\mu_m = h_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (7)

One can then derive various algebraic relationships between the coefficients {c^1_m, c^{\Lambda_q}_m, c^\mu_m} from various identities involving symmetric polynomials, but we will not do so here.

What do we know about the distribution of {U}? By construction, it is conjugation-invariant; from (2) it is also invariant with respect to the rotations {U \rightarrow e^{i\theta} U} for any phase {\theta \in{\bf R}}. We also have the function field analogue of the Rudnick-Sarnak asymptotics:

Proposition 3 (Rudnick-Sarnak asymptotics) Let {a_1,\dots,a_k,b_1,\dots,b_k} be nonnegative integers. If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment

\displaystyle  {\bf E}_{Q} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (9)

is equal to {o(1)} in the limit {q \rightarrow \infty} (holding {N,a_1,\dots,a_k,b_1,\dots,b_k} fixed) unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j! + o(1). \ \ \ \ \ (10)

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE hypothesis (and also with the ACUE hypothesis, again by the previous post). The case {\sum_{j=1}^k a_j + \sum_{j=1}^k b_j \leq 2} of this proposition was essentially established by Andrade, Miller, Pratt, and Trinh.

Proof: We may assume the homogeneity relationship

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j \ \ \ \ \ (11)

since otherwise the claim follows from the invariance under phase rotation {U \mapsto e^{i\theta} U}. By (6), the expression (9) is equal to

\displaystyle  q^{-D} {\bf E}_Q \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'} \in {\mathbb F}[X]': |n_i| = q^{s_i}, |n'_i| = q^{s'_i}} (\prod_{i=1}^l \Lambda_q(n_i) \chi(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i) \overline{\chi(n'_i)}

where

\displaystyle  D := \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j

\displaystyle  l := \sum_{j=1}^k a_j

\displaystyle  l' := \sum_{j=1}^k b_j

and {s_1 \leq \dots \leq s_l} consists of {a_j} copies of {j} for each {j=1,\dots,k}, and similarly {s'_1 \leq \dots \leq s'_{l'}} consists of {b_j} copies of {j} for each {j=1,\dots,k}.

The polynomials {n_1 \dots n_l} and {n'_1 \dots n'_{l'}} are monic of degree {D}, which by hypothesis is less than the degree of {Q}, and thus they can only be scalar multiples of each other in {{\mathbb F}[X] / Q {\mathbb F}[X]} if they are identical (in {{\mathbb F}[X]}). As such, we see that the average

\displaystyle  {\bf E}_Q \chi(n_1) \dots \chi(n_l) \overline{\chi(n'_1)} \dots \overline{\chi(n'_{l'})}

vanishes unless {n_1 \dots n_l = n'_1 \dots n'_{l'}}, in which case this average is equal to {1}. Thus the expression (9) simplifies to

\displaystyle  q^{-D} \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}: |n_i| = q^{s_i}, |n'_i| = q^{s'_i}; n_1 \dots n_l = n'_1 \dots n'_l} (\prod_{i=1}^l \Lambda_q(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i).

There are at most {q^D} choices for the product {n_1 \dots n_l}, and each one contributes {O_D(1)} to the above sum. All but {o(q^D)} of these choices are square-free, so by accepting an error of {o(1)}, we may restrict attention to square-free {n_1 \dots n_l}. This forces {n_1,\dots,n_l,n'_1,\dots,n'_{l'}} to all be irreducible (as opposed to powers of irreducibles); as {{\mathbb F}[X]} is a unique factorisation domain, this forces {l=l'} and {n_1,\dots,n_l} to be a permutation of {n'_1,\dots,n'_{l'}}. By the size restrictions, this then forces {a_j = b_j} for all {j} (if the above expression is to be anything other than {o(1)}), and each {n_1,\dots,n_l} is associated to {\prod_{j=1}^k a_j!} possible choices of {n'_1,\dots,n'_{l'}}. Writing {\Lambda_q(n'_i) = s'_i} and then reinstating the non-squarefree possibilities for {n_1 \dots n_l}, we can thus write the above expression as

\displaystyle  q^{-D} \prod_{j=1}^k j a_j! \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}\in {\mathbb F}[X]': |n_i| = q^{s_i}} \prod_{i=1}^l \Lambda_q(n_i) + o(1).

Using the prime number theorem {\sum_{n \in {\mathbb F}[X]': |n| = q^s} \Lambda_q(n) = q^s}, we obtain the claim. \Box

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE and ACUE hypotheses:

Corollary 4 (CUE statistics at low frequencies) Let {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U}, permuted uniformly at random. Let {R(\lambda)} be a linear combination of monomials {\lambda_1^{a_1} \dots \lambda_N^{a_N}} where {a_1,\dots,a_N} are integers with either {\sum_{j=1}^N a_j \neq 0} or {\sum_{j=1}^N |a_j| \leq 2N}. Then

\displaystyle  {\bf E}_Q R(\lambda) = {\bf E}_{CUE} R(\lambda) + o(1).

The analogue of the GUE hypothesis in this setting would be the CUE hypothesis, which asserts that the threshold {2N} here can be replaced by an arbitrarily large quantity. As far as I know this is not known even for {2N+2} (though, as mentioned previously, in principle one may be able to resolve such cases using Deligne’s proof of the Riemann hypothesis for function fields). Among other things, this would allow one to distinguish CUE from ACUE, since as discussed in the previous post, these two distributions agree when tested against monomials up to threshold {2N}, though not to {2N+2}.

Proof: By permutation symmetry we can take {R} to be symmetric, and by linearity we may then take {R} to be the symmetrisation of a single monomial {\lambda_1^{a_1} \dots \lambda_N^{a_N}}. If {\sum_{j=1}^N a_j \neq 0} then both expectations vanish due to the phase rotation symmetry, so we may assume that {\sum_{j=1}^N a_j \neq 0} and {\sum_{j=1}^N |a_j| \leq 2N}. We can write this symmetric polynomial as a constant multiple of {\mathrm{tr}(U^{a_1}) \dots \mathrm{tr}(U^{a_N})} plus other monomials with a smaller value of {\sum_{j=1}^N |a_j|}. Since {\mathrm{tr}(U^{-a}) = \overline{\mathrm{tr}(U^a)}}, the claim now follows by induction from Proposition 3 and Proposition 1 from the previous post. \Box

Thus, for instance, for {k=1,2}, the {2k^{th}} moment

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = {\bf E}_Q |P(1)|^{2k} = {\bf E}_Q |L(\frac{1}{2} + it, \chi)|^{2k}

is equal to

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} + o(1)

because all the monomials in {\prod_{j=1}^N (1-\lambda_j)^k (1-\lambda_j^{-1})^k} are of the required form when {k \leq 2}. The latter expectation can be computed exactly (for any natural number {k}) using a formula

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} = \prod_{j=1}^N \frac{\Gamma(j) \Gamma(j+2k)}{\Gamma(j+k)^2}

of Baker-Forrester and Keating-Snaith, thus for instance

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^2 = N+1

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^4 = \frac{(N+1)(N+2)^2(N+3)}{12}

and more generally

\displaystyle  {\bf E}_{CUE}|\det(1-U)|^{2k} = \frac{g_k+o(1)}{(k^2)!} N^{k^2}

when {N \rightarrow \infty}, where {g_k} are the integers

\displaystyle  g_1 = 1, g_2 = 2, g_3 = 42, g_4 = 24024, \dots

and more generally

\displaystyle  g_k := \frac{(k^2)!}{\prod_{i=1}^{2k-1} i^{k-|k-i|}}

(OEIS A039622). Thus we have

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = \frac{g_k+o(1)}{k^2!} N^{k^2}

for {k=1,2} if {Q \rightarrow \infty} and {N} is sufficiently slowly growing depending on {Q}. The CUE hypothesis would imply that that this formula also holds for higher {k}. (The situation here is cleaner than in the number field case, in which the GUE hypothesis only suggests the correct lower bound for the moments rather than an asymptotic, due to the absence of the wildly fluctuating additional factor {\exp(A)} that is present in the Riemann zeta function model.)

Now we can recover the analogue of Montgomery’s work on the pair correlation conjecture. Consider the statistic

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j )

where

\displaystyle R(z) = \sum_m \hat R(m) z^m

is some finite linear combination of monomials {z^m} independent of {q}. We can expand the above sum as

\displaystyle  \sum_m \hat R(m) {\bf E}_Q \mathrm{tr}(U^m) \mathrm{tr}(U^{-m}).

Assuming the CUE hypothesis, then by Example 3 of the previous post, we would conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 \hat R(0) + \sum_m \min(|m|,N) \hat R(m) + o(1). \ \ \ \ \ (12)

This is the analogue of Montgomery’s pair correlation conjecture. Proposition 3 implies that this claim is true whenever {\hat R} is supported on {[-N,N]}. If instead we assume the ACUE hypothesis (or the weaker Alternative Hypothesis that the phase gaps are non-zero multiples of {1/2N}), one should instead have

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = \sum_{k \in {\bf Z}} N^2 \hat R(2Nk) + \sum_{1 \leq |m| \leq N} |m| \hat R(m+2Nk) + o(1)

for arbitrary {R}; this is the function field analogue of a recent result of Baluyot. In any event, since {\mathrm{tr}(U^m) \mathrm{tr}(U^{-m})} is non-negative, we unconditionally have the lower bound

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1). \ \ \ \ \ (13)

if {\hat R(m)} is non-negative for {|m| > N}.

By applying (12) for various choices of test functions {R} we can obtain various bounds on the behaviour of eigenvalues. For instance suppose we take the Fejér kernel

\displaystyle  R(z) = |1 + z + \dots + z^N|^2 = \sum_{m=-N}^N (N+1-|m|) z^m.

Then (12) applies unconditionally and we conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 (N+1) + \sum_{1 \leq |m| \leq N} (N+1-|m|) |m| + o(1).

The right-hand side evaluates to {\frac{2}{3} N(N+1)(2N+1)+o(1)}. On the other hand, {R(\lambda_i/\lambda_j)} is non-negative, and equal to {(N+1)^2} when {\lambda_i = \lambda_j}. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} 1_{\lambda_i = \lambda_j} \leq \frac{2}{3} \frac{N(2N+1)}{N+1} + o(1).

The sum {\sum_{1 \leq j \leq N} 1_{\lambda_i = \lambda_j}} is at least {1}, and is at least {2} if {\lambda_i} is not a simple eigenvalue. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i, \leq N} 1_{\lambda_i \hbox{ not simple}} \leq \frac{1}{3} \frac{N(N-1)}{N+1} + o(1),

and thus the expected number of simple eigenvalues is at least {\frac{2N}{3} \frac{N+4}{N+1} + o(1)}; in particular, at least two thirds of the eigenvalues are simple asymptotically on average. If we had (12) without any restriction on the support of {\hat R}, the same arguments allow one to show that the expected proportion of simple eigenvalues is {1-o(1)}.

Suppose that the phase gaps in {U} are all greater than {c/N} almost surely. Let {\hat R} is non-negative and {R(e^{i\theta})} non-positive for {\theta} outside of the arc {[-c/N,c/N]}. Then from (13) one has

\displaystyle  R(0) N \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1),

so by taking contrapositives one can force the existence of a gap less than {c/N} asymptotically if one can find {R} with {\hat R} non-negative, {R} non-positive for {\theta} outside of the arc {[-c/N,c/N]}, and for which one has the inequality

\displaystyle  R(0) N < N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m).

By a suitable choice of {R} (based on a minorant of Selberg) one can ensure this for {c \approx 0.6072} for {N} large; see Section 5 of these notes of Goldston. This is not the smallest value of {c} currently obtainable in the literature for the number field case (which is currently {0.50412}, due to Goldston and Turnage-Butterbaugh, by a somewhat different method), but is still significantly less than the trivial value of {1}. On the other hand, due to the compatibility of the ACUE distribution with Proposition 3, it is not possible to lower {c} below {0.5} purely through the use of Proposition 3.

In some cases it is possible to go beyond Proposition 3. Consider the mollified moment

\displaystyle  {\bf E}_Q |M(U) P(1)|^2

where

\displaystyle  M(U) = \sum_{m=0}^d a_m h_m(\lambda_1,\dots,\lambda_N)

for some coefficients {a_0,\dots,a_d}. We can compute this moment in the CUE case:

Proposition 5 We have

\displaystyle  {\bf E}_{CUE} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2.

Proof: From (5) one has

\displaystyle  P(1) = \sum_{i=0}^N (-1)^i e_i(\lambda_1,\dots,\lambda_N)

hence

\displaystyle  M(U) P(1) = \sum_{i=0}^N \sum_{m=0}^d (-1)^i a_m e_i h_m

where we suppress the dependence on the eigenvalues {\lambda}. Now observe the Pieri formula

\displaystyle  e_i h_m = s_{m 1^i} + s_{(m+1) 1^{i-1}}

where {s_{m 1^i}} are the hook Schur polynomials

\displaystyle  s_{m 1^i} = \sum_{a_1 \leq \dots \leq a_m; a_1 < b_1 < \dots < b_i} \lambda_{a_1} \dots \lambda_{a_m} \lambda_{b_1} \dots \lambda_{b_i}

and we adopt the convention that {s_{m 1^i}} vanishes for {i = -1}, or when {m = 0} and {i > 0}. Then {s_{m1^i}} also vanishes for {i\geq N}. We conclude that

\displaystyle  M(U) P(1) = a_0 s_{0 1^0} + \sum_{0 \leq i \leq N-1} \sum_{m \geq 1} (-1)^i (a_m - a_{m-1}) s_{m 1^i}.

As the Schur polynomials are orthonormal on the unitary group, the claim follows. \Box

The CUE hypothesis would then imply the corresponding mollified moment conjecture

\displaystyle  {\bf E}_{Q} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2 + o(1). \ \ \ \ \ (14)

(See this paper of Conrey, and this paper of Radziwill, for some discussion of the analogous conjecture for the zeta function, which is essentially due to Farmer.)

From Proposition 3 one sees that this conjecture holds in the range {d \leq \frac{1}{2} N}. It is likely that the function field analogue of the calculations of Conrey (based ultimately on deep exponential sum estimates of Deshouillers and Iwaniec) can extend this range to {d < \theta N} for any {\theta < \frac{4}{7}}, if {N} is sufficiently large depending on {\theta}; these bounds thus go beyond what is available from Proposition 3. On the other hand, as discussed in Remark 7 of the previous post, ACUE would also predict (14) for {d} as large as {N-2}, so the available mollified moment estimates are not strong enough to rule out ACUE. It would be interesting to see if there is some other estimate in the function field setting that can be used to exclude the ACUE hypothesis (possibly one that exploits the fact that GRH is available in the function field case?).

Read the rest of this entry »

In a recent post I discussed how the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of {P_t} should all lie on the unit circle, and one should then be able to write {P_t} as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix {U = U_t \in U(N)}, which we normalise as

\displaystyle  P_t(Z) = \exp(A_t) \mathrm{det}(1 - ZU). \ \ \ \ \ (2)

Here {A_t} is some quantity depending on {t}. We view {U} as a random element of {U(N)}; in the limit {T \rightarrow \infty}, the GUE hypothesis is equivalent to {U} becoming equidistributed with respect to Haar measure on {U(N)} (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view {U} as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

\displaystyle  -\frac{P'_t(Z)}{P_t(Z)} = \mathrm{tr}( U (1-ZU)^{-1} ) = \sum_{j=1}^\infty Z^{j-1} \mathrm{tr} U^j \ \ \ \ \ (3)

and hence on taking logarithmic derivatives of (1) in the {z} variable we (heuristically) have

\displaystyle  -\frac{2\pi i}{\log T} \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx \frac{2\pi i}{N} \sum_{j=1}^\infty e^{2\pi i jz/N} \mathrm{tr} U^j.

Morally speaking, we have

\displaystyle  - \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^{1/2+it}} e^{2\pi i z (\log n/\log T)}

so on comparing coefficients we expect to interpret the moments {\mathrm{tr} U^j} of {U} as a finite Dirichlet series:

\displaystyle  \mathrm{tr} U^j \approx \frac{N}{\log T} \sum_{T^{(j-1)/N} < n \leq T^{j/N}} \frac{\Lambda(n)}{n^{1/2+it}}. \ \ \ \ \ (4)

To understand the distribution of {U} in the unitary group {U(N)}, it suffices to understand the distribution of the moments

\displaystyle  {\bf E}_t \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (5)

where {{\bf E}_t} denotes averaging over {t \in [T,2T]}, and {k, a_1,\dots,a_k, b_1,\dots,b_k \geq 0}. The GUE hypothesis asserts that in the limit {T \rightarrow \infty}, these moments converge to their CUE counterparts

\displaystyle  {\bf E}_{\mathrm{CUE}} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (6)

where {U} is now drawn uniformly in {U(n)} with respect to the CUE ensemble, and {{\bf E}_{\mathrm{CUE}}} denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j. \ \ \ \ \ (7)

This follows from the fact that for any phase {\theta \in {\bf R}}, {e(\theta) U} has the same distribution as {U}, where we use the number theory notation {e(\theta) := e^{2\pi i\theta}}.

In the case when the degree {\sum_{j=1}^k j a_j} is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model) If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment (6) vanishes unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j!. \ \ \ \ \ (9)

Another way of viewing this proposition is that for {U} distributed according to CUE, the random variables {\mathrm{tr} U^j} are distributed like independent complex random variables of mean zero and variance {j}, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of {a_j}, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of {j}‘s and lets {N} go to infinity. (The paper of Diaconis and Shahshahani writes {\sum_{j=1}^k a_j + b_j} in place of {\sum_{j=1}^k j a_j}, but I believe this to be a typo.)

Proof: Let {D} be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

\displaystyle  D = \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j.

Our starting point is Schur-Weyl duality. Namely, we consider the {n^D}-dimensional complex vector space

\displaystyle  ({\bf C}^n)^{\otimes D} = {\bf C}^n \otimes \dots \otimes {\bf C}^n.

This space has an action of the product group {S_D \times GL_n({\bf C})}: the symmetric group {S_D} acts by permutation on the {D} tensor factors, while the general linear group {GL_n({\bf C})} acts diagonally on the {{\bf C}^n} factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

\displaystyle  ({\bf C}^n)^{\otimes D} \equiv \bigoplus_\lambda V^\lambda_{S_D} \otimes V^\lambda_{GL_n({\bf C})} \ \ \ \ \ (10)

where {\lambda} ranges over Young tableaux of size {D} with at most {n} rows, {V^\lambda_{S_D}} is the {S_D}-irreducible unitary representation corresponding to {\lambda} (which can be constructed for instance using Specht modules), and {V^\lambda_{GL_n({\bf C})}} is the {GL_n({\bf C})}-irreducible polynomial representation corresponding with highest weight {\lambda}.

Let {\pi \in S_D} be a permutation consisting of {a_j} cycles of length {j} (this is uniquely determined up to conjugation), and let {g \in GL_n({\bf C})}. The pair {(\pi,g)} then acts on {({\bf C}^n)^{\otimes D}}, with the action on basis elements {e_{i_1} \otimes \dots \otimes e_{i_D}} given by

\displaystyle  g e_{\pi(i_1)} \otimes \dots \otimes g_{\pi(i_D)}.

The trace of this action can then be computed as

\displaystyle  \sum_{i_1,\dots,i_D \in \{1,\dots,n\}} g_{\pi(i_1),i_1} \dots g_{\pi(i_D),i_D}

where {g_{i,j}} is the {ij} matrix coefficient of {g}. Breaking up into cycles and summing, this is just

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j}.

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(g) \ \ \ \ \ (11)

where {\chi_\lambda: S_D \rightarrow {\bf C}} is the character on {S_D} associated to {V^\lambda_{S_D}}, and {s_\lambda: GL_n({\bf C}) \rightarrow {\bf C}} is the character on {GL_n({\bf C})} associated to {V^\lambda_{GL_n({\bf C})}}. As is well known, {s_\lambda(g)} is just the Schur polynomial of weight {\lambda} applied to the (algebraic, generalised) eigenvalues of {g}. We can specialise to unitary matrices to conclude that

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(U)

and similarly

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{b_j} = \sum_\lambda \chi_\lambda(\pi') s_\lambda(U)

where {\pi' \in S_D} consists of {b_j} cycles of length {j} for each {j=1,\dots,k}. On the other hand, the characters {s_\lambda} are an orthonormal system on {L^2(U(N))} with the CUE measure. Thus we can write the expectation (6) as

\displaystyle  \sum_\lambda \chi_\lambda(\pi) \overline{\chi_\lambda(\pi')}. \ \ \ \ \ (12)

Now recall that {\lambda} ranges over all the Young tableaux of size {D} with at most {N} rows. But by (8) we have {D \leq N}, and so the condition of having {N} rows is redundant. Hence {\lambda} now ranges over all Young tableaux of size {D}, which as is well known enumerates all the irreducible representations of {S_D}. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if {\pi}, {\pi'} are not conjugate, and is equal to {D!} divided by the size of the conjugacy class of {\pi} (or equivalently, by the size of the centraliser of {\pi}) otherwise. But the latter expression is easily computed to be {\prod_{j=1}^k j^{a_j} a_j!}, giving the claim. \Box

Example 2 We illustrate the identity (11) when {D=3}, {n \geq 3}. The Schur polynomials are given as

\displaystyle  s_{3}(g) = \sum_i \lambda_i^3 + \sum_{i<j} \lambda_i^2 \lambda_j + \lambda_i \lambda_j^2 + \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{2,1}(g) = \sum_{i < j} \lambda_i^2 \lambda_j + \sum_{i < j,k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{1,1,1}(g) = \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

where {\lambda_1,\dots,\lambda_n} are the (generalised) eigenvalues of {g}, and the formula (11) in this case becomes

\displaystyle  \mathrm{tr}(g^3) = s_{3}(g) - s_{2,1}(g) + s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g^2) \mathrm{tr}(g) = s_{3}(g) - s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g)^3 = s_{3}(g) + 2 s_{2,1}(g) + s_{1,1,1}(g).

The functions {s_{1,1,1}, s_{2,1}, s_3} are orthonormal on {U(n)}, so the three functions {\mathrm{tr}(g^3), \mathrm{tr}(g^2) \mathrm{tr}(g), \mathrm{tr}(g)^3} are also, and their {L^2} norms are {\sqrt{3}}, {\sqrt{2}}, and {\sqrt{6}} respectively, reflecting the size in {S_3} of the centralisers of the permutations {(123)}, {(12)}, and {\mathrm{id}} respectively. If {n} is instead set to say {2}, then the {s_{1,1,1}} terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3 Consider the moment {{\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2}. For {j \leq N}, the above proposition shows us that this moment is equal to {D}. What happens for {j>N}? The formula (12) computes this moment as

\displaystyle  \sum_\lambda |\chi_\lambda(\pi)|^2

where {\pi} is a cycle of length {j} in {S_j}, and {\lambda} ranges over all Young tableaux with size {j} and at most {N} rows. The Murnaghan-Nakayama rule tells us that {\chi_\lambda(\pi)} vanishes unless {\lambda} is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space {{\bf C}^j_{\sum=0}} of vectors in {{\bf C}^j} whose coordinates sum to zero), in which case it is equal to {\pm 1} (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to {N}. Thus in general we have

\displaystyle  {\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2 = \min(j,N). \ \ \ \ \ (13)

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges {T^{(j-1)/N} < n \leq T^{j/N}} and {T^{(k-1)/N} < n \leq T^{k/N}} is not quite the range {T^{(j+k-1)/N} < n \leq T^{j+k/N}} but instead leaks into the adjacent range {T^{(j+k-2)/N} < n \leq T^{j+k-1/N}}. This issue can be addressed by working in a “weak" sense in which parameters such as {j,k} are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of {t} to the range {[T,2T]} (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J,m_1,\dots,m_K} \frac{\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}{n_1^{1/2} \dots n_J^{1/2} m_1^{1/2} \dots m_K^{1/2}} \times \ \ \ \ \ (14)

\displaystyle  \times {\bf E}_t (m_1 \dots m_K / n_1 \dots n_J)^{it}

where {J := \sum_{j=1}^k a_j}, {K := \sum_{j=1}^k b_j}, and the integers {n_i,m_i} are in the ranges

\displaystyle  T^{(j-1)/N} < n_{a_1 + \dots + a_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq a_j}, and

\displaystyle  T^{(j-1)/N} < m_{b_1 + \dots + b_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq b_j}. Morally, the expectation here is negligible unless

\displaystyle  m_1 \dots m_K = (1 + O(1/T)) n_1 \dots n_J \ \ \ \ \ (15)

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then {n_1 \dots n_J} is significantly less than {T}, so the {O(1/T)} multiplicative error in (15) becomes an additive error of {o(1)}. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least {1} – this forces the integers {m_1 \dots m_K}, {n_1 \dots n_J} to in fact be equal:

\displaystyle  m_1 \dots m_K = n_1 \dots n_J. \ \ \ \ \ (16)

The von Mangoldt factors {\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)} effectively restrict {n_1,\dots,n_J,m_1,\dots,m_K} to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces {J=K}, and {n_1,\dots,n_J} to be a permutation of {m_1,\dots,m_K}, which then forces {a_j = b_j} for all {j=1,\dots,k}._ For a given {n_1,\dots,n_J}, the number of possible {m_1 \dots m_K} is then {\prod_{j=1}^k a_j!}, and the expectation in (14) is equal to {1}. Thus this expectation is morally

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J} \frac{\Lambda^2(n_1) \dots \Lambda^2(n_J) }{n_1 \dots n_J} \prod_{j=1}^k a_j!

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree {1} case of pair correlations due to Montgomery, and the degree {2} case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for {{\mathbf E}_t}, thus

\displaystyle  {\bf E}_{t} |\mathrm{tr} U^j|^2 \approx \min(j,N) \ \ \ \ \ (17)

for all {j \geq 0}. Montgomery showed this for (essentially) the range {j \leq N} (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of {U}. One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose {Q_t(Z)} is some random polynomial depending on {t} of degree at most {N}. Let {\lambda_1,\dots,\lambda_n} denote the eigenvalues of {U}, and let {c > 0} be a parameter. Observe from the pigeonhole principle that if the quantity

\displaystyle  \sum_{j=1}^n \int_0^{c/N} |Q_t( e(\theta) \lambda_j )|^2\ d\theta \ \ \ \ \ (18)

exceeds the quantity

\displaystyle  \int_{0}^{2\pi} |Q_t(e(\theta))|^2\ d\theta, \ \ \ \ \ (19)

then the arcs {\{ e(\theta) \lambda_j: 0 \leq \theta \leq c \}} cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than {c/N} ({c} times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than {c} times the mean angle separation. By judiciously choosing the coefficients of {Q_t} as functions of the moments {\mathrm{tr}(U^j)}, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

\displaystyle  \frac{1}{2\pi i} \int_0^{c/N} (\int_{|z| = 1+\varepsilon} - \int_{|z|=1-\varepsilon}) Q_t( e(\theta) z ) \overline{Q_t}( \frac{1}{e(\theta) z} ) \frac{P'_t(z)}{P_t(z)}\ dz

for sufficiently small {\varepsilon>0}, and this can be computed (in principle, at least) using (3) if the coefficients of {Q_t} are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than {\mu} times the mean spacing and greater than {\lambda} times the mean spacing infinitely often for certain {0 < \mu < 1 < \lambda}; the current records are {\mu = 0.50412} (due to Goldston and Turnage-Butterbaugh) and {\lambda = 3.18} (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound {\mu} for the smallest gap below {1/2}. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases {\theta} of the eigenvalues {e^{2\pi i\theta}} of {U} are nasymptotically always non-zero integer multiples of {1/2N}. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of {T}); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which {L(1,\chi)} is very close to zero, then {1*\chi} behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined {L}-function {\zeta(s) L(s,\chi)} will have a polynomial approximation which in our language looks like a scalar multiple of {1 + e(\theta) Z^{2N+M}}, where {q \approx T^{M/N}} and {\theta} is a phase. The zeroes of this approximation lie on a coset of the {(2N+M)^{th}} roots of unity; the polynomial {P} is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of {1/(2N+M)}. Taking {M = o(N)} then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation {{\bf E}_{ACUE}} for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of {1/2N}. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group {U(n)} in terms of the distribution of the phases {\theta_1,\dots,\theta_N \in {\bf R}/{\bf Z}} of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

\displaystyle  \frac{1}{N!} |V(\theta)|^2\ d\theta_1 \dots d\theta_N; \ \ \ \ \ (20)

where

\displaystyle  V(\theta) := \prod_{1 \leq i<j \leq N} (e(\theta_i)-e(\theta_j))

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

\displaystyle  V(\theta) = \sum_{\pi \in S_N} \mathrm{sgn}(\pi) e(\theta \cdot \pi(\rho))

where {\theta := (\theta_1,\dots,\theta_N)}, {\cdot} denotes the dot product, and {\rho := (1,2,\dots,N)} is the “long word”, which implies that (20) is a trigonometric series with constant term {1}; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing {(\theta_1,\dots,\theta_n) \in ({\bf R}/{\bf Z})^N} using the probability measure (20), and then generating {U} to be a random unitary matrix with eigenvalues {e(\theta_1),\dots,e(\theta_N)}.

For the alternative distribution, we first draw {(\theta_1,\dots,\theta_N)} on the discrete torus {(\frac{1}{2N}{\bf Z}/{\bf Z})^N} (thus each {\theta_j} is a {2N^{th}} root of unity) with probability density function

\displaystyle  \frac{1}{(2N)^N} \frac{1}{N!} |V(\theta)|^2 \ \ \ \ \ (21)

shift by a phase {\alpha \in {\bf R}/{\bf Z}} drawn uniformly at random, and then select {U} to be a random unitary matrix with eigenvalues {e^{i(\theta_1+\alpha)}, \dots, e^{i(\theta_N+\alpha)}}. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. The diagonal contribution {\pi=\pi'} gives the constant function {\frac{1}{(2N)^N}}, which has total mass one. All of the other exponentials have a frequency {\pi(\rho)-\pi'(\rho)} that is not a multiple of {2N}, and hence will have mean zero on {(\frac{1}{2N}{\bf Z}/{\bf Z})^N}. The claim follows.

From construction it is clear that the matrix {U} drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of {1/2N}. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials {s_\lambda} with {\lambda} of size at most {N} and of equal size remain orthonormal with respect to the alternative measure. That is to say,

\displaystyle  \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{CUE}}(U) = \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{ACUE}}(U)

when {\lambda,\lambda'} have size equal to each other and at most {N}. In this case the phase {\alpha} in the definition of {U} is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

\displaystyle  \int_{({\bf R}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2\ d\theta = \frac{1}{(2N)^N} \sum_{\theta \in (\frac{1}{2N}{\bf Z}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2.

By Fourier decomposition, it then suffices to show that the trigonometric polynomial {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} does not contain any components of the form {e( \theta \cdot 2N k)} for some non-zero lattice vector {k \in {\bf Z}^N}. But we have already observed that {|V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. Also, as is well known, {s_\lambda(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa )} where {\kappa} is majorised by {\lambda}, and similarly {s_{\lambda'}(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa' )} where {\kappa'} is majorised by {\lambda'}. So the product {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\kappa - \kappa' + \pi(\rho) - \pi'(\rho)))}. But every coefficient of the vector {\kappa - \kappa' + \pi(\rho) - \pi'(\rho)} lies between {1-2N} and {2N-1}, and so cannot be of the form {2Nk} for any non-zero lattice vector {k}, giving the claim.

Example 4 If {N=2}, then the distribution (21) assigns a probability of {\frac{1}{4^2 2!} 2} to any pair {(\theta_1,\theta_2) \in (\frac{1}{4} {\bf Z}/{\bf Z})^2} that is a permuted rotation of {(0,\frac{1}{4})}, and a probability of {\frac{1}{4^2 2!} 4} to any pair that is a permuted rotation of {(0,\frac{1}{2})}. Thus, a matrix {U} drawn from the alternative distribution will be conjugate to a phase rotation of {\mathrm{diag}(1, i)} with probability {1/2}, and to {\mathrm{diag}(1,-1)} with probability {1/2}.

A similar computation when {N=3} gives {U} conjugate to a phase rotation of {\mathrm{diag}(1, e(1/6), e(1/3))} with probability {1/12}, to a phase rotation of {\mathrm{diag}( 1, e(1/6), -1)} or its adjoint with probability of {1/3} each, and a phase rotation of {\mathrm{diag}(1, e(1/3), e(2/3))} with probability {1/4}.

Remark 5 For large {N} it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of {1/2N}; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then {|\mathrm{tr}(U^j)|} is periodic in {j} with period {2N}, so from Proposition 1 for the alternative distribution one has

\displaystyle  {\bf E}_{\mathrm{ACUE}} |\mathrm{tr} U^j|^2 = \min_{k \in {\bf Z}} |j-2Nk|

which differs from (13) for any {|j| > N}. (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single {j} with {|j| > N} would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6 One can view the CUE as normalised Lebesgue measure on {U(N)} (viewed as a smooth submanifold of {{\bf C}^{N^2}}). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of {U(N)} consisting of those unitary matrices whose phase spacings are non-zero integer multiples of {1/2N}; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel {K(\theta,\theta') = \frac{1}{N} \sum_{j=0}^{N-1} e(j(\theta - \theta'))} (or one can equivalently take {K(\theta,\theta') = \frac{\sin(\pi N (\theta-\theta'))}{N\sin(\pi(\theta-\theta'))}}; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be {2N^{th}} roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of {\frac{1}{2}}). In particular, the {k}-point correlation functions of ACUE (after this rotation) are precisely the restriction of the {k}-point correlation functions of CUE after normalisation, that is to say they are proportional to {\mathrm{det}( K( \theta_i,\theta_j) )_{1 \leq i,j \leq k}}.

Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for

\displaystyle  \int_T^{2T} |\zeta(\frac{1}{2}+it)|^{2k} |Q(\frac{1}{2}+it)|^2\ dt

for some small even exponent {2k} (almost always {2} or {4}) and some short Dirichlet polynomial {Q}; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

\displaystyle  {\bf E}_t |P_t(1)|^{2k} |Q_t(1)|^2

where {Q_t} is now some random medium degree polynomial that depends on the unitary matrix {U} associated to {P_t} (and in applications will typically also contain some negative power of {\exp(A_t)} to cancel the corresponding powers of {\exp(A_t)} in {|P_t(1)|^{2k}}). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of {Q} involve products of traces {\mathrm{tr}(U^k)} of total order less than {N-k}, then in terms of the eigenvalue phases {\theta}, {|Q(1)|^2} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude less than {N-k}. On the other hand, as each coefficient of {P_t} is an elementary symmetric function of the eigenvalues, {P_t(1)} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude at most {1}. Thus {|P_t(1)|^{2k} |Q_t(1)|^2} is a linear combination of plane waves where the frequencies {\xi} have coefficients of magnitude less than {N}, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus {({\bf R}/{\bf Z})^n} by the previous arguments. In other words, {|P_t(1)|^{2k} |Q_t(1)|^2} has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier {Q_t} has degree close to or exceeding {N}, which corresponds to Dirichlet polynomials {Q} of length close to or exceeding {T}, which is far beyond current technology for such moment estimates.

Remark 8 The GUE hypothesis for the zeta function asserts that the average

\displaystyle  \lim_{T \rightarrow \infty} \frac{1}{T} \int_T^{2T} \sum_{\gamma_1,\dots,\gamma_n \hbox{ distinct}} \eta( \frac{\log T}{2\pi}(\gamma_1-t),\dots, \frac{\log T}{2\pi}(\gamma_k-t))\ dt \ \ \ \ \ (22)

is equal to

\displaystyle  \int_{{\bf R}^n} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ dx_1 \dots dx_k \ \ \ \ \ (23)

for any {k \geq 1} and any test function {\eta: {\bf R}^k \rightarrow {\bf C}}, where {K(x) := \frac{\sin \pi x}{\pi x}} is the Dyson sine kernel and {\gamma_i} are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for {U}. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

\displaystyle  \int_0^1 2^{-k} \sum_{x_1,\dots,x_k \in \frac{1}{2} {\bf Z} + \theta} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ d\theta.

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

A useful rule of thumb in complex analysis is that holomorphic functions {f(z)} behave like large degree polynomials {P(z)}. This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.

In some cases it can be convenient instead to work with polynomials {P(Z)} of another variable {Z} such as {Z = e^{2\pi i z}} (or more generally {Z=e^{2\pi i z/N}} for a scaling parameter {N}). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula

\displaystyle  \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)

one ends up having the following heuristic approximation in the neighbourhood of a point {\frac{1}{2}+it} on the critical line:

Heuristic 1 (Polynomial approximation) Let {T \ggg 1} be a height, let {t} be a “typical” element of {[T,2T]}, and let {1 \lll N \ll \log T} be an integer. Let {\phi_t = \phi_{t,T}: {\bf C} \rightarrow {\bf C}} be the linear change of variables

\displaystyle  \phi_t(z) := \frac{1}{2} + it - \frac{2\pi i z}{\log T}.

Then one has an approximation

\displaystyle  \zeta( \phi_t(z) ) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (2)

for {z = o(N)} and some polynomial {P_t = P_{t,T}} of degree {N}.

The requirement {z=o(N)} is necessary since the right-hand side is periodic with period {N} in the {z} variable (or period {\frac{2\pi i N}{\log T}} in the {s = \phi_t(z)} variable), whereas the zeta function is not expected to have any such periodicity, even approximately.

Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with {\mathrm{Im}(s) = O(T)}) we have an approximate form

\displaystyle  \zeta(s) \approx \sum_{n \leq T} \frac{1}{n^s}

of (11). If we group the integers {n} from {1} to {T} into {N} bins depending on what powers of {T^{1/N}} they lie between, we thus have

\displaystyle  \zeta(s) \approx \sum_{j=0}^N \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^s}

For {s = \phi_t(z)} with {z = o(N)} and {T^{j/N} \leq n < T^{(j+1)/N}} we heuristically have

\displaystyle  \frac{1}{n^s} \approx \frac{1}{n^{\frac{1}{2}+it}} e^{2\pi i j z / N}

and so

\displaystyle  \zeta(s) \approx \sum_{j=0}^N a_j(t) (e^{2\pi i z/N})^j

where {a_j(t)} are the partial Dirichlet series

\displaystyle  a_j(t) \approx \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^{\frac{1}{2}+it}}. \ \ \ \ \ (3)

This gives the desired polynomial approximation.

A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have

\displaystyle  \zeta(s) \propto \prod_\rho (s-\rho) \times \dots

where {\rho} runs over the non-trivial zeroes of {\zeta}, and there are some additional factors arising from the trivial zeroes and poles of {\zeta} which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region {s = \frac{1}{2} + it + o( N / \log T) = \phi_t( \{ z: z = o(N) \})}, the dominant contribution to this product (besides multiplicative constants) should arise from zeroes {\rho} that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical” {t} one should have about {N} such zeroes. If one lets {\rho_1,\dots,\rho_N} be any enumeration of {N} zeroes closest to {\frac{1}{2}+it}, and then repeats this set of zeroes periodically by period {\frac{2\pi i N}{\log T}}, one then expects to have an approximation of the form

\displaystyle  \zeta(s) \propto \prod_{j=1}^N \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) )

again ignoring all issues of convergence. If one writes {s = \phi_t(z)} and {\rho_j = \phi_t(\lambda_j)}, then Euler’s famous product formula for sine basically gives

\displaystyle  \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) ) \propto \prod_{k \in {\bf Z}} (z - (\lambda_j+2\pi k N) )

\displaystyle  \propto (e^{2\pi i z/N} - e^{2\pi i \lambda j/N})

(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as {\mathrm{Im}(z) \rightarrow \infty}) and hence we expect

\displaystyle  \zeta(s) \propto \prod_{j=1}^N (e^{2\pi i z/N} - e^{2\pi i \lambda j/N}).

This again gives the desired polynomial approximation.

Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show

Theorem 2 Let {N = N(T)} be an integer going sufficiently slowly to infinity. Let {W_0 \ll N} go to zero sufficiently slowly depending on {N}. Let {t} be drawn uniformly at random from {[T,2T]}. Then with probability {1-o(1)} (in the limit {T \rightarrow \infty}), and possibly after adjusting {N} by {1}, there exists a polynomial {P_t(Z)} of degree {N} and obeying the functional equation (9) below, such that

\displaystyle  \zeta( \phi_t(z) ) = (1+o(1)) P_t( e^{2\pi i z/N} ) \ \ \ \ \ (4)

whenever {|z| \leq W_0}.

It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting {N} be anything growing like {o(\log T)}, and {W_0} anything growing like {o(N)}; also we should be able to delete the need to adjust {N} by {1}. We have not attempted these optimisations here.

Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials {P_t(Z)}, which one can view as random degree {N} polynomials if {t} is interpreted as a random variable drawn uniformly at random from {[T,2T]}. These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in {Z}). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable {Z}, as per the Weil conjectures). The parameter {N} is at our disposal to choose, and reflects the scale {\approx N/\log T} at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take {N \approx \log T} (or very slightly larger), while for “microscopic” questions one would take {N} close to {1} and only growing very slowly with {T}. For the intermediate “mesoscopic” scales one would take {N} somewhere between {1} and {\log T}. Unfortunately, the statistical properties of {P_t} are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of {P_t} is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of {P_t} and related functions (e.g., {1/P_t}, {P'_t/P_t}, or {\log P_t}).

Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the {N} zeroes of the polynomial {P_t(Z)} lie on the unit circle {|Z|=1} (which, after the change of variables {Z = e^{2\pi i z/N}}, corresponds to {z} being real); in a similar vein, the GUE hypothesis corresponds to {P_t(Z)} having the asymptotic law of a random scalar {a_N(t)} times the characteristic polynomial of a random unitary {N \times N} matrix. Next, we consider what happens to the functional equation

\displaystyle  \zeta(s) = \chi(s) \zeta(1-s) \ \ \ \ \ (5)

where

\displaystyle  \chi(s) := 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s).

A routine calculation involving Stirling’s formula reveals that

\displaystyle  \chi(\frac{1}{2}+it) = (1+o(1)) e^{-2\pi i L(t)} \ \ \ \ \ (6)

with {L(t) := \frac{t}{2\pi} \log \frac{t}{2\pi} - \frac{t}{2\pi} + \frac{7}{8}}; one also has the closely related approximation

\displaystyle  \frac{\chi'}{\chi}(s) = -\log T + O(1) \ \ \ \ \ (7)

and hence

\displaystyle  \chi(\phi_t(z)) = (1+o(1)) e^{-2\pi i \theta(t)} e^{2\pi i z} \ \ \ \ \ (8)

when {z = o(\log T)}. Since {\zeta(1-s) = \overline{\zeta(\overline{1-s})}}, applying (5) with {s = \phi_t(z)} and using the approximation (2) suggests a functional equation for {P_t}:

\displaystyle  P_t(e^{2\pi i z/N}) = e^{-2\pi i L(t)} e^{2\pi i z} \overline{P_t(e^{2\pi i \overline{z}/N})}

or in terms of {Z := e^{2\pi i z/N}},

\displaystyle  P_t(Z) = e^{-2\pi i L(t)} Z^N \overline{P_t}(1/Z) \ \ \ \ \ (9)

where {\overline{P_t}(Z) := \overline{P_t(\overline{Z})}} is the polynomial {P_t} with all the coefficients replaced by their complex conjugate. Thus if we write

\displaystyle  P_t(Z) = \sum_{j=0}^N a_j Z^j

then the functional equation can be written as

\displaystyle  a_j(t) = e^{-2\pi i L(t)} \overline{a_{N-j}(t)}.

We remark that if we use the heuristic (3) (interpreting the cutoffs in the {n} summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.

Another consequence of the functional equation is that the zeroes of {P_t} are symmetric with respect to inversion {Z \mapsto 1/\overline{Z}} across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase {L(t)} is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation {e^{\pi i L(t)} P_t} of {P_t} instead.

One consequence of the functional equation is that {e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})} is real for any {\theta \in {\bf R}}; the same is then true for the derivative {e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}. Among other things, this implies that {P'_t(e^{i\theta})} cannot vanish unless {P_t(e^{i\theta})} does also; thus the zeroes of {P'_t} will not lie on the unit circle except where {P_t} has repeated zeroes. The analogous statement is true for {\zeta}; the zeroes of {\zeta'} will not lie on the critical line except where {\zeta} has repeated zeroes.

Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative {\zeta'} of the zeta function in the critical strip lie on or to the right of the critical line. The analogous result for polynomials is

Proposition 3 We have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P'_t(Z) = 0 \}

(where all zeroes are counted with multiplicity.) In particular, the zeroes of {P_t(Z)} all lie on the unit circle if and only if the zeroes of {P'_t(Z)} lie in the closed unit disk.

Proof: From the functional equation we have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P_t(Z) = 0 \}.

Thus it will suffice to show that {P_t} and {P'_t} have the same number of zeroes outside the closed unit disk.

Set {f(z) := z \frac{P'(z)}{P(z)}}, then {f} is a rational function that does not have a zero or pole at infinity. For {e^{i\theta}} not a zero of {P_t}, we have already seen that {e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})} and {e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})} are real, so on dividing we see that {i f(e^{i\theta}) - \frac{iN}{2}} is always real, that is to say

\displaystyle  \mathrm{Re} f(e^{i\theta}) = \frac{N}{2}.

(This can also be seen by writing {f(e^{i\theta}) = \sum_\lambda \frac{1}{1-e^{-i\theta} \lambda}}, where {\lambda} runs over the zeroes of {P_t}, and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When {e^{i\theta}} is a zero of {P_t}, {f(z)} has a simple pole at {e^{i\theta}} with residue a positive multiple of {e^{i\theta}}, and so {f(z)} stays on the right half-plane if one traverses a semicircular arc around {e^{i\theta}} outside the unit disk. From this and continuity we see that {f} stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim. \Box

From the functional equation and the chain rule, {Z} is a zero of {P'_t} if and only if {1/\overline{Z}} is a zero of {N P_t - P'_t}. We can thus write the above proposition in the equivalent form

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| < 1: NP_t(Z) - P'_t(Z) = 0 \}.

One can use this identity to get a lower bound on the number of zeroes of {P_t} by the method of mollifiers. Namely, for any other polynomial {M_t}, we clearly have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \}

\displaystyle \geq N - 2 \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}.

By Jensen’s formula, we have for any {r>1} that

\displaystyle  \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle \leq -(\log r) \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|\ d\theta.

We therefore have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N + \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle - \frac{1}{\log r} \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta.

As the logarithm function is concave, we can apply Jensen’s inequality to conclude

\displaystyle  {\bf E} \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N

\displaystyle + {\bf E} \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle - \frac{1}{\log r} \log \left( \frac{1}{2\pi} \int_0^{2\pi} {\bf E} |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta\right).

where the expectation is over the {t} parameter. It turns out that by choosing the mollifier {M_t} carefully in order to make {M_t P_t} behave like the function {1} (while keeping the degree {M_t} small enough that one can compute the second moment here), and then optimising in {r}, one can use this inequality to get a positive fraction of zeroes of {P_t} on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier {M_t} and of the differential operator {N - \partial_z} that implicitly appears in the above approach. (The most recent lower bound I know of is {0.4191637}, due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to {1} if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any {W \geq 1}, the number of zeroes of {P_t} that lie further than {\frac{W}{N}} from the unit circle is of order {O( e^{-cW} N )} on average for some absolute constant {c>0}. Thus, roughly speaking, most zeroes of {P_t} lie within {O(1/N)} of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of {c}.)

The zeroes of {P'_t} tend to live somewhat closer to the origin than the zeroes of {P_t}. Suppose for instance that we write

\displaystyle  P_t(Z) = \sum_{j=0}^N a_j(t) Z^j = a_N(t) \prod_{j=1}^N (Z - \lambda_j)

where {\lambda_1,\dots,\lambda_N} are the zeroes of {P_t(Z)}, then by evaluating at zero we see that

\displaystyle  \lambda_1 \dots \lambda_N = (-1)^N a_0(t) / a_N(t)

and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate

\displaystyle  P'_t(Z) = \sum_{j=1}^N a_j(t) j Z^{j-1} = N a_N(t) \prod_{j=1}^{N-1} (Z - \lambda'_j)

where {\lambda'_1,\dots,\lambda'_{N-1}} are the zeroes of {P'_t}, then by evaluating at zero we now see that

\displaystyle  \lambda'_1 \dots \lambda'_{N-1} = (-1)^N a_1(t) / N a_N(t).

The right-hand side would now be typically expected to be of size {O(1/N) \approx \exp(- \log N)}, and so on average we expect the {\lambda'_j} to have magnitude like {\exp( - \frac{\log N}{N} )}, that is to say pushed inwards from the unit circle by a distance roughly {\frac{\log N}{N}}. The analogous result for the Riemann zeta function is that the zeroes of {\zeta'(s)} at height {\sim T} lie at a distance roughly {\frac{\log\log T}{\log T}} to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.

Read the rest of this entry »

The Polymath15 paper “Effective approximation of heat flow evolution of the Riemann {\xi} function, and a new upper bound for the de Bruijn-Newman constant“, submitted to Research in the Mathematical Sciences, has just been uploaded to the arXiv. This paper records the mix of theoretical and computational work needed to improve the upper bound on the de Bruijn-Newman constant {\Lambda}. This constant can be defined as follows. The function

\displaystyle H_0(z) := \frac{1}{8} \xi\left(\frac{1}{2} + \frac{iz}{2}\right),

where {\xi} is the Riemann {\xi} function

\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma\left(\frac{s}{2}\right) \zeta(s)

has a Fourier representation

\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du

where {\Phi} is the super-exponentially decaying function

\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u} ).

The Riemann hypothesis is equivalent to the claim that all the zeroes of {H_0} are real. De Bruijn introduced (in different notation) the deformations

\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du

of {H_0}; one can view this as the solution to the backwards heat equation {\partial_t H_t = -\partial_{zz} H_t} starting at {H_0}. From the work of de Bruijn and of Newman, it is known that there exists a real number {\Lambda} – the de Bruijn-Newman constant – such that {H_t} has all zeroes real for {t \geq \Lambda} and has at least one non-real zero for {t < \Lambda}. In particular, the Riemann hypothesis is equivalent to the assertion {\Lambda \leq 0}. Prior to this paper, the best known bounds for this constant were

\displaystyle 0 \leq \Lambda < 1/2

with the lower bound due to Rodgers and myself, and the upper bound due to Ki, Kim, and Lee. One of the main results of the paper is to improve the upper bound to

\displaystyle \Lambda \leq 0.22. \ \ \ \ \ (1)

At a purely numerical level this gets “closer” to proving the Riemann hypothesis, but the methods of proof take as input a finite numerical verification of the Riemann hypothesis up to some given height {T} (in our paper we take {T \sim 3 \times 10^{10}}) and converts this (and some other numerical verification) to an upper bound on {\Lambda} that is of order {O(1/\log T)}. As discussed in the final section of the paper, further improvement of the numerical verification of RH would thus lead to modest improvements in the upper bound on {\Lambda}, although it does not seem likely that our methods could for instance improve the bound to below {0.1} without an infeasible amount of computation.

We now discuss the methods of proof. An existing result of de Bruijn shows that if all the zeroes of {H_{t_0}(z)} lie in the strip {\{ x+iy: |y| \leq y_0\}}, then {\Lambda \leq t_0 + \frac{1}{2} y_0^2}; we will verify this hypothesis with {t_0=y_0=0.2}, thus giving (1). Using the symmetries and the known zero-free regions, it suffices to show that

\displaystyle H_{0.2}(x+iy) \neq 0 \ \ \ \ \ (2)

whenever {x \geq 0} and {0.2 \leq y \leq 1}.

For large {x} (specifically, {x \geq 6 \times 10^{10}}), we use effective numerical approximation to {H_t(x+iy)} to establish (2), as discussed in a bit more detail below. For smaller values of {x}, the existing numerical verification of the Riemann hypothesis (we use the results of Platt) shows that

\displaystyle H_0(x+iy) \neq 0

for {0 \leq x \leq 6 \times 10^{10}} and {0.2 \leq y \leq 1}. The problem though is that this result only controls {H_t} at time {t=0} rather than the desired time {t = 0.2}. To bridge the gap we need to erect a “barrier” that, roughly speaking, verifies that

\displaystyle H_t(x+iy) \neq 0 \ \ \ \ \ (3)

for {0 \leq t \leq 0.2}, {x = 6 \times 10^{10} + O(1)}, and {0.2 \leq y \leq 1}; with a little bit of work this barrier shows that zeroes cannot sneak in from the right of the barrier to the left in order to produce counterexamples to (2) for small {x}.

To enforce this barrier, and to verify (2) for large {x}, we need to approximate {H_t(x+iy)} for positive {t}. Our starting point is the Riemann-Siegel formula, which roughly speaking is of the shape

\displaystyle H_0(x+iy) \approx B_0(x+iy) ( \sum_{n=1}^N \frac{1}{n^{\frac{1+y-ix}{2}}} + \gamma_0(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\frac{1+y+ix}{2}}} )

where {N := \sqrt{x/4\pi}}, {B_0(x+iy)} is an explicit “gamma factor” that decays exponentially in {x}, and {\gamma_0(x+iy)} is a ratio of gamma functions that is roughly of size {(x/4\pi)^{-y/2}}. Deforming this by the heat flow gives rise to an approximation roughly of the form

\displaystyle H_t(x+iy) \approx B_t(x+iy) ( \sum_{n=1}^N \frac{b_n^t}{n^{s_*}} + \gamma_t(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\overline{s_*}}} ) \ \ \ \ \ (4)

where {B_t(x+iy)} and {\gamma_t(x+iy)} are variants of {B_0(x+iy)} and {\gamma_0(x+iy)}, {b_n^t := \exp( \frac{t}{4} \log^2 n )}, and {s_*} is an exponent which is roughly {\frac{1+y-ix}{2} + \frac{t}{4} \log \frac{x}{4\pi}}. In particular, for positive values of {t}, {s_*} increases (logarithmically) as {x} increases, and the two sums in the Riemann-Siegel formula become increasingly convergent (even in the face of the slowly increasing coefficients {b_n^t}). For very large values of {x} (in the range {x \geq \exp(C/t)} for a large absolute constant {C}), the {n=1} terms of both sums dominate, and {H_t(x+iy)} begins to behave in a sinusoidal fashion, with the zeroes “freezing” into an approximate arithmetic progression on the real line much like the zeroes of the sine or cosine functions (we give some asymptotic theorems that formalise this “freezing” effect). This lets one verify (2) for extremely large values of {x} (e.g., {x \geq 10^{12}}). For slightly less large values of {x}, we first multiply the Riemann-Siegel formula by an “Euler product mollifier” to reduce some of the oscillation in the sum and make the series converge better; we also use a technical variant of the triangle inequality to improve the bounds slightly. These are sufficient to establish (2) for moderately large {x} (say {x \geq 6 \times 10^{10}}) with only a modest amount of computational effort (a few seconds after all the optimisations; on my own laptop with very crude code I was able to verify all the computations in a matter of minutes).

The most difficult computational task is the verification of the barrier (3), particularly when {t} is close to zero where the series in (4) converge quite slowly. We first use an Euler product heuristic approximation to {H_t(x+iy)} to decide where to place the barrier in order to make our numerical approximation to {H_t(x+iy)} as large in magnitude as possible (so that we can afford to work with a sparser set of mesh points for the numerical verification). In order to efficiently evaluate the sums in (4) for many different values of {x+iy}, we perform a Taylor expansion of the coefficients to factor the sums as combinations of other sums that do not actually depend on {x} and {y} and so can be re-used for multiple choices of {x+iy} after a one-time computation. At the scales we work in, this computation is still quite feasible (a handful of minutes after software and hardware optimisations); if one assumes larger numerical verifications of RH and lowers {t_0} and {y_0} to optimise the value of {\Lambda} accordingly, one could get down to an upper bound of {\Lambda \leq 0.1} assuming an enormous numerical verification of RH (up to height about {4 \times 10^{21}}) and a very large distributed computing project to perform the other numerical verifications.

This post can serve as the (presumably final) thread for the Polymath15 project (continuing this post), to handle any remaining discussion topics for that project.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Value patterns of multiplicative functions and related sequences“, submitted to Forum of Mathematics, Sigma. This paper explores how to use recent technology on correlations of multiplicative (or nearly multiplicative functions), such as the “entropy decrement method”, in conjunction with techniques from additive combinatorics, to establish new results on the sign patterns of functions such as the Liouville function {\lambda}. For instance, with regards to length 5 sign patterns

\displaystyle  (\lambda(n+1),\dots,\lambda(n+5)) \in \{-1,+1\}^5

of the Liouville function, we can now show that at least {24} of the {32} possible sign patterns in {\{-1,+1\}^5} occur with positive upper density. (Conjecturally, all of them do so, and this is known for all shorter sign patterns, but unfortunately {24} seems to be the limitation of our methods.)

The Liouville function can be written as {\lambda(n) = e^{2\pi i \Omega(n)/2}}, where {\Omega(n)} is the number of prime factors of {n} (counting multiplicity). One can also consider the variant {\lambda_3(n) = e^{2\pi i \Omega(n)/3}}, which is a completely multiplicative function taking values in the cube roots of unity {\{1, \omega, \omega^2\}}. Here we are able to show that all {27} sign patterns in {\{1,\omega,\omega^2\}} occur with positive lower density as sign patterns {(\lambda_3(n+1), \lambda_3(n+2), \lambda_3(n+3))} of this function. The analogous result for {\lambda} was already known (see this paper of Matomäki, Radziwiłł, and myself), and in that case it is even known that all sign patterns occur with equal logarithmic density {1/8} (from this paper of myself and Teräväinen), but these techniques barely fail to handle the {\lambda_3} case by itself (largely because the “parity” arguments used in the case of the Liouville function no longer control three-point correlations in the {\lambda_3} case) and an additional additive combinatorial tool is needed. After applying existing technology (such as entropy decrement methods), the problem roughly speaking reduces to locating patterns {a \in A_1, a+r \in A_2, a+2r \in A_3} for a certain partition {G = A_1 \cup A_2 \cup A_3} of a compact abelian group {G} (think for instance of the unit circle {G={\bf R}/{\bf Z}}, although the general case is a bit more complicated, in particular if {G} is disconnected then there is a certain “coprimality” constraint on {r}, also we can allow the {A_1,A_2,A_3} to be replaced by any {A_{c_1}, A_{c_2}, A_{c_3}} with {c_1+c_2+c_3} divisible by {3}), with each of the {A_i} having measure {1/3}. An inequality of Kneser just barely fails to guarantee the existence of such patterns, but by using an inverse theorem for Kneser’s inequality in this previous paper of mine we are able to identify precisely the obstruction for this method to work, and rule it out by an ad hoc method.

The same techniques turn out to also make progress on some conjectures of Erdös-Pomerance and Hildebrand regarding patterns of the largest prime factor {P^+(n)} of a natural number {n}. For instance, we improve results of Erdös-Pomerance and of Balog demonstrating that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3)

each hold for infinitely many {n}, by demonstrating the stronger claims that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3) > P^+(n+4)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3) < P^+(n+4)

each hold for a set of {n} of positive lower density. As a variant, we also show that we can find a positive density set of {n} for which

\displaystyle  P^+(n+1), P^+(n+2), P^+(n+3) > n^\gamma

for any fixed {\gamma < e^{-1/3} = 0.7165\dots} (this improves on a previous result of Hildebrand with {e^{-1/3}} replaced by {e^{-1/2} = 0.6065\dots}. A number of other results of this type are also obtained in this paper.

In order to obtain these sorts of results, one needs to extend the entropy decrement technology from the setting of multiplicative functions to that of what we call “weakly stable sets” – sets {A} which have some multiplicative structure, in the sense that (roughly speaking) there is a set {B} such that for all small primes {p}, the statements {n \in A} and {pn \in B} are roughly equivalent to each other. For instance, if {A} is a level set {A = \{ n: \omega(n) = 0 \hbox{ mod } 3 \}}, one would take {B = \{ n: \omega(n) = 1 \hbox{ mod } 3 \}}; if instead {A} is a set of the form {\{ n: P^+(n) \geq n^\gamma\}}, then one can take {B=A}. When one has such a situation, then very roughly speaking, the entropy decrement argument then allows one to estimate a one-parameter correlation such as

\displaystyle  {\bf E}_n 1_A(n+1) 1_A(n+2) 1_A(n+3)

with a two-parameter correlation such as

\displaystyle  {\bf E}_n {\bf E}_p 1_B(n+p) 1_B(n+2p) 1_B(n+3p)

(where we will be deliberately vague as to how we are averaging over {n} and {p}), and then the use of the “linear equations in primes” technology of Ben Green, Tamar Ziegler, and myself then allows one to replace this average in turn by something like

\displaystyle  {\bf E}_n {\bf E}_r 1_B(n+r) 1_B(n+2r) 1_B(n+3r)

where {r} is constrained to be not divisible by small primes but is otherwise quite arbitrary. This latter average can then be attacked by tools from additive combinatorics, such as translation to a continuous group model (using for instance the Furstenberg correspondence principle) followed by tools such as Kneser’s inequality (or inverse theorems to that inequality).

This is the eleventh research thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}, continuing this post. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

There are currently two strands of activity.  One is writing up the paper describing the combination of theoretical and numerical results needed to obtain the new bound \Lambda \leq 0.22.  The latest version of the writeup may be found here, in this directory.  The theoretical side of things have mostly been written up; the main remaining tasks to do right now are

  1. giving a more detailed description and illustration of the two major numerical verifications, namely the barrier verification that establishes a zero-free region for H_t(x+iy)=0 for 0 \leq t \leq 0.2, 0.2 \leq y \leq 1, |x - 6 \times 10^{10} - 83952| \leq 0.5, and the Dirichlet series bound that establishes a zero-free region for t = 0.2, 0.2 \leq y \leq 1, x \geq 6 \times 10^{10} + 83952; and
  2. giving more detail on the conditional results assuming more numerical verification of RH.

Meanwhile, several of us have been exploring the behaviour of the zeroes of H_t for negative t; this does not directly lead to any new progress on bounding \Lambda (though there is a good chance that it may simplify the proof of \Lambda \geq 0), but there have been some interesting numerical phenomena uncovered, as summarised in this set of slides.  One phenomenon is that for large negative t, many of the complex zeroes begin to organise themselves near the curves

\displaystyle y = -\frac{t}{2} \log \frac{x}{4\pi n(n+1)} - 1.

(An example of the agreement between the zeroes and these curves may be found here.)  We now have a (heuristic) theoretical explanation for this; we should have an approximation

\displaystyle H_t(x+iy) \approx B_t(x+iy) \sum_{n=1}^\infty \frac{b_n^t}{n^{s_*}}

in this region (where B_t, b_n^t, n^{s_*} are defined in equations (11), (15), (17) of the writeup, and the above curves arise from (an approximation of) those locations where two adjacent terms \frac{b_n^t}{n^{s_*}}, \frac{b_{n+1}^t}{(n+1)^{s_*}} in this series have equal magnitude (with the other terms being of lower order).

However, we only have a partial explanation at present of the interesting behaviour of the real zeroes at negative t, for instance the surviving zeroes at extremely negative values of t appear to lie on the curve where the quantity N is close to a half-integer, where

\displaystyle \tilde x := x + \frac{\pi t}{4}

\displaystyle N := \sqrt{\frac{\tilde x}{4\pi}}

The remaining zeroes exhibit a pattern in (N,u) coordinates that is approximately 1-periodic in N, where

\displaystyle u := \frac{4\pi |t|}{\tilde x}.

A plot of the zeroes in these coordinates (somewhat truncated due to the numerical range) may be found here.

We do not yet have a total explanation of the phenomena seen in this picture.  It appears that we have an approximation

\displaystyle H_t(x) \approx A_t(x) \sum_{n=1}^\infty \exp( -\frac{|t| \log^2(n/N)}{4(1-\frac{iu}{8\pi})} - \frac{1+i\tilde x}{2} \log(n/N) )

where A_t(x) is the non-zero multiplier

\displaystyle A_t(x) := e^{\pi^2 t/64} M_0(\frac{1+i\tilde x}{2}) N^{-\frac{1+i\tilde x}{2}} \sqrt{\frac{\pi}{1-\frac{iu}{8\pi}}}

and

\displaystyle M_0(s) := \frac{1}{8}\frac{s(s-1)}{2}\pi^{-s/2} \sqrt{2\pi} \exp( (\frac{s}{2}-\frac{1}{2}) \log \frac{s}{2} - \frac{s}{2} )

The derivation of this formula may be found in this wiki page.  However our initial attempts to simplify the above approximation further have proven to be somewhat inaccurate numerically (in particular giving an incorrect prediction for the location of zeroes, as seen in this picture).  We are in the process of using numerics to try to resolve the discrepancies (see this page for some code and discussion).

 

Kaisa Matomäki, Maksym Radziwill, and I just uploaded to the arXiv our paper “Fourier uniformity of bounded multiplicative functions in short intervals on average“. This paper is the outcome of our attempts during the MSRI program in analytic number theory last year to attack the local Fourier uniformity conjecture for the Liouville function {\lambda}. This conjecture generalises a landmark result of Matomäki and Radziwill, who show (among other things) that one has the asymptotic

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o(HX) \ \ \ \ \ (1)

whenever {X \rightarrow \infty} and {H = H(X)} goes to infinity as {X \rightarrow \infty}. Informally, this says that the Liouville function has small mean for almost all short intervals {[x,x+H]}. The remarkable thing about this theorem is that there is no lower bound on how {H} goes to infinity with {X}; one can take for instance {H = \log\log\log X}. This lack of lower bound was crucial when I applied this result (or more precisely, a generalisation of this result to arbitrary non-pretentious bounded multiplicative functions) a few years ago to solve the Erdös discrepancy problem, as well as a logarithmically averaged two-point Chowla conjecture, for instance it implies that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log X).

The local Fourier uniformity conjecture asserts the stronger asymptotic

\displaystyle  \int_X^{2X} \sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (2)

under the same hypotheses on {H} and {X}. As I worked out in a previous paper, this conjecture would imply a logarithmically averaged three-point Chowla conjecture, implying for instance that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log X).

This particular bound also follows from some slightly different arguments of Joni Teräväinen and myself, but the implication would also work for other non-pretentious bounded multiplicative functions, whereas the arguments of Joni and myself rely more heavily on the specific properties of the Liouville function (in particular that {\lambda(p)=-1} for all primes {p}).

There is also a higher order version of the local Fourier uniformity conjecture in which the linear phase {{}e(-\alpha n)} is replaced with a polynomial phase such as {e(-\alpha_d n^d - \dots - \alpha_1 n - \alpha_0)}, or more generally a nilsequence {\overline{F(g(n) \Gamma)}}; as shown in my previous paper, this conjecture implies (and is in fact equivalent to, after logarithmic averaging) a logarithmically averaged version of the full Chowla conjecture (not just the two-point or three-point versions), as well as a logarithmically averaged version of the Sarnak conjecture.

The main result of the current paper is to obtain some cases of the local Fourier uniformity conjecture:

Theorem 1 The asymptotic (2) is true when {H = X^\theta} for a fixed {\theta > 0}.

Previously this was known for {\theta > 5/8} by the work of Zhan (who in fact proved the stronger pointwise assertion {\sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|= o(H)} for {X \leq x \leq 2X} in this case). In a previous paper with Kaisa and Maksym, we also proved a weak version

\displaystyle  \sup_{\alpha \in {\bf R}} \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)

of (2) for any {H} growing arbitrarily slowly with {X}; this is stronger than (1) (and is in fact proven by a variant of the method) but significantly weaker than (2), because in the latter the worst-case {\alpha} is permitted to depend on the {x} parameter, whereas in (3) {\alpha} must remain independent of {x}.

Unfortunately, the restriction {H = X^\theta} is not strong enough to give applications to Chowla-type conjectures (one would need something more like {H = \log^\theta X} for this). However, it can still be used to control some sums that had not previously been manageable. For instance, a quick application of the circle method lets one use the above theorem to derive the asymptotic

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \lambda(n) \Lambda(n+h) \Lambda(n+2h) = o( H X )

whenever {H = X^\theta} for a fixed {\theta > 0}, where {\Lambda} is the von Mangoldt function. Amusingly, the seemingly simpler question of establishing the expected asymptotic for

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \Lambda(n+h) \Lambda(n+2h)

is only known in the range {\theta \geq 1/6} (from the work of Zaccagnini). Thus we have a rare example of a number theory sum that becomes easier to control when one inserts a Liouville function!

We now give an informal description of the strategy of proof of the theorem (though for numerous technical reasons, the actual proof deviates in some respects from the description given here). If (2) failed, then for many values of {x \in [X,2X]} we would have the lower bound

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha_x n)| \gg 1

for some frequency {\alpha_x \in{\bf R}}. We informally describe this correlation between {\lambda(n)} and {e(\alpha_x n)} by writing

\displaystyle  \lambda(n) \approx e(\alpha_x n) \ \ \ \ \ (4)

for {n \in [x,x+H]} (informally, one should view this as asserting that {\lambda(n)} “behaves like” a constant multiple of {e(\alpha_x n)}). For sake of discussion, suppose we have this relationship for all {x \in [X,2X]}, not just many.

As mentioned before, the main difficulty here is to understand how {\alpha_x} varies with {x}. As it turns out, the multiplicativity properties of the Liouville function place a significant constraint on this dependence. Indeed, if we let {p} be a fairly small prime (e.g. of size {H^\varepsilon} for some {\varepsilon>0}), and use the identity {\lambda(np) = \lambda(n) \lambda(p) = - \lambda(n)} for the Liouville function to conclude (at least heuristically) from (4) that

\displaystyle  \lambda(n) \approx e(\alpha_x n p)

for {n \in [x/p, x/p + H/p]}. (In practice, we will have this sort of claim for many primes {p} rather than all primes {p}, after using tools such as the Turán-Kubilius inequality, but we ignore this distinction for this informal argument.)

Now let {x, y \in [X,2X]} and {p,q \sim P} be primes comparable to some fixed range {P = H^\varepsilon} such that

\displaystyle  x/p = y/q + O( H/P). \ \ \ \ \ (5)

Then we have both

\displaystyle  \lambda(n) \approx e(\alpha_x n p)

and

\displaystyle  \lambda(n) \approx e(\alpha_y n q)

on essentially the same range of {n} (two nearby intervals of length {\sim H/P}). This suggests that the frequencies {p \alpha_x} and {q \alpha_y} should be close to each other modulo {1}, in particular one should expect the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } 1. \ \ \ \ \ (6)

Comparing this with (5) one is led to the expectation that {\alpha_x} should depend inversely on {x} in some sense (for instance one can check that

\displaystyle  \alpha_x = T/x \ \ \ \ \ (7)

would solve (6) if {T = O( X / H^2 )}; by Taylor expansion, this would correspond to a global approximation of the form {\lambda(n) \approx n^{iT}}). One now has a problem of an additive combinatorial flavour (or of a “local to global” flavour), namely to leverage the relation (6) to obtain global control on {\alpha_x} that resembles (7).

A key obstacle in solving (6) efficiently is the fact that one only knows that {p \alpha_x} and {q \alpha_y} are close modulo {1}, rather than close on the real line. One can start resolving this problem by the Chinese remainder theorem, using the fact that we have the freedom to shift (say) {\alpha_y} by an arbitrary integer. After doing so, one can arrange matters so that one in fact has the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p \ \ \ \ \ (8)

whenever {x,y \in [X,2X]} and {p,q \sim P} obey (5). (This may force {\alpha_q} to become extremely large, on the order of {\prod_{p \sim P} p}, but this will not concern us.)

Now suppose that we have {y,y' \in [X,2X]} and primes {q,q' \sim P} such that

\displaystyle  y/q = y'/q' + O(H/P). \ \ \ \ \ (9)

For every prime {p \sim P}, we can find an {x} such that {x/p} is within {O(H/P)} of both {y/q} and {y'/q'}. Applying (8) twice we obtain

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p

and

\displaystyle  p \alpha_x = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

and thus by the triangle inequality we have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

for all {p \sim P}; hence by the Chinese remainder theorem

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } \prod_{p \sim P} p.

In practice, in the regime {H = X^\theta} that we are considering, the modulus {\prod_{p \sim P} p} is so huge we can effectively ignore it (in the spirit of the Lefschetz principle); so let us pretend that we in fact have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \ \ \ \ \ (10)

whenever {y,y' \in [X,2X]} and {q,q' \sim P} obey (9).

Now let {k} be an integer to be chosen later, and suppose we have primes {p_1,\dots,p_k,q_1,\dots,q_k \sim P} such that the difference

\displaystyle  q = |p_1 \dots p_k - q_1 \dots q_k|

is small but non-zero. If {k} is chosen so that

\displaystyle  P^k \approx \frac{X}{H}

(where one is somewhat loose about what {\approx} means) then one can then find real numbers {x_1,\dots,x_k \sim X} such that

\displaystyle  \frac{x_j}{p_j} = \frac{x_{j+1}}{q_j} + O( \frac{H}{P} )

for {j=1,\dots,k}, with the convention that {x_{k+1} = x_1}. We then have

\displaystyle  p_j \alpha_{x_j} = q_j \alpha_{x_{j+1}} + O( \frac{P}{H} )

which telescopes to

\displaystyle  p_1 \dots p_k \alpha_{x_1} = q_1 \dots q_k \alpha_{x_1} + O( \frac{P^k}{H} )

and thus

\displaystyle  q \alpha_{x_1} = O( \frac{P^k}{H} )

and hence

\displaystyle  \alpha_{x_1} = O( \frac{P^k}{H} ) \approx O( \frac{X}{H^2} ).

In particular, for each {x \sim X}, we expect to be able to write

\displaystyle  \alpha_x = \frac{T_x}{x} + O( \frac{1}{H} )

for some {T_x = O( \frac{X^2}{H^2} )}. This quantity {T_x} can vary with {x}; but from (10) and a short calculation we see that

\displaystyle  T_y = T_{y'} + O( \frac{X}{H} )

whenever {y, y' \in [X,2X]} obey (9) for some {q,q' \sim P}.

Now imagine a “graph” in which the vertices are elements {y} of {[X,2X]}, and two elements {y,y'} are joined by an edge if (9) holds for some {q,q' \sim P}. Because of exponential sum estimates on {\sum_{q \sim P} q^{it}}, this graph turns out to essentially be an “expander” in the sense that any two vertices {y,y' \in [X,2X]} can be connected (in multiple ways) by fairly short paths in this graph (if one allows one to modify one of {y} or {y'} by {O(H)}). As a consequence, we can assume that this quantity {T_y} is essentially constant in {y} (cf. the application of the ergodic theorem in this previous blog post), thus we now have

\displaystyle  \alpha_x = \frac{T}{x} + O(\frac{1}{H} )

for most {x \in [X,2X]} and some {T = O(X^2/H^2)}. By Taylor expansion, this implies that

\displaystyle  \lambda(n) \approx n^{iT}

on {[x,x+H]} for most {x}, thus

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}|\ dx \gg HX.

But this can be shown to contradict the Matomäki-Radziwill theorem (because the multiplicative function {n \mapsto \lambda(n) n^{-iT}} is known to be non-pretentious).

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures“. This is a sequel to our previous paper that studied logarithmic correlations of the form

\displaystyle  f(a) := \lim^*_{x \rightarrow \infty} \frac{1}{\log \omega(x)} \sum_{x/\omega(x) \leq n \leq x} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n},

where {g_1,\dots,g_k} were bounded multiplicative functions, {h_1,\dots,h_k \rightarrow \infty} were fixed shifts, {1 \leq \omega(x) \leq x} was a quantity going off to infinity, and {\lim^*} was a generalised limit functional. Our main technical result asserted that these correlations were necessarily the uniform limit of periodic functions {f_i}. Furthermore, if {g_1 \dots g_k} (weakly) pretended to be a Dirichlet character {\chi}, then the {f_i} could be chosen to be {\chi}isotypic in the sense that {f_i(ab) = f_i(a) \chi(b)} whenever {a,b} are integers with {b} coprime to the periods of {\chi} and {f_i}; otherwise, if {g_1 \dots g_k} did not weakly pretend to be any Dirichlet character {\chi}, then {f} vanished completely. This was then used to verify several cases of the logarithmically averaged Elliott and Chowla conjectures.

The purpose of this paper was to investigate the extent to which the methods could be extended to non-logarithmically averaged settings. For our main technical result, we now considered the unweighted averages

\displaystyle  f_d(a) := \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k),

where {d>1} is an additional parameter. Our main result was now as follows. If {g_1 \dots g_k} did not weakly pretend to be a twisted Dirichlet character {n \mapsto \chi(n) n^{it}}, then {f_d(a)} converged to zero on (doubly logarithmic) average as {d \rightarrow \infty}. If instead {g_1 \dots g_k} did pretend to be such a twisted Dirichlet character, then {f_d(a) d^{it}} converged on (doubly logarithmic) average to a limit {f(a)} of {\chi}-isotypic functions {f_i}. Thus, roughly speaking, one has the approximation

\displaystyle  \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k) \approx f(a) d^{-it}

for most {d}.

Informally, this says that at almost all scales {x} (where “almost all” means “outside of a set of logarithmic density zero”), the non-logarithmic averages behave much like their logarithmic counterparts except for a possible additional twisting by an Archimedean character {d \mapsto d^{it}} (which interacts with the Archimedean parameter {d} in much the same way that the Dirichlet character {\chi} interacts with the non-Archimedean parameter {a}). One consequence of this is that most of the recent results on the logarithmically averaged Chowla and Elliott conjectures can now be extended to their non-logarithmically averaged counterparts, so long as one excludes a set of exceptional scales {x} of logarithmic density zero. For instance, the Chowla conjecture

\displaystyle  \lim_{x \rightarrow\infty} \frac{1}{x} \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

is now established for {k} either odd or equal to {2}, so long as one excludes an exceptional set of scales.

In the logarithmically averaged setup, the main idea was to combine two very different pieces of information on {f(a)}. The first, coming from recent results in ergodic theory, was to show that {f(a)} was well approximated in some sense by a nilsequence. The second was to use the “entropy decrement argument” to obtain an approximate isotopy property of the form

\displaystyle  f(a) g_1 \dots g_k(p)\approx f(ap)

for “most” primes {p} and integers {a}. Combining the two facts, one eventually finds that only the almost periodic components of the nilsequence are relevant.

In the current situation, each {a \mapsto f_d(a)} is approximated by a nilsequence, but the nilsequence can vary with {d} (although there is some useful “Lipschitz continuity” of this nilsequence with respect to the {d} parameter). Meanwhile, the entropy decrement argument gives an approximation basically of the form

\displaystyle  f_{dp}(a) g_1 \dots g_k(p)\approx f_d(ap)

for “most” {d,p,a}. The arguments then proceed largely as in the logarithmically averaged case. A key lemma to handle the dependence on the new parameter {d} is the following cohomological statement: if one has a map {\alpha: (0,+\infty) \rightarrow S^1} that was a quasimorphism in the sense that {\alpha(xy) = \alpha(x) \alpha(y) + O(\varepsilon)} for all {x,y \in (0,+\infty)} and some small {\varepsilon}, then there exists a real number {t} such that {\alpha(x) = x^{it} + O(\varepsilon)} for all small {\varepsilon}. This is achieved by applying a standard “cocycle averaging argument” to the cocycle {(x,y) \mapsto \alpha(xy) \alpha(x)^{-1} \alpha(y)^{-1}}.

It would of course be desirable to not have the set of exceptional scales. We only know of one (implausible) scenario in which we can do this, namely when one has far fewer (in particular, subexponentially many) sign patterns for (say) the Liouville function than predicted by the Chowla conjecture. In this scenario (roughly analogous to the “Siegel zero” scenario in multiplicative number theory), the entropy of the Liouville sign patterns is so small that the entropy decrement argument becomes powerful enough to control all scales rather than almost all scales. On the other hand, this scenario seems to be self-defeating, in that it allows one to establish a large number of cases of the Chowla conjecture, and the full Chowla conjecture is inconsistent with having unusually few sign patterns. Still it hints that future work in this direction may need to split into “low entropy” and “high entropy” cases, in analogy to how many arguments in multiplicative number theory have to split into the “Siegel zero” and “no Siegel zero” cases.

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z)

on the zeroes of a time-dependent family of polynomials {z \mapsto P(t,z)}, with a particular focus on the case when the polynomials {z \mapsto P(t,z)} had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle {\{ z: |z| = \sqrt{q} \}}, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when {P} is the numerator of the zeta function of a curve.

More precisely, let {g} be a natural number. We will say that a polynomial

\displaystyle  P(z) = \sum_{j=0}^{2g} a_j z^j

of degree {2g} (so that {a_{2g} \neq 0}) obeys the functional equation if the {a_j} are all real and

\displaystyle  a_j = q^{g-j} a_{2g-j}

for all {j=0,\dots,2g}, thus

\displaystyle  P(\overline{z}) = \overline{P(z)}

and

\displaystyle  P(q/z) = q^g z^{-2g} P(z)

for all non-zero {z}. This means that the {2g} zeroes {\alpha_1,\dots,\alpha_{2g}} of {P(z)} (counting multiplicity) lie in {{\bf C} \backslash \{0\}} and are symmetric with respect to complex conjugation {z \mapsto \overline{z}} and inversion {z \mapsto q/z} across the circle {\{ |z| = \sqrt{q}\}}. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle {\{ z = \sqrt{q}\}}. For instance, in the {g=1} case, the polynomial {z^2 - a_1 z + q} obeys the Riemann hypothesis if and only if {|a_1| \leq 2\sqrt{q}}.

Such polynomials arise in number theory as follows: if {C} is a projective curve of genus {g} over a finite field {\mathbf{F}_q}, then, as famously proven by Weil, the associated local zeta function {\zeta_{C,q}(z)} (as defined for instance in this previous blog post) is known to take the form

\displaystyle  \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}

where {P} is a degree {2g} polynomial obeying both the functional equation and the Riemann hypothesis. In the case that {C} is an elliptic curve, then {g=1} and {P} takes the form {P(z) = z^2 - a_1 z + q}, where {a_1} is the number of {{\bf F}_q}-points of {C} minus {q+1}. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

\displaystyle  P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)

of {2g \times 2g} matrices {F} in the compact symplectic group {Sp(g)}. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials {P} arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where {F} is drawn uniformly from {Sp(g)} with Haar measure.

Given a polynomial {z \mapsto P(0,z)} of degree {2g} with coefficients

\displaystyle  P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,

we can evolve it in time by the formula

\displaystyle  P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,

thus {a_j(t) = \exp(t(j-g)) a_j(0)} for {t \in {\bf R}}. Informally, as one increases {t}, this evolution accentuates the effect of the extreme monomials, particularly, {z^0} and {z^{2g}} at the expense of the intermediate monomials such as {z^g}, and conversely as one decreases {t}. This family of polynomials obeys the heat-type equation

\displaystyle  \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group {Sp(n)}, and should also be tied to some sort of “{\beta=\infty}” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if {z \mapsto P(0,z)} obeys the functional equation, then so does {z \mapsto P(t,z)} for any other time {t}. Now we investigate the evolution of the zeroes. Suppose at some time {t_0} that the zeroes {\alpha_1(t_0),\dots,\alpha_{2g}(t_0)} of {z \mapsto P(t_0,z)} are distinct, then

\displaystyle  P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).

From the inverse function theorem we see that for times {t} sufficiently close to {t_0}, the zeroes {\alpha_1(t),\dots,\alpha_{2g}(t)} of {z \mapsto P(t,z)} continue to be distinct (and vary smoothly in {t}), with

\displaystyle  P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).

Differentiating this at any {z} not equal to any of the {\alpha_j(t)}, we obtain

\displaystyle  \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})

and

\displaystyle  \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})

and

\displaystyle  \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).

Inserting these formulae into (2) (expanding {(z \partial_z - g)^2} as {z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}) and canceling some terms, we conclude that

\displaystyle  - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}

\displaystyle  - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}

for {t} sufficiently close to {t_0}, and {z} not equal to {\alpha_1(t),\dots,\alpha_{2g}(t)}. Extracting the residue at {z = \alpha_j(t)}, we conclude that

\displaystyle  - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)

which we can rearrange as

\displaystyle  \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.

If we make the change of variables {\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}} (noting that one can make {\theta_j} depend smoothly on {t} for {t} sufficiently close to {t_0}), this becomes

\displaystyle  \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)

Intuitively, this equation asserts that the phases {\theta_j} repel each other if they are real (and attract each other if their difference is imaginary). If {z \mapsto P(t_0,z)} obeys the Riemann hypothesis, then the {\theta_j} are all real at time {t_0}, then the Picard uniqueness theorem (applied to {\theta_j(t)} and its complex conjugate) then shows that the {\theta_j} are also real for {t} sufficiently close to {t_0}. If we then define the entropy functional

\displaystyle  H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }

then the above equation becomes a gradient flow

\displaystyle  \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )

which implies in particular that {H(\theta_1(t),\dots,\theta_{2g}(t))} is non-increasing in time. This shows that as one evolves time forward from {t_0}, there is a uniform lower bound on the separation between the phases {\theta_1(t),\dots,\theta_{2g}(t)}, and hence the equation can be solved indefinitely; in particular, {z \mapsto P(t,z)} obeys the Riemann hypothesis for all {t > t_0} if it does so at time {t_0}. Our argument here assumed that the zeroes of {z \mapsto P(t_0,z)} were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial {z \mapsto P(0,z)} obeying the functional equation, the rescaled polynomials {z \mapsto e^{-g^2 t} P(t,z)} converge locally uniformly to {a_{2g}(0) (z^{2g} + q^g)} as {t \rightarrow +\infty}. By Rouche’s theorem, we conclude that the zeroes of {z \mapsto P(t,z)} converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}} on the circle {\{ |z| = \sqrt{q}\}}. Together with the symmetry properties of the zeroes, this implies in particular that {z \mapsto P(t,z)} obeys the Riemann hypothesis for all sufficiently large positive {t}. In the opposite direction, when {t \rightarrow -\infty}, the polynomials {z \mapsto P(t,z)} converge locally uniformly to {a_g(0) z^g}, so if {a_g(0) \neq 0}, {g} of the zeroes converge to the origin and the other {g} converge to infinity. In particular, {z \mapsto P(t,z)} fails the Riemann hypothesis for sufficiently large negative {t}. Thus (if {a_g(0) \neq 0}), there must exist a real number {\Lambda}, which we call the de Bruijn-Newman constant of the original polynomial {z \mapsto P(0,z)}, such that {z \mapsto P(t,z)} obeys the Riemann hypothesis for {t \geq \Lambda} and fails the Riemann hypothesis for {t < \Lambda}. The situation is a bit more complicated if {a_g(0)} vanishes; if {k} is the first natural number such that {a_{g+k}(0)} (or equivalently, {a_{g-j}(0)}) does not vanish, then by the above arguments one finds in the limit {t \rightarrow -\infty} that {g-k} of the zeroes go to the origin, {g-k} go to infinity, and the remaining {2k} zeroes converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}. In this case the de Bruijn-Newman constant remains finite except in the degenerate case {k=g}, in which case {\Lambda = -\infty}.

For instance, consider the case when {g=1} and {P(0,z) = z^2 - a_1 z + q} for some real {a_1} with {|a_1| \leq 2\sqrt{q}}. Then the quadratic polynomial

\displaystyle  P(t,z) = e^t z^2 - a_1 z + e^t q

has zeroes

\displaystyle  \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}

and one easily checks that these zeroes lie on the circle {\{ |z|=\sqrt{q}\}} when {t \geq \log \frac{|a_1|}{2\sqrt{q}}}, and are on the real axis otherwise. Thus in this case we have {\Lambda = \log \frac{|a_1|}{2\sqrt{q}}} (with {\Lambda=-\infty} if {a_1=0}). Note how as {t} increases to {+\infty}, the zeroes repel each other and eventually converge to {\pm i \sqrt{q}}, while as {t} decreases to {-\infty}, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial {P} of degree {g} that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to {1/g}, basically because the average spacing is {1/g} and hence by (3) the typical velocity of the zeroes should be comparable to {g}, and the diameter of the unit circle is comparable to {1}, thus requiring time comparable to {1/g} to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant {\Lambda} should typically take on values comparable to {-1/g} (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of {P} given previously) to explore this further.

Archives