You are currently browsing the category archive for the ‘math.NT’ category.

Define the Collatz map {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} on the natural numbers {{\bf N}+1 = \{1,2,\dots\}} by setting {\mathrm{Col}(N)} to equal {3N+1} when {N} is odd and {N/2} when {N} is even, and let {\mathrm{Col}^{\bf N}(N) := \{ N, \mathrm{Col}(N), \mathrm{Col}^2(N), \dots \}} denote the forward Collatz orbit of {N}. The notorious Collatz conjecture asserts that {1 \in \mathrm{Col}^{\bf N}(N)} for all {N \in {\bf N}+1}. Equivalently, if we define the backwards Collatz orbit {(\mathrm{Col}^{\bf N})^*(N) := \{ M \in {\bf N}+1: N \in \mathrm{Col}^{\bf N}(M) \}} to be all the natural numbers {M} that encounter {N} in their forward Collatz orbit, then the Collatz conjecture asserts that {(\mathrm{Col}^{\bf N})^*(1) = {\bf N}+1}. As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (1)

 

for all {x \geq 1} and {\gamma = 0.84}. (This improved upon previous values of {\gamma = 0.81} obtained by Applegate and Lagarias in 1995, {\gamma = 0.65} by Applegate and Lagarias in 1995 by a different method, {\gamma=0.48} by Wirsching in 1993, {\gamma=0.43} by Krasikov in 1989, {\gamma=0.3} by Sander in 1990, and some {\gamma>0} by Crandall in 1978.) This is still the largest value of {\gamma} for which (1) has been established. Of course, the Collatz conjecture would imply that we can take {\gamma} equal to {1}, which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the {\gamma=1} case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to {1}. In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called Syracuse random variables.

Definition 1 (Syracuse random variables) For any natural number {n}, a Syracuse random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on the cyclic group {{\bf Z}/3^n{\bf Z}} is defined as a random variable of the form

\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = \sum_{m=1}^n 3^{n-m} 2^{-{\mathbf a}_m-\dots-{\mathbf a}_n} \ \ \ \ \ (2)

 

where {\mathbf{a}_1,\dots,\mathbf{a_n}} are independent copies of a geometric random variable {\mathbf{Geom}(2)} on the natural numbers with mean {2}, thus

\displaystyle \mathop{\bf P}( \mathbf{a}_1=a_1,\dots,\mathbf{a}_n=a_n) = 2^{-a_1-\dots-a_n}

} for {a_1,\dots,a_n \in {\bf N}+1}. In (2) the arithmetic is performed in the ring {{\bf Z}/3^n{\bf Z}}.

Thus for instance

\displaystyle \mathrm{Syrac}({\bf Z}/3{\bf Z}) = 2^{-\mathbf{a}_1} \hbox{ mod } 3

\displaystyle \mathrm{Syrac}({\bf Z}/3^2{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2} + 3 \times 2^{-\mathbf{a}_2} \hbox{ mod } 3^2

\displaystyle \mathrm{Syrac}({\bf Z}/3^3{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2-\mathbf{a}_3} + 3 \times 2^{-\mathbf{a}_2-\mathbf{a}_3} + 3^2 \times 2^{-\mathbf{a}_3} \hbox{ mod } 3^3

and so forth. After reversing the labeling of the {\mathbf{a}_1,\dots,\mathbf{a}_n}, one could also view {\mathrm{Syrac}({\bf Z}/3^n{\bf Z})} as the mod {3^n} reduction of a {3}-adic random variable

\displaystyle \mathbf{Syrac}({\bf Z}_3) = \sum_{m=1}^\infty 3^{m-1} 2^{-{\mathbf a}_1-\dots-{\mathbf a}_m}.

The probability density function {b \mapsto \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b )} of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when {n=1}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3{\bf Z}) = b )} is equal to {0,1/3,2/3} for {x=b,1,2 \hbox{ mod } 3} respectively, while when {n=2}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = b )} is equal to

\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

when {b=0,\dots,8 \hbox{ mod } 9} respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let {2{\bf N}+1 = \{1,3,5,\dots\}} denote the odd natural numbers, and define the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1} by

\displaystyle \mathrm{Syr}(N) := \frac{3n+1}{2^{\nu_2(3N+1)}}

where the {2}valuation {\nu_2(3n+1) \in {\bf N}} is the number of times {2} divides {3N+1}. We can define the forward orbit {\mathrm{Syr}^{\bf N}(n)} and backward orbit {(\mathrm{Syr}^{\bf N})^*(N)} of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion {(\mathrm{Syr}^{\bf N})^*(1) = 2{\bf N}+1}, and that the assertion (1) for a given {\gamma} is equivalent to the assertion

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (3)

 

for all {x \geq 1}, where {N} is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number {N} and natural number {n}, one has

\displaystyle \mathrm{Syr}^n(N) = 3^n 2^{-a_1-\dots-a_n} N + \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n}

where the natural numbers {a_1,\dots,a_n} are defined by the formula

\displaystyle a_i := \nu_2( 3 \mathrm{Syr}^{i-1}(N) + 1 ),

so in particular

\displaystyle \mathrm{Syr}^n(N) = \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n} \hbox{ mod } 3^n.

Heuristically, one expects the {2}-valuation {a = \nu_2(N)} of a typical odd number {N} to be approximately distributed according to the geometric distribution {\mathbf{Geom}(2)}, so one therefore expects the residue class {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} to be distributed approximately according to the random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}.

The Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} will always avoid multiples of three (this reflects the fact that {\mathrm{Syr}(N)} is never a multiple of three), but attains any non-multiple of three in {{\bf Z}/3^n{\bf Z}} with positive probability. For any natural number {n}, set

\displaystyle c_n := \inf_{b \in {\bf Z}/3^n{\bf Z}: 3 \nmid b} \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b ).

Equivalently, {c_n} is the greatest quantity for which we have the inequality

\displaystyle \sum_{(a_1,\dots,a_n) \in S_{n,N}} 2^{-a_1-\dots-a_m} \geq c_n \ \ \ \ \ (4)

 

for all integers {N} not divisible by three, where {S_{n,N} \subset ({\bf N}+1)^n} is the set of all tuples {(a_1,\dots,a_n)} for which

\displaystyle N = \sum_{m=1}^n 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n.

Thus for instance {c_0=1}, {c_1 = 1/3}, and {c_2 = 2/63}. On the other hand, since all the probabilities {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b)} sum to {1} as {b \in {\bf Z}/3^n{\bf Z}} ranges over the non-multiples of {3}, we have the trivial upper bound

\displaystyle c_n \leq \frac{3}{2} 3^{-n}.

There is also an easy submultiplicativity result:

Lemma 2 For any natural numbers {n_1,n_2}, we have

\displaystyle c_{n_1+n_2-1} \geq c_{n_1} c_{n_2}.

Proof: Let {N} be an integer not divisible by {3}, then by (4) we have

\displaystyle \sum_{(a_1,\dots,a_{n_1}) \in S_{n_1,N}} 2^{-a_1-\dots-a_{n_1}} \geq c_{n_1}.

If we let {S'_{n_1,N}} denote the set of tuples {(a_1,\dots,a_{n_1-1})} that can be formed from the tuples in {S_{n_1,N}} by deleting the final component {a_{n_1}} from each tuple, then we have

\displaystyle \sum_{(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}} 2^{-a_1-\dots-a_{n_1-1}} \geq c_{n_1}. \ \ \ \ \ (5)

 

Next, observe that if {(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}}, then

\displaystyle N = \sum_{m=1}^{n_1-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_1-1} 2^{-a_1-\dots-a_{n_1-1}} M

with {M = M_{N,n_1,a_1,\dots,a_{n_1-1}}} an integer not divisible by three. By definition of {S_{n_2,M}} and a relabeling, we then have

\displaystyle M = \sum_{m=1}^{n_2} 3^{m-1} 2^{-a_{n_1}-\dots-a_{m+n_1-1}} \hbox{ mod } 3^{n_2}

for all {(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}}. For such tuples we then have

\displaystyle N = \sum_{m=1}^{n_1+n_2-1} 3^{m-1} 2^{-a_1-\dots-a_{n_1+n_2-1}} \hbox{ mod } 3^{n_1+n_2-1}

so that {(a_1,\dots,a_{n_1+n_2-1}) \in S_{n_1+n_2-1,N}}. Since

\displaystyle \sum_{(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}} 2^{-a_{n_1}-\dots-a_{n_1+n_2-1}} \geq c_{n_2}

for each {M}, the claim follows. \Box

From this lemma we see that {c_n = 3^{-\beta n + o(n)}} for some absolute constant {\beta \geq 1}. Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of {{\bf Z}/3^n{\bf Z}} (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that {\beta=1}. I cannot prove this, but I can show that this conjecture would imply that we can take the exponent {\gamma} in (1), (3) arbitrarily close to one:

Proposition 3 Suppose that {\beta=1} (that is to say, {c_n = 3^{-n+o(n)}} as {n \rightarrow \infty}). Then

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}, or equivalently

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}. In other words, (1), (3) hold for all {\gamma < 1}.

I prove this proposition below the fold. A variant of the argument shows that for any value of {\beta}, (1), (3) holds whenever {\gamma < f(\beta)}, where {f: [0,1] \rightarrow [0,1]} is an explicitly computable function with {f(\beta) \rightarrow 1} as {\beta \rightarrow 1}. In principle, one could then improve the Krasikov-Lagarias result {\gamma = 0.84} by getting a sufficiently good upper bound on {\beta}, which is in principle achievable numerically (note for instance that Lemma 2 implies the bound {c_n \leq 3^{-\beta(n-1)}} for any {n}, since {c_{kn-k+1} \geq c_n^k} for any {k}).

Read the rest of this entry »

Just a brief post to record some notable papers in my fields of interest that appeared on the arXiv recently.

  • A sharp square function estimate for the cone in {\bf R}^3“, by Larry Guth, Hong Wang, and Ruixiang Zhang.  This paper establishes an optimal (up to epsilon losses) square function estimate for the three-dimensional light cone that was essentially conjectured by Mockenhaupt, Seeger, and Sogge, which has a number of other consequences including Sogge’s local smoothing conjecture for the wave equation in two spatial dimensions, which in turn implies the (already known) Bochner-Riesz, restriction, and Kakeya conjectures in two dimensions.   Interestingly, modern techniques such as polynomial partitioning and decoupling estimates are not used in this argument; instead, the authors mostly rely on an induction on scales argument and Kakeya type estimates.  Many previous authors (including myself) were able to get weaker estimates of this type by an induction on scales method, but there were always significant inefficiencies in doing so; in particular knowing the sharp square function estimate at smaller scales did not imply the sharp square function estimate at the given larger scale.  The authors here get around this issue by finding an even stronger estimate that implies the square function estimate, but behaves significantly better with respect to induction on scales.
  • On the Chowla and twin primes conjectures over {\mathbb F}_q[T]“, by Will Sawin and Mark Shusterman.  This paper resolves a number of well known open conjectures in analytic number theory, such as the Chowla conjecture and the twin prime conjecture (in the strong form conjectured by Hardy and Littlewood), in the case of function fields where the field is a prime power q=p^j which is fixed (in contrast to a number of existing results in the “large q” limit) but has a large exponent j.  The techniques here are orthogonal to those used in recent progress towards the Chowla conjecture over the integers (e.g., in this previous paper of mine); the starting point is an algebraic observation that in certain function fields, the Mobius function behaves like a quadratic Dirichlet character along certain arithmetic progressions.  In principle, this reduces problems such as Chowla’s conjecture to problems about estimating sums of Dirichlet characters, for which more is known; but the task is still far from trivial.
  • Bounds for sets with no polynomial progressions“, by Sarah Peluse.  This paper can be viewed as part of a larger project to obtain quantitative density Ramsey theorems of Szemeredi type.  For instance, Gowers famously established a relatively good quantitative bound for Szemeredi’s theorem that all dense subsets of integers contain arbitrarily long arithmetic progressions a, a+r, \dots, a+(k-1)r.  The corresponding question for polynomial progressions a+P_1(r), \dots, a+P_k(r) is considered more difficult for a number of reasons.  One of them is that dilation invariance is lost; a dilation of an arithmetic progression is again an arithmetic progression, but a dilation of a polynomial progression will in general not be a polynomial progression with the same polynomials P_1,\dots,P_k.  Another issue is that the ranges of the two parameters a,r are now at different scales.  Peluse gets around these difficulties in the case when all the polynomials P_1,\dots,P_k have distinct degrees, which is in some sense the opposite case to that considered by Gowers (in particular, she avoids the need to obtain quantitative inverse theorems for high order Gowers norms; which was recently obtained in this integer setting by Manners but with bounds that are probably not strong enough to for the bounds in Peluse’s results, due to a degree lowering argument that is available in this case).  To resolve the first difficulty one has to make all the estimates rather uniform in the coefficients of the polynomials P_j, so that one can still run a density increment argument efficiently.  To resolve the second difficulty one needs to find a quantitative concatenation theorem for Gowers uniformity norms.  Many of these ideas were developed in previous papers of Peluse and Peluse-Prendiville in simpler settings.
  • On blow up for the energy super critical defocusing non linear Schrödinger equations“, by Frank Merle, Pierre Raphael, Igor Rodnianski, and Jeremie Szeftel.  This paper (when combined with two companion papers) resolves a long-standing problem as to whether finite time blowup occurs for the defocusing supercritical nonlinear Schrödinger equation (at least in certain dimensions and nonlinearities).  I had a previous paper establishing a result like this if one “cheated” by replacing the nonlinear Schrodinger equation by a system of such equations, but remarkably they are able to tackle the original equation itself without any such cheating.  Given the very analogous situation with Navier-Stokes, where again one can create finite time blowup by “cheating” and modifying the equation, it does raise hope that finite time blowup for the incompressible Navier-Stokes and Euler equations can be established…  In fact the connection may not just be at the level of analogy; a surprising key ingredient in the proofs here is the observation that a certain blowup ansatz for the nonlinear Schrodinger equation is governed by solutions to the (compressible) Euler equation, and finite time blowup examples for the latter can be used to construct finite time blowup examples for the former.

Let us call an arithmetic function {f: {\bf N} \rightarrow {\bf C}} {1}-bounded if we have {|f(n)| \leq 1} for all {n \in {\bf N}}. In this section we focus on the asymptotic behaviour of {1}-bounded multiplicative functions. Some key examples of such functions include:

  • The Möbius function {\mu};
  • The Liouville function {\lambda};
  • Archimedean” characters {n \mapsto n^{it}} (which I call Archimedean because they are pullbacks of a Fourier character {x \mapsto x^{it}} on the multiplicative group {{\bf R}^+}, which has the Archimedean property);
  • Dirichlet characters (or “non-Archimedean” characters) {\chi} (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group {({\bf Z}/q{\bf Z})^\times} with the discrete (non-Archimedean) metric);
  • Hybrid characters {n \mapsto \chi(n) n^{it}}.

The space of {1}-bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function {f}, we are often interested in the asymptotics of long averages such as

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n)

for large values of {X}, as well as short sums

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)

where {H} and {x} are both large, but {H} is significantly smaller than {x}. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by {O(1)}; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)

as {X \rightarrow \infty}. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

\displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)

as {X \rightarrow \infty}. (Hint: use the identities {\lambda(n) = \sum_{d^2|n} \mu(n/d^2)} and {\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)

as {x \rightarrow \infty} if {H = H(x)} is such that {H \geq \varepsilon x} for some fixed {\varepsilon>0}. However it is significantly more difficult to understand what happens when {H} grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to {H \geq x^{1/2} \log^C x} for an absolute constant {C} (the bound {H \geq x^{1/2+\varepsilon}} is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that {H} should be able to be taken as small as {x^\varepsilon}, and perhaps even {\log^{1+\varepsilon} x} if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that {H} cannot be taken arbitrarily slowly growing in {x}, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence {x \rightarrow \infty} and some sufficiently slowly growing {H = H(x)}, by modifying the arguments in these papers of mine).

The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let {H = H(X)} be a function of {X} such that {H \rightarrow \infty} and {H = o(X)} as {X \rightarrow \infty}. Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded function. Show that the following assertions are equivalent:

  • (i) One has

    \displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)

    as {X \rightarrow \infty}, uniformly for all {x \in [X,2X]} outside of a set of measure {o(X)}.

  • (ii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)

    as {X \rightarrow \infty}.

  • (iii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)

    as {X \rightarrow \infty}.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X

whenever {X^{1/6+\varepsilon} \leq H \leq X} and {\varepsilon>0}. With this quality of bound (saving arbitrary powers of {\log X} over the trivial bound of {O(1)}), this is still the lowest value of {H} one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of {\log X} or {\log H}, or just a qualitative decay of {o(1)}), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any {2 \leq H \leq X}, one has

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H

for some absolute constant {c>0}.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function {H = H(X)} that goes to infinity as {X \rightarrow \infty}, no matter how slowly! This ability to let {H} grow slowly with {X} is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that {H} be allowed to grow more slowly than {\log X}. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

  • (i) Establish the lower bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c

    for some absolute constant {c>0} and all sufficiently large {X}. (Hint: if this bound failed, then {\lambda(n)=\lambda(n+1)} would hold for almost all {n}; use this to create many intervals {[x,x+H]} for which {\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)} is extremely large.)

  • (ii) Show that Theorem 3 also holds with {\lambda(n)} replaced by {\chi_2 \lambda(n)}, where {\chi_2} is the principal character of period {2}. (Use the fact that {\lambda(2n)=-\lambda(n)} for all {n}.) Use this to establish the corresponding upper bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c

    to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as {\mu} or {\lambda} have Dirichlet series {{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}, {{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}} that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of {1}-bounded multiplicative functions {f}, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two {1}-bounded multiplicative functions {f,g} (at a given scale {X}), which informally measures the extent to which {f} “pretends” to be like {g} (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two {1}-bounded multiplicative functions {f,g}, and a threshold {X>0}, the pretentious distance {\mathbb{D}(f,g;X)} between {f} and {g} up to scale {X} is given by the formula

\displaystyle  \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}

Note that one can also define an infinite version {\mathbb{D}(f,g;\infty)} of this distance by removing the constraint {p \leq X}, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because {\mathbb{D}(f,f;X)} can be non-zero, and furthermore {\mathbb{D}(f,g;X)} can vanish without {f,g} being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two {1}-bounded multiplicative functions {f,g} will exhibit similar behaviour at scale {X} if their pretentious distance {\mathbb{D}(f,g;X)} is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative functions {f,g}, one has

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )

for an absolute constant {c>0}.

In particular, if {f} does not pretend to be {1}, then the logarithmic average {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} will be small. This condition is basically necessary, since of course {\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}.

If one works with non-logarithmic averages {\frac{1}{X} \sum_{n \leq X} f(n)}, then not pretending to be {1} is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if {f} is an Archimedean character {f(n) = n^{it}} for some non-zero real {t}, then {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} goes to zero as {X \rightarrow \infty} (which is consistent with Proposition 6), but {\frac{1}{X} \sum_{n \leq X} f(n)} does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}

for an absolute constant {c>0} and any {T > 0}.

Informally, we refer to a {1}-bounded multiplicative function as “pretentious’; if it pretends to be a character such as {n^{it}}, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character {1} which needs to be considered, but in Theorem 7 it is the characters {n \mapsto n^{it}} with {|t| \leq T}. In other contexts one may also need to add Dirichlet characters {\chi(n)} or hybrid characters such as {\chi(n) n^{it}} to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

\displaystyle  |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta

for some {\eta > 0}, then one has the pretentious property

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}

for some {t \ll \eta^{-1}}. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.

Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let {\eta>0}, and let {1 \leq H \leq X}, with {H} sufficiently large depending on {\eta}. Suppose that {f} is a {1}-bounded multiplicative function such that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta \frac{X}{H}}.

The condition {t \ll_\eta \frac{X}{H}} is basically optimal, as the following example shows:

Exercise 9 Let {\varepsilon>0} be a sufficiently small constant, and let {1 \leq H \leq X} be such that {\frac{1}{\varepsilon} \leq H \leq \varepsilon X}. Let {f} be the Archimedean character {f(n) = n^{it}} for some {|t| \leq \varepsilon \frac{X}{H}}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only {o(1)} rather than {\log^{-c} H}). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function {f} with its Dirichlet series

\displaystyle  {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}

which one can view as a discrete Fourier transform of {f} (or more precisely of the measure {\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}, if one evaluates the Dirichlet series on the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let {f,g: {\bf N} \rightarrow {\bf C}} be finitely supported arithmetic functions, and let {\psi: {\bf R} \rightarrow {\bf R}} be a Schwartz function. Then

\displaystyle  \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt

where {\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du} is the Fourier transform of {\psi}. (Note that the finite support of {f,g} and the Schwartz nature of {\psi,\hat \psi} ensure that both sides of the identity are absolutely convergent.)

The restriction that {f,g} be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

Proof: By expanding out the Dirichlet series, it suffices to show that

\displaystyle  \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt

for any natural numbers {n,m}. But this follows from the Fourier inversion formula {\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt} applied at {u = \log n - \log m}. \Box

For applications to Halasz type theorems, one sets {g(n)} equal to the Kronecker delta {\delta_{n=1}}, producing weighted integrals of {{\mathcal D} f(1+it)} of “{L^1}” type. For applications to Matomaki-Radziwill theorems, one instead sets {f=g}, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of {|{\mathcal D} f(1+it)|^2} of “{L^2}” type:

Exercise 11 (Plancherel type identity) If {f: {\bf N} \rightarrow {\bf C}} is finitely supported, and {\varphi: {\bf R} \rightarrow {\bf R}} is a Schwartz function, establish the identity

\displaystyle  \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.

In contrast, information about the non-pretentious nature of a multiplicative function {f} will give “pointwise” or “{L^\infty}” type control on the Dirichlet series {{\mathcal D} f(1+it)}, as is suggested from the Euler product factorisation of {{\mathcal D} f}.

It will be convenient to formalise the notion of {L^1}, {L^2}, and {L^\infty} control of the Dirichlet series {{\mathcal D} f}, which as previously mentioned can be viewed as a sort of “Fourier transform” of {f}:

Definition 12 (Fourier norms) Let {f: {\bf N} \rightarrow {\bf C}} be finitely supported, and let {\Omega \subset {\bf R}} be a bounded measurable set. We define the Fourier {L^\infty} norm

\displaystyle  \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,

the Fourier {L^2} norm

\displaystyle  \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},

and the Fourier {L^1} norm

\displaystyle  \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.

One could more generally define {FL^p} norms for other exponents {p}, but we will only need the exponents {p=1,2,\infty} in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier {L^\infty} norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier {L^\infty} control via Halasz) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, let {I} be an interval in {[C^{-1} X, CX]} for some {X \geq C \geq 1}, let {R \geq 1}, and let {\Omega \subset {\bf R}} be a bounded measurable set. Show that

\displaystyle  \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a {\frac{1}{n}} weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier {L^2} norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ({L^2} mean value theorem) Let {T \geq 1}, and let {f: {\bf N} \rightarrow {\bf C}} be finitely supported. Show that

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.

Conclude in particular that if {f} is supported in {[C^{-1} N, C N]} for some {C \geq 1} and {N \gg T}, then

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier {L^\infty} estimates are already sufficient to obtain decent control on the (weighted) Fourier {L^1} type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier {L^\infty} control to Fourier {L^1} or {L^2} control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

\displaystyle  {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)

gives the inequalities

\displaystyle  \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)

and

\displaystyle  \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)

The strategy is then to factor (or approximately factor) the original function {f} as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier {L^2} or {L^\infty} estimates on various regions {\Omega}, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split {f} into the Dirichlet convolution of three factors, one of which will be estimated in {FL^\infty} using the non-pretentiousness hypothesis, and the other two being estimated in {FL^2} using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of {f} into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain {[-T,T]} into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in {FL^\infty}, and the other in {FL^2}; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in {FL^\infty} by construction of the region; in other places, the {FL^\infty} control will come from Exercise 13. Similarly, in some regions, satisfactory {FL^2} control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard {L^2} mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.

There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing {f} as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges {[P,Q]} of primes, the number of prime divisors of a large number {n} in the range {[P,Q]} is almost always close to {\log\log Q - \log\log P}. Thus, if we introduce the arithmetic functions

\displaystyle  w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)

then we have

\displaystyle  1 \approx 1 * w_{[P,Q]}

and more generally we have a twisted approximation

\displaystyle  f \approx f * fw_{[P,Q]}

for multiplicative functions {f}. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “{FL^2} energy” of a multiplicative function {f} is concentrated in those regions where {f w_{[P,Q]}} is extremely large in a {FL^\infty} sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of {{\mathcal D} f}.

Read the rest of this entry »

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

\displaystyle  \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)

of the prime number theorem (where we take the limit {x \rightarrow \infty}). One can interpret this estimate probabilistically as

\displaystyle  {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)

where {\mathbf{n} = \mathbf{n}_{\leq x}} is a random variable drawn uniformly from the natural numbers up to {x}, and {{\mathbb E}} denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

\displaystyle  \sum_{n \leq x} \mu(n) \mu(n+1).

After dividing by {x}, this is essentially

\displaystyle  {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).

With probabilistic intuition, one may expect the random variables {\mu(\mathbf{n}), \mu(\mathbf{n}+1)} to be approximately independent (there is no obvious relationship between the number of prime factors of {\mathbf{n}}, and of {\mathbf{n}+1}), and so the above average would be expected to be approximately equal to

\displaystyle  ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))

which by (2) is equal to {o(1)}. Thus we are led to the prediction

\displaystyle  \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “{L^2}” measure of independence known as covariance, and the “{L \log L}” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of {L^2} type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of {L \log L} methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as {L^2} methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

\displaystyle  \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound {\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables {\mathbf{X}} and {\mathbf{Y}} of number-theoretic origin (such as the random variables {\mu(\mathbf{n})} and {\mu(\mathbf{n}+1)} mentioned previously), it can often be extremely difficult to determine the extent to which {\mathbf{X},\mathbf{Y}} behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if {\mathbf{Y}_1,\dots,\mathbf{Y}_k} are a large collection of “independent” random variables, and {\mathbf{X}} is a further random variable that is “not too large” in some sense, then {\mathbf{X}} must necessarily be nearly independent (or conditionally independent) to many of the {\mathbf{Y}_i}, even if one cannot pinpoint precisely which of the {\mathbf{Y}_i} the variable {\mathbf{X}} is independent with. In the case of the second moment method, this allows us to compute correlations such as {{\mathbb E} {\mathbf X} \mathbf{Y}_i} for “most” {i}. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities {{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )} for “most” {i} thanks to tools such as the Pinsker inequality.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle  \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N)  0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)

When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle  \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

\displaystyle  \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle  \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle  (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle  + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle  \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle  \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.

A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

  1. Elementary multiplicative number theory
  2. Complex-analytic multiplicative number theory
  3. The entropy decrement argument
  4. Bounds for exponential sums
  5. Zero density theorems
  6. Halasz’s theorem and the Matomaki-Radziwill theorem
  7. The circle method
  8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem

Lecture notes for topics 3, 6, and 8 will be forthcoming.

 

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes {{\mathcal P} := \{2,3,5,\dots\}}, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If {x} is a large number, then the largest prime gap {G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n} in {[1,x]} is of size {\asymp \log^2 x}. (Granville refines this conjecture to {\gtrsim \xi \log^2 x}, where {\xi := 2e^{-\gamma} = 1.1229\dots}. Here we use the asymptotic notation {X \gtrsim Y} for {X \geq (1-o(1)) Y}, {X \sim Y} for {X \gtrsim Y \gtrsim X}, {X \gg Y} for {X \geq C^{-1} Y}, and {X \asymp Y} for {X \gg Y \gg X}.)

Conjecture 2 (Hardy-Littlewood conjecture) If {\mathcal{H} := \{h_1,\dots,h_k\}} are fixed distinct integers, then the number of numbers {n \in [1,x]} with {n+h_1,\dots,n+h_k} all prime is {({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}} as {x \rightarrow \infty}, where the singular series {{\mathfrak S}(\mathcal{H})} is defined by the formula

\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps {p_{n+1}-p_n} with {n \leq x} were asymptotically distributed according to an exponential distribution of mean {\log x}, in the sense that

\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)

 

as {x \rightarrow \infty} for any fixed {\lambda \geq 0}. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of {\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}} for fixed {k,\lambda}, where {p_n} ranges over the primes in {[1,x]}. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is even and

\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is odd, for any natural number {N}; setting {N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|} and taking means, one then gets upper and lower bounds for the probability that the interval {(p_n, p_n + \lambda \log x)} is free of primes. The most difficult step is to control the mean values of the singular series {{\mathfrak S}(\mathcal{H})} as {{\mathcal H}} ranges over {k}-tuples in a fixed interval such as {[0, \lambda \log x]}.

Heuristically, if one extrapolates the asymptotic (1) to the regime {\lambda \asymp \log x}, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below {1} when {\lambda} is significantly larger than {\log x}. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of {\lambda}, which is only enough to establish the far weaker bound {G_{\mathcal P}(x) / \log x \rightarrow \infty}, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on {G_{\mathcal P}(x)}). An inspection of the argument shows that if one wished to extend (1) to parameter choices {\lambda} that were allowed to grow with {x}, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length {k} of the tuple {{\mathcal H} = (h_1,\dots,h_k)}, as well as the magnitudes of the shifts {h_1,\dots,h_k}, were also allowed to grow with {x}. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

  • If the Hardy-Littlewood conjecture is uniformly true for {k}-tuples of length {k \ll \frac{\log x}{\log\log x}}, and with shifts {h_1,\dots,h_k} of size {O( \log^2 x )}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}.
  • If the Hardy-Littlewood conjecture is “true on average” for {k}-tuples of length {k \ll \frac{y}{\log x}} and shifts {h_1,\dots,h_k} of size {y} for all {\log x \leq y \leq \log^2 x \log\log x}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \log^2 x}.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with {k} allowed to grow slowly with {x}. Again, the main difficulty is to accurately estimate average values of the singular series {{\mathfrak S}({\mathfrak H})}. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)

of the singular series, for a suitable cutoff {z}; it turns out that when studying prime tuples of size {t}, the most convenient cutoff {z(t)} is the “Pólya magic cutoff“, defined as the largest prime for which

\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)

 

(this is well defined for {t \geq e^2}); by Mertens’ theorem, we have {z(t) \sim t^{1/e^\gamma}}. One can interpret {V_{\mathcal Z}(z)} probabilistically as

\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )

where {\mathcal{S}_z \subset {\bf Z}} is the randomly sifted set of integers formed by removing one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p \leq z}. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes {{\mathcal P}} behave in some approximate statistical sense like the random sifted set {\mathcal{S}_z}, and one can prove the above theorem by using the Bonferroni inequalities both for the primes {{\mathcal P}} and for the random sifted set, and comparing the two (using an even {K} for the sifted set and an odd {K} for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes {{\mathcal P}} other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of {{\mathcal P}} which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model {{\mathcal C}} of the primes, in which each natural number {n \geq 3} is selected for membership in {{\mathcal C}} with an independent probability of {1/\log n}. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})

and Cramér also showed that the largest gap {G_{\mathcal C}(x)} was almost surely {\sim \log^2 x}. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series {{\mathfrak S}({\mathcal H})} is absent.

Granville proposed a refinement {{\mathcal G}} to Cramér’s random model {{\mathcal C}} in which one first sieves out (in each dyadic interval {[x,2x]}) all residue classes {0 \hbox{ mod } p} for {p \leq A} for a certain threshold {A = \log^{1-o(1)} x = o(\log x)}, and then places each surviving natural number {n} in {{\mathcal G}} with an independent probability {\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap {G_{\mathcal G}(x)} in this model was almost surely {\gtrsim \xi \log^2 x}, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to {10^{18}}, for instance, have shown that {G_{\mathcal P}(x)} never exceeds {0.9206 \log^2 x} in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff {A}. After some experimentation, we were able to produce a tractable random model {{\mathcal R}} for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of {\gtrsim \xi \log^2 x} (we also get an upper bound of {\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}} for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p}, and as before we let {S_z} be the sifted set of integers formed by deleting the residue classes {a_p \hbox{ mod } p} with {p \leq z}. We then set

\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}

with {z(t)} Pólya’s magic cutoff (this is the cutoff that gives {{\mathcal R}} a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)

 

and that the Hardy-Littlewood conjectures hold with power savings for {k} up to {\log^c x} for any fixed {c < 1} and for shifts {h_1,\dots,h_k} of size {O(\log^c x)}. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint {c=1}), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model {{\mathcal R}} (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for {{\mathcal G}_{\mathcal R}(x) } that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo {p} from an interval {[0,y]} for all primes {p} up to a given threshold. The lower bound in (3) relates to the case of deleting the {0 \hbox{ mod } p} residue classes from {[0,y]}; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of {{\mathcal R}} is somewhat more complicated than of {{\mathcal C}} or {{\mathcal G}} as there is now non-trivial coupling between the events {n \in {\mathcal R}} as {n} varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval {[0,y]} and randomly deletes a residue class {a_p \hbox{ mod } p} for each prime {p} up to a given threshold. For very small {p} this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized {p}, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes {p}, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements {n \in [0,y]} to primes {p} if {n} falls in the random residue class {a_p \hbox{ mod } p}) turns out to give the best results.

I recently came across this question on MathOverflow asking if there are any polynomials {P} of two variables with rational coefficients, such that the map {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} is a bijection. The answer to this question is almost surely “no”, but it is remarkable how hard this problem resists any attempt at rigorous proof. (MathOverflow users with enough privileges to see deleted answers will find that there are no fewer than seventeen deleted attempts at a proof in response to this question!)

On the other hand, the one surviving response to the question does point out this paper of Poonen which shows that assuming a powerful conjecture in Diophantine geometry known as the Bombieri-Lang conjecture (discussed in this previous post), it is at least possible to exhibit polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} which are injective.

I believe that it should be possible to also rule out the existence of bijective polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} if one assumes the Bombieri-Lang conjecture, and have sketched out a strategy to do so, but filling in the gaps requires a fair bit more algebraic geometry than I am capable of. So as a sort of experiment, I would like to see if a rigorous implication of this form (similarly to the rigorous implication of the Erdos-Ulam conjecture from the Bombieri-Lang conjecture in my previous post) can be crowdsourced, in the spirit of the polymath projects (though I feel that this particular problem should be significantly quicker to resolve than a typical such project).

Here is how I imagine a Bombieri-Lang-powered resolution of this question should proceed (modulo a large number of unjustified and somewhat vague steps that I believe to be true but have not established rigorously). Suppose for contradiction that we have a bijective polynomial {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}}. Then for any polynomial {Q: {\bf Q} \rightarrow {\bf Q}} of one variable, the surface

\displaystyle  S_Q := \{ (x,y,z) \in \mathbb{A}^3: P(x,y) = Q(z) \}

has infinitely many rational points; indeed, every rational {z \in {\bf Q}} lifts to exactly one rational point in {S_Q}. I believe that for “typical” {Q} this surface {S_Q} should be irreducible. One can now split into two cases:

  • (a) The rational points in {S_Q} are Zariski dense in {S_Q}.
  • (b) The rational points in {S_Q} are not Zariski dense in {S_Q}.

Consider case (b) first. By definition, this case asserts that the rational points in {S_Q} are contained in a finite number of algebraic curves. By Faltings’ theorem (a special case of the Bombieri-Lang conjecture), any curve of genus two or higher only contains a finite number of rational points. So all but finitely many of the rational points in {S_Q} are contained in a finite union of genus zero and genus one curves. I think all genus zero curves are birational to a line, and all the genus one curves are birational to an elliptic curve (though I don’t have an immediate reference for this). These curves {C} all can have an infinity of rational points, but very few of them should have “enough” rational points {C \cap {\bf Q}^3} that their projection {\pi(C \cap {\bf Q}^3) := \{ z \in {\bf Q} : (x,y,z) \in C \hbox{ for some } x,y \in {\bf Q} \}} to the third coordinate is “large”. In particular, I believe

  • (i) If {C \subset {\mathbb A}^3} is birational to an elliptic curve, then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow at most polylogarithmically in {H} (i.e., be of order {O( \log^{O(1)} H )}.
  • (ii) If {C \subset {\mathbb A}^3} is birational to a line but not of the form {\{ (f(z), g(z), z) \}} for some rational {f,g}, then then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow slower than {H^2} (in fact I think it can only grow like {O(H)}).

I do not have proofs of these results (though I think something similar to (i) can be found in Knapp’s book, and (ii) should basically follow by using a rational parameterisation {\{(f(t),g(t),h(t))\}} of {C} with {h} nonlinear). Assuming these assertions, this would mean that there is a curve of the form {\{ (f(z),g(z),z)\}} that captures a “positive fraction” of the rational points of {S_Q}, as measured by restricting the height of the third coordinate {z} to lie below a large threshold {H}, computing density, and sending {H} to infinity (taking a limit superior). I believe this forces an identity of the form

\displaystyle  P(f(z), g(z)) = Q(z) \ \ \ \ \ (1)

for all {z}. Such identities are certainly possible for some choices of {Q} (e.g. {Q(z) = P(F(z), G(z))} for arbitrary polynomials {F,G} of one variable) but I believe that the only way that such identities hold for a “positive fraction” of {Q} (as measured using height as before) is if there is in fact a rational identity of the form

\displaystyle  P( f_0(z), g_0(z) ) = z

for some rational functions {f_0,g_0} with rational coefficients (in which case we would have {f = f_0 \circ Q} and {g = g_0 \circ Q}). But such an identity would contradict the hypothesis that {P} is bijective, since one can take a rational point {(x,y)} outside of the curve {\{ (f_0(z), g_0(z)): z \in {\bf Q} \}}, and set {z := P(x,y)}, in which case we have {P(x,y) = P(f_0(z), g_0(z) )} violating the injective nature of {P}. Thus, modulo a lot of steps that have not been fully justified, we have ruled out the scenario in which case (b) holds for a “positive fraction” of {Q}.

This leaves the scenario in which case (a) holds for a “positive fraction” of {Q}. Assuming the Bombieri-Lang conjecture, this implies that for such {Q}, any resolution of singularities of {S_Q} fails to be of general type. I would imagine that this places some very strong constraints on {P,Q}, since I would expect the equation {P(x,y) = Q(z)} to describe a surface of general type for “generic” choices of {P,Q} (after resolving singularities). However, I do not have a good set of techniques for detecting whether a given surface is of general type or not. Presumably one should proceed by viewing the surface {\{ (x,y,z): P(x,y) = Q(z) \}} as a fibre product of the simpler surface {\{ (x,y,w): P(x,y) = w \}} and the curve {\{ (z,w): Q(z) = w \}} over the line {\{w \}}. In any event, I believe the way to handle (a) is to show that the failure of general type of {S_Q} implies some strong algebraic constraint between {P} and {Q} (something in the spirit of (1), perhaps), and then use this constraint to rule out the bijectivity of {P} by some further ad hoc method.

This is another sequel to a recent post in which I showed the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. It turns out that in the function field setting there is an exact version of this approximation which captures many of the known features of the Riemann zeta function, namely Dirichlet {L}-functions for a random character of given modulus over a function field. This model was (essentially) studied in a fairly recent paper by Andrade, Miller, Pratt, and Trinh; I am not sure if there is any further literature on this model beyond this paper (though the number field analogue of low-lying zeroes of Dirichlet {L}-functions is certainly well studied). In this model it is possible to set {N} fixed and let {T} go to infinity, thus providing a simple finite-dimensional model problem for problems involving the statistics of zeroes of the zeta function.

In this post I would like to record this analogue precisely. We will need a finite field {{\mathbb F}} of some order {q} and a natural number {N}, and set

\displaystyle  T := q^{N+1}.

We will primarily think of {q} as being large and {N} as being either fixed or growing very slowly with {q}, though it is possible to also consider other asymptotic regimes (such as holding {q} fixed and letting {N} go to infinity). Let {{\mathbb F}[X]} be the ring of polynomials of one variable {X} with coefficients in {{\mathbb F}}, and let {{\mathbb F}[X]'} be the multiplicative semigroup of monic polynomials in {{\mathbb F}[X]}; one should view {{\mathbb F}[X]} and {{\mathbb F}[X]'} as the function field analogue of the integers and natural numbers respectively. We use the valuation {|n| := q^{\mathrm{deg}(n)}} for polynomials {n \in {\mathbb F}[X]} (with {|0|=0}); this is the analogue of the usual absolute value on the integers. We select an irreducible polynomial {Q \in {\mathbb F}[X]} of size {|Q|=T} (i.e., {Q} has degree {N+1}). The multiplicative group {({\mathbb F}[X]/Q{\mathbb F}[X])^\times} can be shown to be cyclic of order {|Q|-1=T-1}. A Dirichlet character of modulus {Q} is a completely multiplicative function {\chi: {\mathbb F}[X] \rightarrow {\bf C}} of modulus {Q}, that is periodic of period {Q} and vanishes on those {n \in {\mathbb F}[X]} not coprime to {Q}. From Fourier analysis we see that there are exactly {\phi(Q) := |Q|-1} Dirichlet characters of modulus {Q}. A Dirichlet character is said to be odd if it is not identically one on the group {{\mathbb F}^\times} of non-zero constants; there are only {\frac{1}{q-1} \phi(Q)} non-odd characters (including the principal character), so in the limit {q \rightarrow \infty} most Dirichlet characters are odd. We will work primarily with odd characters in order to be able to ignore the effect of the place at infinity.

Let {\chi} be an odd Dirichlet character of modulus {Q}. The Dirichlet {L}-function {L(s, \chi)} is then defined (for {s \in {\bf C}} of sufficiently large real part, at least) as

\displaystyle  L(s,\chi) := \sum_{n \in {\mathbb F}[X]'} \frac{\chi(n)}{|n|^s}

\displaystyle  = \sum_{m=0}^\infty q^{-sm} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that for {m \geq N+1}, the set {n \in {\mathbb F}[X]': |n| = q^m} is invariant under shifts {h} whenever {|h| < T}; since this covers a full set of residue classes of {{\mathbb F}[X]/Q{\mathbb F}[X]}, and the odd character {\chi} has mean zero on this set of residue classes, we conclude that the sum {\sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n)} vanishes for {m \geq N+1}. In particular, the {L}-function is entire, and for any real number {t} and complex number {z}, we can write the {L}-function as a polynomial

\displaystyle  L(\frac{1}{2} + it - \frac{2\pi i z}{\log T},\chi) = P(Z) = P_{t,\chi}(Z) := \sum_{m=0}^N c^1_m(t,\chi) Z^j

where {Z := e(z/N) = e^{2\pi i z/N}} and the coefficients {c^1_m = c^1_m(t,\chi)} are given by the formula

\displaystyle  c^1_m(t,\chi) := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that {t} can easily be normalised to zero by the relation

\displaystyle  P_{t,\chi}(Z) = P_{0,\chi}( q^{-it} Z ). \ \ \ \ \ (2)

In particular, the dependence on {t} is periodic with period {\frac{2\pi}{\log q}} (so by abuse of notation one could also take {t} to be an element of {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}).

Fourier inversion yields a functional equation for the polynomial {P}:

Proposition 1 (Functional equation) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. There exists a phase {e(\theta)} (depending on {t,\chi}) such that

\displaystyle  a_{N-m}^1 = e(\theta) \overline{c^1_m}

for all {0 \leq m \leq N}, or equivalently that

\displaystyle  P(1/Z) = e^{i\theta} Z^{-N} \overline{P}(Z)

where {\overline{P}(Z) := \overline{P(\overline{Z})}}.

Proof: We can normalise {t=0}. Let {G} be the finite field {{\mathbb F}[X] / Q {\mathbb F}[X]}. We can write

\displaystyle  a_{N-m} = q^{-(N-m)/2} \sum_{n \in q^{N-m} + H_{N-m}} \chi(n)

where {H_j} denotes the subgroup of {G} consisting of (residue classes of) polynomials of degree less than {j}. Let {e_G: G \rightarrow S^1} be a non-trivial character of {G} whose kernel lies in the space {H_N} (this is easily achieved by pulling back a non-trivial character from the quotient {G/H_N \equiv {\mathbb F}}). We can use the Fourier inversion formula to write

\displaystyle  a_{N-m} = q^{(m-N)/2} \sum_{\xi \in G} \hat \chi(\xi) \sum_{n \in T^{N-m} + H_{N-m}} e_G( n\xi )

where

\displaystyle  \hat \chi(\xi) := q^{-N-1} \sum_{n \in G} \chi(n) e_G(-n\xi).

From change of variables we see that {\hat \chi} is a scalar multiple of {\overline{\chi}}; from Plancherel we conclude that

\displaystyle  \hat \chi = e(\theta_0) q^{-(N+1)/2} \overline{\chi} \ \ \ \ \ (3)

for some phase {e(\theta_0)}. We conclude that

\displaystyle  a_{N-m} = e(\theta_0) q^{-(2N-m+1)/2} \sum_{\xi \in G} \overline{\chi}(\xi) e_G( T^{N-j} \xi) \sum_{n \in H_{N-j}} e_G( n\xi ). \ \ \ \ \ (4)

The inner sum {\sum_{n \in H_{N-m}} e_G( n\xi )} equals {q^{N-m}} if {\xi \in H_{j+1}}, and vanishes otherwise, thus

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{\xi \in H_{j+1}} \overline{\chi}(\xi) e_G( T^{N-m} \xi).

For {\xi} in {H_j}, {e_G(T^{N-m} \xi)=1} and the contribution of the sum vanishes as {\chi} is odd. Thus we may restrict {\xi} to {H_{m+1} \backslash H_m}, so that

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{h \in {\mathbb F}^\times} e_G( T^{N} h) \sum_{\xi \in h T^m + H_{m}} \overline{\chi}(\xi).

By the multiplicativity of {\chi}, this factorises as

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} (\sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h)) (\sum_{\xi \in T^m + H_{m}} \overline{\chi}(\xi)).

From the one-dimensional version of (3) (and the fact that {\chi} is odd) we have

\displaystyle  \sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h) = e(\theta_1) q^{1/2}

for some phase {e(\theta_1)}. The claim follows. \Box

As one corollary of the functional equation, {a_N} is a phase rotation of {\overline{a_1} = 1} and thus is non-zero, so {P} has degree exactly {N}. The functional equation is then equivalent to the {N} zeroes of {P} being symmetric across the unit circle. In fact we have the stronger

Theorem 2 (Riemann hypothesis for Dirichlet {L}-functions over function fields) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. Then all the zeroes of {P} lie on the unit circle.

We derive this result from the Riemann hypothesis for curves over function fields below the fold.

In view of this theorem (and the fact that {a_1=1}), we may write

\displaystyle  P(Z) = \mathrm{det}(1 - ZU)

for some unitary {N \times N} matrix {U = U_{t,\chi}}. It is possible to interpret {U} as the action of the geometric Frobenius map on a certain cohomology group, but we will not do so here. The situation here is simpler than in the number field case because the factor {\exp(A)} arising from very small primes is now absent (in the function field setting there are no primes of size between {1} and {q}).

We now let {\chi} vary uniformly at random over all odd characters of modulus {Q}, and {t} uniformly over {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}, independently of {\chi}; we also make the distribution of the random variable {U} conjugation invariant in {U(N)}. We use {{\mathbf E}_Q} to denote the expectation with respect to this randomness. One can then ask what the limiting distribution of {U} is in various regimes; we will focus in this post on the regime where {N} is fixed and {q} is being sent to infinity. In the spirit of the Sato-Tate conjecture, one should expect {U} to converge in distribution to the circular unitary ensemble (CUE), that is to say Haar probability measure on {U(N)}. This may well be provable from Deligne’s “Weil II” machinery (in the spirit of this monograph of Katz and Sarnak), though I do not know how feasible this is or whether it has already been done in the literature; here we shall avoid using this machinery and study what partial results towards this CUE hypothesis one can make without it.

If one lets {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U} (ordered arbitrarily), then we now have

\displaystyle  \sum_{m=0}^N c^1_m Z^m = P(Z) = \prod_{j=1}^N (1 - \lambda_j Z)

and hence the {c^1_m} are essentially elementary symmetric polynomials of the eigenvalues:

\displaystyle  c^1_m = (-1)^j e_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (5)

One can take log derivatives to conclude

\displaystyle  \frac{P'(Z)}{P(Z)} = \sum_{j=1}^N \frac{\lambda_j}{1-\lambda_j Z}.

On the other hand, as in the number field case one has the Dirichlet series expansion

\displaystyle  Z \frac{P'(Z)}{P(Z)} = \sum_{n \in {\mathbb F}[X]'} \frac{\Lambda_q(n) \chi(n)}{|n|^s}

where {s = \frac{1}{2} + it - \frac{2\pi i z}{\log T}} has sufficiently large real part, {Z = e(z/N)}, and the von Mangoldt function {\Lambda_q(n)} is defined as {\log_q |p| = \mathrm{deg} p} when {n} is the power of an irreducible {p} and {0} otherwise. We conclude the “explicit formula”

\displaystyle  c^{\Lambda_q}_m = \sum_{j=1}^N \lambda_j^m = \mathrm{tr}(U^m) \ \ \ \ \ (6)

for {m \geq 1}, where

\displaystyle  c^{\Lambda_q}_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \Lambda_q(n) \chi(n).

Similarly on inverting {P(Z)} we have

\displaystyle  P(Z)^{-1} = \prod_{j=1}^N (1 - \lambda_j Z)^{-1}.

Since we also have

\displaystyle  P(Z)^{-1} = \sum_{n \in {\mathbb F}[X]'} \frac{\mu(n) \chi(n)}{|n|^s}

for {s} sufficiently large real part, where the Möbius function {\mu(n)} is equal to {(-1)^k} when {n} is the product of {k} distinct irreducibles, and {0} otherwise, we conclude that the Möbius coefficients

\displaystyle  c^\mu_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \mu(n) \chi(n)

are just the complete homogeneous symmetric polynomials of the eigenvalues:

\displaystyle  c^\mu_m = h_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (7)

One can then derive various algebraic relationships between the coefficients {c^1_m, c^{\Lambda_q}_m, c^\mu_m} from various identities involving symmetric polynomials, but we will not do so here.

What do we know about the distribution of {U}? By construction, it is conjugation-invariant; from (2) it is also invariant with respect to the rotations {U \rightarrow e^{i\theta} U} for any phase {\theta \in{\bf R}}. We also have the function field analogue of the Rudnick-Sarnak asymptotics:

Proposition 3 (Rudnick-Sarnak asymptotics) Let {a_1,\dots,a_k,b_1,\dots,b_k} be nonnegative integers. If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment

\displaystyle  {\bf E}_{Q} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (9)

is equal to {o(1)} in the limit {q \rightarrow \infty} (holding {N,a_1,\dots,a_k,b_1,\dots,b_k} fixed) unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j! + o(1). \ \ \ \ \ (10)

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE hypothesis (and also with the ACUE hypothesis, again by the previous post). The case {\sum_{j=1}^k a_j + \sum_{j=1}^k b_j \leq 2} of this proposition was essentially established by Andrade, Miller, Pratt, and Trinh.

Proof: We may assume the homogeneity relationship

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j \ \ \ \ \ (11)

since otherwise the claim follows from the invariance under phase rotation {U \mapsto e^{i\theta} U}. By (6), the expression (9) is equal to

\displaystyle  q^{-D} {\bf E}_Q \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'} \in {\mathbb F}[X]': |n_i| = q^{s_i}, |n'_i| = q^{s'_i}} (\prod_{i=1}^l \Lambda_q(n_i) \chi(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i) \overline{\chi(n'_i)}

where

\displaystyle  D := \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j

\displaystyle  l := \sum_{j=1}^k a_j

\displaystyle  l' := \sum_{j=1}^k b_j

and {s_1 \leq \dots \leq s_l} consists of {a_j} copies of {j} for each {j=1,\dots,k}, and similarly {s'_1 \leq \dots \leq s'_{l'}} consists of {b_j} copies of {j} for each {j=1,\dots,k}.

The polynomials {n_1 \dots n_l} and {n'_1 \dots n'_{l'}} are monic of degree {D}, which by hypothesis is less than the degree of {Q}, and thus they can only be scalar multiples of each other in {{\mathbb F}[X] / Q {\mathbb F}[X]} if they are identical (in {{\mathbb F}[X]}). As such, we see that the average

\displaystyle  {\bf E}_Q \chi(n_1) \dots \chi(n_l) \overline{\chi(n'_1)} \dots \overline{\chi(n'_{l'})}

vanishes unless {n_1 \dots n_l = n'_1 \dots n'_{l'}}, in which case this average is equal to {1}. Thus the expression (9) simplifies to

\displaystyle  q^{-D} \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}: |n_i| = q^{s_i}, |n'_i| = q^{s'_i}; n_1 \dots n_l = n'_1 \dots n'_l} (\prod_{i=1}^l \Lambda_q(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i).

There are at most {q^D} choices for the product {n_1 \dots n_l}, and each one contributes {O_D(1)} to the above sum. All but {o(q^D)} of these choices are square-free, so by accepting an error of {o(1)}, we may restrict attention to square-free {n_1 \dots n_l}. This forces {n_1,\dots,n_l,n'_1,\dots,n'_{l'}} to all be irreducible (as opposed to powers of irreducibles); as {{\mathbb F}[X]} is a unique factorisation domain, this forces {l=l'} and {n_1,\dots,n_l} to be a permutation of {n'_1,\dots,n'_{l'}}. By the size restrictions, this then forces {a_j = b_j} for all {j} (if the above expression is to be anything other than {o(1)}), and each {n_1,\dots,n_l} is associated to {\prod_{j=1}^k a_j!} possible choices of {n'_1,\dots,n'_{l'}}. Writing {\Lambda_q(n'_i) = s'_i} and then reinstating the non-squarefree possibilities for {n_1 \dots n_l}, we can thus write the above expression as

\displaystyle  q^{-D} \prod_{j=1}^k j a_j! \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}\in {\mathbb F}[X]': |n_i| = q^{s_i}} \prod_{i=1}^l \Lambda_q(n_i) + o(1).

Using the prime number theorem {\sum_{n \in {\mathbb F}[X]': |n| = q^s} \Lambda_q(n) = q^s}, we obtain the claim. \Box

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE and ACUE hypotheses:

Corollary 4 (CUE statistics at low frequencies) Let {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U}, permuted uniformly at random. Let {R(\lambda)} be a linear combination of monomials {\lambda_1^{a_1} \dots \lambda_N^{a_N}} where {a_1,\dots,a_N} are integers with either {\sum_{j=1}^N a_j \neq 0} or {\sum_{j=1}^N |a_j| \leq 2N}. Then

\displaystyle  {\bf E}_Q R(\lambda) = {\bf E}_{CUE} R(\lambda) + o(1).

The analogue of the GUE hypothesis in this setting would be the CUE hypothesis, which asserts that the threshold {2N} here can be replaced by an arbitrarily large quantity. As far as I know this is not known even for {2N+2} (though, as mentioned previously, in principle one may be able to resolve such cases using Deligne’s proof of the Riemann hypothesis for function fields). Among other things, this would allow one to distinguish CUE from ACUE, since as discussed in the previous post, these two distributions agree when tested against monomials up to threshold {2N}, though not to {2N+2}.

Proof: By permutation symmetry we can take {R} to be symmetric, and by linearity we may then take {R} to be the symmetrisation of a single monomial {\lambda_1^{a_1} \dots \lambda_N^{a_N}}. If {\sum_{j=1}^N a_j \neq 0} then both expectations vanish due to the phase rotation symmetry, so we may assume that {\sum_{j=1}^N a_j \neq 0} and {\sum_{j=1}^N |a_j| \leq 2N}. We can write this symmetric polynomial as a constant multiple of {\mathrm{tr}(U^{a_1}) \dots \mathrm{tr}(U^{a_N})} plus other monomials with a smaller value of {\sum_{j=1}^N |a_j|}. Since {\mathrm{tr}(U^{-a}) = \overline{\mathrm{tr}(U^a)}}, the claim now follows by induction from Proposition 3 and Proposition 1 from the previous post. \Box

Thus, for instance, for {k=1,2}, the {2k^{th}} moment

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = {\bf E}_Q |P(1)|^{2k} = {\bf E}_Q |L(\frac{1}{2} + it, \chi)|^{2k}

is equal to

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} + o(1)

because all the monomials in {\prod_{j=1}^N (1-\lambda_j)^k (1-\lambda_j^{-1})^k} are of the required form when {k \leq 2}. The latter expectation can be computed exactly (for any natural number {k}) using a formula

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} = \prod_{j=1}^N \frac{\Gamma(j) \Gamma(j+2k)}{\Gamma(j+k)^2}

of Baker-Forrester and Keating-Snaith, thus for instance

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^2 = N+1

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^4 = \frac{(N+1)(N+2)^2(N+3)}{12}

and more generally

\displaystyle  {\bf E}_{CUE}|\det(1-U)|^{2k} = \frac{g_k+o(1)}{(k^2)!} N^{k^2}

when {N \rightarrow \infty}, where {g_k} are the integers

\displaystyle  g_1 = 1, g_2 = 2, g_3 = 42, g_4 = 24024, \dots

and more generally

\displaystyle  g_k := \frac{(k^2)!}{\prod_{i=1}^{2k-1} i^{k-|k-i|}}

(OEIS A039622). Thus we have

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = \frac{g_k+o(1)}{k^2!} N^{k^2}

for {k=1,2} if {Q \rightarrow \infty} and {N} is sufficiently slowly growing depending on {Q}. The CUE hypothesis would imply that that this formula also holds for higher {k}. (The situation here is cleaner than in the number field case, in which the GUE hypothesis only suggests the correct lower bound for the moments rather than an asymptotic, due to the absence of the wildly fluctuating additional factor {\exp(A)} that is present in the Riemann zeta function model.)

Now we can recover the analogue of Montgomery’s work on the pair correlation conjecture. Consider the statistic

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j )

where

\displaystyle R(z) = \sum_m \hat R(m) z^m

is some finite linear combination of monomials {z^m} independent of {q}. We can expand the above sum as

\displaystyle  \sum_m \hat R(m) {\bf E}_Q \mathrm{tr}(U^m) \mathrm{tr}(U^{-m}).

Assuming the CUE hypothesis, then by Example 3 of the previous post, we would conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 \hat R(0) + \sum_m \min(|m|,N) \hat R(m) + o(1). \ \ \ \ \ (12)

This is the analogue of Montgomery’s pair correlation conjecture. Proposition 3 implies that this claim is true whenever {\hat R} is supported on {[-N,N]}. If instead we assume the ACUE hypothesis (or the weaker Alternative Hypothesis that the phase gaps are non-zero multiples of {1/2N}), one should instead have

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = \sum_{k \in {\bf Z}} N^2 \hat R(2Nk) + \sum_{1 \leq |m| \leq N} |m| \hat R(m+2Nk) + o(1)

for arbitrary {R}; this is the function field analogue of a recent result of Baluyot. In any event, since {\mathrm{tr}(U^m) \mathrm{tr}(U^{-m})} is non-negative, we unconditionally have the lower bound

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1). \ \ \ \ \ (13)

if {\hat R(m)} is non-negative for {|m| > N}.

By applying (12) for various choices of test functions {R} we can obtain various bounds on the behaviour of eigenvalues. For instance suppose we take the Fejér kernel

\displaystyle  R(z) = |1 + z + \dots + z^N|^2 = \sum_{m=-N}^N (N+1-|m|) z^m.

Then (12) applies unconditionally and we conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 (N+1) + \sum_{1 \leq |m| \leq N} (N+1-|m|) |m| + o(1).

The right-hand side evaluates to {\frac{2}{3} N(N+1)(2N+1)+o(1)}. On the other hand, {R(\lambda_i/\lambda_j)} is non-negative, and equal to {(N+1)^2} when {\lambda_i = \lambda_j}. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} 1_{\lambda_i = \lambda_j} \leq \frac{2}{3} \frac{N(2N+1)}{N+1} + o(1).

The sum {\sum_{1 \leq j \leq N} 1_{\lambda_i = \lambda_j}} is at least {1}, and is at least {2} if {\lambda_i} is not a simple eigenvalue. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i, \leq N} 1_{\lambda_i \hbox{ not simple}} \leq \frac{1}{3} \frac{N(N-1)}{N+1} + o(1),

and thus the expected number of simple eigenvalues is at least {\frac{2N}{3} \frac{N+4}{N+1} + o(1)}; in particular, at least two thirds of the eigenvalues are simple asymptotically on average. If we had (12) without any restriction on the support of {\hat R}, the same arguments allow one to show that the expected proportion of simple eigenvalues is {1-o(1)}.

Suppose that the phase gaps in {U} are all greater than {c/N} almost surely. Let {\hat R} is non-negative and {R(e^{i\theta})} non-positive for {\theta} outside of the arc {[-c/N,c/N]}. Then from (13) one has

\displaystyle  R(0) N \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1),

so by taking contrapositives one can force the existence of a gap less than {c/N} asymptotically if one can find {R} with {\hat R} non-negative, {R} non-positive for {\theta} outside of the arc {[-c/N,c/N]}, and for which one has the inequality

\displaystyle  R(0) N < N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m).

By a suitable choice of {R} (based on a minorant of Selberg) one can ensure this for {c \approx 0.6072} for {N} large; see Section 5 of these notes of Goldston. This is not the smallest value of {c} currently obtainable in the literature for the number field case (which is currently {0.50412}, due to Goldston and Turnage-Butterbaugh, by a somewhat different method), but is still significantly less than the trivial value of {1}. On the other hand, due to the compatibility of the ACUE distribution with Proposition 3, it is not possible to lower {c} below {0.5} purely through the use of Proposition 3.

In some cases it is possible to go beyond Proposition 3. Consider the mollified moment

\displaystyle  {\bf E}_Q |M(U) P(1)|^2

where

\displaystyle  M(U) = \sum_{m=0}^d a_m h_m(\lambda_1,\dots,\lambda_N)

for some coefficients {a_0,\dots,a_d}. We can compute this moment in the CUE case:

Proposition 5 We have

\displaystyle  {\bf E}_{CUE} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2.

Proof: From (5) one has

\displaystyle  P(1) = \sum_{i=0}^N (-1)^i e_i(\lambda_1,\dots,\lambda_N)

hence

\displaystyle  M(U) P(1) = \sum_{i=0}^N \sum_{m=0}^d (-1)^i a_m e_i h_m

where we suppress the dependence on the eigenvalues {\lambda}. Now observe the Pieri formula

\displaystyle  e_i h_m = s_{m 1^i} + s_{(m+1) 1^{i-1}}

where {s_{m 1^i}} are the hook Schur polynomials

\displaystyle  s_{m 1^i} = \sum_{a_1 \leq \dots \leq a_m; a_1 < b_1 < \dots < b_i} \lambda_{a_1} \dots \lambda_{a_m} \lambda_{b_1} \dots \lambda_{b_i}

and we adopt the convention that {s_{m 1^i}} vanishes for {i = -1}, or when {m = 0} and {i > 0}. Then {s_{m1^i}} also vanishes for {i\geq N}. We conclude that

\displaystyle  M(U) P(1) = a_0 s_{0 1^0} + \sum_{0 \leq i \leq N-1} \sum_{m \geq 1} (-1)^i (a_m - a_{m-1}) s_{m 1^i}.

As the Schur polynomials are orthonormal on the unitary group, the claim follows. \Box

The CUE hypothesis would then imply the corresponding mollified moment conjecture

\displaystyle  {\bf E}_{Q} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2 + o(1). \ \ \ \ \ (14)

(See this paper of Conrey, and this paper of Radziwill, for some discussion of the analogous conjecture for the zeta function, which is essentially due to Farmer.)

From Proposition 3 one sees that this conjecture holds in the range {d \leq \frac{1}{2} N}. It is likely that the function field analogue of the calculations of Conrey (based ultimately on deep exponential sum estimates of Deshouillers and Iwaniec) can extend this range to {d < \theta N} for any {\theta < \frac{4}{7}}, if {N} is sufficiently large depending on {\theta}; these bounds thus go beyond what is available from Proposition 3. On the other hand, as discussed in Remark 7 of the previous post, ACUE would also predict (14) for {d} as large as {N-2}, so the available mollified moment estimates are not strong enough to rule out ACUE. It would be interesting to see if there is some other estimate in the function field setting that can be used to exclude the ACUE hypothesis (possibly one that exploits the fact that GRH is available in the function field case?).

Read the rest of this entry »

In a recent post I discussed how the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of {P_t} should all lie on the unit circle, and one should then be able to write {P_t} as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix {U = U_t \in U(N)}, which we normalise as

\displaystyle  P_t(Z) = \exp(A_t) \mathrm{det}(1 - ZU). \ \ \ \ \ (2)

Here {A_t} is some quantity depending on {t}. We view {U} as a random element of {U(N)}; in the limit {T \rightarrow \infty}, the GUE hypothesis is equivalent to {U} becoming equidistributed with respect to Haar measure on {U(N)} (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view {U} as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

\displaystyle  -\frac{P'_t(Z)}{P_t(Z)} = \mathrm{tr}( U (1-ZU)^{-1} ) = \sum_{j=1}^\infty Z^{j-1} \mathrm{tr} U^j \ \ \ \ \ (3)

and hence on taking logarithmic derivatives of (1) in the {z} variable we (heuristically) have

\displaystyle  -\frac{2\pi i}{\log T} \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx \frac{2\pi i}{N} \sum_{j=1}^\infty e^{2\pi i jz/N} \mathrm{tr} U^j.

Morally speaking, we have

\displaystyle  - \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^{1/2+it}} e^{2\pi i z (\log n/\log T)}

so on comparing coefficients we expect to interpret the moments {\mathrm{tr} U^j} of {U} as a finite Dirichlet series:

\displaystyle  \mathrm{tr} U^j \approx \frac{N}{\log T} \sum_{T^{(j-1)/N} < n \leq T^{j/N}} \frac{\Lambda(n)}{n^{1/2+it}}. \ \ \ \ \ (4)

To understand the distribution of {U} in the unitary group {U(N)}, it suffices to understand the distribution of the moments

\displaystyle  {\bf E}_t \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (5)

where {{\bf E}_t} denotes averaging over {t \in [T,2T]}, and {k, a_1,\dots,a_k, b_1,\dots,b_k \geq 0}. The GUE hypothesis asserts that in the limit {T \rightarrow \infty}, these moments converge to their CUE counterparts

\displaystyle  {\bf E}_{\mathrm{CUE}} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (6)

where {U} is now drawn uniformly in {U(n)} with respect to the CUE ensemble, and {{\bf E}_{\mathrm{CUE}}} denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j. \ \ \ \ \ (7)

This follows from the fact that for any phase {\theta \in {\bf R}}, {e(\theta) U} has the same distribution as {U}, where we use the number theory notation {e(\theta) := e^{2\pi i\theta}}.

In the case when the degree {\sum_{j=1}^k j a_j} is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model) If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment (6) vanishes unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j!. \ \ \ \ \ (9)

Another way of viewing this proposition is that for {U} distributed according to CUE, the random variables {\mathrm{tr} U^j} are distributed like independent complex random variables of mean zero and variance {j}, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of {a_j}, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of {j}‘s and lets {N} go to infinity. (The paper of Diaconis and Shahshahani writes {\sum_{j=1}^k a_j + b_j} in place of {\sum_{j=1}^k j a_j}, but I believe this to be a typo.)

Proof: Let {D} be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

\displaystyle  D = \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j.

Our starting point is Schur-Weyl duality. Namely, we consider the {n^D}-dimensional complex vector space

\displaystyle  ({\bf C}^n)^{\otimes D} = {\bf C}^n \otimes \dots \otimes {\bf C}^n.

This space has an action of the product group {S_D \times GL_n({\bf C})}: the symmetric group {S_D} acts by permutation on the {D} tensor factors, while the general linear group {GL_n({\bf C})} acts diagonally on the {{\bf C}^n} factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

\displaystyle  ({\bf C}^n)^{\otimes D} \equiv \bigoplus_\lambda V^\lambda_{S_D} \otimes V^\lambda_{GL_n({\bf C})} \ \ \ \ \ (10)

where {\lambda} ranges over Young tableaux of size {D} with at most {n} rows, {V^\lambda_{S_D}} is the {S_D}-irreducible unitary representation corresponding to {\lambda} (which can be constructed for instance using Specht modules), and {V^\lambda_{GL_n({\bf C})}} is the {GL_n({\bf C})}-irreducible polynomial representation corresponding with highest weight {\lambda}.

Let {\pi \in S_D} be a permutation consisting of {a_j} cycles of length {j} (this is uniquely determined up to conjugation), and let {g \in GL_n({\bf C})}. The pair {(\pi,g)} then acts on {({\bf C}^n)^{\otimes D}}, with the action on basis elements {e_{i_1} \otimes \dots \otimes e_{i_D}} given by

\displaystyle  g e_{\pi(i_1)} \otimes \dots \otimes g_{\pi(i_D)}.

The trace of this action can then be computed as

\displaystyle  \sum_{i_1,\dots,i_D \in \{1,\dots,n\}} g_{\pi(i_1),i_1} \dots g_{\pi(i_D),i_D}

where {g_{i,j}} is the {ij} matrix coefficient of {g}. Breaking up into cycles and summing, this is just

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j}.

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(g) \ \ \ \ \ (11)

where {\chi_\lambda: S_D \rightarrow {\bf C}} is the character on {S_D} associated to {V^\lambda_{S_D}}, and {s_\lambda: GL_n({\bf C}) \rightarrow {\bf C}} is the character on {GL_n({\bf C})} associated to {V^\lambda_{GL_n({\bf C})}}. As is well known, {s_\lambda(g)} is just the Schur polynomial of weight {\lambda} applied to the (algebraic, generalised) eigenvalues of {g}. We can specialise to unitary matrices to conclude that

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(U)

and similarly

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{b_j} = \sum_\lambda \chi_\lambda(\pi') s_\lambda(U)

where {\pi' \in S_D} consists of {b_j} cycles of length {j} for each {j=1,\dots,k}. On the other hand, the characters {s_\lambda} are an orthonormal system on {L^2(U(N))} with the CUE measure. Thus we can write the expectation (6) as

\displaystyle  \sum_\lambda \chi_\lambda(\pi) \overline{\chi_\lambda(\pi')}. \ \ \ \ \ (12)

Now recall that {\lambda} ranges over all the Young tableaux of size {D} with at most {N} rows. But by (8) we have {D \leq N}, and so the condition of having {N} rows is redundant. Hence {\lambda} now ranges over all Young tableaux of size {D}, which as is well known enumerates all the irreducible representations of {S_D}. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if {\pi}, {\pi'} are not conjugate, and is equal to {D!} divided by the size of the conjugacy class of {\pi} (or equivalently, by the size of the centraliser of {\pi}) otherwise. But the latter expression is easily computed to be {\prod_{j=1}^k j^{a_j} a_j!}, giving the claim. \Box

Example 2 We illustrate the identity (11) when {D=3}, {n \geq 3}. The Schur polynomials are given as

\displaystyle  s_{3}(g) = \sum_i \lambda_i^3 + \sum_{i<j} \lambda_i^2 \lambda_j + \lambda_i \lambda_j^2 + \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{2,1}(g) = \sum_{i < j} \lambda_i^2 \lambda_j + \sum_{i < j,k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{1,1,1}(g) = \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

where {\lambda_1,\dots,\lambda_n} are the (generalised) eigenvalues of {g}, and the formula (11) in this case becomes

\displaystyle  \mathrm{tr}(g^3) = s_{3}(g) - s_{2,1}(g) + s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g^2) \mathrm{tr}(g) = s_{3}(g) - s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g)^3 = s_{3}(g) + 2 s_{2,1}(g) + s_{1,1,1}(g).

The functions {s_{1,1,1}, s_{2,1}, s_3} are orthonormal on {U(n)}, so the three functions {\mathrm{tr}(g^3), \mathrm{tr}(g^2) \mathrm{tr}(g), \mathrm{tr}(g)^3} are also, and their {L^2} norms are {\sqrt{3}}, {\sqrt{2}}, and {\sqrt{6}} respectively, reflecting the size in {S_3} of the centralisers of the permutations {(123)}, {(12)}, and {\mathrm{id}} respectively. If {n} is instead set to say {2}, then the {s_{1,1,1}} terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3 Consider the moment {{\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2}. For {j \leq N}, the above proposition shows us that this moment is equal to {D}. What happens for {j>N}? The formula (12) computes this moment as

\displaystyle  \sum_\lambda |\chi_\lambda(\pi)|^2

where {\pi} is a cycle of length {j} in {S_j}, and {\lambda} ranges over all Young tableaux with size {j} and at most {N} rows. The Murnaghan-Nakayama rule tells us that {\chi_\lambda(\pi)} vanishes unless {\lambda} is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space {{\bf C}^j_{\sum=0}} of vectors in {{\bf C}^j} whose coordinates sum to zero), in which case it is equal to {\pm 1} (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to {N}. Thus in general we have

\displaystyle  {\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2 = \min(j,N). \ \ \ \ \ (13)

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges {T^{(j-1)/N} < n \leq T^{j/N}} and {T^{(k-1)/N} < n \leq T^{k/N}} is not quite the range {T^{(j+k-1)/N} < n \leq T^{j+k/N}} but instead leaks into the adjacent range {T^{(j+k-2)/N} < n \leq T^{j+k-1/N}}. This issue can be addressed by working in a “weak" sense in which parameters such as {j,k} are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of {t} to the range {[T,2T]} (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J,m_1,\dots,m_K} \frac{\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}{n_1^{1/2} \dots n_J^{1/2} m_1^{1/2} \dots m_K^{1/2}} \times \ \ \ \ \ (14)

\displaystyle  \times {\bf E}_t (m_1 \dots m_K / n_1 \dots n_J)^{it}

where {J := \sum_{j=1}^k a_j}, {K := \sum_{j=1}^k b_j}, and the integers {n_i,m_i} are in the ranges

\displaystyle  T^{(j-1)/N} < n_{a_1 + \dots + a_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq a_j}, and

\displaystyle  T^{(j-1)/N} < m_{b_1 + \dots + b_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq b_j}. Morally, the expectation here is negligible unless

\displaystyle  m_1 \dots m_K = (1 + O(1/T)) n_1 \dots n_J \ \ \ \ \ (15)

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then {n_1 \dots n_J} is significantly less than {T}, so the {O(1/T)} multiplicative error in (15) becomes an additive error of {o(1)}. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least {1} – this forces the integers {m_1 \dots m_K}, {n_1 \dots n_J} to in fact be equal:

\displaystyle  m_1 \dots m_K = n_1 \dots n_J. \ \ \ \ \ (16)

The von Mangoldt factors {\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)} effectively restrict {n_1,\dots,n_J,m_1,\dots,m_K} to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces {J=K}, and {n_1,\dots,n_J} to be a permutation of {m_1,\dots,m_K}, which then forces {a_j = b_j} for all {j=1,\dots,k}._ For a given {n_1,\dots,n_J}, the number of possible {m_1 \dots m_K} is then {\prod_{j=1}^k a_j!}, and the expectation in (14) is equal to {1}. Thus this expectation is morally

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J} \frac{\Lambda^2(n_1) \dots \Lambda^2(n_J) }{n_1 \dots n_J} \prod_{j=1}^k a_j!

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree {1} case of pair correlations due to Montgomery, and the degree {2} case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for {{\mathbf E}_t}, thus

\displaystyle  {\bf E}_{t} |\mathrm{tr} U^j|^2 \approx \min(j,N) \ \ \ \ \ (17)

for all {j \geq 0}. Montgomery showed this for (essentially) the range {j \leq N} (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of {U}. One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose {Q_t(Z)} is some random polynomial depending on {t} of degree at most {N}. Let {\lambda_1,\dots,\lambda_n} denote the eigenvalues of {U}, and let {c > 0} be a parameter. Observe from the pigeonhole principle that if the quantity

\displaystyle  \sum_{j=1}^n \int_0^{c/N} |Q_t( e(\theta) \lambda_j )|^2\ d\theta \ \ \ \ \ (18)

exceeds the quantity

\displaystyle  \int_{0}^{2\pi} |Q_t(e(\theta))|^2\ d\theta, \ \ \ \ \ (19)

then the arcs {\{ e(\theta) \lambda_j: 0 \leq \theta \leq c \}} cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than {c/N} ({c} times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than {c} times the mean angle separation. By judiciously choosing the coefficients of {Q_t} as functions of the moments {\mathrm{tr}(U^j)}, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

\displaystyle  \frac{1}{2\pi i} \int_0^{c/N} (\int_{|z| = 1+\varepsilon} - \int_{|z|=1-\varepsilon}) Q_t( e(\theta) z ) \overline{Q_t}( \frac{1}{e(\theta) z} ) \frac{P'_t(z)}{P_t(z)}\ dz

for sufficiently small {\varepsilon>0}, and this can be computed (in principle, at least) using (3) if the coefficients of {Q_t} are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than {\mu} times the mean spacing and greater than {\lambda} times the mean spacing infinitely often for certain {0 < \mu < 1 < \lambda}; the current records are {\mu = 0.50412} (due to Goldston and Turnage-Butterbaugh) and {\lambda = 3.18} (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound {\mu} for the smallest gap below {1/2}. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases {\theta} of the eigenvalues {e^{2\pi i\theta}} of {U} are nasymptotically always non-zero integer multiples of {1/2N}. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of {T}); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which {L(1,\chi)} is very close to zero, then {1*\chi} behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined {L}-function {\zeta(s) L(s,\chi)} will have a polynomial approximation which in our language looks like a scalar multiple of {1 + e(\theta) Z^{2N+M}}, where {q \approx T^{M/N}} and {\theta} is a phase. The zeroes of this approximation lie on a coset of the {(2N+M)^{th}} roots of unity; the polynomial {P} is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of {1/(2N+M)}. Taking {M = o(N)} then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation {{\bf E}_{ACUE}} for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of {1/2N}. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group {U(n)} in terms of the distribution of the phases {\theta_1,\dots,\theta_N \in {\bf R}/{\bf Z}} of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

\displaystyle  \frac{1}{N!} |V(\theta)|^2\ d\theta_1 \dots d\theta_N; \ \ \ \ \ (20)

where

\displaystyle  V(\theta) := \prod_{1 \leq i<j \leq N} (e(\theta_i)-e(\theta_j))

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

\displaystyle  V(\theta) = \sum_{\pi \in S_N} \mathrm{sgn}(\pi) e(\theta \cdot \pi(\rho))

where {\theta := (\theta_1,\dots,\theta_N)}, {\cdot} denotes the dot product, and {\rho := (1,2,\dots,N)} is the “long word”, which implies that (20) is a trigonometric series with constant term {1}; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing {(\theta_1,\dots,\theta_n) \in ({\bf R}/{\bf Z})^N} using the probability measure (20), and then generating {U} to be a random unitary matrix with eigenvalues {e(\theta_1),\dots,e(\theta_N)}.

For the alternative distribution, we first draw {(\theta_1,\dots,\theta_N)} on the discrete torus {(\frac{1}{2N}{\bf Z}/{\bf Z})^N} (thus each {\theta_j} is a {2N^{th}} root of unity) with probability density function

\displaystyle  \frac{1}{(2N)^N} \frac{1}{N!} |V(\theta)|^2 \ \ \ \ \ (21)

shift by a phase {\alpha \in {\bf R}/{\bf Z}} drawn uniformly at random, and then select {U} to be a random unitary matrix with eigenvalues {e^{i(\theta_1+\alpha)}, \dots, e^{i(\theta_N+\alpha)}}. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. The diagonal contribution {\pi=\pi'} gives the constant function {\frac{1}{(2N)^N}}, which has total mass one. All of the other exponentials have a frequency {\pi(\rho)-\pi'(\rho)} that is not a multiple of {2N}, and hence will have mean zero on {(\frac{1}{2N}{\bf Z}/{\bf Z})^N}. The claim follows.

From construction it is clear that the matrix {U} drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of {1/2N}. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials {s_\lambda} with {\lambda} of size at most {N} and of equal size remain orthonormal with respect to the alternative measure. That is to say,

\displaystyle  \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{CUE}}(U) = \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{ACUE}}(U)

when {\lambda,\lambda'} have size equal to each other and at most {N}. In this case the phase {\alpha} in the definition of {U} is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

\displaystyle  \int_{({\bf R}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2\ d\theta = \frac{1}{(2N)^N} \sum_{\theta \in (\frac{1}{2N}{\bf Z}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2.

By Fourier decomposition, it then suffices to show that the trigonometric polynomial {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} does not contain any components of the form {e( \theta \cdot 2N k)} for some non-zero lattice vector {k \in {\bf Z}^N}. But we have already observed that {|V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. Also, as is well known, {s_\lambda(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa )} where {\kappa} is majorised by {\lambda}, and similarly {s_{\lambda'}(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa' )} where {\kappa'} is majorised by {\lambda'}. So the product {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\kappa - \kappa' + \pi(\rho) - \pi'(\rho)))}. But every coefficient of the vector {\kappa - \kappa' + \pi(\rho) - \pi'(\rho)} lies between {1-2N} and {2N-1}, and so cannot be of the form {2Nk} for any non-zero lattice vector {k}, giving the claim.

Example 4 If {N=2}, then the distribution (21) assigns a probability of {\frac{1}{4^2 2!} 2} to any pair {(\theta_1,\theta_2) \in (\frac{1}{4} {\bf Z}/{\bf Z})^2} that is a permuted rotation of {(0,\frac{1}{4})}, and a probability of {\frac{1}{4^2 2!} 4} to any pair that is a permuted rotation of {(0,\frac{1}{2})}. Thus, a matrix {U} drawn from the alternative distribution will be conjugate to a phase rotation of {\mathrm{diag}(1, i)} with probability {1/2}, and to {\mathrm{diag}(1,-1)} with probability {1/2}.

A similar computation when {N=3} gives {U} conjugate to a phase rotation of {\mathrm{diag}(1, e(1/6), e(1/3))} with probability {1/12}, to a phase rotation of {\mathrm{diag}( 1, e(1/6), -1)} or its adjoint with probability of {1/3} each, and a phase rotation of {\mathrm{diag}(1, e(1/3), e(2/3))} with probability {1/4}.

Remark 5 For large {N} it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of {1/2N}; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then {|\mathrm{tr}(U^j)|} is periodic in {j} with period {2N}, so from Proposition 1 for the alternative distribution one has

\displaystyle  {\bf E}_{\mathrm{ACUE}} |\mathrm{tr} U^j|^2 = \min_{k \in {\bf Z}} |j-2Nk|

which differs from (13) for any {|j| > N}. (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single {j} with {|j| > N} would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6 One can view the CUE as normalised Lebesgue measure on {U(N)} (viewed as a smooth submanifold of {{\bf C}^{N^2}}). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of {U(N)} consisting of those unitary matrices whose phase spacings are non-zero integer multiples of {1/2N}; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel {K(\theta,\theta') = \frac{1}{N} \sum_{j=0}^{N-1} e(j(\theta - \theta'))} (or one can equivalently take {K(\theta,\theta') = \frac{\sin(\pi N (\theta-\theta'))}{N\sin(\pi(\theta-\theta'))}}; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be {2N^{th}} roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of {\frac{1}{2}}). In particular, the {k}-point correlation functions of ACUE (after this rotation) are precisely the restriction of the {k}-point correlation functions of CUE after normalisation, that is to say they are proportional to {\mathrm{det}( K( \theta_i,\theta_j) )_{1 \leq i,j \leq k}}.

Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for

\displaystyle  \int_T^{2T} |\zeta(\frac{1}{2}+it)|^{2k} |Q(\frac{1}{2}+it)|^2\ dt

for some small even exponent {2k} (almost always {2} or {4}) and some short Dirichlet polynomial {Q}; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

\displaystyle  {\bf E}_t |P_t(1)|^{2k} |Q_t(1)|^2

where {Q_t} is now some random medium degree polynomial that depends on the unitary matrix {U} associated to {P_t} (and in applications will typically also contain some negative power of {\exp(A_t)} to cancel the corresponding powers of {\exp(A_t)} in {|P_t(1)|^{2k}}). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of {Q} involve products of traces {\mathrm{tr}(U^k)} of total order less than {N-k}, then in terms of the eigenvalue phases {\theta}, {|Q(1)|^2} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude less than {N-k}. On the other hand, as each coefficient of {P_t} is an elementary symmetric function of the eigenvalues, {P_t(1)} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude at most {1}. Thus {|P_t(1)|^{2k} |Q_t(1)|^2} is a linear combination of plane waves where the frequencies {\xi} have coefficients of magnitude less than {N}, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus {({\bf R}/{\bf Z})^n} by the previous arguments. In other words, {|P_t(1)|^{2k} |Q_t(1)|^2} has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier {Q_t} has degree close to or exceeding {N}, which corresponds to Dirichlet polynomials {Q} of length close to or exceeding {T}, which is far beyond current technology for such moment estimates.

Remark 8 The GUE hypothesis for the zeta function asserts that the average

\displaystyle  \lim_{T \rightarrow \infty} \frac{1}{T} \int_T^{2T} \sum_{\gamma_1,\dots,\gamma_n \hbox{ distinct}} \eta( \frac{\log T}{2\pi}(\gamma_1-t),\dots, \frac{\log T}{2\pi}(\gamma_k-t))\ dt \ \ \ \ \ (22)

is equal to

\displaystyle  \int_{{\bf R}^n} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ dx_1 \dots dx_k \ \ \ \ \ (23)

for any {k \geq 1} and any test function {\eta: {\bf R}^k \rightarrow {\bf C}}, where {K(x) := \frac{\sin \pi x}{\pi x}} is the Dyson sine kernel and {\gamma_i} are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for {U}. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

\displaystyle  \int_0^1 2^{-k} \sum_{x_1,\dots,x_k \in \frac{1}{2} {\bf Z} + \theta} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ d\theta.

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

Archives