You are currently browsing the category archive for the ‘expository’ category.

Let {\Omega} be some domain (such as the real numbers). For any natural number {p}, let {L(\Omega^p)_{sym}} denote the space of symmetric real-valued functions {F^{(p)}: \Omega^p \rightarrow {\bf R}} on {p} variables {x_1,\dots,x_p \in \Omega}, thus

\displaystyle  F^{(p)}(x_{\sigma(1)},\dots,x_{\sigma(p)}) = F^{(p)}(x_1,\dots,x_p)

for any permutation {\sigma: \{1,\dots,p\} \rightarrow \{1,\dots,p\}}. For instance, for any natural numbers {k,p}, the elementary symmetric polynomials

\displaystyle  e_k^{(p)}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} x_{i_1} \dots x_{i_k}

will be an element of {L({\bf R}^p)_{sym}}. With the pointwise product operation, {L(\Omega^p)_{sym}} becomes a commutative real algebra. We include the case {p=0}, in which case {L(\Omega^0)_{sym}} consists solely of the real constants.

Given two natural numbers {k,p}, one can “lift” a symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} of {k} variables to a symmetric function {[F^{(k)}]_{k \rightarrow p} \in L(\Omega^p)_{sym}} of {p} variables by the formula

\displaystyle  [F^{(k)}]_{k \rightarrow p}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} F^{(k)}(x_{i_1}, \dots, x_{i_k})

\displaystyle  = \frac{1}{k!} \sum_\pi F^{(k)}( x_{\pi(1)}, \dots, x_{\pi(k)} )

where {\pi} ranges over all injections from {\{1,\dots,k\}} to {\{1,\dots,p\}} (the latter formula making it clearer that {[F^{(k)}]_{k \rightarrow p}} is symmetric). Thus for instance

\displaystyle  [F^{(1)}(x_1)]_{1 \rightarrow p} = \sum_{i=1}^p F^{(1)}(x_i)

\displaystyle  [F^{(2)}(x_1,x_2)]_{2 \rightarrow p} = \sum_{1 \leq i < j \leq p} F^{(2)}(x_i,x_j)

and

\displaystyle  e_k^{(p)}(x_1,\dots,x_p) = [x_1 \dots x_k]_{k \rightarrow p}.

Also we have

\displaystyle  [1]_{k \rightarrow p} = \binom{p}{k} = \frac{p(p-1)\dots(p-k+1)}{k!}.

With these conventions, we see that {[F^{(k)}]_{k \rightarrow p}} vanishes for {p=0,\dots,k-1}, and is equal to {F} if {k=p}. We also have the transitivity

\displaystyle  [F^{(k)}]_{k \rightarrow p} = \frac{1}{\binom{p-k}{p-l}} [[F^{(k)}]_{k \rightarrow l}]_{l \rightarrow p}

if {k \leq l \leq p}.

The lifting map {[]_{k \rightarrow p}} is a linear map from {L(\Omega^k)_{sym}} to {L(\Omega^p)_{sym}}, but it is not a ring homomorphism. For instance, when {\Omega={\bf R}}, one has

\displaystyle  [x_1]_{1 \rightarrow p} [x_1]_{1 \rightarrow p} = (\sum_{i=1}^p x_i)^2 \ \ \ \ \ (1)

\displaystyle  = \sum_{i=1}^p x_i^2 + 2 \sum_{1 \leq i < j \leq p} x_i x_j

\displaystyle  = [x_1^2]_{1 \rightarrow p} + 2 [x_1 x_2]_{1 \rightarrow p}

\displaystyle  \neq [x_1^2]_{1 \rightarrow p}.

In general, one has the identity

\displaystyle  [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow p} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow p} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (2)

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow p}

for all natural numbers {k,l,p} and {F^{(k)} \in L(\Omega^k)_{sym}}, {G^{(l)} \in L(\Omega^l)_{sym}}, where {\pi, \rho} range over all injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}} with {\pi(\{1,\dots,k\}) \cup \rho(\{1,\dots,l\}) = \{1,\dots,m\}}. Combinatorially, the identity (2) follows from the fact that given any injections {\tilde \pi: \{1,\dots,k\} \rightarrow \{1,\dots,p\}} and {\tilde \rho: \{1,\dots,l\} \rightarrow \{1,\dots,p\}} with total image {\tilde \pi(\{1,\dots,k\}) \cup \tilde \rho(\{1,\dots,l\})} of cardinality {m}, one has {k,l \leq m \leq k+l}, and furthermore there exist precisely {m!} triples {(\pi, \rho, \sigma)} of injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}}, {\sigma: \{1,\dots,m\} \rightarrow \{1,\dots,p\}} such that {\tilde \pi = \sigma \circ \pi} and {\tilde \rho = \sigma \circ \rho}.

Example 1 When {\Omega = {\bf R}}, one has

\displaystyle  [x_1 x_2]_{2 \rightarrow p} [x_1]_{1 \rightarrow p} = [\frac{1}{2! 1!}( 2 x_1^2 x_2 + 2 x_1 x_2^2 )]_{2 \rightarrow p} + [\frac{1}{2! 1!} 6 x_1 x_2 x_3]_{3 \rightarrow p}

\displaystyle  = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow p} + [3x_1 x_2 x_3]_{3 \rightarrow p}

which is just a restatement of the identity

\displaystyle  (\sum_{i < j} x_i x_j) (\sum_k x_k) = \sum_{i<j} x_i^2 x_j + x_i x_j^2 + \sum_{i < j < k} 3 x_i x_j x_k.

Note that the coefficients appearing in (2) do not depend on the final number of variables {p}. We may therefore abstract the role of {p} from the law (2) by introducing the real algebra {L(\Omega^*)_{sym}} of formal sums

\displaystyle  F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

where for each {k}, {F^{(k)}} is an element of {L(\Omega^k)_{sym}} (with only finitely many of the {F^{(k)}} being non-zero), and with the formal symbol {[]_{k \rightarrow *}} being formally linear, thus

\displaystyle  [F^{(k)}]_{k \rightarrow *} + [G^{(k)}]_{k \rightarrow *} := [F^{(k)} + G^{(k)}]_{k \rightarrow *}

and

\displaystyle  c [F^{(k)}]_{k \rightarrow *} := [cF^{(k)}]_{k \rightarrow *}

for {F^{(k)}, G^{(k)} \in L(\Omega^k)_{sym}} and scalars {c \in {\bf R}}, and with multiplication given by the analogue

\displaystyle  [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow *} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow *} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (3)

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow *}

of (2). Thus for instance, in this algebra {L(\Omega^*)_{sym}} we have

\displaystyle  [x_1]_{1 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2]_{1 \rightarrow *} + 2 [x_1 x_2]_{2 \rightarrow *}

and

\displaystyle  [x_1 x_2]_{2 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow *} + [3 x_1 x_2 x_3]_{3 \rightarrow *}.

Informally, {L(\Omega^*)_{sym}} is an abstraction (or “inverse limit”) of the concept of a symmetric function of an unspecified number of variables, which are formed by summing terms that each involve only a bounded number of these variables at a time. One can check (somewhat tediously) that {L(\Omega^*)_{sym}} is indeed a commutative real algebra, with a unit {[1]_{0 \rightarrow *}}. (I do not know if this algebra has previously been studied in the literature; it is somewhat analogous to the abstract algebra of finite linear combinations of Schur polynomials, with multiplication given by a Littlewood-Richardson rule. )

For natural numbers {p}, there is an obvious specialisation map {[]_{* \rightarrow p}} from {L(\Omega^*)_{sym}} to {L(\Omega^p)_{sym}}, defined by the formula

\displaystyle  [\sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}]_{* \rightarrow p} := \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}.

Thus, for instance, {[]_{* \rightarrow p}} maps {[x_1]_{1 \rightarrow *}} to {[x_1]_{1 \rightarrow p}} and {[x_1 x_2]_{2 \rightarrow *}} to {[x_1 x_2]_{2 \rightarrow p}}. From (2) and (3) we see that this map {[]_{* \rightarrow p}: L(\Omega^*)_{sym} \rightarrow L(\Omega^p)_{sym}} is an algebra homomorphism, even though the maps {[]_{k \rightarrow *}: L(\Omega^k)_{sym} \rightarrow L(\Omega^*)_{sym}} and {[]_{k \rightarrow p}: L(\Omega^k)_{sym} \rightarrow L(\Omega^p)_{sym}} are not homomorphisms. By inspecting the {p^{th}} component of {L(\Omega^*)_{sym}} we see that the homomorphism {[]_{* \rightarrow p}} is in fact surjective.

Now suppose that we have a measure {\mu} on the space {\Omega}, which then induces a product measure {\mu^p} on every product space {\Omega^p}. To avoid degeneracies we will assume that the integral {\int_\Omega \mu} is strictly positive. Assuming suitable measurability and integrability hypotheses, a function {F \in L(\Omega^p)_{sym}} can then be integrated against this product measure to produce a number

\displaystyle  \int_{\Omega^p} F\ d\mu^p.

In the event that {F} arises as a lift {[F^{(k)}]_{k \rightarrow p}} of another function {F^{(k)} \in L(\Omega^k)_{sym}}, then from Fubini’s theorem we obtain the formula

\displaystyle  \int_{\Omega^p} F\ d\mu^p = \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}.

Thus for instance, if {\Omega={\bf R}},

\displaystyle  \int_{{\bf R}^p} [x_1]_{1 \rightarrow p}\ d\mu^p = p (\int_{\bf R} x\ d\mu(x)) (\int_{\bf R} \mu)^{p-1} \ \ \ \ \ (4)

and

\displaystyle  \int_{{\bf R}^p} [x_2]_{1 \rightarrow p}\ d\mu^p = \binom{p}{2} (\int_{{\bf R}^2} x_1 x_2\ d\mu(x_1) d\mu(x_2)) (\int_{\bf R} \mu)^{p-2}. \ \ \ \ \ (5)

On summing, we see that if

\displaystyle  F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

is an element of the formal algebra {L(\Omega^*)_{sym}}, then

\displaystyle  \int_{\Omega^p} [F^{(*)}]_{* \rightarrow p}\ d\mu^p = \sum_{k=0}^\infty \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}. \ \ \ \ \ (6)

Note that by hypothesis, only finitely many terms on the right-hand side are non-zero.

Now for a key observation: whereas the left-hand side of (6) only makes sense when {p} is a natural number, the right-hand side is meaningful when {p} takes a fractional value (or even when it takes negative or complex values!), interpreting the binomial coefficient {\binom{p}{k}} as a polynomial {\frac{p(p-1) \dots (p-k+1)}{k!}} in {p}. As such, this suggests a way to introduce a “virtual” concept of a symmetric function on a fractional power space {\Omega^p} for such values of {p}, and even to integrate such functions against product measures {\mu^p}, even if the fractional power {\Omega^p} does not exist in the usual set-theoretic sense (and {\mu^p} similarly does not exist in the usual measure-theoretic sense). More precisely, for arbitrary real or complex {p}, we now define {L(\Omega^p)_{sym}} to be the space of abstract objects

\displaystyle  F^{(p)} = [F^{(*)}]_{* \rightarrow p} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}

with {F^{(*)} \in L(\Omega^*)_{sym}} and {[]_{* \rightarrow p}} (and {[]_{k \rightarrow p}} now interpreted as formal symbols, with the structure of a commutative real algebra inherited from {L(\Omega^*)_{sym}}, thus

\displaystyle  [F^{(*)}]_{* \rightarrow p} + [G^{(*)}]_{* \rightarrow p} := [F^{(*)} + G^{(*)}]_{* \rightarrow p}

\displaystyle  c [F^{(*)}]_{* \rightarrow p} := [c F^{(*)}]_{* \rightarrow p}

\displaystyle  [F^{(*)}]_{* \rightarrow p} [G^{(*)}]_{* \rightarrow p} := [F^{(*)} G^{(*)}]_{* \rightarrow p}.

In particular, the multiplication law (2) continues to hold for such values of {p}, thanks to (3). Given any measure {\mu} on {\Omega}, we formally define a measure {\mu^p} on {\Omega^p} with regards to which we can integrate elements {F^{(p)}} of {L(\Omega^p)_{sym}} by the formula (6) (providing one has sufficient measurability and integrability to make sense of this formula), thus providing a sort of “fractional dimensional integral” for symmetric functions. Thus, for instance, with this formalism the identities (4), (5) now hold for fractional values of {p}, even though the formal space {{\bf R}^p} no longer makes sense as a set, and the formal measure {\mu^p} no longer makes sense as a measure. (The formalism here is somewhat reminiscent of the technique of dimensional regularisation employed in the physical literature in order to assign values to otherwise divergent integrals. See also this post for an unrelated abstraction of the integration concept involving integration over supercommutative variables (and in particular over fermionic variables).)

Example 2 Suppose {\mu} is a probability measure on {\Omega}, and {X: \Omega \rightarrow {\bf R}} is a random variable; on any power {\Omega^k}, we let {X_1,\dots,X_k: \Omega^k \rightarrow {\bf R}} be the usual independent copies of {X} on {\Omega^k}, thus {X_j(\omega_1,\dots,\omega_k) := X(\omega_j)} for {(\omega_1,\dots,\omega_k) \in \Omega^k}. Then for any real or complex {p}, the formal integral

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p

can be evaluated by first using the identity

\displaystyle  [X_1]_{1 \rightarrow p}^2 = [X_1^2]_{1 \rightarrow p} + 2[X_1 X_2]_{2 \rightarrow p}

(cf. (1)) and then using (6) and the probability measure hypothesis {\int_\Omega\ d\mu = 1} to conclude that

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = \binom{p}{1} \int_{\Omega} X^2\ d\mu + 2 \binom{p}{2} \int_{\Omega^2} X_1 X_2\ d\mu^2

\displaystyle  = p (\int_\Omega X^2\ d\mu - (\int_\Omega X\ d\mu)^2) + p^2 (\int_\Omega X\ d\mu)^2

or in probabilistic notation

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2. \ \ \ \ \ (7)

For {p} a natural number, this identity has the probabilistic interpretation

\displaystyle  \mathbf{E}( X_1 + \dots + X_p)^2 = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2 \ \ \ \ \ (8)

whenever {X_1,\dots,X_p} are jointly independent copies of {X}, which reflects the well known fact that the sum {X_1 + \dots + X_p} has expectation {p \mathbf{E} X} and variance {p \mathbf{Var}(X)}. One can thus view (7) as an abstract generalisation of (8) to the case when {p} is fractional, negative, or even complex, despite the fact that there is no sensible way in this case to talk about {p} independent copies {X_1,\dots,X_p} of {X} in the standard framework of probability theory.

In this particular case, the quantity (7) is non-negative for every nonnegative {p}, which looks plausible given the form of the left-hand side. Unfortunately, this sort of non-negativity does not always hold; for instance, if {X} has mean zero, one can check that

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^4\ d\mu^p = p \mathbf{Var}(X^2) + p(3p-2) (\mathbf{E}(X^2))^2

and the right-hand side can become negative for {p < 2/3}. This is a shame, because otherwise one could hope to start endowing {L(X^p)_{sym}} with some sort of commutative von Neumann algebra type structure (or the abstract probability structure discussed in this previous post) and then interpret it as a genuine measure space rather than as a virtual one. (This failure of positivity is related to the fact that the characteristic function of a random variable, when raised to the {p^{th}} power, need not be a characteristic function of any random variable once {p} is no longer a natural number: “fractional convolution” does not preserve positivity!) However, one vestige of positivity remains: if {F: \Omega \rightarrow {\bf R}} is non-negative, then so is

\displaystyle  \int_{\Omega^p} [F]_{1 \rightarrow p}\ d\mu^p = p (\int_\Omega F\ d\mu) (\int_\Omega\ d\mu)^{p-1}.

One can wonder what the point is to all of this abstract formalism and how it relates to the rest of mathematics. For me, this formalism originated implicitly in an old paper I wrote with Jon Bennett and Tony Carbery on the multilinear restriction and Kakeya conjectures, though we did not have a good language for working with it at the time, instead working first with the case of natural number exponents {p} and appealing to a general extrapolation theorem to then obtain various identities in the fractional {p} case. The connection between these fractional dimensional integrals and more traditional integrals ultimately arises from the simple identity

\displaystyle  (\int_\Omega\ d\mu)^p = \int_{\Omega^p}\ d\mu^p

(where the right-hand side should be viewed as the fractional dimensional integral of the unit {[1]_{0 \rightarrow p}} against {\mu^p}). As such, one can manipulate {p^{th}} powers of ordinary integrals using the machinery of fractional dimensional integrals. A key lemma in this regard is

Lemma 3 (Differentiation formula) Suppose that a positive measure {\mu = \mu(t)} on {\Omega} depends on some parameter {t} and varies by the formula

\displaystyle  \frac{d}{dt} \mu(t) = a(t) \mu(t) \ \ \ \ \ (9)

for some function {a(t): \Omega \rightarrow {\bf R}}. Let {p} be any real or complex number. Then, assuming sufficient smoothness and integrability of all quantities involved, we have

\displaystyle  \frac{d}{dt} \int_{\Omega^p} F^{(p)}\ d\mu(t)^p = \int_{\Omega^p} F^{(p)} [a(t)]_{1 \rightarrow p}\ d\mu(t)^p \ \ \ \ \ (10)

for all {F^{(p)} \in L(\Omega^p)_{sym}} that are independent of {t}. If we allow {F^{(p)}(t)} to now depend on {t} also, then we have the more general total derivative formula

\displaystyle  \frac{d}{dt} \int_{\Omega^p} F^{(p)}(t)\ d\mu(t)^p \ \ \ \ \ (11)

\displaystyle  = \int_{\Omega^p} \frac{d}{dt} F^{(p)}(t) + F^{(p)}(t) [a(t)]_{1 \rightarrow p}\ d\mu(t)^p,

again assuming sufficient amounts of smoothness and regularity.

Proof: We just prove (10), as (11) then follows by same argument used to prove the usual product rule. By linearity it suffices to verify this identity in the case {F^{(p)} = [F^{(k)}]_{k \rightarrow p}} for some symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} for a natural number {k}. By (6), the left-hand side of (10) is then

\displaystyle  \frac{d}{dt} [\binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu(t)^k) (\int_\Omega\ d\mu(t))^{p-k}]. \ \ \ \ \ (12)

Differentiating under the integral sign using (9) we have

\displaystyle  \frac{d}{dt} \int_\Omega\ d\mu(t) = \int_\Omega\ a(t)\ d\mu(t)

and similarly

\displaystyle  \frac{d}{dt} \int_{\Omega^k} F^{(k)}\ d\mu(t)^k = \int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu(t)^k

where {a_1,\dots,a_k} are the standard {k} copies of {a = a(t)} on {\Omega^k}:

\displaystyle  a_j(\omega_1,\dots,\omega_k) := a(\omega_j).

By the product rule, we can thus expand (12) as

\displaystyle  \binom{p}{k} (\int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu^k ) (\int_\Omega\ d\mu)^{p-k}

\displaystyle  + \binom{p}{k} (p-k) (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ a\ d\mu) (\int_\Omega\ d\mu)^{p-k-1}

where we have suppressed the dependence on {t} for brevity. Since {\binom{p}{k} (p-k) = \binom{p}{k+1} (k+1)}, we can write this expression using (6) as

\displaystyle  \int_{\Omega^p} [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p}\ d\mu^p

where {F^{(k)} \ast a \in L(\Omega^{k+1})_{sym}} is the symmetric function

\displaystyle F^{(k)} \ast a(\omega_1,\dots,\omega_{k+1}) := \sum_{j=1}^{k+1} F^{(k)}(\omega_1,\dots,\omega_{j-1},\omega_{j+1} \dots \omega_{k+1}) a(\omega_j).

But from (2) one has

\displaystyle  [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p} = [F^{(k)}]_{k \rightarrow p} [a]_{1 \rightarrow p}

and the claim follows. \Box

Remark 4 It is also instructive to prove this lemma in the special case when {p} is a natural number, in which case the fractional dimensional integral {\int_{\Omega^p} F^{(p)}\ d\mu(t)^p} can be interpreted as a classical integral. In this case, the identity (10) is immediate from applying the product rule to (9) to conclude that

\displaystyle  \frac{d}{dt} d\mu(t)^p = [a(t)]_{1 \rightarrow p} d\mu(t)^p.

One could in fact derive (10) for arbitrary real or complex {p} from the case when {p} is a natural number by an extrapolation argument; see the appendix of my paper with Bennett and Carbery for details.

Let us give a simple PDE application of this lemma as illustration:

Proposition 5 (Heat flow monotonicity) Let {u: [0,+\infty) \times {\bf R}^d \rightarrow {\bf R}} be a solution to the heat equation {u_t = \Delta u} with initial data {\mu_0} a rapidly decreasing finite non-negative Radon measure, or more explicitly

\displaystyle  u(t,x) = \frac{1}{(4\pi t)^{d/2}} \int_{{\bf R}^d} e^{-|x-y|^2/4t}\ d\mu_0(y)

for al {t>0}. Then for any {p>0}, the quantity

\displaystyle  Q_p(t) := t^{\frac{d}{2} (p-1)} \int_{{\bf R}^d} u(t,x)^p\ dx

is monotone non-decreasing in {t \in (0,+\infty)} for {1 < p < \infty}, constant for {p=1}, and monotone non-increasing for {0 < p < 1}.

Proof: By a limiting argument we may assume that {d\mu_0} is absolutely continuous, with Radon-Nikodym derivative a test function; this is more than enough regularity to justify the arguments below.

For any {(t,x) \in (0,+\infty) \times {\bf R}^d}, let {\mu(t,x)} denote the Radon measure

\displaystyle  d\mu(t,x)(y) := \frac{1}{(4\pi)^{d/2}} e^{-|x-y|^2/4t}\ d\mu_0(y).

Then the quantity {Q_p(t)} can be written as a fractional dimensional integral

\displaystyle  Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx.

Observe that

\displaystyle  \frac{\partial}{\partial t} d\mu(t,x) = \frac{|x-y|^2}{4t^2} d\mu(t,x)

and thus by Lemma 3 and the product rule

\displaystyle  \frac{d}{dt} Q_p(t) = -\frac{d}{2t} Q_p(t) + t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [\frac{|x-y|^2}{4t^2}]_{1 \rightarrow p} d\mu(t,x)^p\ dx \ \ \ \ \ (13)

where we use {y} for the variable of integration in the factor space {{\bf R}^d} of {({\bf R}^d)^p}.

To simplify this expression we will take advantage of integration by parts in the {x} variable. Specifically, in any direction {x_j}, we have

\displaystyle  \frac{\partial}{\partial x_j} d\mu(t,x) = -\frac{x_j-y_j}{2t} d\mu(t,x)

and hence by Lemma 3

\displaystyle  \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx = - \int_{({\bf R}^d)^p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

Multiplying by {x_j} and integrating by parts, we see that

\displaystyle  d Q_p(t) = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle  = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

where we use the Einstein summation convention in {j}. Similarly, if {F_j(y)} is any reasonable function depending only on {y}, we have

\displaystyle  \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}[F_j(y)]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = - \int_{({\bf R}^d)^p} [F_j(y)]_{1 \rightarrow p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

and hence on integration by parts

\displaystyle  0 = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [F_j(y) \frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

We conclude that

\displaystyle  \frac{d}{2t} Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} (x_j - [F_j(y)]_{1 \rightarrow p}) [\frac{(x_j-y_j)}{4t}]_{1 \rightarrow p} d\mu(t,x)^p\ dx

and thus by (13)

\displaystyle  \frac{d}{dt} Q_p(t) = \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}

\displaystyle [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p} - (x_j - [F_j(y)]_{1 \rightarrow p}) [x_j - y_j]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

The choice of {F_j} that then achieves the most cancellation turns out to be {F_j(y) = \frac{1}{p} y_j} (this cancels the terms that are linear or quadratic in the {x_j}), so that {x_j - [F_j(y)]_{1 \rightarrow p} = \frac{1}{p} [x_j - y_j]_{1 \rightarrow p}}. Repeating the calculations establishing (7), one has

\displaystyle  \int_{({\bf R}^d)^p} [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p}\ d\mu^p = p \mathop{\bf E} |x-Y|^2 (\int_{{\bf R}^d}\ d\mu)^{p}

and

\displaystyle  \int_{({\bf R}^d)^p} [x_j-y_j]_{1 \rightarrow p} [x_j-y_j]_{1 \rightarrow p}\ d\mu^p

\displaystyle = (p \mathbf{Var}(x-Y) + p^2 |\mathop{\bf E} x-Y|^2) (\int_{{\bf R}^d}\ d\mu)^{p}

where {Y} is the random variable drawn from {{\bf R}^d} with the normalised probability measure {\mu / \int_{{\bf R}^d}\ d\mu}. Since {\mathop{\bf E} |x-Y|^2 = \mathbf{Var}(x-Y) + |\mathop{\bf E} x-Y|^2}, one thus has

\displaystyle  \frac{d}{dt} Q_p(t) = (p-1) \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \mathbf{Var}(x-Y) (\int_{{\bf R}^d}\ d\mu)^{p}\ dx. \ \ \ \ \ (14)

This expression is clearly non-negative for {p>1}, equal to zero for {p=1}, and positive for {0 < p < 1}, giving the claim. (One could simplify {\mathbf{Var}(x-Y)} here as {\mathbf{Var}(Y)} if desired, though it is not strictly necessary to do so for the proof.) \Box

Remark 6 As with Remark 4, one can also establish the identity (14) first for natural numbers {p} by direct computation avoiding the theory of fractional dimensional integrals, and then extrapolate to the case of more general values of {p}. This particular identity is also simple enough that it can be directly established by integration by parts without much difficulty, even for fractional values of {p}.

A more complicated version of this argument establishes the non-endpoint multilinear Kakeya inequality (without any logarithmic loss in a scale parameter {R}); this was established in my previous paper with Jon Bennett and Tony Carbery, but using the “natural number {p} first” approach rather than using the current formalism of fractional dimensional integration. However, the arguments can be translated into this formalism without much difficulty; we do so below the fold. (To simplify the exposition slightly we will not address issues of establishing enough regularity and integrability to justify all the manipulations, though in practice this can be done by standard limiting arguments.)

Read the rest of this entry »

The following situation is very common in modern harmonic analysis: one has a large scale parameter {N} (sometimes written as {N=1/\delta} in the literature for some small scale parameter {\delta}, or as {N=R} for some large radius {R}), which ranges over some unbounded subset of {[1,+\infty)} (e.g. all sufficiently large real numbers {N}, or all powers of two), and one has some positive quantity {D(N)} depending on {N} that is known to be of polynomial size in the sense that

\displaystyle  C^{-1} N^{-C} \leq D(N) \leq C N^C \ \ \ \ \ (1)

for all {N} in the range and some constant {C>0}, and one wishes to obtain a subpolynomial upper bound for {D(N)}, by which we mean an upper bound of the form

\displaystyle  D(N) \leq C_\varepsilon N^\varepsilon \ \ \ \ \ (2)

for all {\varepsilon>0} and all {N} in the range, where {C_\varepsilon>0} can depend on {\varepsilon} but is independent of {N}. In many applications, this bound is nearly tight in the sense that one can easily establish a matching lower bound

\displaystyle  D(N) \geq C_\varepsilon N^{-\varepsilon}

in which case the property of having a subpolynomial upper bound is equivalent to that of being subpolynomial size in the sense that

\displaystyle  C_\varepsilon N^{-\varepsilon} \leq D(N) \leq C_\varepsilon N^\varepsilon \ \ \ \ \ (3)

for all {\varepsilon>0} and all {N} in the range. It would naturally be of interest to tighten these bounds further, for instance to show that {D(N)} is polylogarithmic or even bounded in size, but a subpolynomial bound is already sufficient for many applications.

Let us give some illustrative examples of this type of problem:

Example 1 (Kakeya conjecture) Here {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension. For each {N \geq 1}, we pick a maximal {1/N}-separated set of directions {\Omega_N \subset S^{d-1}}. We let {D(N)} be the smallest constant for which one has the Kakeya inequality

\displaystyle  \| \sum_{\omega \in \Omega_N} 1_{T_\omega} \|_{L^{\frac{d}{d-1}}({\bf R}^d)} \leq D(N),

where {T_\omega} is a {1/N \times 1}-tube oriented in the direction {\omega}. The Kakeya maximal function conjecture is then equivalent to the assertion that {D(N)} has a subpolynomial upper bound (or equivalently, is of subpolynomial size). Currently this is only known in dimension {d=2}.

Example 2 (Restriction conjecture for the sphere) Here {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension. We let {D(N)} be the smallest constant for which one has the restriction inequality

\displaystyle  \| \widehat{fd\sigma} \|_{L^{\frac{2d}{d-1}}(B(0,N))} \leq D(N) \| f \|_{L^\infty(S^{d-1})}

for all bounded measurable functions {f} on the unit sphere {S^{d-1}} equipped with surface measure {d\sigma}, where {B(0,N)} is the ball of radius {N} centred at the origin. The restriction conjecture of Stein for the sphere is then equivalent to the assertion that {D(N)} has a subpolynomial upper bound (or equivalently, is of subpolynomial size). Currently this is only known in dimension {d=2}.

Example 3 (Multilinear Kakeya inequality) Again {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension, and let {S_1,\dots,S_d} be compact subsets of the sphere {S^{d-1}} which are transverse in the sense that there is a uniform lower bound {|\omega_1 \wedge \dots \wedge \omega_d| \geq c > 0} for the wedge product of directions {\omega_i \in S_i} for {i=1,\dots,d} (equivalently, there is no hyperplane through the origin that intersects all of the {S_i}). For each {N \geq 1}, we let {D(N)} be the smallest constant for which one has the multilinear Kakeya inequality

\displaystyle  \| \mathrm{geom} \sum_{T \in {\mathcal T}_i} 1_{T} \|_{L^{\frac{d}{d-1}}(B(0,N))} \leq D(N) \mathrm{geom} \# {\mathcal T}_i,

where for each {i=1,\dots,d}, {{\mathcal T}_i} is a collection of infinite tubes in {{\bf R}^d} of radius {1} oriented in a direction in {S_i}, which are separated in the sense that for any two tubes {T,T'} in {{\mathcal T}_i}, either the directions of {T,T'} differ by an angle of at least {1/N}, or {T,T'} are disjoint; and {\mathrm{geom} = \mathrm{geom}_{1 \leq i \leq d}} is our notation for the geometric mean

\displaystyle  \mathrm{geom} a_i := (a_1 \dots a_d)^{1/d}.

The multilinear Kakeya inequality of Bennett, Carbery, and myself establishes that {D(N)} is of subpolynomial size; a later argument of Guth improves this further by showing that {D(N)} is bounded (and in fact comparable to {1}).

Example 4 (Multilinear restriction theorem) Once again {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension, and let {S_1,\dots,S_d} be compact subsets of the sphere {S^{d-1}} which are transverse as in the previous example. For each {N \geq 1}, we let {D(N)} be the smallest constant for which one has the multilinear restriction inequality

\displaystyle  \| \mathrm{geom} \widehat{f_id\sigma} \|_{L^{\frac{2d}{d-1}}(B(0,N))} \leq D(N) \| f \|_{L^2(S^{d-1})}

for all bounded measurable functions {f_i} on {S_i} for {i=1,\dots,d}. Then the multilinear restriction theorem of Bennett, Carbery, and myself establishes that {D(N)} is of subpolynomial size; it is known to be bounded for {d=2} (as can be easily verified from Plancherel’s theorem), but it remains open whether it is bounded for any {d>2}.

Example 5 (Decoupling for the paraboloid) {N} now ranges over the square numbers. Let {d \geq 2}, and subdivide the unit cube {[0,1]^{d-1}} into {N^{(d-1)/2}} cubes {Q} of sidelength {1/N^{1/2}}. For any {g \in L^1([0,1]^{d-1})}, define the extension operators

\displaystyle  E_{[0,1]^{d-1}} g( x', x_d ) := \int_{[0,1]^{d-1}} e^{2\pi i (x' \cdot \xi + x_d |\xi|^2)} g(\xi)\ d\xi

and

\displaystyle  E_Q g( x', x_d ) := \int_{Q} e^{2\pi i (x' \cdot \xi + x_d |\xi|^2)} g(\xi)\ d\xi

for {x' \in {\bf R}^{d-1}} and {x_d \in {\bf R}}. We also introduce the weight function

\displaystyle  w_{B(0,N)}(x) := (1 + \frac{|x|}{N})^{-100d}.

For any {p}, let {D_p(N)} be the smallest constant for which one has the decoupling inequality

\displaystyle  \| E_{[0,1]^{d-1}} g \|_{L^p(w_{B(0,N)})} \leq D_p(N) (\sum_Q \| E_Q g \|_{L^p(w_{B(0,N)})}^2)^{1/2}.

The decoupling theorem of Bourgain and Demeter asserts that {D_p(N)} is of subpolynomial size for all {p} in the optimal range {2 \leq p \leq \frac{2(d+1)}{d-1}}.

Example 6 (Decoupling for the moment curve) {N} now ranges over the natural numbers. Let {d \geq 2}, and subdivide {[0,1]} into {N} intervals {J} of length {1/N}. For any {g \in L^1([0,1])}, define the extension operators

\displaystyle  E_{[0,1]} g(x_1,\dots,x_d) = \int_{[0,1]} e^{2\pi i ( x_1 \xi + x_2 \xi^2 + \dots + x_d \xi^d} g(\xi)\ d\xi

and more generally

\displaystyle  E_J g(x_1,\dots,x_d) = \int_{[0,1]} e^{2\pi i ( x_1 \xi + x_2 \xi^2 + \dots + x_d \xi^d} g(\xi)\ d\xi

for {(x_1,\dots,x_d) \in {\bf R}^d}. For any {p}, let {D_p(N)} be the smallest constant for which one has the decoupling inequality

\displaystyle  \| E_{[0,1]} g \|_{L^p(w_{B(0,N^d)})} \leq D_p(N) (\sum_J \| E_J g \|_{L^p(w_{B(0,N^d)})}^2)^{1/2}.

It was shown by Bourgain, Demeter, and Guth that {D_p(N)} is of subpolynomial size for all {p} in the optimal range {2 \leq p \leq d(d+1)}, which among other things implies the Vinogradov main conjecture (as discussed in this previous post).

It is convenient to use asymptotic notation to express these estimates. We write {X \lesssim Y}, {X = O(Y)}, or {Y \gtrsim X} to denote the inequality {|X| \leq CY} for some constant {C} independent of the scale parameter {N}, and write {X \sim Y} for {X \lesssim Y \lesssim X}. We write {X = o(Y)} to denote a bound of the form {|X| \leq c(N) Y} where {c(N) \rightarrow 0} as {N \rightarrow \infty} along the given range of {N}. We then write {X \lessapprox Y} for {X \lesssim N^{o(1)} Y}, and {X \approx Y} for {X \lessapprox Y \lessapprox X}. Then the statement that {D(N)} is of polynomial size can be written as

\displaystyle  D(N) \sim N^{O(1)},

while the statement that {D(N)} has a subpolynomial upper bound can be written as

\displaystyle  D(N) \lessapprox 1

and similarly the statement that {D(N)} is of subpolynomial size is simply

\displaystyle  D(N) \approx 1.

Many modern approaches to bounding quantities like {D(N)} in harmonic analysis rely on some sort of induction on scales approach in which {D(N)} is bounded using quantities such as {D(N^\theta)} for some exponents {0 < \theta < 1}. For instance, suppose one is somehow able to establish the inequality

\displaystyle  D(N) \lessapprox D(\sqrt{N}) \ \ \ \ \ (4)

for all {N \geq 1}, and suppose that {D} is also known to be of polynomial size. Then this implies that {D} has a subpolynomial upper bound. Indeed, one can iterate this inequality to show that

\displaystyle  D(N) \lessapprox D(N^{1/2^k})

for any fixed {k}; using the polynomial size hypothesis one thus has

\displaystyle  D(N) \lessapprox N^{C/2^k}

for some constant {C} independent of {k}. As {k} can be arbitrarily large, we conclude that {D(N) \lesssim N^\varepsilon} for any {\varepsilon>0}, and hence {D} is of subpolynomial size. (This sort of iteration is used for instance in my paper with Bennett and Carbery to derive the multilinear restriction theorem from the multilinear Kakeya theorem.)

Exercise 7 If {D} is of polynomial size, and obeys the inequality

\displaystyle  D(N) \lessapprox D(N^{1-\varepsilon}) + N^{O(\varepsilon)}

for any fixed {\varepsilon>0}, where the implied constant in the {O(\varepsilon)} notation is independent of {\varepsilon}, show that {D} has a subpolynomial upper bound. This type of inequality is used to equate various linear estimates in harmonic analysis with their multilinear counterparts; see for instance this paper of myself, Vargas, and Vega for an early example of this method.

In more recent years, more sophisticated induction on scales arguments have emerged in which one or more auxiliary quantities besides {D(N)} also come into play. Here is one example, this time being an abstraction of a short proof of the multilinear Kakeya inequality due to Guth. Let {D(N)} be the quantity in Example 3. We define {D(N,M)} similarly to {D(N)} for any {M \geq 1}, except that we now also require that the diameter of each set {S_i} is at most {1/M}. One can then observe the following estimates:

  • (Triangle inequality) For any {N,M \geq 1}, we have

    \displaystyle  D(N,M) = M^{O(1)} D(N). \ \ \ \ \ (5)

  • (Multiplicativity) For any {N_1,N_2 = N^{O(1)}}, one has

    \displaystyle  D(N_1 N_2, M) \lessapprox D(N_1, M) D(N_2, M). \ \ \ \ \ (6)

  • (Loomis-Whitney inequality) We have

    \displaystyle  D(N,N) \lessapprox 1. \ \ \ \ \ (7)

These inequalities now imply that {D} has a subpolynomial upper bound, as we now demonstrate. Let {k} be a large natural number (independent of {N}) to be chosen later. From many iterations of (6) we have

\displaystyle  D(N, N^{1/k}) \lessapprox D(N^{1/k},N^{1/k})^k

and hence by (7) (with {N} replaced by {N^{1/k}}) and (5)

\displaystyle  D(N) \lessapprox N^{O(1/k)}

where the implied constant in the {O(1/k)} exponent does not depend on {k}. As {k} can be arbitrarily large, the claim follows. We remark that a nearly identical scheme lets one deduce decoupling estimates for the three-dimensional cone from that of the two-dimensional paraboloid; see the final section of this paper of Bourgain and Demeter.

Now we give a slightly more sophisticated example, abstracted from the proof of {L^p} decoupling of the paraboloid by Bourgain and Demeter, as described in this study guide after specialising the dimension to {2} and the exponent {p} to the endpoint {p=6} (the argument is also more or less summarised in this previous post). (In the cited papers, the argument was phrased only for the non-endpoint case {p<6}, but it has been observed independently by many experts that the argument extends with only minor modifications to the endpoint {p=6}.) Here we have a quantity {D_p(N)} that we wish to show is of subpolynomial size. For any {0 < \varepsilon < 1} and {0 \leq u \leq 1}, one can define an auxiliary quantity {A_{p,u,\varepsilon}(N)}. The precise definitions of {D_p(N)} and {A_{p,u,\varepsilon}(N)} are given in the study guide (where they are called {\mathrm{Dec}_2(1/N,p)} and {A_p(u, B(0,N^2), u, g)} respectively, setting {\delta = 1/N} and {\nu = \delta^\varepsilon}) but will not be of importance to us for this discussion. Suffice to say that the following estimates are known:

  • (Crude upper bound for {D_p}) {D_p(N)} is of polynomial size: {D_p(N) \sim N^{O(1)}}.
  • (Bilinear reduction, using parabolic rescaling) For any {0 \leq u \leq 1}, one has

    \displaystyle  D_p(N) \lessapprox D_p(N^{1-\varepsilon}) + N^{O(\varepsilon)+O(u)} A_{p,u,\varepsilon}(N). \ \ \ \ \ (8)

  • (Crude upper bound for {A_{p,u,\varepsilon}(N)}) For any {0 \leq u \leq 1} one has

    \displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)+O(u)} D_p(N) \ \ \ \ \ (9)

  • (Application of multilinear Kakeya and {L^2} decoupling) If {\varepsilon, u} are sufficiently small (e.g. both less than {1/4}), then

    \displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2u,\varepsilon}(N)^{1/2} D_p(N^{1-u})^{1/2}. \ \ \ \ \ (10)

In all of these bounds the implied constant exponents such as {O(\varepsilon)} or {O(u)} are independent of {\varepsilon} and {u}, although the implied constants in the {\lessapprox} notation can depend on both {\varepsilon} and {u}. Here we gloss over an annoying technicality in that quantities such as {N^{1-\varepsilon}}, {N^{1-u}}, or {N^u} might not be an integer (and might not divide evenly into {N}), which is needed for the application to decoupling theorems; this can be resolved by restricting the scales involved to powers of two and restricting the values of {\varepsilon, u} to certain rational values, which introduces some complications to the later arguments below which we shall simply ignore as they do not significantly affect the numerology.

It turns out that these estimates imply that {D_p(N)} is of subpolynomial size. We give the argument as follows. As {D_p(N)} is known to be of polynomial size, we have some {\eta>0} for which we have the bound

\displaystyle  D_p(N) \lessapprox N^\eta \ \ \ \ \ (11)

for all {N}. We can pick {\eta} to be the minimal exponent for which this bound is attained: thus

\displaystyle  \eta = \limsup_{N \rightarrow \infty} \frac{\log D_p(N)}{\log N}. \ \ \ \ \ (12)

We will call this the upper exponent of {D_p(N)}. We need to show that {\eta \leq 0}. We assume for contradiction that {\eta > 0}. Let {\varepsilon>0} be a sufficiently small quantity depending on {\eta} to be chosen later. From (10) we then have

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2u,\varepsilon}(N)^{1/2} N^{\eta (\frac{1}{2} - \frac{u}{2})}

for any sufficiently small {u}. A routine iteration then gives

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2^k u,\varepsilon}(N)^{1/2^k} N^{\eta (1 - \frac{1}{2^k} - k\frac{u}{2})}

for any {k \geq 1} that is independent of {N}, if {u} is sufficiently small depending on {k}. A key point here is that the implied constant in the exponent {O(\varepsilon)} is uniform in {k} (the constant comes from summing a convergent geometric series). We now use the crude bound (9) followed by (11) and conclude that

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{\eta (1 - k\frac{u}{2}) + O(\varepsilon) + O(u)}.

Applying (8) we then have

\displaystyle  D_p(N) \lessapprox N^{\eta(1-\varepsilon)} + N^{\eta (1 - k\frac{u}{2}) + O(\varepsilon) + O(u)}.

If we choose {k} sufficiently large depending on {\eta} (which was assumed to be positive), then the negative term {-\eta k \frac{u}{2}} will dominate the {O(u)} term. If we then pick {u} sufficiently small depending on {k}, then finally {\varepsilon} sufficiently small depending on all previous quantities, we will obtain {D_p(N) \lessapprox N^{\eta'}} for some {\eta'} strictly less than {\eta}, contradicting the definition of {\eta}. Thus {\eta} cannot be positive, and hence {D_p(N)} has a subpolynomial upper bound as required.

Exercise 8 Show that one still obtains a subpolynomial upper bound if the estimate (10) is replaced with

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2u,\varepsilon}(N)^{1-\theta} D_p(N)^{\theta}

for some constant {0 \leq \theta < 1/2}, so long as we also improve (9) to

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} D_p(N^{1-u}).

(This variant of the argument lets one handle the non-endpoint cases {2 < p < 6} of the decoupling theorem for the paraboloid.)

To establish decoupling estimates for the moment curve, restricting to the endpoint case {p = d(d+1)} for sake of discussion, an even more sophisticated induction on scales argument was deployed by Bourgain, Demeter, and Guth. The proof is discussed in this previous blog post, but let us just describe an abstract version of the induction on scales argument. To bound the quantity {D_p(N) = D_{d(d+1)}(N)}, some auxiliary quantities {A_{t,q,s,\varepsilon}(N)} are introduced for various exponents {1 \leq t \leq \infty} and {0 \leq q,s \leq 1} and {\varepsilon>0}, with the following bounds:

  • (Crude upper bound for {D}) {D_p(N)} is of polynomial size: {D_p(N) \sim N^{O(1)}}.
  • (Multilinear reduction, using non-isotropic rescaling) For any {0 \leq q,s \leq 1} and {1 \leq t \leq \infty}, one has

    \displaystyle  D_p(N) \lessapprox D_p(N^{1-\varepsilon}) + N^{O(\varepsilon)+O(q)+O(s)} A_{t,q,s,\varepsilon}(N). \ \ \ \ \ (13)

  • (Crude upper bound for {A_{t,q,s,\varepsilon}(N)}) For any {0 \leq q,s \leq 1} and {1 \leq t \leq \infty} one has

    \displaystyle  A_{t,q,s,\varepsilon}(N) \lessapprox N^{O(\varepsilon)+O(q)+O(s)} D_p(N) \ \ \ \ \ (14)

  • (Hölder) For {0 \leq q, s \leq 1} and {1 \leq t_0 \leq t_1 \leq \infty} one has

    \displaystyle  A_{t_0,q,s,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{t_1,q,s,\varepsilon}(N) \ \ \ \ \ (15)

    and also

    \displaystyle  A_{t_\theta,q,s,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{t_0,q,s,\varepsilon}(N)^{1-\theta} A_{t_1,q,s,\varepsilon}(N)^\theta \ \ \ \ \ (16)

    whenever {0 \leq \theta \leq 1}, where {\frac{1}{t_\theta} = \frac{1-\theta}{t_0} + \frac{\theta}{t_1}}.

  • (Rescaled decoupling hypothesis) For {0 \leq q,s \leq 1}, one has

    \displaystyle  A_{p,q,s,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} D_p(N^{1-q}). \ \ \ \ \ (17)

  • (Lower dimensional decoupling) If {1 \leq k \leq d-1} and {q \leq s/k}, then

    \displaystyle  A_{k(k+1),q,s,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{k(k+1),s/k,s,\varepsilon}(N). \ \ \ \ \ (18)

  • (Multilinear Kakeya) If {1 \leq k \leq d-1} and {0 \leq q \leq 1}, then

    \displaystyle  A_{kp/d,q,kq,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{kp/d,q,(k+1)q,\varepsilon}(N). \ \ \ \ \ (19)

It is now substantially less obvious that these estimates can be combined to demonstrate that {D(N)} is of subpolynomial size; nevertheless this can be done. A somewhat complicated arrangement of the argument (involving some rather unmotivated choices of expressions to induct over) appears in my previous blog post; I give an alternate proof later in this post.

These examples indicate a general strategy to establish that some quantity {D(N)} is of subpolynomial size, by

  • (i) Introducing some family of related auxiliary quantities, often parameterised by several further parameters;
  • (ii) establishing as many bounds between these quantities and the original quantity {D(N)} as possible; and then
  • (iii) appealing to some sort of “induction on scales” to conclude.

The first two steps (i), (ii) depend very much on the harmonic analysis nature of the quantities {D(N)} and the related auxiliary quantities, and the estimates in (ii) will typically be proven from various harmonic analysis inputs such as Hölder’s inequality, rescaling arguments, decoupling estimates, or Kakeya type estimates. The final step (iii) requires no knowledge of where these quantities come from in harmonic analysis, but the iterations involved can become extremely complicated.

In this post I would like to observe that one can clean up and made more systematic this final step (iii) by passing to upper exponents (12) to eliminate the role of the parameter {N} (and also “tropicalising” all the estimates), and then taking similar limit superiors to eliminate some other less important parameters, until one is left with a simple linear programming problem (which, among other things, could be amenable to computer-assisted proving techniques). This method is analogous to that of passing to a simpler asymptotic limit object in many other areas of mathematics (for instance using the Furstenberg correspondence principle to pass from a combinatorial problem to an ergodic theory problem, as discussed in this previous post). We use the limit superior exclusively in this post, but many of the arguments here would also apply with one of the other generalised limit functionals discussed in this previous post, such as ultrafilter limits.

For instance, if {\eta} is the upper exponent of a quantity {D(N)} of polynomial size obeying (4), then a comparison of the upper exponent of both sides of (4) one arrives at the scalar inequality

\displaystyle  \eta \leq \frac{1}{2} \eta

from which it is immediate that {\eta \leq 0}, giving the required subpolynomial upper bound. Notice how the passage to upper exponents converts the {\lessapprox} estimate to a simpler inequality {\leq}.

Exercise 9 Repeat Exercise 7 using this method.

Similarly, given the quantities {D(N,M)} obeying the axioms (5), (6), (7), and assuming that {D(N)} is of polynomial size (which is easily verified for the application at hand), we see that for any real numbers {a, u \geq 0}, the quantity {D(N^a,N^u)} is also of polynomial size and hence has some upper exponent {\eta(a,u)}; meanwhile {D(N)} itself has some upper exponent {\eta}. By reparameterising we have the homogeneity

\displaystyle  \eta(\lambda a, \lambda u) = \lambda \eta(a,u)

for any {\lambda \geq 0}. Also, comparing the upper exponents of both sides of the axioms (5), (6), (7) we arrive at the inequalities

\displaystyle  \eta(1,u) = \eta + O(u)

\displaystyle  \eta(a_1+a_2,u) \leq \eta(a_1,u) + \eta(a_2,u)

\displaystyle  \eta(1,1) \leq 0.

For any natural number {k}, the third inequality combined with homogeneity gives {\eta(1/k,1/k)}, which when combined with the second inequality gives {\eta(1,1/k) \leq k \eta(1/k,1/k) \leq 0}, which on combination with the first estimate gives {\eta \leq O(1/k)}. Sending {k} to infinity we obtain {\eta \leq 0} as required.

Now suppose that {D_p(N)}, {A_{p,u,\varepsilon}(N)} obey the axioms (8), (9), (10). For any fixed {u,\varepsilon}, the quantity {A_{p,u,\varepsilon}(N)} is of polynomial size (thanks to (9) and the polynomial size of {D_6}), and hence has some upper exponent {\eta(u,\varepsilon)}; similarly {D_p(N)} has some upper exponent {\eta}. (Actually, strictly speaking our axioms only give an upper bound on {A_{p,u,\varepsilon}} so we have to temporarily admit the possibility that {\eta(u,\varepsilon)=-\infty}, though this will soon be eliminated anyway.) Taking upper exponents of all the axioms we then conclude that

\displaystyle  \eta \leq \max( (1-\varepsilon) \eta, \eta(u,\varepsilon) + O(\varepsilon) + O(u) ) \ \ \ \ \ (20)

\displaystyle  \eta(u,\varepsilon) \leq \eta + O(\varepsilon) + O(u)

\displaystyle  \eta(u,\varepsilon) \leq \frac{1}{2} \eta(2u,\varepsilon) + \frac{1}{2} \eta (1-u) + O(\varepsilon)

for all {0 \leq u \leq 1} and {0 \leq \varepsilon \leq 1}.

Assume for contradiction that {\eta>0}, then {(1-\varepsilon) \eta < \eta}, and so the statement (20) simplifies to

\displaystyle  \eta \leq \eta(u,\varepsilon) + O(\varepsilon) + O(u).

At this point we can eliminate the role of {\varepsilon} and simplify the system by taking a second limit superior. If we write

\displaystyle  \eta(u) := \limsup_{\varepsilon \rightarrow 0} \eta(u,\varepsilon)

then on taking limit superiors of the previous inequalities we conclude that

\displaystyle  \eta(u) \leq \eta + O(u)

\displaystyle  \eta(u) \leq \frac{1}{2} \eta(2u) + \frac{1}{2} \eta (1-u) \ \ \ \ \ (21)

\displaystyle  \eta \leq \eta(u) + O(u)

for all {u}; in particular {\eta(u) = \eta + O(u)}. We take advantage of this by taking a further limit superior (or “upper derivative”) in the limit {u \rightarrow 0} to eliminate the role of {u} and simplify the system further. If we define

\displaystyle  \alpha := \limsup_{u \rightarrow 0^+} \frac{\eta(u)-\eta}{u},

so that {\alpha} is the best constant for which {\eta(u) \leq \eta + \alpha u + o(u)} as {u \rightarrow 0}, then {\alpha} is finite, and by inserting this “Taylor expansion” into the right-hand side of (21) and conclude that

\displaystyle  \alpha \leq \alpha - \frac{1}{2} \eta.

This leads to a contradiction when {\eta>0}, and hence {\eta \leq 0} as desired.

Exercise 10 Redo Exercise 8 using this method.

The same strategy now clarifies how to proceed with the more complicated system of quantities {A_{t,q,s,\varepsilon}(N)} obeying the axioms (13)(19) with {D_p(N)} of polynomial size. Let {\eta} be the exponent of {D_p(N)}. From (14) we see that for fixed {t,q,s,\varepsilon}, each {A_{t,q,s,\varepsilon}(N)} is also of polynomial size (at least in upper bound) and so has some exponent {a( t,q,s,\varepsilon)} (which for now we can permit to be {-\infty}). Taking upper exponents of all the various axioms we can now eliminate {N} and arrive at the simpler axioms

\displaystyle  \eta \leq \max( (1-\varepsilon) \eta, a(t,q,s,\varepsilon) + O(\varepsilon) + O(q) + O(s) )

\displaystyle  a(t,q,s,\varepsilon) \leq \eta + O(\varepsilon) + O(q) + O(s)

\displaystyle  a(t_0,q,s,\varepsilon) \leq a(t_1,q,s,\varepsilon) + O(\varepsilon)

\displaystyle  a(t_\theta,q,s,\varepsilon) \leq (1-\theta) a(t_0,q,s,\varepsilon) + \theta a(t_1,q,s,\varepsilon) + O(\varepsilon)

\displaystyle  a(d(d+1),q,s,\varepsilon) \leq \eta(1-q) + O(\varepsilon)

for all {0 \leq q,s \leq 1}, {1 \leq t \leq \infty}, {1 \leq t_0 \leq t_1 \leq \infty} and {0 \leq \theta \leq 1}, with the lower dimensional decoupling inequality

\displaystyle  a(k(k+1),q,s,\varepsilon) \leq a(k(k+1),s/k,s,\varepsilon) + O(\varepsilon)

for {1 \leq k \leq d-1} and {q \leq s/k}, and the multilinear Kakeya inequality

\displaystyle  a(k(d+1),q,kq,\varepsilon) \leq a(k(d+1),q,(k+1)q,\varepsilon)

for {1 \leq k \leq d-1} and {0 \leq q \leq 1}.

As before, if we assume for sake of contradiction that {\eta>0} then the first inequality simplifies to

\displaystyle  \eta \leq a(t,q,s,\varepsilon) + O(\varepsilon) + O(q) + O(s).

We can then again eliminate the role of {\varepsilon} by taking a second limit superior as {\varepsilon \rightarrow 0}, introducing

\displaystyle  a(t,q,s) := \limsup_{\varepsilon \rightarrow 0} a(t,q,s,\varepsilon)

and thus getting the simplified axiom system

\displaystyle  a(t,q,s) \leq \eta + O(q) + O(s) \ \ \ \ \ (22)

\displaystyle  a(t_0,q,s) \leq a(t_1,q,s)

\displaystyle  a(t_\theta,q,s) \leq (1-\theta) a(t_0,q,s) + \theta a(t_1,q,s)

\displaystyle  a(d(d+1),q,s) \leq \eta(1-q)

\displaystyle  \eta \leq a(t,q,s) + O(q) + O(s) \ \ \ \ \ (23)

and also

\displaystyle  a(k(k+1),q,s) \leq a(k(k+1),s/k,s)

for {1 \leq k \leq d-1} and {q \leq s/k}, and

\displaystyle  a(k(d+1),q,kq) \leq a(k(d+1),q,(k+1)q)

for {1 \leq k \leq d-1} and {0 \leq q \leq 1}.

In view of the latter two estimates it is natural to restrict attention to the quantities {a(t,q,kq)} for {1 \leq k \leq d+1}. By the axioms (22), these quantities are of the form {\eta + O(q)}. We can then eliminate the role of {q} by taking another limit superior

\displaystyle  \alpha_k(t) := \limsup_{q \rightarrow 0} \frac{a(t,q,kq)-\eta}{q}.

The axioms now simplify to

\displaystyle  \alpha_k(t) = O(1)

\displaystyle  \alpha_k(t_0) \leq \alpha_k(t_1) \ \ \ \ \ (24)

\displaystyle  \alpha_k(t_\theta) \leq (1-\theta) \alpha_k(t_0) + \theta \alpha_k(t_1) \ \ \ \ \ (25)

\displaystyle  \alpha_k(d(d+1)) \leq -\eta \ \ \ \ \ (26)

and

\displaystyle  \alpha_j(k(k+1)) \leq \frac{j}{k} \alpha_k(k(k+1)) \ \ \ \ \ (27)

for {1 \leq k \leq d-1} and {k \leq j \leq d}, and

\displaystyle  \alpha_k(k(d+1)) \leq \alpha_{k+1}(k(d+1)) \ \ \ \ \ (28)

for {1 \leq k \leq d-1}.

It turns out that the inequality (27) is strongest when {j=k+1}, thus

\displaystyle  \alpha_{k+1}(k(k+1)) \leq \frac{k+1}{k} \alpha_k(k(k+1)) \ \ \ \ \ (29)

for {1 \leq k \leq d-1}.

From the last two inequalities (28), (29) we see that a special role is likely to be played by the exponents

\displaystyle  \beta_k := \alpha_k(k(k-1))

for {2 \leq k \leq d} and

\displaystyle \gamma_k := \alpha_k(k(d+1))

for {1 \leq k \leq d}. From the convexity (25) and a brief calculation we have

\displaystyle  \alpha_{k+1}(k(d+1)) \leq \frac{1}{d-k+1} \alpha_{k+1}(k(k+1))

\displaystyle + \frac{d-k}{d-k+1} \alpha_{k+1}((k+1)(d+1)),

for {1 \leq k \leq d-1}, hence from (28) we have

\displaystyle  \gamma_k \leq \frac{1}{d-k+1} \beta_{k+1} + \frac{d-k}{d-k+1} \gamma_{k+1}. \ \ \ \ \ (30)

Similarly, from (25) and a brief calculation we have

\displaystyle  \alpha_k(k(k+1)) \leq \frac{(d-k)(k-1)}{(k+1)(d-k+2)} \alpha_k( k(k-1))

\displaystyle  + \frac{2(d+1)}{(k+1)(d-k+2)} \alpha_k(k(d+1))

for {2 \leq k \leq d-1}; the same bound holds for {k=1} if we drop the term with the {(k-1)} factor, thanks to (24). Thus from (29) we have

\displaystyle  \beta_{k+1} \leq \frac{(d-k)(k-1)}{k(d-k+2)} \beta_k + \frac{2(d+1)}{k(d-k+2)} \gamma_k, \ \ \ \ \ (31)

for {1 \leq k \leq d-1}, again with the understanding that we omit the first term on the right-hand side when {k=1}. Finally, (26) gives

\displaystyle  \gamma_d \leq -\eta.

Let us write out the system of equations we have obtained in full:

\displaystyle  \beta_2 \leq 2 \gamma_1 \ \ \ \ \ (32)

\displaystyle  \gamma_1 \leq \frac{1}{d} \beta_2 + \frac{d-1}{d} \gamma_2 \ \ \ \ \ (33)

\displaystyle  \beta_3 \leq \frac{d-2}{2d} \beta_2 + \frac{2(d+1)}{2d} \gamma_2 \ \ \ \ \ (34)

\displaystyle  \gamma_2 \leq \frac{1}{d-1} \beta_3 + \frac{d-2}{d-1} \gamma_3 \ \ \ \ \ (35)

\displaystyle  \beta_4 \leq \frac{2(d-3)}{3(d-1)} \beta_3 + \frac{2(d+1)}{3(d-1)} \gamma_3

\displaystyle  \gamma_3 \leq \frac{1}{d-2} \beta_4 + \frac{d-3}{d-2} \gamma_4

\displaystyle  ...

\displaystyle  \beta_d \leq \frac{d-2}{(d-1) 3} \beta_{d-1} + \frac{2(d+1)}{(d-1) 3} \gamma_{d-1}

\displaystyle  \gamma_{d-1} \leq \frac{1}{2} \beta_d + \frac{1}{2} \gamma_d \ \ \ \ \ (36)

\displaystyle  \gamma_d \leq -\eta. \ \ \ \ \ (37)

We can then eliminate the variables one by one. Inserting (33) into (32) we obtain

\displaystyle  \beta_2 \leq \frac{2}{d} \beta_2 + \frac{2(d-1)}{d} \gamma_2

which simplifies to

\displaystyle  \beta_2 \leq \frac{2(d-1)}{d-2} \gamma_2.

Inserting this into (34) gives

\displaystyle  \beta_3 \leq 2 \gamma_2

which when combined with (35) gives

\displaystyle  \beta_3 \leq \frac{2}{d-1} \beta_3 + \frac{2(d-2)}{d-1} \gamma_3

which simplifies to

\displaystyle  \beta_3 \leq \frac{2(d-2)}{d-3} \gamma_3.

Iterating this we get

\displaystyle  \beta_{k+1} \leq 2 \gamma_k

for all {1 \leq k \leq d-1} and

\displaystyle  \beta_k \leq \frac{2(d-k+1)}{d-k} \gamma_k

for all {2 \leq k \leq d-1}. In particular

\displaystyle  \beta_d \leq 2 \gamma_{d-1}

which on insertion into (36), (37) gives

\displaystyle  \beta_d \leq \beta_d - \eta

which is absurd if {\eta>0}. Thus {\eta \leq 0} and so {D_p(N)} must be of subpolynomial growth.

Remark 11 (This observation is essentially due to Heath-Brown.) If we let {x} denote the column vector with entries {\beta_2,\dots,\beta_d,\gamma_1,\dots,\gamma_{d-1}} (arranged in whatever order one pleases), then the above system of inequalities (32)(36) (using (37) to handle the appearance of {\gamma_d} in (36)) reads

\displaystyle  x \leq Px + \eta v \ \ \ \ \ (38)

for some explicit square matrix {P} with non-negative coefficients, where the inequality denotes pointwise domination, and {v} is an explicit vector with non-positive coefficients that reflects the effect of (37). It is possible to show (using (24), (26)) that all the coefficients of {x} are negative (assuming the counterfactual situation {\eta>0} of course). Then we can iterate this to obtain

\displaystyle  x \leq P^k x + \eta \sum_{j=0}^{k-1} P^j v

for any natural number {k}. This would lead to an immediate contradiction if the Perron-Frobenius eigenvalue of {P} exceeds {1} because {P^k x} would now grow exponentially; this is typically the situation for “non-endpoint” applications such as proving decoupling inequalities away from the endpoint. In the endpoint situation discussed above, the Perron-Frobenius eigenvalue is {1}, with {v} having a non-trivial projection to this eigenspace, so the sum {\sum_{j=0}^{k-1} \eta P^j v} now grows at least linearly, which still gives the required contradiction for any {\eta>0}. So it is important to gather “enough” inequalities so that the relevant matrix {P} has a Perron-Frobenius eigenvalue greater than or equal to {1} (and in the latter case one needs non-trivial injection of an induction hypothesis into an eigenspace corresponding to an eigenvalue {1}). More specifically, if {\rho} is the spectral radius of {P} and {w^T} is a left Perron-Frobenius eigenvector, that is to say a non-negative vector, not identically zero, such that {w^T P = \rho w^T}, then by taking inner products of (38) with {w} we obtain

\displaystyle  w^T x \leq \rho w^T x + \eta w^T v.

If {\rho > 1} this leads to a contradiction since {w^T x} is negative and {w^T v} is non-positive. When {\rho = 1} one still gets a contradiction as long as {w^T v} is strictly negative.

Remark 12 (This calculation is essentially due to Guo and Zorin-Kranich.) Here is a concrete application of the Perron-Frobenius strategy outlined above to the system of inequalities (32)(37). Consider the weighted sum

\displaystyle  W := \sum_{k=2}^d (k-1) \beta_k + \sum_{k=1}^{d-1} 2k \gamma_k;

I had secretly calculated the weights {k-1}, {2k} as coming from the left Perron-Frobenius eigenvector of the matrix {P} described in the previous remark, but for this calculation the precise provenance of the weights is not relevant. Applying the inequalities (31), (30) we see that {W} is bounded by

\displaystyle  \sum_{k=2}^d (k-1) (\frac{(d-k+1)(k-2)}{(k-1)(d-k+3)} \beta_{k-1} + \frac{2(d+1)}{(k-1)(d-k+3)} \gamma_{k-1})

\displaystyle  + \sum_{k=1}^{d-1} 2k(\frac{1}{d-k+1} \beta_{k+1} + \frac{d-k}{d-k+1} \gamma_{k+1})

(with the convention that the {\beta_1} term is absent); this simplifies after some calculation to the bound

\displaystyle  W \leq W + \frac{1}{2} \gamma_d

and this and (37) then leads to the required contradiction.

Exercise 13

  • (i) Extend the above analysis to also cover the non-endpoint case {d^2 < p < d(d+1)}. (One will need to establish the claim {\alpha_k(t) \leq -\eta} for {t \leq p}.)
  • (ii) Modify the argument to deal with the remaining cases {2 < p \leq d^2} by dropping some of the steps.

I was recently asked to contribute a short comment to Nature Reviews Physics, as part of a series of articles on fluid dynamics on the occasion of the 200th anniversary (this August) of the birthday of George Stokes.  My contribution is now online as “Searching for singularities in the Navier–Stokes equations“, where I discuss the global regularity problem for Navier-Stokes and my thoughts on how one could try to construct a solution that blows up in finite time via an approximately discretely self-similar “fluid computer”.  (The rest of the series does not currently seem to be available online, but I expect they will become so shortly.)

 

Given three points {A,B,C} in the plane, the distances {|AB|, |BC|, |AC|} between them have to be non-negative and obey the triangle inequalities

\displaystyle  |AB| \leq |BC| + |AC|, |BC| \leq |AC| + |AB|, |AC| \leq |AB| + |BC|

but are otherwise unconstrained. But if one has four points {A,B,C,D} in the plane, then there is an additional constraint connecting the six distances {|AB|, |AC|, |AD|, |BC|, |BD|, |CD|} between them, coming from the Cayley-Menger determinant:

Proposition 1 (Cayley-Menger determinant) If {A,B,C,D} are four points in the plane, then the Cayley-Menger determinant

\displaystyle  \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & |AB|^2 & |AC|^2 & |AD|^2 \\ 1 & |AB|^2 & 0 & |BC|^2 & |BD|^2 \\ 1 & |AC|^2 & |BC|^2 & 0 & |CD|^2 \\ 1 & |AD|^2 & |BD|^2 & |CD|^2 & 0 \end{pmatrix} \ \ \ \ \ (1)

vanishes.

Proof: If we view {A,B,C,D} as vectors in {{\bf R}^2}, then we have the usual cosine rule {|AB|^2 = |A|^2 + |B|^2 - 2 A \cdot B}, and similarly for all the other distances. The {5 \times 5} matrix appearing in (1) can then be written as {M+M^T-2\tilde G}, where {M} is the matrix

\displaystyle  M := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \end{pmatrix}

and {\tilde G} is the (augmented) Gram matrix

\displaystyle  \tilde G := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 0 & A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ 0 & B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ 0 & C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ 0 & D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.

The matrix {M} is a rank one matrix, and so {M^T} is also. The Gram matrix {\tilde G} factorises as {\tilde G = \tilde \Sigma \tilde \Sigma^T}, where {\tilde \Sigma} is the {5 \times 2} matrix with rows {0,A,B,C,D}, and thus has rank at most {2}. Therefore the matrix {M+M^T-2\tilde G} in (1) has rank at most {1+1+2=4}, and hence has determinant zero as claimed. \Box

For instance, if we know that {|AB|=|AC|=|DB|=|DC|=1} and {|BC|=\sqrt{2}}, then in order for {A,B,C,D} to be coplanar, the remaining distance {|AD|} has to obey the equation

\displaystyle  \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & |AD|^2 \\ 1 & 1 & 0 & 2 & 1 \\ 1 & 1 & 2 & 0 & 1 \\ 1 & |AD|^2 & 1 & 1 & 0 \end{pmatrix} = 0.

After some calculation the left-hand side simplifies to {-4 |AD|^4 + 8 |AD|^2}, so the non-negative quantity is constrained to equal either {0} or {\sqrt{2}}. The former happens when {A,B,C} form a unit right-angled triangle with right angle at {A} and {D=A}; the latter happens when {A,B,D,C} form the vertices of a unit square traversed in that order. Any other value for {|AD|} is not compatible with the hypothesis for {A,B,C,D} lying on a plane; hence the Cayley-Menger determinant can be used as a test for planarity.

Now suppose that we have four points {A,B,C,D} on a sphere {S_R} of radius {R}, with six distances {|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |AD|_R} now measured as lengths of arcs on the sphere. There is a spherical analogue of the Cayley-Menger determinant:

Proposition 2 (Spherical Cayley-Menger determinant) If {A,B,C,D} are four points on a sphere {S_R} of radius {R} in {{\bf R}^3}, then the spherical Cayley-Menger determinant

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & \cos \frac{|AB|_R}{R} & \cos \frac{|AC|_R}{R} & \cos \frac{|AD|_R}{R} \\ \cos \frac{|AB|_R}{R} & 1 & \cos \frac{|BC|_R}{R} & \cos \frac{|BD|_R}{R} \\ \cos \frac{|AC|_R}{R} & \cos \frac{|BC|_R}{R} & 1 & \cos \frac{|CD|_R}{R} \\ \cos \frac{|AD|_R}{R} & \cos \frac{|BD|_R}{R} & \cos \frac{|CD|_R}{R} & 1 \end{pmatrix} \ \ \ \ \ (2)

vanishes.

Proof: We can assume that the sphere {S_R} is centred at the origin of {{\bf R}^3}, and view {A,B,C,D} as vectors in {{\bf R}^3} of magnitude {R}. The angle subtended by {AB} from the origin is {|AB|_R/R}, so by the cosine rule we have

\displaystyle  A \cdot B = R^{2} \cos \frac{|AB|_R}{R}.

Similarly for all the other inner products. Thus the matrix in (2) can be written as {R^{-2} G}, where {G} is the Gram matrix

\displaystyle  G := \begin{pmatrix} A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.

We can factor {G = \Sigma \Sigma^T} where {\Sigma} is the {4 \times 3} matrix with rows {A,B,C,D}. Thus {R^{-2} G} has rank at most {3} and thus the determinant vanishes as required. \Box

Just as the Cayley-Menger determinant can be used to test for coplanarity, the spherical Cayley-Menger determinant can be used to test for lying on a sphere of radius {R}. For instance, if we know that {A,B,C,D} lie on {S_R} and {|AB|_R, |AC|_R, |BC|_R, |BD|_R, |CD|_R} are all equal to {\pi R/2}, then the above proposition gives

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & 0 & 0 & \cos \frac{|AD|_R}{R} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \cos \frac{|AD|_R}{R} & 0 & 0 & 1 \end{pmatrix} = 0.

The left-hand side evaluates to {1 - \cos^2 \frac{|AD|_R}{R}}; as {|AD|_R} lies between {0} and {\pi R}, the only choices for this distance are then {0} and {\pi R}. The former happens for instance when {A} lies on the north pole {(R,0,0)}, {B = (0,R,0), C = (0,R,0)} are points on the equator with longitudes differing by 90 degrees, and {D=(R,0,0)} is also equal to the north pole; the latter occurs when {D=(-R,0,0)} is instead placed on the south pole.

The Cayley-Menger and spherical Cayley-Menger determinants look slightly different from each other, but one can transform the latter into something resembling the former by row and column operations. Indeed, the determinant (2) can be rewritten as

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 - \cos \frac{|AB|_R}{R} & 1 - \cos \frac{|AC|_R}{R} & 1 - \cos \frac{|AD|_R}{R} \\ 1 & 1-\cos \frac{|AB|_R}{R} & 0 & 1-\cos \frac{|BC|_R}{R} & 1- \cos \frac{|BD|_R}{R} \\ 1 & 1-\cos \frac{|AC|_R}{R} & 1-\cos \frac{|BC|_R}{R} & 0 & 1-\cos \frac{|CD|_R}{R} \\ 1 & 1-\cos \frac{|AD|_R}{R} & 1-\cos \frac{|BD|_R}{R} & 1- \cos \frac{|CD|_R}{R} & 0 \end{pmatrix}

and by further row and column operations, this determinant vanishes if and only if the determinant

\displaystyle  \mathrm{det} \begin{pmatrix} \frac{1}{2R^2} & 1 & 1 & 1 & 1 \\ 1 & 0 & f_R(|AB|_R) & f_R(|AC|_R) & f_R(|AD|_R) \\ 1 & f_R(|AB|_R) & 0 & f_R(|BC|_R) & f_R(|BD|_R) \\ 1 & f_R(|AC|_R) & f_R(|BC|_R) & 0 & f_R(|CD|_R) \\ 1 & f_R(|AD|_R) & f_R(|BD|_R) & f_R(|CD|_R) & 0 \end{pmatrix} \ \ \ \ \ (3)

vanishes, where {f_R(x) := 2R^2 (1-\cos \frac{x}{R})}. In the limit {R \rightarrow \infty} (so that the curvature of the sphere {S_R} tends to zero), {|AB|_R} tends to {|AB|}, and by Taylor expansion {f_R(|AB|_R)} tends to {|AB|^2}; similarly for the other distances. Now we see that the planar Cayley-Menger determinant emerges as the limit of (3) as {R \rightarrow \infty}, as would be expected from the intuition that a plane is essentially a sphere of infinite radius.

In principle, one can now estimate the radius {R} of the Earth (assuming that it is either a sphere {S_R} or a flat plane {S_\infty}) if one is given the six distances {|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |CD|_R} between four points {A,B,C,D} on the Earth. Of course, if one wishes to do so, one should have {A,B,C,D} rather far apart from each other, since otherwise it would be difficult to for instance distinguish the round Earth from a flat one. As an experiment, and just for fun, I wanted to see how accurate this would be with some real world data. I decided to take {A}, {B}, {C}, {D} be the cities of London, Los Angeles, Tokyo, and Dubai respectively. As an initial test, I used distances from this online flight calculator, measured in kilometers:

\displaystyle  |AB|_R = 8790

\displaystyle  |AC|_R = 9597 \mathrm{km}

\displaystyle  |AD|_R = 5488\mathrm{km}

\displaystyle  |BC|_R = 8849\mathrm{km}

\displaystyle  |BD|_R = 13435\mathrm{km}

\displaystyle  |CD|_R = 7957\mathrm{km}.

Given that the true radius of the earth was about {R_0 := 6371 \mathrm{km}} kilometers, I chose the change of variables {R = R_0/k} (so that {k=1} corresponds to the round Earth model with the commonly accepted value for the Earth’s radius, and {k=0} corresponds to the flat Earth), and obtained the following plot for (3):

In particular, the determinant does indeed come very close to vanishing when {k=1}, which is unsurprising since, as explained on the web site, the online flight calculator uses a model in which the Earth is an ellipsoid of radii close to {6371} km. There is another radius that would also be compatible with this data at {k\approx 1.3} (corresponding to an Earth of radius about {4900} km), but presumably one could rule out this as a spurious coincidence by experimenting with other quadruples of cities than the ones I selected. On the other hand, these distances are highly incompatible with the flat Earth model {k=0}; one could also see this with a piece of paper and a ruler by trying to lay down four points {A,B,C,D} on the paper with (an appropriately rescaled) version of the above distances (e.g., with {|AB| = 8.790 \mathrm{cm}}, {|AC| = 9.597 \mathrm{cm}}, etc.).

If instead one goes to the flight time calculator and uses flight travel times instead of distances, one now gets the following data (measured in hours):

\displaystyle  |AB|_R = 10\mathrm{h}\ 3\mathrm{m}

\displaystyle  |AC|_R = 11\mathrm{h}\ 49\mathrm{m}

\displaystyle  |AD|_R = 6\mathrm{h}\ 56\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 56\mathrm{m}

\displaystyle  |BD|_R = 16\mathrm{h}\ 23\mathrm{m}

\displaystyle  |CD|_R = 9\mathrm{h}\ 52\mathrm{m}.

Assuming that planes travel at about {800} kilometers per hour, the true radius of the Earth should be about {R_1 := 8\mathrm{h}} of flight time. If one then uses the normalisation {R = R_1/k}, one obtains the following plot:

Not too surprisingly, this is basically a rescaled version of the previous plot, with vanishing near {k=1} and at {k=1.3}. (The website for the flight calculator does say it calculates short and long haul flight times slightly differently, which may be the cause of the slight discrepancies between this figure and the previous one.)

Of course, these two data sets are “cheating” since they come from a model which already presupposes what the radius of the Earth is. But one can input real world flight times between these four cities instead of the above idealised data. Here one runs into the issue that the flight time from {A} to {B} is not necessarily the same as that from {B} to {A} due to such factors as windspeed. For instance, I looked up the online flight time from Tokyo to Dubai to be 11 hours and 10 minutes, whereas the online flight time from Dubai to Tokyo was 9 hours and 50 minutes. The simplest thing to do here is take an arithmetic mean of the two times as a preliminary estimate for the flight time without windspeed factors, thus for instance the Tokyo-Dubai flight time would now be 10 hours and 30 minutes, and more generally

\displaystyle  |AB|_R = 10\mathrm{h}\ 47\mathrm{m}

\displaystyle  |AC|_R = 12\mathrm{h}\ 0\mathrm{m}

\displaystyle  |AD|_R = 7\mathrm{h}\ 17\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 50\mathrm{m}

\displaystyle  |BD|_R = 15\mathrm{h}\ 55\mathrm{m}

\displaystyle  |CD|_R = 10\mathrm{h}\ 30\mathrm{m}.

This data is not too far off from the online calculator data, but it does distort the graph slightly (taking {R=8/k} as before):

Now one gets estimates for the radius of the Earth that are off by about a factor of {2} from the truth, although the {k=1} round Earth model still is twice as accurate as the flat Earth model {k=0}.

Given that windspeed should additively affect flight velocity rather than flight time, and the two are inversely proportional to each other, it is more natural to take a harmonic mean rather than an arithmetic mean. This gives the slightly different values

\displaystyle  |AB|_R = 10\mathrm{h}\ 51\mathrm{m}

\displaystyle  |AC|_R = 11\mathrm{h}\ 59\mathrm{m}

\displaystyle  |AD|_R = 7\mathrm{h}\ 16\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 46\mathrm{m}

\displaystyle  |BD|_R = 15\mathrm{h}\ 54\mathrm{m}

\displaystyle  |CD|_R = 10\mathrm{h}\ 27\mathrm{m}

but one still gets essentially the same plot:

So the inaccuracies are presumably coming from some other source. (Note for instance that the true flight time from Tokyo to Dubai is about {6\%} greater than the calculator predicts, while the flight time from LA to Dubai is about {3\%} less; these sorts of errors seem to pile up in this calculation.) Nevertheless, it does seem that flight time data is (barely) enough to establish the roundness of the Earth and obtain a somewhat ballpark estimate for its radius. (I assume that the fit would be better if one could include some Southern Hemisphere cities such as Sydney or Santiago, but I was not able to find a good quadruple of widely spaced cities on both hemispheres for which there were direct flights between all six pairs.)

This is another sequel to a recent post in which I showed the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. It turns out that in the function field setting there is an exact version of this approximation which captures many of the known features of the Riemann zeta function, namely Dirichlet {L}-functions for a random character of given modulus over a function field. This model was (essentially) studied in a fairly recent paper by Andrade, Miller, Pratt, and Trinh; I am not sure if there is any further literature on this model beyond this paper (though the number field analogue of low-lying zeroes of Dirichlet {L}-functions is certainly well studied). In this model it is possible to set {N} fixed and let {T} go to infinity, thus providing a simple finite-dimensional model problem for problems involving the statistics of zeroes of the zeta function.

In this post I would like to record this analogue precisely. We will need a finite field {{\mathbb F}} of some order {q} and a natural number {N}, and set

\displaystyle  T := q^{N+1}.

We will primarily think of {q} as being large and {N} as being either fixed or growing very slowly with {q}, though it is possible to also consider other asymptotic regimes (such as holding {q} fixed and letting {N} go to infinity). Let {{\mathbb F}[X]} be the ring of polynomials of one variable {X} with coefficients in {{\mathbb F}}, and let {{\mathbb F}[X]'} be the multiplicative semigroup of monic polynomials in {{\mathbb F}[X]}; one should view {{\mathbb F}[X]} and {{\mathbb F}[X]'} as the function field analogue of the integers and natural numbers respectively. We use the valuation {|n| := q^{\mathrm{deg}(n)}} for polynomials {n \in {\mathbb F}[X]} (with {|0|=0}); this is the analogue of the usual absolute value on the integers. We select an irreducible polynomial {Q \in {\mathbb F}[X]} of size {|Q|=T} (i.e., {Q} has degree {N+1}). The multiplicative group {({\mathbb F}[X]/Q{\mathbb F}[X])^\times} can be shown to be cyclic of order {|Q|-1=T-1}. A Dirichlet character of modulus {Q} is a completely multiplicative function {\chi: {\mathbb F}[X] \rightarrow {\bf C}} of modulus {Q}, that is periodic of period {Q} and vanishes on those {n \in {\mathbb F}[X]} not coprime to {Q}. From Fourier analysis we see that there are exactly {\phi(Q) := |Q|-1} Dirichlet characters of modulus {Q}. A Dirichlet character is said to be odd if it is not identically one on the group {{\mathbb F}^\times} of non-zero constants; there are only {\frac{1}{q-1} \phi(Q)} non-odd characters (including the principal character), so in the limit {q \rightarrow \infty} most Dirichlet characters are odd. We will work primarily with odd characters in order to be able to ignore the effect of the place at infinity.

Let {\chi} be an odd Dirichlet character of modulus {Q}. The Dirichlet {L}-function {L(s, \chi)} is then defined (for {s \in {\bf C}} of sufficiently large real part, at least) as

\displaystyle  L(s,\chi) := \sum_{n \in {\mathbb F}[X]'} \frac{\chi(n)}{|n|^s}

\displaystyle  = \sum_{m=0}^\infty q^{-sm} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that for {m \geq N+1}, the set {n \in {\mathbb F}[X]': |n| = q^m} is invariant under shifts {h} whenever {|h| < T}; since this covers a full set of residue classes of {{\mathbb F}[X]/Q{\mathbb F}[X]}, and the odd character {\chi} has mean zero on this set of residue classes, we conclude that the sum {\sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n)} vanishes for {m \geq N+1}. In particular, the {L}-function is entire, and for any real number {t} and complex number {z}, we can write the {L}-function as a polynomial

\displaystyle  L(\frac{1}{2} + it - \frac{2\pi i z}{\log T},\chi) = P(Z) = P_{t,\chi}(Z) := \sum_{m=0}^N c^1_m(t,\chi) Z^j

where {Z := e(z/N) = e^{2\pi i z/N}} and the coefficients {c^1_m = c^1_m(t,\chi)} are given by the formula

\displaystyle  c^1_m(t,\chi) := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that {t} can easily be normalised to zero by the relation

\displaystyle  P_{t,\chi}(Z) = P_{0,\chi}( q^{-it} Z ). \ \ \ \ \ (2)

In particular, the dependence on {t} is periodic with period {\frac{2\pi}{\log q}} (so by abuse of notation one could also take {t} to be an element of {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}).

Fourier inversion yields a functional equation for the polynomial {P}:

Proposition 1 (Functional equation) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. There exists a phase {e(\theta)} (depending on {t,\chi}) such that

\displaystyle  a_{N-m}^1 = e(\theta) \overline{c^1_m}

for all {0 \leq m \leq N}, or equivalently that

\displaystyle  P(1/Z) = e^{i\theta} Z^{-N} \overline{P}(Z)

where {\overline{P}(Z) := \overline{P(\overline{Z})}}.

Proof: We can normalise {t=0}. Let {G} be the finite field {{\mathbb F}[X] / Q {\mathbb F}[X]}. We can write

\displaystyle  a_{N-m} = q^{-(N-m)/2} \sum_{n \in q^{N-m} + H_{N-m}} \chi(n)

where {H_j} denotes the subgroup of {G} consisting of (residue classes of) polynomials of degree less than {j}. Let {e_G: G \rightarrow S^1} be a non-trivial character of {G} whose kernel lies in the space {H_N} (this is easily achieved by pulling back a non-trivial character from the quotient {G/H_N \equiv {\mathbb F}}). We can use the Fourier inversion formula to write

\displaystyle  a_{N-m} = q^{(m-N)/2} \sum_{\xi \in G} \hat \chi(\xi) \sum_{n \in T^{N-m} + H_{N-m}} e_G( n\xi )

where

\displaystyle  \hat \chi(\xi) := q^{-N-1} \sum_{n \in G} \chi(n) e_G(-n\xi).

From change of variables we see that {\hat \chi} is a scalar multiple of {\overline{\chi}}; from Plancherel we conclude that

\displaystyle  \hat \chi = e(\theta_0) q^{-(N+1)/2} \overline{\chi} \ \ \ \ \ (3)

for some phase {e(\theta_0)}. We conclude that

\displaystyle  a_{N-m} = e(\theta_0) q^{-(2N-m+1)/2} \sum_{\xi \in G} \overline{\chi}(\xi) e_G( T^{N-j} \xi) \sum_{n \in H_{N-j}} e_G( n\xi ). \ \ \ \ \ (4)

The inner sum {\sum_{n \in H_{N-m}} e_G( n\xi )} equals {q^{N-m}} if {\xi \in H_{j+1}}, and vanishes otherwise, thus

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{\xi \in H_{j+1}} \overline{\chi}(\xi) e_G( T^{N-m} \xi).

For {\xi} in {H_j}, {e_G(T^{N-m} \xi)=1} and the contribution of the sum vanishes as {\chi} is odd. Thus we may restrict {\xi} to {H_{m+1} \backslash H_m}, so that

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{h \in {\mathbb F}^\times} e_G( T^{N} h) \sum_{\xi \in h T^m + H_{m}} \overline{\chi}(\xi).

By the multiplicativity of {\chi}, this factorises as

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} (\sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h)) (\sum_{\xi \in T^m + H_{m}} \overline{\chi}(\xi)).

From the one-dimensional version of (3) (and the fact that {\chi} is odd) we have

\displaystyle  \sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h) = e(\theta_1) q^{1/2}

for some phase {e(\theta_1)}. The claim follows. \Box

As one corollary of the functional equation, {a_N} is a phase rotation of {\overline{a_1} = 1} and thus is non-zero, so {P} has degree exactly {N}. The functional equation is then equivalent to the {N} zeroes of {P} being symmetric across the unit circle. In fact we have the stronger

Theorem 2 (Riemann hypothesis for Dirichlet {L}-functions over function fields) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. Then all the zeroes of {P} lie on the unit circle.

We derive this result from the Riemann hypothesis for curves over function fields below the fold.

In view of this theorem (and the fact that {a_1=1}), we may write

\displaystyle  P(Z) = \mathrm{det}(1 - ZU)

for some unitary {N \times N} matrix {U = U_{t,\chi}}. It is possible to interpret {U} as the action of the geometric Frobenius map on a certain cohomology group, but we will not do so here. The situation here is simpler than in the number field case because the factor {\exp(A)} arising from very small primes is now absent (in the function field setting there are no primes of size between {1} and {q}).

We now let {\chi} vary uniformly at random over all odd characters of modulus {Q}, and {t} uniformly over {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}, independently of {\chi}; we also make the distribution of the random variable {U} conjugation invariant in {U(N)}. We use {{\mathbf E}_Q} to denote the expectation with respect to this randomness. One can then ask what the limiting distribution of {U} is in various regimes; we will focus in this post on the regime where {N} is fixed and {q} is being sent to infinity. In the spirit of the Sato-Tate conjecture, one should expect {U} to converge in distribution to the circular unitary ensemble (CUE), that is to say Haar probability measure on {U(N)}. This may well be provable from Deligne’s “Weil II” machinery (in the spirit of this monograph of Katz and Sarnak), though I do not know how feasible this is or whether it has already been done in the literature; here we shall avoid using this machinery and study what partial results towards this CUE hypothesis one can make without it.

If one lets {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U} (ordered arbitrarily), then we now have

\displaystyle  \sum_{m=0}^N c^1_m Z^m = P(Z) = \prod_{j=1}^N (1 - \lambda_j Z)

and hence the {c^1_m} are essentially elementary symmetric polynomials of the eigenvalues:

\displaystyle  c^1_m = (-1)^j e_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (5)

One can take log derivatives to conclude

\displaystyle  \frac{P'(Z)}{P(Z)} = \sum_{j=1}^N \frac{\lambda_j}{1-\lambda_j Z}.

On the other hand, as in the number field case one has the Dirichlet series expansion

\displaystyle  Z \frac{P'(Z)}{P(Z)} = \sum_{n \in {\mathbb F}[X]'} \frac{\Lambda_q(n) \chi(n)}{|n|^s}

where {s = \frac{1}{2} + it - \frac{2\pi i z}{\log T}} has sufficiently large real part, {Z = e(z/N)}, and the von Mangoldt function {\Lambda_q(n)} is defined as {\log_q |p| = \mathrm{deg} p} when {n} is the power of an irreducible {p} and {0} otherwise. We conclude the “explicit formula”

\displaystyle  c^{\Lambda_q}_m = \sum_{j=1}^N \lambda_j^m = \mathrm{tr}(U^m) \ \ \ \ \ (6)

for {m \geq 1}, where

\displaystyle  c^{\Lambda_q}_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \Lambda_q(n) \chi(n).

Similarly on inverting {P(Z)} we have

\displaystyle  P(Z)^{-1} = \prod_{j=1}^N (1 - \lambda_j Z)^{-1}.

Since we also have

\displaystyle  P(Z)^{-1} = \sum_{n \in {\mathbb F}[X]'} \frac{\mu(n) \chi(n)}{|n|^s}

for {s} sufficiently large real part, where the Möbius function {\mu(n)} is equal to {(-1)^k} when {n} is the product of {k} distinct irreducibles, and {0} otherwise, we conclude that the Möbius coefficients

\displaystyle  c^\mu_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \mu(n) \chi(n)

are just the complete homogeneous symmetric polynomials of the eigenvalues:

\displaystyle  c^\mu_m = h_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (7)

One can then derive various algebraic relationships between the coefficients {c^1_m, c^{\Lambda_q}_m, c^\mu_m} from various identities involving symmetric polynomials, but we will not do so here.

What do we know about the distribution of {U}? By construction, it is conjugation-invariant; from (2) it is also invariant with respect to the rotations {U \rightarrow e^{i\theta} U} for any phase {\theta \in{\bf R}}. We also have the function field analogue of the Rudnick-Sarnak asymptotics:

Proposition 3 (Rudnick-Sarnak asymptotics) Let {a_1,\dots,a_k,b_1,\dots,b_k} be nonnegative integers. If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment

\displaystyle  {\bf E}_{Q} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (9)

is equal to {o(1)} in the limit {q \rightarrow \infty} (holding {N,a_1,\dots,a_k,b_1,\dots,b_k} fixed) unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j! + o(1). \ \ \ \ \ (10)

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE hypothesis (and also with the ACUE hypothesis, again by the previous post). The case {\sum_{j=1}^k a_j + \sum_{j=1}^k b_j \leq 2} of this proposition was essentially established by Andrade, Miller, Pratt, and Trinh.

Proof: We may assume the homogeneity relationship

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j \ \ \ \ \ (11)

since otherwise the claim follows from the invariance under phase rotation {U \mapsto e^{i\theta} U}. By (6), the expression (9) is equal to

\displaystyle  q^{-D} {\bf E}_Q \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'} \in {\mathbb F}[X]': |n_i| = q^{s_i}, |n'_i| = q^{s'_i}} (\prod_{i=1}^l \Lambda_q(n_i) \chi(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i) \overline{\chi(n'_i)}

where

\displaystyle  D := \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j

\displaystyle  l := \sum_{j=1}^k a_j

\displaystyle  l' := \sum_{j=1}^k b_j

and {s_1 \leq \dots \leq s_l} consists of {a_j} copies of {j} for each {j=1,\dots,k}, and similarly {s'_1 \leq \dots \leq s'_{l'}} consists of {b_j} copies of {j} for each {j=1,\dots,k}.

The polynomials {n_1 \dots n_l} and {n'_1 \dots n'_{l'}} are monic of degree {D}, which by hypothesis is less than the degree of {Q}, and thus they can only be scalar multiples of each other in {{\mathbb F}[X] / Q {\mathbb F}[X]} if they are identical (in {{\mathbb F}[X]}). As such, we see that the average

\displaystyle  {\bf E}_Q \chi(n_1) \dots \chi(n_l) \overline{\chi(n'_1)} \dots \overline{\chi(n'_{l'})}

vanishes unless {n_1 \dots n_l = n'_1 \dots n'_{l'}}, in which case this average is equal to {1}. Thus the expression (9) simplifies to

\displaystyle  q^{-D} \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}: |n_i| = q^{s_i}, |n'_i| = q^{s'_i}; n_1 \dots n_l = n'_1 \dots n'_l} (\prod_{i=1}^l \Lambda_q(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i).

There are at most {q^D} choices for the product {n_1 \dots n_l}, and each one contributes {O_D(1)} to the above sum. All but {o(q^D)} of these choices are square-free, so by accepting an error of {o(1)}, we may restrict attention to square-free {n_1 \dots n_l}. This forces {n_1,\dots,n_l,n'_1,\dots,n'_{l'}} to all be irreducible (as opposed to powers of irreducibles); as {{\mathbb F}[X]} is a unique factorisation domain, this forces {l=l'} and {n_1,\dots,n_l} to be a permutation of {n'_1,\dots,n'_{l'}}. By the size restrictions, this then forces {a_j = b_j} for all {j} (if the above expression is to be anything other than {o(1)}), and each {n_1,\dots,n_l} is associated to {\prod_{j=1}^k a_j!} possible choices of {n'_1,\dots,n'_{l'}}. Writing {\Lambda_q(n'_i) = s'_i} and then reinstating the non-squarefree possibilities for {n_1 \dots n_l}, we can thus write the above expression as

\displaystyle  q^{-D} \prod_{j=1}^k j a_j! \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}\in {\mathbb F}[X]': |n_i| = q^{s_i}} \prod_{i=1}^l \Lambda_q(n_i) + o(1).

Using the prime number theorem {\sum_{n \in {\mathbb F}[X]': |n| = q^s} \Lambda_q(n) = q^s}, we obtain the claim. \Box

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE and ACUE hypotheses:

Corollary 4 (CUE statistics at low frequencies) Let {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U}, permuted uniformly at random. Let {R(\lambda)} be a linear combination of monomials {\lambda_1^{a_1} \dots \lambda_N^{a_N}} where {a_1,\dots,a_N} are integers with either {\sum_{j=1}^N a_j \neq 0} or {\sum_{j=1}^N |a_j| \leq 2N}. Then

\displaystyle  {\bf E}_Q R(\lambda) = {\bf E}_{CUE} R(\lambda) + o(1).

The analogue of the GUE hypothesis in this setting would be the CUE hypothesis, which asserts that the threshold {2N} here can be replaced by an arbitrarily large quantity. As far as I know this is not known even for {2N+2} (though, as mentioned previously, in principle one may be able to resolve such cases using Deligne’s proof of the Riemann hypothesis for function fields). Among other things, this would allow one to distinguish CUE from ACUE, since as discussed in the previous post, these two distributions agree when tested against monomials up to threshold {2N}, though not to {2N+2}.

Proof: By permutation symmetry we can take {R} to be symmetric, and by linearity we may then take {R} to be the symmetrisation of a single monomial {\lambda_1^{a_1} \dots \lambda_N^{a_N}}. If {\sum_{j=1}^N a_j \neq 0} then both expectations vanish due to the phase rotation symmetry, so we may assume that {\sum_{j=1}^N a_j \neq 0} and {\sum_{j=1}^N |a_j| \leq 2N}. We can write this symmetric polynomial as a constant multiple of {\mathrm{tr}(U^{a_1}) \dots \mathrm{tr}(U^{a_N})} plus other monomials with a smaller value of {\sum_{j=1}^N |a_j|}. Since {\mathrm{tr}(U^{-a}) = \overline{\mathrm{tr}(U^a)}}, the claim now follows by induction from Proposition 3 and Proposition 1 from the previous post. \Box

Thus, for instance, for {k=1,2}, the {2k^{th}} moment

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = {\bf E}_Q |P(1)|^{2k} = {\bf E}_Q |L(\frac{1}{2} + it, \chi)|^{2k}

is equal to

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} + o(1)

because all the monomials in {\prod_{j=1}^N (1-\lambda_j)^k (1-\lambda_j^{-1})^k} are of the required form when {k \leq 2}. The latter expectation can be computed exactly (for any natural number {k}) using a formula

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} = \prod_{j=1}^N \frac{\Gamma(j) \Gamma(j+2k)}{\Gamma(j+k)^2}

of Baker-Forrester and Keating-Snaith, thus for instance

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^2 = N+1

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^4 = \frac{(N+1)(N+2)^2(N+3)}{12}

and more generally

\displaystyle  {\bf E}_{CUE}|\det(1-U)|^{2k} = \frac{g_k+o(1)}{(k^2)!} N^{k^2}

when {N \rightarrow \infty}, where {g_k} are the integers

\displaystyle  g_1 = 1, g_2 = 2, g_3 = 42, g_4 = 24024, \dots

and more generally

\displaystyle  g_k := \frac{(k^2)!}{\prod_{i=1}^{2k-1} i^{k-|k-i|}}

(OEIS A039622). Thus we have

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = \frac{g_k+o(1)}{k^2!} N^{k^2}

for {k=1,2} if {Q \rightarrow \infty} and {N} is sufficiently slowly growing depending on {Q}. The CUE hypothesis would imply that that this formula also holds for higher {k}. (The situation here is cleaner than in the number field case, in which the GUE hypothesis only suggests the correct lower bound for the moments rather than an asymptotic, due to the absence of the wildly fluctuating additional factor {\exp(A)} that is present in the Riemann zeta function model.)

Now we can recover the analogue of Montgomery’s work on the pair correlation conjecture. Consider the statistic

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j )

where

\displaystyle R(z) = \sum_m \hat R(m) z^m

is some finite linear combination of monomials {z^m} independent of {q}. We can expand the above sum as

\displaystyle  \sum_m \hat R(m) {\bf E}_Q \mathrm{tr}(U^m) \mathrm{tr}(U^{-m}).

Assuming the CUE hypothesis, then by Example 3 of the previous post, we would conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 \hat R(0) + \sum_m \min(|m|,N) \hat R(m) + o(1). \ \ \ \ \ (12)

This is the analogue of Montgomery’s pair correlation conjecture. Proposition 3 implies that this claim is true whenever {\hat R} is supported on {[-N,N]}. If instead we assume the ACUE hypothesis (or the weaker Alternative Hypothesis that the phase gaps are non-zero multiples of {1/2N}), one should instead have

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = \sum_{k \in {\bf Z}} N^2 \hat R(2Nk) + \sum_{1 \leq |m| \leq N} |m| \hat R(m+2Nk) + o(1)

for arbitrary {R}; this is the function field analogue of a recent result of Baluyot. In any event, since {\mathrm{tr}(U^m) \mathrm{tr}(U^{-m})} is non-negative, we unconditionally have the lower bound

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1). \ \ \ \ \ (13)

if {\hat R(m)} is non-negative for {|m| > N}.

By applying (12) for various choices of test functions {R} we can obtain various bounds on the behaviour of eigenvalues. For instance suppose we take the Fejér kernel

\displaystyle  R(z) = |1 + z + \dots + z^N|^2 = \sum_{m=-N}^N (N+1-|m|) z^m.

Then (12) applies unconditionally and we conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 (N+1) + \sum_{1 \leq |m| \leq N} (N+1-|m|) |m| + o(1).

The right-hand side evaluates to {\frac{2}{3} N(N+1)(2N+1)+o(1)}. On the other hand, {R(\lambda_i/\lambda_j)} is non-negative, and equal to {(N+1)^2} when {\lambda_i = \lambda_j}. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} 1_{\lambda_i = \lambda_j} \leq \frac{2}{3} \frac{N(2N+1)}{N+1} + o(1).

The sum {\sum_{1 \leq j \leq N} 1_{\lambda_i = \lambda_j}} is at least {1}, and is at least {2} if {\lambda_i} is not a simple eigenvalue. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i, \leq N} 1_{\lambda_i \hbox{ not simple}} \leq \frac{1}{3} \frac{N(N-1)}{N+1} + o(1),

and thus the expected number of simple eigenvalues is at least {\frac{2N}{3} \frac{N+4}{N+1} + o(1)}; in particular, at least two thirds of the eigenvalues are simple asymptotically on average. If we had (12) without any restriction on the support of {\hat R}, the same arguments allow one to show that the expected proportion of simple eigenvalues is {1-o(1)}.

Suppose that the phase gaps in {U} are all greater than {c/N} almost surely. Let {\hat R} is non-negative and {R(e^{i\theta})} non-positive for {\theta} outside of the arc {[-c/N,c/N]}. Then from (13) one has

\displaystyle  R(0) N \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1),

so by taking contrapositives one can force the existence of a gap less than {c/N} asymptotically if one can find {R} with {\hat R} non-negative, {R} non-positive for {\theta} outside of the arc {[-c/N,c/N]}, and for which one has the inequality

\displaystyle  R(0) N < N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m).

By a suitable choice of {R} (based on a minorant of Selberg) one can ensure this for {c \approx 0.6072} for {N} large; see Section 5 of these notes of Goldston. This is not the smallest value of {c} currently obtainable in the literature for the number field case (which is currently {0.50412}, due to Goldston and Turnage-Butterbaugh, by a somewhat different method), but is still significantly less than the trivial value of {1}. On the other hand, due to the compatibility of the ACUE distribution with Proposition 3, it is not possible to lower {c} below {0.5} purely through the use of Proposition 3.

In some cases it is possible to go beyond Proposition 3. Consider the mollified moment

\displaystyle  {\bf E}_Q |M(U) P(1)|^2

where

\displaystyle  M(U) = \sum_{m=0}^d a_m h_m(\lambda_1,\dots,\lambda_N)

for some coefficients {a_0,\dots,a_d}. We can compute this moment in the CUE case:

Proposition 5 We have

\displaystyle  {\bf E}_{CUE} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2.

Proof: From (5) one has

\displaystyle  P(1) = \sum_{i=0}^N (-1)^i e_i(\lambda_1,\dots,\lambda_N)

hence

\displaystyle  M(U) P(1) = \sum_{i=0}^N \sum_{m=0}^d (-1)^i a_m e_i h_m

where we suppress the dependence on the eigenvalues {\lambda}. Now observe the Pieri formula

\displaystyle  e_i h_m = s_{m 1^i} + s_{(m+1) 1^{i-1}}

where {s_{m 1^i}} are the hook Schur polynomials

\displaystyle  s_{m 1^i} = \sum_{a_1 \leq \dots \leq a_m; a_1 < b_1 < \dots < b_i} \lambda_{a_1} \dots \lambda_{a_m} \lambda_{b_1} \dots \lambda_{b_i}

and we adopt the convention that {s_{m 1^i}} vanishes for {i = -1}, or when {m = 0} and {i > 0}. Then {s_{m1^i}} also vanishes for {i\geq N}. We conclude that

\displaystyle  M(U) P(1) = a_0 s_{0 1^0} + \sum_{0 \leq i \leq N-1} \sum_{m \geq 1} (-1)^i (a_m - a_{m-1}) s_{m 1^i}.

As the Schur polynomials are orthonormal on the unitary group, the claim follows. \Box

The CUE hypothesis would then imply the corresponding mollified moment conjecture

\displaystyle  {\bf E}_{Q} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2 + o(1). \ \ \ \ \ (14)

(See this paper of Conrey, and this paper of Radziwill, for some discussion of the analogous conjecture for the zeta function, which is essentially due to Farmer.)

From Proposition 3 one sees that this conjecture holds in the range {d \leq \frac{1}{2} N}. It is likely that the function field analogue of the calculations of Conrey (based ultimately on deep exponential sum estimates of Deshouillers and Iwaniec) can extend this range to {d < \theta N} for any {\theta < \frac{4}{7}}, if {N} is sufficiently large depending on {\theta}; these bounds thus go beyond what is available from Proposition 3. On the other hand, as discussed in Remark 7 of the previous post, ACUE would also predict (14) for {d} as large as {N-2}, so the available mollified moment estimates are not strong enough to rule out ACUE. It would be interesting to see if there is some other estimate in the function field setting that can be used to exclude the ACUE hypothesis (possibly one that exploits the fact that GRH is available in the function field case?).

Read the rest of this entry »

In a recent post I discussed how the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of {P_t} should all lie on the unit circle, and one should then be able to write {P_t} as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix {U = U_t \in U(N)}, which we normalise as

\displaystyle  P_t(Z) = \exp(A_t) \mathrm{det}(1 - ZU). \ \ \ \ \ (2)

Here {A_t} is some quantity depending on {t}. We view {U} as a random element of {U(N)}; in the limit {T \rightarrow \infty}, the GUE hypothesis is equivalent to {U} becoming equidistributed with respect to Haar measure on {U(N)} (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view {U} as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

\displaystyle  -\frac{P'_t(Z)}{P_t(Z)} = \mathrm{tr}( U (1-ZU)^{-1} ) = \sum_{j=1}^\infty Z^{j-1} \mathrm{tr} U^j \ \ \ \ \ (3)

and hence on taking logarithmic derivatives of (1) in the {z} variable we (heuristically) have

\displaystyle  -\frac{2\pi i}{\log T} \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx \frac{2\pi i}{N} \sum_{j=1}^\infty e^{2\pi i jz/N} \mathrm{tr} U^j.

Morally speaking, we have

\displaystyle  - \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^{1/2+it}} e^{2\pi i z (\log n/\log T)}

so on comparing coefficients we expect to interpret the moments {\mathrm{tr} U^j} of {U} as a finite Dirichlet series:

\displaystyle  \mathrm{tr} U^j \approx \frac{N}{\log T} \sum_{T^{(j-1)/N} < n \leq T^{j/N}} \frac{\Lambda(n)}{n^{1/2+it}}. \ \ \ \ \ (4)

To understand the distribution of {U} in the unitary group {U(N)}, it suffices to understand the distribution of the moments

\displaystyle  {\bf E}_t \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (5)

where {{\bf E}_t} denotes averaging over {t \in [T,2T]}, and {k, a_1,\dots,a_k, b_1,\dots,b_k \geq 0}. The GUE hypothesis asserts that in the limit {T \rightarrow \infty}, these moments converge to their CUE counterparts

\displaystyle  {\bf E}_{\mathrm{CUE}} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (6)

where {U} is now drawn uniformly in {U(n)} with respect to the CUE ensemble, and {{\bf E}_{\mathrm{CUE}}} denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j. \ \ \ \ \ (7)

This follows from the fact that for any phase {\theta \in {\bf R}}, {e(\theta) U} has the same distribution as {U}, where we use the number theory notation {e(\theta) := e^{2\pi i\theta}}.

In the case when the degree {\sum_{j=1}^k j a_j} is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model) If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment (6) vanishes unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j!. \ \ \ \ \ (9)

Another way of viewing this proposition is that for {U} distributed according to CUE, the random variables {\mathrm{tr} U^j} are distributed like independent complex random variables of mean zero and variance {j}, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of {a_j}, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of {j}‘s and lets {N} go to infinity. (The paper of Diaconis and Shahshahani writes {\sum_{j=1}^k a_j + b_j} in place of {\sum_{j=1}^k j a_j}, but I believe this to be a typo.)

Proof: Let {D} be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

\displaystyle  D = \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j.

Our starting point is Schur-Weyl duality. Namely, we consider the {n^D}-dimensional complex vector space

\displaystyle  ({\bf C}^n)^{\otimes D} = {\bf C}^n \otimes \dots \otimes {\bf C}^n.

This space has an action of the product group {S_D \times GL_n({\bf C})}: the symmetric group {S_D} acts by permutation on the {D} tensor factors, while the general linear group {GL_n({\bf C})} acts diagonally on the {{\bf C}^n} factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

\displaystyle  ({\bf C}^n)^{\otimes D} \equiv \bigoplus_\lambda V^\lambda_{S_D} \otimes V^\lambda_{GL_n({\bf C})} \ \ \ \ \ (10)

where {\lambda} ranges over Young tableaux of size {D} with at most {n} rows, {V^\lambda_{S_D}} is the {S_D}-irreducible unitary representation corresponding to {\lambda} (which can be constructed for instance using Specht modules), and {V^\lambda_{GL_n({\bf C})}} is the {GL_n({\bf C})}-irreducible polynomial representation corresponding with highest weight {\lambda}.

Let {\pi \in S_D} be a permutation consisting of {a_j} cycles of length {j} (this is uniquely determined up to conjugation), and let {g \in GL_n({\bf C})}. The pair {(\pi,g)} then acts on {({\bf C}^n)^{\otimes D}}, with the action on basis elements {e_{i_1} \otimes \dots \otimes e_{i_D}} given by

\displaystyle  g e_{\pi(i_1)} \otimes \dots \otimes g_{\pi(i_D)}.

The trace of this action can then be computed as

\displaystyle  \sum_{i_1,\dots,i_D \in \{1,\dots,n\}} g_{\pi(i_1),i_1} \dots g_{\pi(i_D),i_D}

where {g_{i,j}} is the {ij} matrix coefficient of {g}. Breaking up into cycles and summing, this is just

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j}.

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(g) \ \ \ \ \ (11)

where {\chi_\lambda: S_D \rightarrow {\bf C}} is the character on {S_D} associated to {V^\lambda_{S_D}}, and {s_\lambda: GL_n({\bf C}) \rightarrow {\bf C}} is the character on {GL_n({\bf C})} associated to {V^\lambda_{GL_n({\bf C})}}. As is well known, {s_\lambda(g)} is just the Schur polynomial of weight {\lambda} applied to the (algebraic, generalised) eigenvalues of {g}. We can specialise to unitary matrices to conclude that

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(U)

and similarly

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{b_j} = \sum_\lambda \chi_\lambda(\pi') s_\lambda(U)

where {\pi' \in S_D} consists of {b_j} cycles of length {j} for each {j=1,\dots,k}. On the other hand, the characters {s_\lambda} are an orthonormal system on {L^2(U(N))} with the CUE measure. Thus we can write the expectation (6) as

\displaystyle  \sum_\lambda \chi_\lambda(\pi) \overline{\chi_\lambda(\pi')}. \ \ \ \ \ (12)

Now recall that {\lambda} ranges over all the Young tableaux of size {D} with at most {N} rows. But by (8) we have {D \leq N}, and so the condition of having {N} rows is redundant. Hence {\lambda} now ranges over all Young tableaux of size {D}, which as is well known enumerates all the irreducible representations of {S_D}. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if {\pi}, {\pi'} are not conjugate, and is equal to {D!} divided by the size of the conjugacy class of {\pi} (or equivalently, by the size of the centraliser of {\pi}) otherwise. But the latter expression is easily computed to be {\prod_{j=1}^k j^{a_j} a_j!}, giving the claim. \Box

Example 2 We illustrate the identity (11) when {D=3}, {n \geq 3}. The Schur polynomials are given as

\displaystyle  s_{3}(g) = \sum_i \lambda_i^3 + \sum_{i<j} \lambda_i^2 \lambda_j + \lambda_i \lambda_j^2 + \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{2,1}(g) = \sum_{i < j} \lambda_i^2 \lambda_j + \sum_{i < j,k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{1,1,1}(g) = \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

where {\lambda_1,\dots,\lambda_n} are the (generalised) eigenvalues of {g}, and the formula (11) in this case becomes

\displaystyle  \mathrm{tr}(g^3) = s_{3}(g) - s_{2,1}(g) + s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g^2) \mathrm{tr}(g) = s_{3}(g) - s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g)^3 = s_{3}(g) + 2 s_{2,1}(g) + s_{1,1,1}(g).

The functions {s_{1,1,1}, s_{2,1}, s_3} are orthonormal on {U(n)}, so the three functions {\mathrm{tr}(g^3), \mathrm{tr}(g^2) \mathrm{tr}(g), \mathrm{tr}(g)^3} are also, and their {L^2} norms are {\sqrt{3}}, {\sqrt{2}}, and {\sqrt{6}} respectively, reflecting the size in {S_3} of the centralisers of the permutations {(123)}, {(12)}, and {\mathrm{id}} respectively. If {n} is instead set to say {2}, then the {s_{1,1,1}} terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3 Consider the moment {{\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2}. For {j \leq N}, the above proposition shows us that this moment is equal to {D}. What happens for {j>N}? The formula (12) computes this moment as

\displaystyle  \sum_\lambda |\chi_\lambda(\pi)|^2

where {\pi} is a cycle of length {j} in {S_j}, and {\lambda} ranges over all Young tableaux with size {j} and at most {N} rows. The Murnaghan-Nakayama rule tells us that {\chi_\lambda(\pi)} vanishes unless {\lambda} is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space {{\bf C}^j_{\sum=0}} of vectors in {{\bf C}^j} whose coordinates sum to zero), in which case it is equal to {\pm 1} (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to {N}. Thus in general we have

\displaystyle  {\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2 = \min(j,N). \ \ \ \ \ (13)

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges {T^{(j-1)/N} < n \leq T^{j/N}} and {T^{(k-1)/N} < n \leq T^{k/N}} is not quite the range {T^{(j+k-1)/N} < n \leq T^{j+k/N}} but instead leaks into the adjacent range {T^{(j+k-2)/N} < n \leq T^{j+k-1/N}}. This issue can be addressed by working in a “weak" sense in which parameters such as {j,k} are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of {t} to the range {[T,2T]} (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J,m_1,\dots,m_K} \frac{\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}{n_1^{1/2} \dots n_J^{1/2} m_1^{1/2} \dots m_K^{1/2}} \times \ \ \ \ \ (14)

\displaystyle  \times {\bf E}_t (m_1 \dots m_K / n_1 \dots n_J)^{it}

where {J := \sum_{j=1}^k a_j}, {K := \sum_{j=1}^k b_j}, and the integers {n_i,m_i} are in the ranges

\displaystyle  T^{(j-1)/N} < n_{a_1 + \dots + a_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq a_j}, and

\displaystyle  T^{(j-1)/N} < m_{b_1 + \dots + b_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq b_j}. Morally, the expectation here is negligible unless

\displaystyle  m_1 \dots m_K = (1 + O(1/T)) n_1 \dots n_J \ \ \ \ \ (15)

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then {n_1 \dots n_J} is significantly less than {T}, so the {O(1/T)} multiplicative error in (15) becomes an additive error of {o(1)}. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least {1} – this forces the integers {m_1 \dots m_K}, {n_1 \dots n_J} to in fact be equal:

\displaystyle  m_1 \dots m_K = n_1 \dots n_J. \ \ \ \ \ (16)

The von Mangoldt factors {\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)} effectively restrict {n_1,\dots,n_J,m_1,\dots,m_K} to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces {J=K}, and {n_1,\dots,n_J} to be a permutation of {m_1,\dots,m_K}, which then forces {a_j = b_j} for all {j=1,\dots,k}._ For a given {n_1,\dots,n_J}, the number of possible {m_1 \dots m_K} is then {\prod_{j=1}^k a_j!}, and the expectation in (14) is equal to {1}. Thus this expectation is morally

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J} \frac{\Lambda^2(n_1) \dots \Lambda^2(n_J) }{n_1 \dots n_J} \prod_{j=1}^k a_j!

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree {1} case of pair correlations due to Montgomery, and the degree {2} case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for {{\mathbf E}_t}, thus

\displaystyle  {\bf E}_{t} |\mathrm{tr} U^j|^2 \approx \min(j,N) \ \ \ \ \ (17)

for all {j \geq 0}. Montgomery showed this for (essentially) the range {j \leq N} (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of {U}. One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose {Q_t(Z)} is some random polynomial depending on {t} of degree at most {N}. Let {\lambda_1,\dots,\lambda_n} denote the eigenvalues of {U}, and let {c > 0} be a parameter. Observe from the pigeonhole principle that if the quantity

\displaystyle  \sum_{j=1}^n \int_0^{c/N} |Q_t( e(\theta) \lambda_j )|^2\ d\theta \ \ \ \ \ (18)

exceeds the quantity

\displaystyle  \int_{0}^{2\pi} |Q_t(e(\theta))|^2\ d\theta, \ \ \ \ \ (19)

then the arcs {\{ e(\theta) \lambda_j: 0 \leq \theta \leq c \}} cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than {c/N} ({c} times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than {c} times the mean angle separation. By judiciously choosing the coefficients of {Q_t} as functions of the moments {\mathrm{tr}(U^j)}, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

\displaystyle  \frac{1}{2\pi i} \int_0^{c/N} (\int_{|z| = 1+\varepsilon} - \int_{|z|=1-\varepsilon}) Q_t( e(\theta) z ) \overline{Q_t}( \frac{1}{e(\theta) z} ) \frac{P'_t(z)}{P_t(z)}\ dz

for sufficiently small {\varepsilon>0}, and this can be computed (in principle, at least) using (3) if the coefficients of {Q_t} are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than {\mu} times the mean spacing and greater than {\lambda} times the mean spacing infinitely often for certain {0 < \mu < 1 < \lambda}; the current records are {\mu = 0.50412} (due to Goldston and Turnage-Butterbaugh) and {\lambda = 3.18} (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound {\mu} for the smallest gap below {1/2}. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases {\theta} of the eigenvalues {e^{2\pi i\theta}} of {U} are nasymptotically always non-zero integer multiples of {1/2N}. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of {T}); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which {L(1,\chi)} is very close to zero, then {1*\chi} behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined {L}-function {\zeta(s) L(s,\chi)} will have a polynomial approximation which in our language looks like a scalar multiple of {1 + e(\theta) Z^{2N+M}}, where {q \approx T^{M/N}} and {\theta} is a phase. The zeroes of this approximation lie on a coset of the {(2N+M)^{th}} roots of unity; the polynomial {P} is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of {1/(2N+M)}. Taking {M = o(N)} then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation {{\bf E}_{ACUE}} for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of {1/2N}. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group {U(n)} in terms of the distribution of the phases {\theta_1,\dots,\theta_N \in {\bf R}/{\bf Z}} of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

\displaystyle  \frac{1}{N!} |V(\theta)|^2\ d\theta_1 \dots d\theta_N; \ \ \ \ \ (20)

where

\displaystyle  V(\theta) := \prod_{1 \leq i<j \leq N} (e(\theta_i)-e(\theta_j))

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

\displaystyle  V(\theta) = \sum_{\pi \in S_N} \mathrm{sgn}(\pi) e(\theta \cdot \pi(\rho))

where {\theta := (\theta_1,\dots,\theta_N)}, {\cdot} denotes the dot product, and {\rho := (1,2,\dots,N)} is the “long word”, which implies that (20) is a trigonometric series with constant term {1}; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing {(\theta_1,\dots,\theta_n) \in ({\bf R}/{\bf Z})^N} using the probability measure (20), and then generating {U} to be a random unitary matrix with eigenvalues {e(\theta_1),\dots,e(\theta_N)}.

For the alternative distribution, we first draw {(\theta_1,\dots,\theta_N)} on the discrete torus {(\frac{1}{2N}{\bf Z}/{\bf Z})^N} (thus each {\theta_j} is a {2N^{th}} root of unity) with probability density function

\displaystyle  \frac{1}{(2N)^N} \frac{1}{N!} |V(\theta)|^2 \ \ \ \ \ (21)

shift by a phase {\alpha \in {\bf R}/{\bf Z}} drawn uniformly at random, and then select {U} to be a random unitary matrix with eigenvalues {e^{i(\theta_1+\alpha)}, \dots, e^{i(\theta_N+\alpha)}}. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. The diagonal contribution {\pi=\pi'} gives the constant function {\frac{1}{(2N)^N}}, which has total mass one. All of the other exponentials have a frequency {\pi(\rho)-\pi'(\rho)} that is not a multiple of {2N}, and hence will have mean zero on {(\frac{1}{2N}{\bf Z}/{\bf Z})^N}. The claim follows.

From construction it is clear that the matrix {U} drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of {1/2N}. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials {s_\lambda} with {\lambda} of size at most {N} and of equal size remain orthonormal with respect to the alternative measure. That is to say,

\displaystyle  \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{CUE}}(U) = \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{ACUE}}(U)

when {\lambda,\lambda'} have size equal to each other and at most {N}. In this case the phase {\alpha} in the definition of {U} is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

\displaystyle  \int_{({\bf R}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2\ d\theta = \frac{1}{(2N)^N} \sum_{\theta \in (\frac{1}{2N}{\bf Z}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2.

By Fourier decomposition, it then suffices to show that the trigonometric polynomial {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} does not contain any components of the form {e( \theta \cdot 2N k)} for some non-zero lattice vector {k \in {\bf Z}^N}. But we have already observed that {|V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. Also, as is well known, {s_\lambda(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa )} where {\kappa} is majorised by {\lambda}, and similarly {s_{\lambda'}(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa' )} where {\kappa'} is majorised by {\lambda'}. So the product {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\kappa - \kappa' + \pi(\rho) - \pi'(\rho)))}. But every coefficient of the vector {\kappa - \kappa' + \pi(\rho) - \pi'(\rho)} lies between {1-2N} and {2N-1}, and so cannot be of the form {2Nk} for any non-zero lattice vector {k}, giving the claim.

Example 4 If {N=2}, then the distribution (21) assigns a probability of {\frac{1}{4^2 2!} 2} to any pair {(\theta_1,\theta_2) \in (\frac{1}{4} {\bf Z}/{\bf Z})^2} that is a permuted rotation of {(0,\frac{1}{4})}, and a probability of {\frac{1}{4^2 2!} 4} to any pair that is a permuted rotation of {(0,\frac{1}{2})}. Thus, a matrix {U} drawn from the alternative distribution will be conjugate to a phase rotation of {\mathrm{diag}(1, i)} with probability {1/2}, and to {\mathrm{diag}(1,-1)} with probability {1/2}.

A similar computation when {N=3} gives {U} conjugate to a phase rotation of {\mathrm{diag}(1, e(1/6), e(1/3))} with probability {1/12}, to a phase rotation of {\mathrm{diag}( 1, e(1/6), -1)} or its adjoint with probability of {1/3} each, and a phase rotation of {\mathrm{diag}(1, e(1/3), e(2/3))} with probability {1/4}.

Remark 5 For large {N} it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of {1/2N}; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then {|\mathrm{tr}(U^j)|} is periodic in {j} with period {2N}, so from Proposition 1 for the alternative distribution one has

\displaystyle  {\bf E}_{\mathrm{ACUE}} |\mathrm{tr} U^j|^2 = \min_{k \in {\bf Z}} |j-2Nk|

which differs from (13) for any {|j| > N}. (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single {j} with {|j| > N} would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6 One can view the CUE as normalised Lebesgue measure on {U(N)} (viewed as a smooth submanifold of {{\bf C}^{N^2}}). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of {U(N)} consisting of those unitary matrices whose phase spacings are non-zero integer multiples of {1/2N}; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel {K(\theta,\theta') = \frac{1}{N} \sum_{j=0}^{N-1} e(j(\theta - \theta'))} (or one can equivalently take {K(\theta,\theta') = \frac{\sin(\pi N (\theta-\theta'))}{N\sin(\pi(\theta-\theta'))}}; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be {2N^{th}} roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of {\frac{1}{2}}). In particular, the {k}-point correlation functions of ACUE (after this rotation) are precisely the restriction of the {k}-point correlation functions of CUE after normalisation, that is to say they are proportional to {\mathrm{det}( K( \theta_i,\theta_j) )_{1 \leq i,j \leq k}}.

Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for

\displaystyle  \int_T^{2T} |\zeta(\frac{1}{2}+it)|^{2k} |Q(\frac{1}{2}+it)|^2\ dt

for some small even exponent {2k} (almost always {2} or {4}) and some short Dirichlet polynomial {Q}; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

\displaystyle  {\bf E}_t |P_t(1)|^{2k} |Q_t(1)|^2

where {Q_t} is now some random medium degree polynomial that depends on the unitary matrix {U} associated to {P_t} (and in applications will typically also contain some negative power of {\exp(A_t)} to cancel the corresponding powers of {\exp(A_t)} in {|P_t(1)|^{2k}}). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of {Q} involve products of traces {\mathrm{tr}(U^k)} of total order less than {N-k}, then in terms of the eigenvalue phases {\theta}, {|Q(1)|^2} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude less than {N-k}. On the other hand, as each coefficient of {P_t} is an elementary symmetric function of the eigenvalues, {P_t(1)} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude at most {1}. Thus {|P_t(1)|^{2k} |Q_t(1)|^2} is a linear combination of plane waves where the frequencies {\xi} have coefficients of magnitude less than {N}, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus {({\bf R}/{\bf Z})^n} by the previous arguments. In other words, {|P_t(1)|^{2k} |Q_t(1)|^2} has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier {Q_t} has degree close to or exceeding {N}, which corresponds to Dirichlet polynomials {Q} of length close to or exceeding {T}, which is far beyond current technology for such moment estimates.

Remark 8 The GUE hypothesis for the zeta function asserts that the average

\displaystyle  \lim_{T \rightarrow \infty} \frac{1}{T} \int_T^{2T} \sum_{\gamma_1,\dots,\gamma_n \hbox{ distinct}} \eta( \frac{\log T}{2\pi}(\gamma_1-t),\dots, \frac{\log T}{2\pi}(\gamma_k-t))\ dt \ \ \ \ \ (22)

is equal to

\displaystyle  \int_{{\bf R}^n} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ dx_1 \dots dx_k \ \ \ \ \ (23)

for any {k \geq 1} and any test function {\eta: {\bf R}^k \rightarrow {\bf C}}, where {K(x) := \frac{\sin \pi x}{\pi x}} is the Dyson sine kernel and {\gamma_i} are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for {U}. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

\displaystyle  \int_0^1 2^{-k} \sum_{x_1,\dots,x_k \in \frac{1}{2} {\bf Z} + \theta} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ d\theta.

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

A useful rule of thumb in complex analysis is that holomorphic functions {f(z)} behave like large degree polynomials {P(z)}. This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.

In some cases it can be convenient instead to work with polynomials {P(Z)} of another variable {Z} such as {Z = e^{2\pi i z}} (or more generally {Z=e^{2\pi i z/N}} for a scaling parameter {N}). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula

\displaystyle  \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)

one ends up having the following heuristic approximation in the neighbourhood of a point {\frac{1}{2}+it} on the critical line:

Heuristic 1 (Polynomial approximation) Let {T \ggg 1} be a height, let {t} be a “typical” element of {[T,2T]}, and let {1 \lll N \ll \log T} be an integer. Let {\phi_t = \phi_{t,T}: {\bf C} \rightarrow {\bf C}} be the linear change of variables

\displaystyle  \phi_t(z) := \frac{1}{2} + it - \frac{2\pi i z}{\log T}.

Then one has an approximation

\displaystyle  \zeta( \phi_t(z) ) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (2)

for {z = o(N)} and some polynomial {P_t = P_{t,T}} of degree {N}.

The requirement {z=o(N)} is necessary since the right-hand side is periodic with period {N} in the {z} variable (or period {\frac{2\pi i N}{\log T}} in the {s = \phi_t(z)} variable), whereas the zeta function is not expected to have any such periodicity, even approximately.

Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with {\mathrm{Im}(s) = O(T)}) we have an approximate form

\displaystyle  \zeta(s) \approx \sum_{n \leq T} \frac{1}{n^s}

of (11). If we group the integers {n} from {1} to {T} into {N} bins depending on what powers of {T^{1/N}} they lie between, we thus have

\displaystyle  \zeta(s) \approx \sum_{j=0}^N \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^s}

For {s = \phi_t(z)} with {z = o(N)} and {T^{j/N} \leq n < T^{(j+1)/N}} we heuristically have

\displaystyle  \frac{1}{n^s} \approx \frac{1}{n^{\frac{1}{2}+it}} e^{2\pi i j z / N}

and so

\displaystyle  \zeta(s) \approx \sum_{j=0}^N a_j(t) (e^{2\pi i z/N})^j

where {a_j(t)} are the partial Dirichlet series

\displaystyle  a_j(t) \approx \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^{\frac{1}{2}+it}}. \ \ \ \ \ (3)

This gives the desired polynomial approximation.

A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have

\displaystyle  \zeta(s) \propto \prod_\rho (s-\rho) \times \dots

where {\rho} runs over the non-trivial zeroes of {\zeta}, and there are some additional factors arising from the trivial zeroes and poles of {\zeta} which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region {s = \frac{1}{2} + it + o( N / \log T) = \phi_t( \{ z: z = o(N) \})}, the dominant contribution to this product (besides multiplicative constants) should arise from zeroes {\rho} that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical” {t} one should have about {N} such zeroes. If one lets {\rho_1,\dots,\rho_N} be any enumeration of {N} zeroes closest to {\frac{1}{2}+it}, and then repeats this set of zeroes periodically by period {\frac{2\pi i N}{\log T}}, one then expects to have an approximation of the form

\displaystyle  \zeta(s) \propto \prod_{j=1}^N \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) )

again ignoring all issues of convergence. If one writes {s = \phi_t(z)} and {\rho_j = \phi_t(\lambda_j)}, then Euler’s famous product formula for sine basically gives

\displaystyle  \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) ) \propto \prod_{k \in {\bf Z}} (z - (\lambda_j+2\pi k N) )

\displaystyle  \propto (e^{2\pi i z/N} - e^{2\pi i \lambda j/N})

(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as {\mathrm{Im}(z) \rightarrow \infty}) and hence we expect

\displaystyle  \zeta(s) \propto \prod_{j=1}^N (e^{2\pi i z/N} - e^{2\pi i \lambda j/N}).

This again gives the desired polynomial approximation.

Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show

Theorem 2 Let {N = N(T)} be an integer going sufficiently slowly to infinity. Let {W_0 \ll N} go to zero sufficiently slowly depending on {N}. Let {t} be drawn uniformly at random from {[T,2T]}. Then with probability {1-o(1)} (in the limit {T \rightarrow \infty}), and possibly after adjusting {N} by {1}, there exists a polynomial {P_t(Z)} of degree {N} and obeying the functional equation (9) below, such that

\displaystyle  \zeta( \phi_t(z) ) = (1+o(1)) P_t( e^{2\pi i z/N} ) \ \ \ \ \ (4)

whenever {|z| \leq W_0}.

It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting {N} be anything growing like {o(\log T)}, and {W_0} anything growing like {o(N)}; also we should be able to delete the need to adjust {N} by {1}. We have not attempted these optimisations here.

Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials {P_t(Z)}, which one can view as random degree {N} polynomials if {t} is interpreted as a random variable drawn uniformly at random from {[T,2T]}. These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in {Z}). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable {Z}, as per the Weil conjectures). The parameter {N} is at our disposal to choose, and reflects the scale {\approx N/\log T} at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take {N \approx \log T} (or very slightly larger), while for “microscopic” questions one would take {N} close to {1} and only growing very slowly with {T}. For the intermediate “mesoscopic” scales one would take {N} somewhere between {1} and {\log T}. Unfortunately, the statistical properties of {P_t} are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of {P_t} is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of {P_t} and related functions (e.g., {1/P_t}, {P'_t/P_t}, or {\log P_t}).

Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the {N} zeroes of the polynomial {P_t(Z)} lie on the unit circle {|Z|=1} (which, after the change of variables {Z = e^{2\pi i z/N}}, corresponds to {z} being real); in a similar vein, the GUE hypothesis corresponds to {P_t(Z)} having the asymptotic law of a random scalar {a_N(t)} times the characteristic polynomial of a random unitary {N \times N} matrix. Next, we consider what happens to the functional equation

\displaystyle  \zeta(s) = \chi(s) \zeta(1-s) \ \ \ \ \ (5)

where

\displaystyle  \chi(s) := 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s).

A routine calculation involving Stirling’s formula reveals that

\displaystyle  \chi(\frac{1}{2}+it) = (1+o(1)) e^{-2\pi i L(t)} \ \ \ \ \ (6)

with {L(t) := \frac{t}{2\pi} \log \frac{t}{2\pi} - \frac{t}{2\pi} + \frac{7}{8}}; one also has the closely related approximation

\displaystyle  \frac{\chi'}{\chi}(s) = -\log T + O(1) \ \ \ \ \ (7)

and hence

\displaystyle  \chi(\phi_t(z)) = (1+o(1)) e^{-2\pi i \theta(t)} e^{2\pi i z} \ \ \ \ \ (8)

when {z = o(\log T)}. Since {\zeta(1-s) = \overline{\zeta(\overline{1-s})}}, applying (5) with {s = \phi_t(z)} and using the approximation (2) suggests a functional equation for {P_t}:

\displaystyle  P_t(e^{2\pi i z/N}) = e^{-2\pi i L(t)} e^{2\pi i z} \overline{P_t(e^{2\pi i \overline{z}/N})}

or in terms of {Z := e^{2\pi i z/N}},

\displaystyle  P_t(Z) = e^{-2\pi i L(t)} Z^N \overline{P_t}(1/Z) \ \ \ \ \ (9)

where {\overline{P_t}(Z) := \overline{P_t(\overline{Z})}} is the polynomial {P_t} with all the coefficients replaced by their complex conjugate. Thus if we write

\displaystyle  P_t(Z) = \sum_{j=0}^N a_j Z^j

then the functional equation can be written as

\displaystyle  a_j(t) = e^{-2\pi i L(t)} \overline{a_{N-j}(t)}.

We remark that if we use the heuristic (3) (interpreting the cutoffs in the {n} summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.

Another consequence of the functional equation is that the zeroes of {P_t} are symmetric with respect to inversion {Z \mapsto 1/\overline{Z}} across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase {L(t)} is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation {e^{\pi i L(t)} P_t} of {P_t} instead.

One consequence of the functional equation is that {e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})} is real for any {\theta \in {\bf R}}; the same is then true for the derivative {e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}. Among other things, this implies that {P'_t(e^{i\theta})} cannot vanish unless {P_t(e^{i\theta})} does also; thus the zeroes of {P'_t} will not lie on the unit circle except where {P_t} has repeated zeroes. The analogous statement is true for {\zeta}; the zeroes of {\zeta'} will not lie on the critical line except where {\zeta} has repeated zeroes.

Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative {\zeta'} of the zeta function in the critical strip lie on or to the right of the critical line. The analogous result for polynomials is

Proposition 3 We have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P'_t(Z) = 0 \}

(where all zeroes are counted with multiplicity.) In particular, the zeroes of {P_t(Z)} all lie on the unit circle if and only if the zeroes of {P'_t(Z)} lie in the closed unit disk.

Proof: From the functional equation we have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P_t(Z) = 0 \}.

Thus it will suffice to show that {P_t} and {P'_t} have the same number of zeroes outside the closed unit disk.

Set {f(z) := z \frac{P'(z)}{P(z)}}, then {f} is a rational function that does not have a zero or pole at infinity. For {e^{i\theta}} not a zero of {P_t}, we have already seen that {e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})} and {e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})} are real, so on dividing we see that {i f(e^{i\theta}) - \frac{iN}{2}} is always real, that is to say

\displaystyle  \mathrm{Re} f(e^{i\theta}) = \frac{N}{2}.

(This can also be seen by writing {f(e^{i\theta}) = \sum_\lambda \frac{1}{1-e^{-i\theta} \lambda}}, where {\lambda} runs over the zeroes of {P_t}, and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When {e^{i\theta}} is a zero of {P_t}, {f(z)} has a simple pole at {e^{i\theta}} with residue a positive multiple of {e^{i\theta}}, and so {f(z)} stays on the right half-plane if one traverses a semicircular arc around {e^{i\theta}} outside the unit disk. From this and continuity we see that {f} stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim. \Box

From the functional equation and the chain rule, {Z} is a zero of {P'_t} if and only if {1/\overline{Z}} is a zero of {N P_t - P'_t}. We can thus write the above proposition in the equivalent form

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| < 1: NP_t(Z) - P'_t(Z) = 0 \}.

One can use this identity to get a lower bound on the number of zeroes of {P_t} by the method of mollifiers. Namely, for any other polynomial {M_t}, we clearly have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \}

\displaystyle \geq N - 2 \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}.

By Jensen’s formula, we have for any {r>1} that

\displaystyle  \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle \leq -(\log r) \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|\ d\theta.

We therefore have

\displaystyle  \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N + \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle - \frac{1}{\log r} \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta.

As the logarithm function is concave, we can apply Jensen’s inequality to conclude

\displaystyle  {\bf E} \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N

\displaystyle + {\bf E} \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|

\displaystyle - \frac{1}{\log r} \log \left( \frac{1}{2\pi} \int_0^{2\pi} {\bf E} |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta\right).

where the expectation is over the {t} parameter. It turns out that by choosing the mollifier {M_t} carefully in order to make {M_t P_t} behave like the function {1} (while keeping the degree {M_t} small enough that one can compute the second moment here), and then optimising in {r}, one can use this inequality to get a positive fraction of zeroes of {P_t} on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier {M_t} and of the differential operator {N - \partial_z} that implicitly appears in the above approach. (The most recent lower bound I know of is {0.4191637}, due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to {1} if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any {W \geq 1}, the number of zeroes of {P_t} that lie further than {\frac{W}{N}} from the unit circle is of order {O( e^{-cW} N )} on average for some absolute constant {c>0}. Thus, roughly speaking, most zeroes of {P_t} lie within {O(1/N)} of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of {c}.)

The zeroes of {P'_t} tend to live somewhat closer to the origin than the zeroes of {P_t}. Suppose for instance that we write

\displaystyle  P_t(Z) = \sum_{j=0}^N a_j(t) Z^j = a_N(t) \prod_{j=1}^N (Z - \lambda_j)

where {\lambda_1,\dots,\lambda_N} are the zeroes of {P_t(Z)}, then by evaluating at zero we see that

\displaystyle  \lambda_1 \dots \lambda_N = (-1)^N a_0(t) / a_N(t)

and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate

\displaystyle  P'_t(Z) = \sum_{j=1}^N a_j(t) j Z^{j-1} = N a_N(t) \prod_{j=1}^{N-1} (Z - \lambda'_j)

where {\lambda'_1,\dots,\lambda'_{N-1}} are the zeroes of {P'_t}, then by evaluating at zero we now see that

\displaystyle  \lambda'_1 \dots \lambda'_{N-1} = (-1)^N a_1(t) / N a_N(t).

The right-hand side would now be typically expected to be of size {O(1/N) \approx \exp(- \log N)}, and so on average we expect the {\lambda'_j} to have magnitude like {\exp( - \frac{\log N}{N} )}, that is to say pushed inwards from the unit circle by a distance roughly {\frac{\log N}{N}}. The analogous result for the Riemann zeta function is that the zeroes of {\zeta'(s)} at height {\sim T} lie at a distance roughly {\frac{\log\log T}{\log T}} to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.

Read the rest of this entry »

(This post is mostly intended for my own reference, as I found myself repeatedly looking up several conversions between polynomial bases on various occasions.)

Let {\mathrm{Poly}_{\leq n}} denote the vector space of polynomials {P:{\bf R} \rightarrow {\bf R}} of one variable {x} with real coefficients of degree at most {n}. This is a vector space of dimension {n+1}, and the sequence of these spaces form a filtration:

\displaystyle  \mathrm{Poly}_{\leq 0} \subset \mathrm{Poly}_{\leq 1} \subset \mathrm{Poly}_{\leq 2} \subset \dots

A standard basis for these vector spaces are given by the monomials {x^0, x^1, x^2, \dots}: every polynomial {P(x)} in {\mathrm{Poly}_{\leq n}} can be expressed uniquely as a linear combination of the first {n+1} monomials {x^0, x^1, \dots, x^n}. More generally, if one has any sequence {Q_0(x), Q_1(x), Q_2(x)} of polynomials, with each {Q_n} of degree exactly {n}, then an easy induction shows that {Q_0(x),\dots,Q_n(x)} forms a basis for {\mathrm{Poly}_{\leq n}}.

In particular, if we have two such sequences {Q_0(x), Q_1(x), Q_2(x),\dots} and {R_0(x), R_1(x), R_2(x), \dots} of polynomials, with each {Q_n} of degree {n} and each {R_k} of degree {k}, then {Q_n} must be expressible uniquely as a linear combination of the polynomials {R_0,R_1,\dots,R_n}, thus we have an identity of the form

\displaystyle  Q_n(x) = \sum_{k=0}^n c_{QR}(n,k) R_k(x)

for some change of basis coefficients {c_{QR}(n,k) \in {\bf R}}. These coefficients describe how to convert a polynomial expressed in the {Q_n} basis into a polynomial expressed in the {R_k} basis.

Many standard combinatorial quantities {c(n,k)} involving two natural numbers {0 \leq k \leq n} can be interpreted as such change of basis coefficients. The most familiar example are the binomial coefficients {\binom{n}{k}}, which measures the conversion from the shifted monomial basis {(x+1)^n} to the monomial basis {x^k}, thanks to (a special case of) the binomial formula:

\displaystyle  (x+1)^n = \sum_{k=0}^n \binom{n}{k} x^k,

thus for instance

\displaystyle  (x+1)^3 = \binom{3}{0} x^0 + \binom{3}{1} x^1 + \binom{3}{2} x^2 + \binom{3}{3} x^3

\displaystyle  = 1 + 3x + 3x^2 + x^3.

More generally, for any shift {h}, the conversion from {(x+h)^n} to {x^k} is measured by the coefficients {h^{n-k} \binom{n}{k}}, thanks to the general case of the binomial formula.

But there are other bases of interest too. For instance if one uses the falling factorial basis

\displaystyle  (x)_n := x (x-1) \dots (x-n+1)

then the conversion from falling factorials to monomials is given by the Stirling numbers of the first kind {s(n,k)}:

\displaystyle  (x)_n = \sum_{k=0}^n s(n,k) x^k,

thus for instance

\displaystyle  (x)_3 = s(3,0) x^0 + s(3,1) x^1 + s(3,2) x^2 + s(3,3) x^3

\displaystyle  = 0 + 2 x - 3x^2 + x^3

and the conversion back is given by the Stirling numbers of the second kind {S(n,k)}:

\displaystyle  x^n = \sum_{k=0}^n S(n,k) (x)_k

thus for instance

\displaystyle  x^3 = S(3,0) (x)_0 + S(3,1) (x)_1 + S(3,2) (x)_2 + S(3,3) (x)_3

\displaystyle  = 0 + x + 3 x(x-1) + x(x-1)(x-2).

If one uses the binomial functions {\binom{x}{n} = \frac{1}{n!} (x)_n} as a basis instead of the falling factorials, one of course can rewrite these conversions as

\displaystyle  \binom{x}{n} = \sum_{k=0}^n \frac{1}{n!} s(n,k) x^k

and

\displaystyle  x^n = \sum_{k=0}^n k! S(n,k) \binom{x}{k}

thus for instance

\displaystyle  \binom{x}{3} = 0 + \frac{1}{3} x - \frac{1}{2} x^2 + \frac{1}{6} x^3

and

\displaystyle  x^3 = 0 + \binom{x}{1} + 6 \binom{x}{2} + 6 \binom{x}{3}.

As a slight variant, if one instead uses rising factorials

\displaystyle  (x)^n := x (x+1) \dots (x+n-1)

then the conversion to monomials yields the unsigned Stirling numbers {|s(n,k)|} of the first kind:

\displaystyle  (x)^n = \sum_{k=0}^n |s(n,k)| x^k

thus for instance

\displaystyle  (x)^3 = 0 + 2x + 3x^2 + x^3.

One final basis comes from the polylogarithm functions

\displaystyle  \mathrm{Li}_{-n}(x) := \sum_{j=1}^\infty j^n x^j.

For instance one has

\displaystyle  \mathrm{Li}_1(x) = -\log(1-x)

\displaystyle  \mathrm{Li}_0(x) = \frac{x}{1-x}

\displaystyle  \mathrm{Li}_{-1}(x) = \frac{x}{(1-x)^2}

\displaystyle  \mathrm{Li}_{-2}(x) = \frac{x}{(1-x)^3} (1+x)

\displaystyle  \mathrm{Li}_{-3}(x) = \frac{x}{(1-x)^4} (1+4x+x^2)

\displaystyle  \mathrm{Li}_{-4}(x) = \frac{x}{(1-x)^5} (1+11x+11x^2+x^3)

and more generally one has

\displaystyle  \mathrm{Li}_{-n-1}(x) = \frac{x}{(1-x)^{n+2}} E_n(x)

for all natural numbers {n} and some polynomial {E_n} of degree {n} (the Eulerian polynomials), which when converted to the monomial basis yields the (shifted) Eulerian numbers

\displaystyle  E_n(x) = \sum_{k=0}^n A(n+1,k) x^k.

For instance

\displaystyle  E_3(x) = A(4,0) x^0 + A(4,1) x^1 + A(4,2) x^2 + A(4,3) x^3

\displaystyle  = 1 + 11x + 11x^2 + x^3.

These particular coefficients also have useful combinatorial interpretations. For instance:

  • The binomial coefficient {\binom{n}{k}} is of course the number of {k}-element subsets of {\{1,\dots,n\}}.
  • The unsigned Stirling numbers {|s(n,k)|} of the first kind are the number of permutations of {\{1,\dots,n\}} with exactly {k} cycles. The signed Stirling numbers {s(n,k)} are then given by the formula {s(n,k) = (-1)^{n-k} |s(n,k)|}.
  • The Stirling numbers {S(n,k)} of the second kind are the number of ways to partition {\{1,\dots,n\}} into {k} non-empty subsets.
  • The Eulerian numbers {A(n,k)} are the number of permutations of {\{1,\dots,n\}} with exactly {k} ascents.

These coefficients behave similarly to each other in several ways. For instance, the binomial coefficients {\binom{n}{k}} obey the well known Pascal identity

\displaystyle  \binom{n+1}{k} = \binom{n}{k} + \binom{n}{k-1}

(with the convention that {\binom{n}{k}} vanishes outside of the range {0 \leq k \leq n}). In a similar spirit, the unsigned Stirling numbers {|s(n,k)|} of the first kind obey the identity

\displaystyle  |s(n+1,k)| = n |s(n,k)| + |s(n,k-1)|

and the signed counterparts {s(n,k)} obey the identity

\displaystyle  s(n+1,k) = -n s(n,k) + s(n,k-1).

The Stirling numbers of the second kind {S(n,k)} obey the identity

\displaystyle  S(n+1,k) = k S(n,k) + S(n,k-1)

and the Eulerian numbers {A(n,k)} obey the identity

\displaystyle  A(n+1,k) = (k+1) A(n,k) + (n-k+1) A(n,k-1).

I was pleased to learn this week that the 2019 Abel Prize was awarded to Karen Uhlenbeck. Uhlenbeck laid much of the foundations of modern geometric PDE. One of the few papers I have in this area is in fact a joint paper with Gang Tian extending a famous singularity removal theorem of Uhlenbeck for four-dimensional Yang-Mills connections to higher dimensions. In both these papers, it is crucial to be able to construct “Coulomb gauges” for various connections, and there is a clever trick of Uhlenbeck for doing so, introduced in another important paper of hers, which is absolutely critical in my own paper with Tian. Nowadays it would be considered a standard technique, but it was definitely not so at the time that Uhlenbeck introduced it.

Suppose one has a smooth connection {A} on a (closed) unit ball {B(0,1)} in {{\bf R}^n} for some {n \geq 1}, taking values in some Lie algebra {{\mathfrak g}} associated to a compact Lie group {G}. This connection then has a curvature {F(A)}, defined in coordinates by the usual formula

\displaystyle F(A)_{\alpha \beta} = \partial_\alpha A_\beta - \partial_\beta A_\alpha + [A_\alpha,A_\beta]. \ \ \ \ \ (1)

It is natural to place the curvature in a scale-invariant space such as {L^{n/2}(B(0,1))}, and then the natural space for the connection would be the Sobolev space {W^{n/2,1}(B(0,1))}. It is easy to see from (1) and Sobolev embedding that if {A} is bounded in {W^{n/2,1}(B(0,1))}, then {F(A)} will be bounded in {L^{n/2}(B(0,1))}. One can then ask the converse question: if {F(A)} is bounded in {L^{n/2}(B(0,1))}, is {A} bounded in {W^{n/2,1}(B(0,1))}? This can be viewed as asking whether the curvature equation (1) enjoys “elliptic regularity”.

There is a basic obstruction provided by gauge invariance. For any smooth gauge {U: B(0,1) \rightarrow G} taking values in the Lie group, one can gauge transform {A} to

\displaystyle A^U_\alpha := U^{-1} \partial_\alpha U + U^{-1} A_\alpha U

and then a brief calculation shows that the curvature is conjugated to

\displaystyle F(A^U)_{\alpha \beta} = U^{-1} F_{\alpha \beta} U.

This gauge symmetry does not affect the {L^{n/2}(B(0,1))} norm of the curvature tensor {F(A)}, but can make the connection {A} extremely large in {W^{n/2,1}(B(0,1))}, since there is no control on how wildly {U} can oscillate in space.

However, one can hope to overcome this problem by gauge fixing: perhaps if {F(A)} is bounded in {L^{n/2}(B(0,1))}, then one can make {A} bounded in {W^{n/2,1}(B(0,1))} after applying a gauge transformation. The basic and useful result of Uhlenbeck is that this can be done if the {L^{n/2}} norm of {F(A)} is sufficiently small (and then the conclusion is that {A} is small in {W^{n/2,1}}). (For large connections there is a serious issue related to the Gribov ambiguity.) In my (much) later paper with Tian, we adapted this argument, replacing Lebesgue spaces by Morrey space counterparts. (This result was also independently obtained at about the same time by Meyer and Riviére.)

To make the problem elliptic, one can try to impose the Coulomb gauge condition

\displaystyle \partial^\alpha A_\alpha = 0 \ \ \ \ \ (2)

(also known as the Lorenz gauge or Hodge gauge in various papers), together with a natural boundary condition on {\partial B(0,1)} that will not be discussed further here. This turns (1), (2) into a divergence-curl system that is elliptic at the linear level at least. Indeed if one takes the divergence of (1) using (2) one sees that

\displaystyle \partial^\alpha F(A)_{\alpha \beta} = \Delta A_\beta + \partial^\alpha [A_\alpha,A_\beta] \ \ \ \ \ (3)

and if one could somehow ignore the nonlinear term {\partial^\alpha [A_\alpha,A_\beta]} then we would get the required regularity on {A} by standard elliptic regularity estimates.

The problem is then how to handle the nonlinear term. If we already knew that {A} was small in the right norm {W^{n/2,1}(B(0,1))} then one can use Sobolev embedding, Hölder’s inequality, and elliptic regularity to show that the second term in (3) is small compared to the first term, and so one could then hope to eliminate it by perturbative analysis. However, proving that {A} is small in this norm is exactly what we are trying to prove! So this approach seems circular.

Uhlenbeck’s clever way out of this circularity is a textbook example of what is now known as a “continuity” argument. Instead of trying to work just with the original connection {A}, one works with the rescaled connections {A^{(t)}_\alpha(x) := t A_\alpha(tx)} for {0 \leq t \leq 1}, with associated rescaled curvatures {F(A^{(t)})_\alpha = t^2 F(A)_{\alpha \beta}(tx)}. If the original curvature {F(A)} is small in {L^{n/2}} norm (e.g. bounded by some small {\varepsilon>0}), then so are all the rescaled curvatures {F(A^{(t)})}. We want to obtain a Coulomb gauge at time {t=1}; this is difficult to do directly, but it is trivial to obtain a Coulomb gauge at time {t=0}, because the connection vanishes at this time. On the other hand, once one has successfully obtained a Coulomb gauge at some time {t \in [0,1]} with {A^{(t)}} small in the natural norm {W^{n/2,1}} (say bounded by {C \varepsilon} for some constant {C} which is large in absolute terms, but not so large compared with say {1/\varepsilon}), the perturbative argument mentioned earlier (combined with the qualitative hypothesis that {A} is smooth) actually works to show that a Coulomb gauge can also be constructed and be small for all sufficiently close nearby times {t' \in [0,1]} to {t}; furthermore, the perturbative analysis actually shows that the nearby gauges enjoy a slightly better bound on the {W^{n/2,1}} norm, say {C\varepsilon/2} rather than {C\varepsilon}. As a consequence of this, the set of times {t} for which one has a good Coulomb gauge obeying the claimed estimates is both open and closed in {[0,1]}, and also contains {t=0}. Since the unit interval {[0,1]} is connected, it must then also contain {t=1}. This concludes the proof.

One of the lessons I drew from this example is to not be deterred (especially in PDE) by an argument seeming to be circular; if the argument is still sufficiently “nontrivial” in nature, it can often be modified into a usefully non-circular argument that achieves what one wants (possibly under an additional qualitative hypothesis, such as a continuity or smoothness hypothesis).

The celebrated decomposition theorem of Fefferman and Stein shows that every function {f \in \mathrm{BMO}({\bf R}^n)} of bounded mean oscillation can be decomposed in the form

\displaystyle f = f_0 + \sum_{i=1}^n R_i f_i \ \ \ \ \ (1)

 

modulo constants, for some {f_0,f_1,\dots,f_n \in L^\infty({\bf R}^n)}, where {R_i := |\nabla|^{-1} \partial_i} are the Riesz transforms. A technical note here a function in BMO is defined only up to constants (as well as up to the usual almost everywhere equivalence); related to this, if {f_i} is an {L^\infty({\bf R}^n)} function, then the Riesz transform {R_i f_i} is well defined as an element of {\mathrm{BMO}({\bf R}^n)}, but is also only defined up to constants and almost everywhere equivalence.

The original proof of Fefferman and Stein was indirect (relying for instance on the Hahn-Banach theorem). A constructive proof was later given by Uchiyama, and was in fact the topic of the second post on this blog. A notable feature of Uchiyama’s argument is that the construction is quite nonlinear; the vector-valued function {(f_0,f_1,\dots,f_n)} is defined to take values on a sphere, and the iterative construction to build these functions from {f} involves repeatedly projecting a potential approximant to this function to the sphere (also, the high-frequency components of this approximant are constructed in a manner that depends nonlinearly on the low-frequency components, which is a type of technique that has become increasingly common in analysis and PDE in recent years).

It is natural to ask whether the Fefferman-Stein decomposition (1) can be made linear in {f}, in the sense that each of the {f_i, i=0,\dots,n} depend linearly on {f}. Strictly speaking this is easily accomplished using the axiom of choice: take a Hamel basis of {\mathrm{BMO}({\bf R}^n)}, choose a decomposition (1) for each element of this basis, and then extend linearly to all finite linear combinations of these basis functions, which then cover {\mathrm{BMO}({\bf R}^n)} by definition of Hamel basis. But these linear operations have no reason to be continuous as a map from {\mathrm{BMO}({\bf R}^n)} to {L^\infty({\bf R}^n)}. So the correct question is whether the decomposition can be made continuously linear (or equivalently, boundedly linear) in {f}, that is to say whether there exist continuous linear transformations {T_i: \mathrm{BMO}({\bf R}^n) \rightarrow L^\infty({\bf R}^n)} such that

\displaystyle f = T_0 f + \sum_{i=1}^n R_i T_i f \ \ \ \ \ (2)

 

modulo constants for all {f \in \mathrm{BMO}({\bf R}^n)}. Note from the open mapping theorem that one can choose the functions {f_0,\dots,f_n} to depend in a bounded fashion on {f} (thus {\|f_i\|_{L^\infty} \leq C \|f\|_{BMO}} for some constant {C}, however the open mapping theorem does not guarantee linearity. Using a result of Bartle and Graves one can also make the {f_i} depend continuously on {f}, but again the dependence is not guaranteed to be linear.

It is generally accepted folklore that continuous linear dependence is known to be impossible, but I had difficulty recently tracking down an explicit proof of this assertion in the literature (if anyone knows of a reference, I would be glad to know of it). The closest I found was a proof of a similar statement in this paper of Bourgain and Brezis, which I was able to adapt to establish the current claim. The basic idea is to average over the symmetries of the decomposition, which in the case of (1) are translation invariance, rotation invariance, and dilation invariance. This effectively makes the operators {T_0,T_1,\dots,T_n} invariant under all these symmetries, which forces them to themselves be linear combinations of the identity and Riesz transform operators; however, no such non-trivial linear combination maps {\mathrm{BMO}} to {L^\infty}, and the claim follows. Formal details of this argument (which we phrase in a dual form in order to avoid some technicalities) appear below the fold.

Read the rest of this entry »

Archives