You are currently browsing the tag archive for the ‘heat flow’ tag.

I have just uploaded to the arXiv my paper “Sharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint“, submitted to Mathematika. In this paper I return (after more than a decade’s absence) to one of my first research interests, namely the Kakeya and restriction family of conjectures. The starting point is the following “multilinear Kakeya estimate” first established in the non-endpoint case by Bennett, Carbery, and myself, and then in the endpoint case by Guth (with further proofs and extensions by Bourgain-Guth and Carbery-Valdimarsson:

Theorem 1 (Multilinear Kakeya estimate) Let {\delta > 0} be a radius. For each {j = 1,\dots,d}, let {\mathbb{T}_j} denote a finite family of infinite tubes {T_j} in {{\bf R}^d} of radius {\delta}. Assume the following axiom:

  • (i) (Transversality) whenever {T_j \in \mathbb{T}_j} is oriented in the direction of a unit vector {n_j} for {j =1,\dots,d}, we have

    \displaystyle  \left|\bigwedge_{j=1}^d n_j\right| \geq A^{-1}

    for some {A>0}, where we use the usual Euclidean norm on the wedge product {\bigwedge^d {\bf R}^d}.

Then, for any {p \geq \frac{1}{d-1}}, one has

\displaystyle  \left\| \prod_{j=1}^d \sum_{T_j \in \mathbb{T}_j} 1_{T_j} \right\|_{L^p({\bf R}^d)} \lesssim_{A,p} \delta^{\frac{d}{p}} \prod_{j \in [d]} \# \mathbb{T}_j. \ \ \ \ \ (1)

where {L^p({\bf R}^d)} are the usual Lebesgue norms with respect to Lebesgue measure, {1_{T_j}} denotes the indicator function of {T_j}, and {\# \mathbb{T}_j} denotes the cardinality of {\mathbb{T}_j}.

The original proof of this proceeded using a heat flow monotonicity method, which in my previous post I reinterpreted using a “virtual integration” concept on a fractional Cartesian product space. It turns out that this machinery is somewhat flexible, and can be used to establish some other estimates of this type. The first result of this paper is to extend the above theorem to the curved setting, in which one localises to a ball of radius {O(1)} (and sets {\delta} to be small), but allows the tubes {T_j} to be curved in a {C^2} fashion. If one runs the heat flow monotonicity argument, one now picks up some additional error terms arising from the curvature, but as the spatial scale approaches zero, the tubes become increasingly linear, and as such the error terms end up being an integrable multiple of the main term, at which point one can conclude by Gronwall’s inequality (actually for technical reasons we use a bootstrap argument instead of Gronwall). A key point in this approach is that one obtains optimal bounds (not losing factors of {\delta^{-\varepsilon}} or {\log^{O(1)} \frac{1}{\delta}}), so long as one stays away from the endpoint case {p=\frac{1}{d-1}} (which does not seem to be easily treatable by the heat flow methods). Previously, the paper of Bennett, Carbery, and myself was able to use an induction on scale argument to obtain a curved multilinear Kakeya estimate losing a factor of {\log^{O(1)} \frac{1}{\delta}} (after optimising the argument); later arguments of Bourgain-Guth and Carbery-Valdimarsson, based on algebraic topology methods, could also obtain a curved multilinear Kakeya estimate without such losses, but only in the algebraic case when the tubes were neighbourhoods of algebraic curves of bounded degree.

Perhaps more interestingly, we are also able to extend the heat flow monotonicity method to apply directly to the multilinear restriction problem, giving the following global multilinear restriction estimate:

Theorem 2 (Multilinear restriction theorem) Let {\frac{1}{d-1} < p \leq \infty} be an exponent, and let {A \geq 2} be a parameter. Let {M} be a sufficiently large natural number, depending only on {d}. For {j \in [d]}, let {U_j} be an open subset of {B^{d-1}(0,A)}, and let {h_j: U_j \rightarrow {\bf R}} be a smooth function obeying the following axioms:

  • (i) (Regularity) For each {j \in [d]} and {\xi \in U_j}, one has

    \displaystyle  |\nabla_\xi^{\otimes m} \otimes h_j(\xi)| \leq A \ \ \ \ \ (2)

    for all {1 \leq m \leq M}.

  • (ii) (Transversality) One has

    \displaystyle  \left| \bigwedge_{j \in [d]} (-\nabla_\xi h_j(\xi_j),1) \right| \geq A^{-1}

    whenever {\xi_j \in U_j} for {j \in [d]}.

Let {U_{j,1/A} \subset U_j} be the sets

\displaystyle  U_{j,1/A} := \{ \xi \in U_j: B^{d-1}(\xi,1/A) \subset U_j \}. \ \ \ \ \ (3)

Then one has

\displaystyle  \left\| \prod_{j \in [d]} {\mathcal E}_j f_j \right\|_{L^{2p}({\bf R}^d)} \leq A^{O(1)} \left(d-1-\frac{1}{p}\right)^{-O(1)} \prod_{j \in [d]} \|f_j \|_{L^2(U_{j,1/A})}

for any {f_j \in L^2(U_{j,1/A} \rightarrow {\bf C})}, {j \in [d]}, extended by zero outside of {U_{j,1/A}}, and {{\mathcal E}_j} denotes the extension operator

\displaystyle  {\mathcal E}_j f_j( x', x_d ) := \int_{U_j} e^{2\pi i (x' \xi^T + x_d h_j(\xi))} f_j(\xi)\ d\xi.

Local versions of such estimate, in which {L^{2p}({\bf R}^d)} is replaced with {L^{2p}(B^d(0,R))} for some {R \geq 2}, and one accepts a loss of the form {\log^{O(1)} R}, were already established by Bennett, Carbery, and myself using an induction on scale argument. In a later paper of Bourgain-Guth these losses were removed by “epsilon removal lemmas” to recover Theorme 2, but only in the case when all the hypersurfaces involved had curvatures bounded away from zero.

There are two main new ingredients in the proof of Theorem 2. The first is to replace the usual induction on scales scheme to establish multilinear restriction by a “ball inflation” induction on scales scheme that more closely resembles the proof of decoupling theorems. In particular, we actually prove the more general family of estimates

\displaystyle  \left\| \prod_{j \in [d]} E_{r}[{\mathcal E}_j f_j] \right\|_{L^{p}({\bf R}^d)} \leq A^{O(1)} \left(d-1 - \frac{1}{p}\right)^{O(1)} r^{\frac{d}{p}} \prod_{j \in [d]} \| f_j \|_{L^2(U_{j,1/A})}^2

where {E_r} denotes the local energies

\displaystyle  E_{r}[f](x',x_d) := \int_{B^{d-1}(x',r)} |f(y',x_d)|^2\ dy'

(actually for technical reasons it is more convenient to use a smoother weight than the strict cutoff to the disk {B^{d-1}(x',r)}). With logarithmic losses, it is not difficult to establish this estimate by an upward induction on {r}. To avoid such losses we use the heat flow monotonicity method. Here we run into the issue that the extension operators {{\mathcal E}_j f_j} are complex-valued rather than non-negative, and thus would not be expected to obey many good montonicity properties. However, the local energies {E_r[{\mathcal E}_j f_j]} can be expressed in terms of the magnitude squared of what is essentially the Gabor transform of {{\mathcal E}_j f_j}, and these are non-negative; furthermore, the dispersion relation associated to the extension operators {{\mathcal E}_j f_j} implies that these Gabor transforms propagate along tubes, so that the situation becomes quite similar (up to several additional lower order error terms) to that in the multilinear Kakeya problem. (This can be viewed as a continuous version of the usual wave packet decomposition method used to relate restriction and Kakeya problems, which when combined with the heat flow monotonicity method allows for one to use a continuous version of induction on scales methods that do not concede any logarithmic factors.)

Finally, one can combine the curved multilinear Kakeya result with the multilinear restriction result to obtain estimates for multilinear oscillatory integrals away from the endpoint. Again, this sort of implication was already established in the previous paper of Bennett, Carbery, and myself, but the arguments there had some epsilon losses in the exponents; here we were able to run the argument more carefully and avoid these losses.

Let {\Omega} be some domain (such as the real numbers). For any natural number {p}, let {L(\Omega^p)_{sym}} denote the space of symmetric real-valued functions {F^{(p)}: \Omega^p \rightarrow {\bf R}} on {p} variables {x_1,\dots,x_p \in \Omega}, thus

\displaystyle F^{(p)}(x_{\sigma(1)},\dots,x_{\sigma(p)}) = F^{(p)}(x_1,\dots,x_p)

for any permutation {\sigma: \{1,\dots,p\} \rightarrow \{1,\dots,p\}}. For instance, for any natural numbers {k,p}, the elementary symmetric polynomials

\displaystyle e_k^{(p)}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} x_{i_1} \dots x_{i_k}

will be an element of {L({\bf R}^p)_{sym}}. With the pointwise product operation, {L(\Omega^p)_{sym}} becomes a commutative real algebra. We include the case {p=0}, in which case {L(\Omega^0)_{sym}} consists solely of the real constants.

Given two natural numbers {k,p}, one can “lift” a symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} of {k} variables to a symmetric function {[F^{(k)}]_{k \rightarrow p} \in L(\Omega^p)_{sym}} of {p} variables by the formula

\displaystyle [F^{(k)}]_{k \rightarrow p}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} F^{(k)}(x_{i_1}, \dots, x_{i_k})

\displaystyle = \frac{1}{k!} \sum_\pi F^{(k)}( x_{\pi(1)}, \dots, x_{\pi(k)} )

where {\pi} ranges over all injections from {\{1,\dots,k\}} to {\{1,\dots,p\}} (the latter formula making it clearer that {[F^{(k)}]_{k \rightarrow p}} is symmetric). Thus for instance

\displaystyle [F^{(1)}(x_1)]_{1 \rightarrow p} = \sum_{i=1}^p F^{(1)}(x_i)

\displaystyle [F^{(2)}(x_1,x_2)]_{2 \rightarrow p} = \sum_{1 \leq i < j \leq p} F^{(2)}(x_i,x_j)


\displaystyle e_k^{(p)}(x_1,\dots,x_p) = [x_1 \dots x_k]_{k \rightarrow p}.

Also we have

\displaystyle [1]_{k \rightarrow p} = \binom{p}{k} = \frac{p(p-1)\dots(p-k+1)}{k!}.

With these conventions, we see that {[F^{(k)}]_{k \rightarrow p}} vanishes for {p=0,\dots,k-1}, and is equal to {F} if {k=p}. We also have the transitivity

\displaystyle [F^{(k)}]_{k \rightarrow p} = \frac{1}{\binom{p-k}{p-l}} [[F^{(k)}]_{k \rightarrow l}]_{l \rightarrow p}

if {k \leq l \leq p}.

The lifting map {[]_{k \rightarrow p}} is a linear map from {L(\Omega^k)_{sym}} to {L(\Omega^p)_{sym}}, but it is not a ring homomorphism. For instance, when {\Omega={\bf R}}, one has

\displaystyle [x_1]_{1 \rightarrow p} [x_1]_{1 \rightarrow p} = (\sum_{i=1}^p x_i)^2 \ \ \ \ \ (1)


\displaystyle = \sum_{i=1}^p x_i^2 + 2 \sum_{1 \leq i < j \leq p} x_i x_j

\displaystyle = [x_1^2]_{1 \rightarrow p} + 2 [x_1 x_2]_{1 \rightarrow p}

\displaystyle \neq [x_1^2]_{1 \rightarrow p}.

In general, one has the identity

\displaystyle [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow p} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow p} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (2)


\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow p}

for all natural numbers {k,l,p} and {F^{(k)} \in L(\Omega^k)_{sym}}, {G^{(l)} \in L(\Omega^l)_{sym}}, where {\pi, \rho} range over all injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}} with {\pi(\{1,\dots,k\}) \cup \rho(\{1,\dots,l\}) = \{1,\dots,m\}}. Combinatorially, the identity (2) follows from the fact that given any injections {\tilde \pi: \{1,\dots,k\} \rightarrow \{1,\dots,p\}} and {\tilde \rho: \{1,\dots,l\} \rightarrow \{1,\dots,p\}} with total image {\tilde \pi(\{1,\dots,k\}) \cup \tilde \rho(\{1,\dots,l\})} of cardinality {m}, one has {k,l \leq m \leq k+l}, and furthermore there exist precisely {m!} triples {(\pi, \rho, \sigma)} of injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}}, {\sigma: \{1,\dots,m\} \rightarrow \{1,\dots,p\}} such that {\tilde \pi = \sigma \circ \pi} and {\tilde \rho = \sigma \circ \rho}.

Example 1 When {\Omega = {\bf R}}, one has

\displaystyle [x_1 x_2]_{2 \rightarrow p} [x_1]_{1 \rightarrow p} = [\frac{1}{2! 1!}( 2 x_1^2 x_2 + 2 x_1 x_2^2 )]_{2 \rightarrow p} + [\frac{1}{2! 1!} 6 x_1 x_2 x_3]_{3 \rightarrow p}

\displaystyle = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow p} + [3x_1 x_2 x_3]_{3 \rightarrow p}

which is just a restatement of the identity

\displaystyle (\sum_{i < j} x_i x_j) (\sum_k x_k) = \sum_{i<j} x_i^2 x_j + x_i x_j^2 + \sum_{i < j < k} 3 x_i x_j x_k.

Note that the coefficients appearing in (2) do not depend on the final number of variables {p}. We may therefore abstract the role of {p} from the law (2) by introducing the real algebra {L(\Omega^*)_{sym}} of formal sums

\displaystyle F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

where for each {k}, {F^{(k)}} is an element of {L(\Omega^k)_{sym}} (with only finitely many of the {F^{(k)}} being non-zero), and with the formal symbol {[]_{k \rightarrow *}} being formally linear, thus

\displaystyle [F^{(k)}]_{k \rightarrow *} + [G^{(k)}]_{k \rightarrow *} := [F^{(k)} + G^{(k)}]_{k \rightarrow *}


\displaystyle c [F^{(k)}]_{k \rightarrow *} := [cF^{(k)}]_{k \rightarrow *}

for {F^{(k)}, G^{(k)} \in L(\Omega^k)_{sym}} and scalars {c \in {\bf R}}, and with multiplication given by the analogue

\displaystyle [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow *} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow *} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (3)


\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow *}

of (2). Thus for instance, in this algebra {L(\Omega^*)_{sym}} we have

\displaystyle [x_1]_{1 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2]_{1 \rightarrow *} + 2 [x_1 x_2]_{2 \rightarrow *}


\displaystyle [x_1 x_2]_{2 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow *} + [3 x_1 x_2 x_3]_{3 \rightarrow *}.

Informally, {L(\Omega^*)_{sym}} is an abstraction (or “inverse limit”) of the concept of a symmetric function of an unspecified number of variables, which are formed by summing terms that each involve only a bounded number of these variables at a time. One can check (somewhat tediously) that {L(\Omega^*)_{sym}} is indeed a commutative real algebra, with a unit {[1]_{0 \rightarrow *}}. (I do not know if this algebra has previously been studied in the literature; it is somewhat analogous to the abstract algebra of finite linear combinations of Schur polynomials, with multiplication given by a Littlewood-Richardson rule. )

For natural numbers {p}, there is an obvious specialisation map {[]_{* \rightarrow p}} from {L(\Omega^*)_{sym}} to {L(\Omega^p)_{sym}}, defined by the formula

\displaystyle [\sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}]_{* \rightarrow p} := \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}.

Thus, for instance, {[]_{* \rightarrow p}} maps {[x_1]_{1 \rightarrow *}} to {[x_1]_{1 \rightarrow p}} and {[x_1 x_2]_{2 \rightarrow *}} to {[x_1 x_2]_{2 \rightarrow p}}. From (2) and (3) we see that this map {[]_{* \rightarrow p}: L(\Omega^*)_{sym} \rightarrow L(\Omega^p)_{sym}} is an algebra homomorphism, even though the maps {[]_{k \rightarrow *}: L(\Omega^k)_{sym} \rightarrow L(\Omega^*)_{sym}} and {[]_{k \rightarrow p}: L(\Omega^k)_{sym} \rightarrow L(\Omega^p)_{sym}} are not homomorphisms. By inspecting the {p^{th}} component of {L(\Omega^*)_{sym}} we see that the homomorphism {[]_{* \rightarrow p}} is in fact surjective.

Now suppose that we have a measure {\mu} on the space {\Omega}, which then induces a product measure {\mu^p} on every product space {\Omega^p}. To avoid degeneracies we will assume that the integral {\int_\Omega \mu} is strictly positive. Assuming suitable measurability and integrability hypotheses, a function {F \in L(\Omega^p)_{sym}} can then be integrated against this product measure to produce a number

\displaystyle \int_{\Omega^p} F\ d\mu^p.

In the event that {F} arises as a lift {[F^{(k)}]_{k \rightarrow p}} of another function {F^{(k)} \in L(\Omega^k)_{sym}}, then from Fubini’s theorem we obtain the formula

\displaystyle \int_{\Omega^p} F\ d\mu^p = \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}.

Thus for instance, if {\Omega={\bf R}},

\displaystyle \int_{{\bf R}^p} [x_1]_{1 \rightarrow p}\ d\mu^p = p (\int_{\bf R} x\ d\mu(x)) (\int_{\bf R} \mu)^{p-1} \ \ \ \ \ (4)



\displaystyle \int_{{\bf R}^p} [x_1 x_2]_{2 \rightarrow p}\ d\mu^p = \binom{p}{2} (\int_{{\bf R}^2} x_1 x_2\ d\mu(x_1) d\mu(x_2)) (\int_{\bf R} \mu)^{p-2}. \ \ \ \ \ (5)


On summing, we see that if

\displaystyle F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

is an element of the formal algebra {L(\Omega^*)_{sym}}, then

\displaystyle \int_{\Omega^p} [F^{(*)}]_{* \rightarrow p}\ d\mu^p = \sum_{k=0}^\infty \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}. \ \ \ \ \ (6)


Note that by hypothesis, only finitely many terms on the right-hand side are non-zero.

Now for a key observation: whereas the left-hand side of (6) only makes sense when {p} is a natural number, the right-hand side is meaningful when {p} takes a fractional value (or even when it takes negative or complex values!), interpreting the binomial coefficient {\binom{p}{k}} as a polynomial {\frac{p(p-1) \dots (p-k+1)}{k!}} in {p}. As such, this suggests a way to introduce a “virtual” concept of a symmetric function on a fractional power space {\Omega^p} for such values of {p}, and even to integrate such functions against product measures {\mu^p}, even if the fractional power {\Omega^p} does not exist in the usual set-theoretic sense (and {\mu^p} similarly does not exist in the usual measure-theoretic sense). More precisely, for arbitrary real or complex {p}, we now define {L(\Omega^p)_{sym}} to be the space of abstract objects

\displaystyle F^{(p)} = [F^{(*)}]_{* \rightarrow p} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}

with {F^{(*)} \in L(\Omega^*)_{sym}} and {[]_{* \rightarrow p}} (and {[]_{k \rightarrow p}} now interpreted as formal symbols, with the structure of a commutative real algebra inherited from {L(\Omega^*)_{sym}}, thus

\displaystyle [F^{(*)}]_{* \rightarrow p} + [G^{(*)}]_{* \rightarrow p} := [F^{(*)} + G^{(*)}]_{* \rightarrow p}

\displaystyle c [F^{(*)}]_{* \rightarrow p} := [c F^{(*)}]_{* \rightarrow p}

\displaystyle [F^{(*)}]_{* \rightarrow p} [G^{(*)}]_{* \rightarrow p} := [F^{(*)} G^{(*)}]_{* \rightarrow p}.

In particular, the multiplication law (2) continues to hold for such values of {p}, thanks to (3). Given any measure {\mu} on {\Omega}, we formally define a measure {\mu^p} on {\Omega^p} with regards to which we can integrate elements {F^{(p)}} of {L(\Omega^p)_{sym}} by the formula (6) (providing one has sufficient measurability and integrability to make sense of this formula), thus providing a sort of “fractional dimensional integral” for symmetric functions. Thus, for instance, with this formalism the identities (4), (5) now hold for fractional values of {p}, even though the formal space {{\bf R}^p} no longer makes sense as a set, and the formal measure {\mu^p} no longer makes sense as a measure. (The formalism here is somewhat reminiscent of the technique of dimensional regularisation employed in the physical literature in order to assign values to otherwise divergent integrals. See also this post for an unrelated abstraction of the integration concept involving integration over supercommutative variables (and in particular over fermionic variables).)

Example 2 Suppose {\mu} is a probability measure on {\Omega}, and {X: \Omega \rightarrow {\bf R}} is a random variable; on any power {\Omega^k}, we let {X_1,\dots,X_k: \Omega^k \rightarrow {\bf R}} be the usual independent copies of {X} on {\Omega^k}, thus {X_j(\omega_1,\dots,\omega_k) := X(\omega_j)} for {(\omega_1,\dots,\omega_k) \in \Omega^k}. Then for any real or complex {p}, the formal integral

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p

can be evaluated by first using the identity

\displaystyle [X_1]_{1 \rightarrow p}^2 = [X_1^2]_{1 \rightarrow p} + 2[X_1 X_2]_{2 \rightarrow p}

(cf. (1)) and then using (6) and the probability measure hypothesis {\int_\Omega\ d\mu = 1} to conclude that

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = \binom{p}{1} \int_{\Omega} X^2\ d\mu + 2 \binom{p}{2} \int_{\Omega^2} X_1 X_2\ d\mu^2

\displaystyle = p (\int_\Omega X^2\ d\mu - (\int_\Omega X\ d\mu)^2) + p^2 (\int_\Omega X\ d\mu)^2

or in probabilistic notation

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2. \ \ \ \ \ (7)


For {p} a natural number, this identity has the probabilistic interpretation

\displaystyle \mathbf{E}( X_1 + \dots + X_p)^2 = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2 \ \ \ \ \ (8)


whenever {X_1,\dots,X_p} are jointly independent copies of {X}, which reflects the well known fact that the sum {X_1 + \dots + X_p} has expectation {p \mathbf{E} X} and variance {p \mathbf{Var}(X)}. One can thus view (7) as an abstract generalisation of (8) to the case when {p} is fractional, negative, or even complex, despite the fact that there is no sensible way in this case to talk about {p} independent copies {X_1,\dots,X_p} of {X} in the standard framework of probability theory.

In this particular case, the quantity (7) is non-negative for every nonnegative {p}, which looks plausible given the form of the left-hand side. Unfortunately, this sort of non-negativity does not always hold; for instance, if {X} has mean zero, one can check that

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^4\ d\mu^p = p \mathbf{Var}(X^2) + p(3p-2) (\mathbf{E}(X^2))^2

and the right-hand side can become negative for {p < 2/3}. This is a shame, because otherwise one could hope to start endowing {L(X^p)_{sym}} with some sort of commutative von Neumann algebra type structure (or the abstract probability structure discussed in this previous post) and then interpret it as a genuine measure space rather than as a virtual one. (This failure of positivity is related to the fact that the characteristic function of a random variable, when raised to the {p^{th}} power, need not be a characteristic function of any random variable once {p} is no longer a natural number: “fractional convolution” does not preserve positivity!) However, one vestige of positivity remains: if {F: \Omega \rightarrow {\bf R}} is non-negative, then so is

\displaystyle \int_{\Omega^p} [F]_{1 \rightarrow p}\ d\mu^p = p (\int_\Omega F\ d\mu) (\int_\Omega\ d\mu)^{p-1}.

One can wonder what the point is to all of this abstract formalism and how it relates to the rest of mathematics. For me, this formalism originated implicitly in an old paper I wrote with Jon Bennett and Tony Carbery on the multilinear restriction and Kakeya conjectures, though we did not have a good language for working with it at the time, instead working first with the case of natural number exponents {p} and appealing to a general extrapolation theorem to then obtain various identities in the fractional {p} case. The connection between these fractional dimensional integrals and more traditional integrals ultimately arises from the simple identity

\displaystyle (\int_\Omega\ d\mu)^p = \int_{\Omega^p}\ d\mu^p

(where the right-hand side should be viewed as the fractional dimensional integral of the unit {[1]_{0 \rightarrow p}} against {\mu^p}). As such, one can manipulate {p^{th}} powers of ordinary integrals using the machinery of fractional dimensional integrals. A key lemma in this regard is

Lemma 3 (Differentiation formula) Suppose that a positive measure {\mu = \mu(t)} on {\Omega} depends on some parameter {t} and varies by the formula

\displaystyle \frac{d}{dt} \mu(t) = a(t) \mu(t) \ \ \ \ \ (9)


for some function {a(t): \Omega \rightarrow {\bf R}}. Let {p} be any real or complex number. Then, assuming sufficient smoothness and integrability of all quantities involved, we have

\displaystyle \frac{d}{dt} \int_{\Omega^p} F^{(p)}\ d\mu(t)^p = \int_{\Omega^p} F^{(p)} [a(t)]_{1 \rightarrow p}\ d\mu(t)^p \ \ \ \ \ (10)


for all {F^{(p)} \in L(\Omega^p)_{sym}} that are independent of {t}. If we allow {F^{(p)}(t)} to now depend on {t} also, then we have the more general total derivative formula

\displaystyle \frac{d}{dt} \int_{\Omega^p} F^{(p)}(t)\ d\mu(t)^p \ \ \ \ \ (11)


\displaystyle = \int_{\Omega^p} \frac{d}{dt} F^{(p)}(t) + F^{(p)}(t) [a(t)]_{1 \rightarrow p}\ d\mu(t)^p,

again assuming sufficient amounts of smoothness and regularity.

Proof: We just prove (10), as (11) then follows by same argument used to prove the usual product rule. By linearity it suffices to verify this identity in the case {F^{(p)} = [F^{(k)}]_{k \rightarrow p}} for some symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} for a natural number {k}. By (6), the left-hand side of (10) is then

\displaystyle \frac{d}{dt} [\binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu(t)^k) (\int_\Omega\ d\mu(t))^{p-k}]. \ \ \ \ \ (12)


Differentiating under the integral sign using (9) we have

\displaystyle \frac{d}{dt} \int_\Omega\ d\mu(t) = \int_\Omega\ a(t)\ d\mu(t)

and similarly

\displaystyle \frac{d}{dt} \int_{\Omega^k} F^{(k)}\ d\mu(t)^k = \int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu(t)^k

where {a_1,\dots,a_k} are the standard {k} copies of {a = a(t)} on {\Omega^k}:

\displaystyle a_j(\omega_1,\dots,\omega_k) := a(\omega_j).

By the product rule, we can thus expand (12) as

\displaystyle \binom{p}{k} (\int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu^k ) (\int_\Omega\ d\mu)^{p-k}

\displaystyle + \binom{p}{k} (p-k) (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ a\ d\mu) (\int_\Omega\ d\mu)^{p-k-1}

where we have suppressed the dependence on {t} for brevity. Since {\binom{p}{k} (p-k) = \binom{p}{k+1} (k+1)}, we can write this expression using (6) as

\displaystyle \int_{\Omega^p} [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p}\ d\mu^p

where {F^{(k)} \ast a \in L(\Omega^{k+1})_{sym}} is the symmetric function

\displaystyle F^{(k)} \ast a(\omega_1,\dots,\omega_{k+1}) := \sum_{j=1}^{k+1} F^{(k)}(\omega_1,\dots,\omega_{j-1},\omega_{j+1} \dots \omega_{k+1}) a(\omega_j).

But from (2) one has

\displaystyle [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p} = [F^{(k)}]_{k \rightarrow p} [a]_{1 \rightarrow p}

and the claim follows. \Box

Remark 4 It is also instructive to prove this lemma in the special case when {p} is a natural number, in which case the fractional dimensional integral {\int_{\Omega^p} F^{(p)}\ d\mu(t)^p} can be interpreted as a classical integral. In this case, the identity (10) is immediate from applying the product rule to (9) to conclude that

\displaystyle \frac{d}{dt} d\mu(t)^p = [a(t)]_{1 \rightarrow p} d\mu(t)^p.

One could in fact derive (10) for arbitrary real or complex {p} from the case when {p} is a natural number by an extrapolation argument; see the appendix of my paper with Bennett and Carbery for details.

Let us give a simple PDE application of this lemma as illustration:

Proposition 5 (Heat flow monotonicity) Let {u: [0,+\infty) \times {\bf R}^d \rightarrow {\bf R}} be a solution to the heat equation {u_t = \Delta u} with initial data {\mu_0} a rapidly decreasing finite non-negative Radon measure, or more explicitly

\displaystyle u(t,x) = \frac{1}{(4\pi t)^{d/2}} \int_{{\bf R}^d} e^{-|x-y|^2/4t}\ d\mu_0(y)

for al {t>0}. Then for any {p>0}, the quantity

\displaystyle Q_p(t) := t^{\frac{d}{2} (p-1)} \int_{{\bf R}^d} u(t,x)^p\ dx

is monotone non-decreasing in {t \in (0,+\infty)} for {1 < p < \infty}, constant for {p=1}, and monotone non-increasing for {0 < p < 1}.

Proof: By a limiting argument we may assume that {d\mu_0} is absolutely continuous, with Radon-Nikodym derivative a test function; this is more than enough regularity to justify the arguments below.

For any {(t,x) \in (0,+\infty) \times {\bf R}^d}, let {\mu(t,x)} denote the Radon measure

\displaystyle d\mu(t,x)(y) := \frac{1}{(4\pi)^{d/2}} e^{-|x-y|^2/4t}\ d\mu_0(y).

Then the quantity {Q_p(t)} can be written as a fractional dimensional integral

\displaystyle Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx.

Observe that

\displaystyle \frac{\partial}{\partial t} d\mu(t,x) = \frac{|x-y|^2}{4t^2} d\mu(t,x)

and thus by Lemma 3 and the product rule

\displaystyle \frac{d}{dt} Q_p(t) = -\frac{d}{2t} Q_p(t) + t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [\frac{|x-y|^2}{4t^2}]_{1 \rightarrow p} d\mu(t,x)^p\ dx \ \ \ \ \ (13)


where we use {y} for the variable of integration in the factor space {{\bf R}^d} of {({\bf R}^d)^p}.

To simplify this expression we will take advantage of integration by parts in the {x} variable. Specifically, in any direction {x_j}, we have

\displaystyle \frac{\partial}{\partial x_j} d\mu(t,x) = -\frac{x_j-y_j}{2t} d\mu(t,x)

and hence by Lemma 3

\displaystyle \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx = - \int_{({\bf R}^d)^p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

Multiplying by {x_j} and integrating by parts, we see that

\displaystyle d Q_p(t) = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

where we use the Einstein summation convention in {j}. Similarly, if {F_j(y)} is any reasonable function depending only on {y}, we have

\displaystyle \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}[F_j(y)]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = - \int_{({\bf R}^d)^p} [F_j(y)]_{1 \rightarrow p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

and hence on integration by parts

\displaystyle 0 = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [F_j(y) \frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

We conclude that

\displaystyle \frac{d}{2t} Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} (x_j - [F_j(y)]_{1 \rightarrow p}) [\frac{(x_j-y_j)}{4t}]_{1 \rightarrow p} d\mu(t,x)^p\ dx

and thus by (13)

\displaystyle \frac{d}{dt} Q_p(t) = \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}

\displaystyle [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p} - (x_j - [F_j(y)]_{1 \rightarrow p}) [x_j - y_j]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

The choice of {F_j} that then achieves the most cancellation turns out to be {F_j(y) = \frac{1}{p} y_j} (this cancels the terms that are linear or quadratic in the {x_j}), so that {x_j - [F_j(y)]_{1 \rightarrow p} = \frac{1}{p} [x_j - y_j]_{1 \rightarrow p}}. Repeating the calculations establishing (7), one has

\displaystyle \int_{({\bf R}^d)^p} [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p}\ d\mu^p = p \mathop{\bf E} |x-Y|^2 (\int_{{\bf R}^d}\ d\mu)^{p}


\displaystyle \int_{({\bf R}^d)^p} [x_j-y_j]_{1 \rightarrow p} [x_j-y_j]_{1 \rightarrow p}\ d\mu^p

\displaystyle = (p \mathbf{Var}(x-Y) + p^2 |\mathop{\bf E} x-Y|^2) (\int_{{\bf R}^d}\ d\mu)^{p}

where {Y} is the random variable drawn from {{\bf R}^d} with the normalised probability measure {\mu / \int_{{\bf R}^d}\ d\mu}. Since {\mathop{\bf E} |x-Y|^2 = \mathbf{Var}(x-Y) + |\mathop{\bf E} x-Y|^2}, one thus has

\displaystyle \frac{d}{dt} Q_p(t) = (p-1) \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \mathbf{Var}(x-Y) (\int_{{\bf R}^d}\ d\mu)^{p}\ dx. \ \ \ \ \ (14)


This expression is clearly non-negative for {p>1}, equal to zero for {p=1}, and positive for {0 < p < 1}, giving the claim. (One could simplify {\mathbf{Var}(x-Y)} here as {\mathbf{Var}(Y)} if desired, though it is not strictly necessary to do so for the proof.) \Box

Remark 6 As with Remark 4, one can also establish the identity (14) first for natural numbers {p} by direct computation avoiding the theory of fractional dimensional integrals, and then extrapolate to the case of more general values of {p}. This particular identity is also simple enough that it can be directly established by integration by parts without much difficulty, even for fractional values of {p}.

A more complicated version of this argument establishes the non-endpoint multilinear Kakeya inequality (without any logarithmic loss in a scale parameter {R}); this was established in my previous paper with Jon Bennett and Tony Carbery, but using the “natural number {p} first” approach rather than using the current formalism of fractional dimensional integration. However, the arguments can be translated into this formalism without much difficulty; we do so below the fold. (To simplify the exposition slightly we will not address issues of establishing enough regularity and integrability to justify all the manipulations, though in practice this can be done by standard limiting arguments.)

Read the rest of this entry »

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z)

on the zeroes of a time-dependent family of polynomials {z \mapsto P(t,z)}, with a particular focus on the case when the polynomials {z \mapsto P(t,z)} had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle {\{ z: |z| = \sqrt{q} \}}, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when {P} is the numerator of the zeta function of a curve.

More precisely, let {g} be a natural number. We will say that a polynomial

\displaystyle  P(z) = \sum_{j=0}^{2g} a_j z^j

of degree {2g} (so that {a_{2g} \neq 0}) obeys the functional equation if the {a_j} are all real and

\displaystyle  a_j = q^{g-j} a_{2g-j}

for all {j=0,\dots,2g}, thus

\displaystyle  P(\overline{z}) = \overline{P(z)}


\displaystyle  P(q/z) = q^g z^{-2g} P(z)

for all non-zero {z}. This means that the {2g} zeroes {\alpha_1,\dots,\alpha_{2g}} of {P(z)} (counting multiplicity) lie in {{\bf C} \backslash \{0\}} and are symmetric with respect to complex conjugation {z \mapsto \overline{z}} and inversion {z \mapsto q/z} across the circle {\{ |z| = \sqrt{q}\}}. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle {\{ z = \sqrt{q}\}}. For instance, in the {g=1} case, the polynomial {z^2 - a_1 z + q} obeys the Riemann hypothesis if and only if {|a_1| \leq 2\sqrt{q}}.

Such polynomials arise in number theory as follows: if {C} is a projective curve of genus {g} over a finite field {\mathbf{F}_q}, then, as famously proven by Weil, the associated local zeta function {\zeta_{C,q}(z)} (as defined for instance in this previous blog post) is known to take the form

\displaystyle  \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}

where {P} is a degree {2g} polynomial obeying both the functional equation and the Riemann hypothesis. In the case that {C} is an elliptic curve, then {g=1} and {P} takes the form {P(z) = z^2 - a_1 z + q}, where {a_1} is the number of {{\bf F}_q}-points of {C} minus {q+1}. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

\displaystyle  P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)

of {2g \times 2g} matrices {F} in the compact symplectic group {Sp(g)}. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials {P} arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where {F} is drawn uniformly from {Sp(g)} with Haar measure.

Given a polynomial {z \mapsto P(0,z)} of degree {2g} with coefficients

\displaystyle  P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,

we can evolve it in time by the formula

\displaystyle  P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,

thus {a_j(t) = \exp(t(j-g)) a_j(0)} for {t \in {\bf R}}. Informally, as one increases {t}, this evolution accentuates the effect of the extreme monomials, particularly, {z^0} and {z^{2g}} at the expense of the intermediate monomials such as {z^g}, and conversely as one decreases {t}. This family of polynomials obeys the heat-type equation

\displaystyle  \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group {Sp(n)}, and should also be tied to some sort of “{\beta=\infty}” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if {z \mapsto P(0,z)} obeys the functional equation, then so does {z \mapsto P(t,z)} for any other time {t}. Now we investigate the evolution of the zeroes. Suppose at some time {t_0} that the zeroes {\alpha_1(t_0),\dots,\alpha_{2g}(t_0)} of {z \mapsto P(t_0,z)} are distinct, then

\displaystyle  P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).

From the inverse function theorem we see that for times {t} sufficiently close to {t_0}, the zeroes {\alpha_1(t),\dots,\alpha_{2g}(t)} of {z \mapsto P(t,z)} continue to be distinct (and vary smoothly in {t}), with

\displaystyle  P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).

Differentiating this at any {z} not equal to any of the {\alpha_j(t)}, we obtain

\displaystyle  \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})


\displaystyle  \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})


\displaystyle  \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).

Inserting these formulae into (2) (expanding {(z \partial_z - g)^2} as {z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}) and canceling some terms, we conclude that

\displaystyle  - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}

\displaystyle  - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}

for {t} sufficiently close to {t_0}, and {z} not equal to {\alpha_1(t),\dots,\alpha_{2g}(t)}. Extracting the residue at {z = \alpha_j(t)}, we conclude that

\displaystyle  - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)

which we can rearrange as

\displaystyle  \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.

If we make the change of variables {\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}} (noting that one can make {\theta_j} depend smoothly on {t} for {t} sufficiently close to {t_0}), this becomes

\displaystyle  \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)

Intuitively, this equation asserts that the phases {\theta_j} repel each other if they are real (and attract each other if their difference is imaginary). If {z \mapsto P(t_0,z)} obeys the Riemann hypothesis, then the {\theta_j} are all real at time {t_0}, then the Picard uniqueness theorem (applied to {\theta_j(t)} and its complex conjugate) then shows that the {\theta_j} are also real for {t} sufficiently close to {t_0}. If we then define the entropy functional

\displaystyle  H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }

then the above equation becomes a gradient flow

\displaystyle  \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )

which implies in particular that {H(\theta_1(t),\dots,\theta_{2g}(t))} is non-increasing in time. This shows that as one evolves time forward from {t_0}, there is a uniform lower bound on the separation between the phases {\theta_1(t),\dots,\theta_{2g}(t)}, and hence the equation can be solved indefinitely; in particular, {z \mapsto P(t,z)} obeys the Riemann hypothesis for all {t > t_0} if it does so at time {t_0}. Our argument here assumed that the zeroes of {z \mapsto P(t_0,z)} were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial {z \mapsto P(0,z)} obeying the functional equation, the rescaled polynomials {z \mapsto e^{-g^2 t} P(t,z)} converge locally uniformly to {a_{2g}(0) (z^{2g} + q^g)} as {t \rightarrow +\infty}. By Rouche’s theorem, we conclude that the zeroes of {z \mapsto P(t,z)} converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}} on the circle {\{ |z| = \sqrt{q}\}}. Together with the symmetry properties of the zeroes, this implies in particular that {z \mapsto P(t,z)} obeys the Riemann hypothesis for all sufficiently large positive {t}. In the opposite direction, when {t \rightarrow -\infty}, the polynomials {z \mapsto P(t,z)} converge locally uniformly to {a_g(0) z^g}, so if {a_g(0) \neq 0}, {g} of the zeroes converge to the origin and the other {g} converge to infinity. In particular, {z \mapsto P(t,z)} fails the Riemann hypothesis for sufficiently large negative {t}. Thus (if {a_g(0) \neq 0}), there must exist a real number {\Lambda}, which we call the de Bruijn-Newman constant of the original polynomial {z \mapsto P(0,z)}, such that {z \mapsto P(t,z)} obeys the Riemann hypothesis for {t \geq \Lambda} and fails the Riemann hypothesis for {t < \Lambda}. The situation is a bit more complicated if {a_g(0)} vanishes; if {k} is the first natural number such that {a_{g+k}(0)} (or equivalently, {a_{g-j}(0)}) does not vanish, then by the above arguments one finds in the limit {t \rightarrow -\infty} that {g-k} of the zeroes go to the origin, {g-k} go to infinity, and the remaining {2k} zeroes converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}. In this case the de Bruijn-Newman constant remains finite except in the degenerate case {k=g}, in which case {\Lambda = -\infty}.

For instance, consider the case when {g=1} and {P(0,z) = z^2 - a_1 z + q} for some real {a_1} with {|a_1| \leq 2\sqrt{q}}. Then the quadratic polynomial

\displaystyle  P(t,z) = e^t z^2 - a_1 z + e^t q

has zeroes

\displaystyle  \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}

and one easily checks that these zeroes lie on the circle {\{ |z|=\sqrt{q}\}} when {t \geq \log \frac{|a_1|}{2\sqrt{q}}}, and are on the real axis otherwise. Thus in this case we have {\Lambda = \log \frac{|a_1|}{2\sqrt{q}}} (with {\Lambda=-\infty} if {a_1=0}). Note how as {t} increases to {+\infty}, the zeroes repel each other and eventually converge to {\pm i \sqrt{q}}, while as {t} decreases to {-\infty}, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial {P} of degree {g} that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to {1/g}, basically because the average spacing is {1/g} and hence by (3) the typical velocity of the zeroes should be comparable to {g}, and the diameter of the unit circle is comparable to {1}, thus requiring time comparable to {1/g} to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant {\Lambda} should typically take on values comparable to {-1/g} (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of {P} given previously) to explore this further.

Let {P(z) = z^n + a_{n-1} z^{n-1} + \dots + a_0} be a monic polynomial of degree {n} with complex coefficients. Then by the fundamental theorem of algebra, we can factor {P} as

\displaystyle  P(z) = (z-z_1) \dots (z-z_n) \ \ \ \ \ (1)

for some complex zeroes {z_1,\dots,z_n} (possibly with repetition).

Now suppose we evolve {P} with respect to time by heat flow, creating a function {P(t,z)} of two variables with given initial data {P(0,z) = P(z)} for which

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z). \ \ \ \ \ (2)

On the space of polynomials of degree at most {n}, the operator {\partial_{zz}} is nilpotent, and one can solve this equation explicitly both forwards and backwards in time by the Taylor series

\displaystyle  P(t,z) = \sum_{j=0}^\infty \frac{t^j}{j!} \partial_z^{2j} P(0,z).

For instance, if one starts with a quadratic {P(0,z) = z^2 + bz + c}, then the polynomial evolves by the formula

\displaystyle  P(t,z) = z^2 + bz + (c+2t).

As the polynomial {P(t)} evolves in time, the zeroes {z_1(t),\dots,z_n(t)} evolve also. Assuming for sake of discussion that the zeroes are simple, the inverse function theorem tells us that the zeroes will (locally, at least) evolve smoothly in time. What are the dynamics of this evolution?

For instance, in the quadratic case, the quadratic formula tells us that the zeroes are

\displaystyle  z_1(t) = \frac{-b + \sqrt{b^2 - 4(c+2t)}}{2}


\displaystyle  z_2(t) = \frac{-b - \sqrt{b^2 - 4(c+2t)}}{2}

after arbitrarily choosing a branch of the square root. If {b,c} are real and the discriminant {b^2 - 4c} is initially positive, we see that we start with two real zeroes centred around {-b/2}, which then approach each other until time {t = \frac{b^2-4c}{8}}, at which point the roots collide and then move off from each other in an imaginary direction.

In the general case, we can obtain the equations of motion by implicitly differentiating the defining equation

\displaystyle  P( t, z_i(t) ) = 0

in time using (2) to obtain

\displaystyle  \partial_{zz} P( t, z_i(t) ) + \partial_t z_i(t) \partial_z P(t,z_i(t)) = 0.

To simplify notation we drop the explicit dependence on time, thus

\displaystyle  \partial_{zz} P(z_i) + (\partial_t z_i) \partial_z P(z_i)= 0.

From (1) and the product rule, we see that

\displaystyle  \partial_z P( z_i ) = \prod_{j:j \neq i} (z_i - z_j)


\displaystyle  \partial_{zz} P( z_i ) = 2 \sum_{k:k \neq i} \prod_{j:j \neq i,k} (z_i - z_j)

(where all indices are understood to range over {1,\dots,n}) leading to the equations of motion

\displaystyle  \partial_t z_i = \sum_{k:k \neq i} \frac{2}{z_k - z_i}, \ \ \ \ \ (3)

at least when one avoids those times in which there is a repeated zero. In the case when the zeroes {z_i} are real, each term {\frac{2}{z_k-z_i}} represents a (first-order) attraction in the dynamics between {z_i} and {z_k}, but the dynamics are more complicated for complex zeroes (e.g. purely imaginary zeroes will experience repulsion rather than attraction, as one already sees in the quadratic example). Curiously, this system resembles that of Dyson brownian motion (except with the brownian motion part removed, and time reversed). I learned of the connection between the ODE (3) and the heat equation from this paper of Csordas, Smith, and Varga, but perhaps it has been mentioned in earlier literature as well.

One interesting consequence of these equations is that if the zeroes are real at some time, then they will stay real as long as the zeroes do not collide. Let us now restrict attention to the case of real simple zeroes, in which case we will rename the zeroes as {x_i} instead of {z_i}, and order them as {x_1 < \dots < x_n}. The evolution

\displaystyle  \partial_t x_i = \sum_{k:k \neq i} \frac{2}{x_k - x_i}

can now be thought of as reverse gradient flow for the “entropy”

\displaystyle  H := -\sum_{i,j: i \neq j} \log |x_i - x_j|,

(which is also essentially the logarithm of the discriminant of the polynomial) since we have

\displaystyle  \partial_t x_i = \frac{\partial H}{\partial x_i}.

In particular, we have the monotonicity formula

\displaystyle  \partial_t H = 4E

where {E} is the “energy”

\displaystyle  E := \frac{1}{4} \sum_i (\frac{\partial H}{\partial x_i})^2

\displaystyle  = \sum_i (\sum_{k:k \neq i} \frac{1}{x_k-x_i})^2

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2} + 2 \sum_{i,j,k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_j-x_i)}

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2}

where in the last line we use the antisymmetrisation identity

\displaystyle  \frac{1}{(x_k-x_i)(x_j-x_i)} + \frac{1}{(x_i-x_j)(x_k-x_j)} + \frac{1}{(x_j-x_k)(x_i-x_k)} = 0.

Among other things, this shows that as one goes backwards in time, the entropy decreases, and so no collisions can occur to the past, only in the future, which is of course consistent with the attractive nature of the dynamics. As {H} is a convex function of the positions {x_1,\dots,x_n}, one expects {H} to also evolve in a convex manner in time, that is to say the energy {E} should be increasing. This is indeed the case:

Exercise 1 Show that

\displaystyle  \partial_t E = 2 \sum_{i,j: i \neq j} (\frac{2}{(x_i-x_j)^2} - \sum_{k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_k-x_j)})^2.

Symmetric polynomials of the zeroes are polynomial functions of the coefficients and should thus evolve in a polynomial fashion. One can compute this explicitly in simple cases. For instance, the center of mass is an invariant:

\displaystyle  \partial_t \frac{1}{n} \sum_i x_i = 0.

The variance decreases linearly:

Exercise 2 Establish the virial identity

\displaystyle  \partial_t \sum_{i,j} (x_i-x_j)^2 = - 4n^2(n-1).

As the variance (which is proportional to {\sum_{i,j} (x_i-x_j)^2}) cannot become negative, this identity shows that “finite time blowup” must occur – that the zeroes must collide at or before the time {\frac{1}{4n^2(n-1)} \sum_{i,j} (x_i-x_j)^2}.

Exercise 3 Show that the Stieltjes transform

\displaystyle  s(t,z) = \sum_i \frac{1}{x_i - z}

solves the viscous Burgers equation

\displaystyle  \partial_t s = \partial_{zz} s - 2 s \partial_z s,

either by using the original heat equation (2) and the identity {s = - \partial_z P / P}, or else by using the equations of motion (3). This relation between the Burgers equation and the heat equation is known as the Cole-Hopf transformation.

The paper of Csordas, Smith, and Varga mentioned previously gives some other bounds on the lifespan of the dynamics; roughly speaking, they show that if there is one pair of zeroes that are much closer to each other than to the other zeroes then they must collide in a short amount of time (unless there is a collision occuring even earlier at some other location). Their argument extends also to situations where there are an infinite number of zeroes, which they apply to get new results on Newman’s conjecture in analytic number theory. I would be curious to know of further places in the literature where this dynamics has been studied.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the {k}-point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

Below the fold is a version of my talk “Recent progress on the Kakeya conjecture” that I gave at the Fefferman conference.

Read the rest of this entry »