You are currently browsing the category archive for the ‘paper’ category.

Kevin Ford, Sergei Konyagin, James Maynard, Carl Pomerance, and I have uploaded to the arXiv our paper “Long gaps in sieved sets“, submitted to J. Europ. Math. Soc..

This paper originated from the MSRI program in analytic number theory last year, and was centred around variants of the question of finding large gaps between primes. As discussed for instance in this previous post, it is now known that within the set of primes {{\mathcal P} = \{2,3,5,\dots\}}, one can find infinitely many adjacent elements {a,b} whose gap {b-a} obeys a lower bound of the form

\displaystyle b-a \gg \log a \frac{\log_2 a \log_4 a}{\log_3 a}

where {\log_k} denotes the {k}-fold iterated logarithm. This compares with the trivial bound of {b-a \gg \log a} that one can obtain from the prime number theorem and the pigeonhole principle. Several years ago, Pomerance posed the question of whether analogous improvements to the trivial bound can be obtained for such sets as

\displaystyle {\mathcal P}_2 = \{ n \in {\bf N}: n^2+1 \hbox{ prime} \}.

Here there is the obvious initial issue that this set is not even known to be infinite (this is the fourth Landau problem), but let us assume for the sake of discussion that this set is indeed infinite, so that we have an infinite number of gaps to speak of. Standard sieve theory techniques give upper bounds for the density of {{\mathcal P}_2} that is comparable (up to an absolute constant) to the prime number theorem bounds for {{\mathcal P}}, so again we can obtain a trivial bound of {b-a \gg \log a} for the gaps of {{\mathcal P}_2}. In this paper we improve this to

\displaystyle b-a \gg \log a \log^c_2 a

for an absolute constant {c>0}; this is not as strong as the corresponding bound for {{\mathcal P}}, but still improves over the trivial bound. In fact we can handle more general “sifted sets” than just {{\mathcal P}_2}. Recall from the sieve of Eratosthenes that the elements of {{\mathcal P}} in, say, the interval {[x/2, x]} can be obtained by removing from {[x/2, x]} one residue class modulo {p} for each prime up to {\sqrt{x}}, namely the class {0} mod {p}. In a similar vein, the elements of {{\mathcal P}_2} in {[x/2,x]} can be obtained by removing for each prime {p} up to {x} zero, one, or two residue classes modulo {p}, depending on whether {-1} is a quadratic residue modulo {p}. On the average, one residue class will be removed (this is a very basic case of the Chebotarev density theorem), so this sieving system is “one-dimensional on the average”. Roughly speaking, our arguments apply to any other set of numbers arising from a sieving system that is one-dimensional on average. (One can consider other dimensions also, but unfortunately our methods seem to give results that are worse than a trivial bound when the dimension is less than or greater than one.)

The standard “Erdős-Rankin” method for constructing long gaps between primes proceeds by trying to line up some residue classes modulo small primes {p} so that they collectively occupy a long interval. A key tool in doing so are the smooth number estimates of de Bruijn and others, which among other things assert that if one removes from an interval such as {[1,x]} all the residue classes {0} mod {p} for {p} between {x^{1/u}} and {x} for some fixed {u>1}, then the set of survivors has exceptionally small density (roughly of the order of {u^{-u}}, with the precise density given by the Dickman function), in marked contrast to the situation in which one randomly removes one residue class for each such prime {p}, in which the density is more like {1/u}. One generally exploits this phenomenon to sieve out almost all the elements of a long interval using some of the primes available, and then using the remaining primes to cover up the remaining elements that have not already been sifted out. In the more recent work on this problem, advanced combinatorial tools such as hypergraph covering lemmas are used for the latter task.

In the case of {{\mathcal P}_2}, there does not appear to be any analogue of smooth numbers, in the sense that there is no obvious way to arrange the residue classes so that they have significantly fewer survivors than a random arrangement. Instead we adopt the following semi-random strategy to cover an interval {[1,y]} by residue classes. Firstly, we randomly remove residue classes for primes {p} up to some intermediate threshold {z} (smaller than {y} by a logarithmic factor), leaving behind a preliminary sifted set {S_{[2,z]}}. Then, for each prime {p} between {z} and another intermediate threshold {x/2}, we remove a residue class mod {p} that maximises (or nearly maximises) its intersection with {S_{[2,z]}}. This ends up reducing the number of survivors to be significantly below what one would achieve if one selects residue classes randomly, particularly if one also uses the hypergraph covering lemma from our previous paper. Finally, we cover each the remaining survivors by a residue class from a remaining available prime.

Brad Rodgers and I have uploaded to the arXiv our paper “The De Bruijn-Newman constant is non-negative“. This paper affirms a conjecture of Newman regarding to the extent to which the Riemann hypothesis, if true, is only “barely so”. To describe the conjecture, let us begin with the Riemann xi function

\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s)

where {\Gamma(s) := \int_0^\infty e^{-t} t^{s-1}\ dt} is the Gamma function and {\zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}} is the Riemann zeta function. Initially, this function is only defined for {\mathrm{Re} s > 1}, but, as was already known to Riemann, we can manipulate it into a form that extends to the entire complex plane as follows. Firstly, in view of the standard identity {s \Gamma(s) = \Gamma(s+1)}, we can write

\displaystyle \frac{s(s-1)}{2} \Gamma(\frac{s}{2}) = 2 \Gamma(\frac{s+4}{2}) - 3 \Gamma( \frac{s+2}{2} )

and hence

\displaystyle \xi(s) = \sum_{n=1}^\infty 2 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt - 3 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt.

By a rescaling, one may write

\displaystyle \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt = (\pi n^2)^{\frac{s+4}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+4}{2}-1}\ dt

and similarly

\displaystyle \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt = (\pi n^2)^{\frac{s+2}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt

and thus (after applying Fubini’s theorem)

\displaystyle \xi(s) = \int_0^\infty \sum_{n=1}^\infty 2 \pi^2 n^4 e^{-\pi n^2 t} t^{\frac{s+4}{2}-1} - 3 \pi n^2 e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt.

We’ll make the change of variables {t = e^{4u}} to obtain

\displaystyle \xi(s) = 4 \int_{\bf R} \sum_{n=1}^\infty (2 \pi^2 n^4 e^{8u} - 3 \pi n^2 e^{4u}) \exp( 2su - \pi n^2 e^{4u} )\ du.

If we introduce the mild renormalisation

\displaystyle H_0(z) := \frac{1}{8} \xi( \frac{1}{2} + \frac{iz}{2} )

of {\xi}, we then conclude (at least for {\mathrm{Im} z > 1}) that

\displaystyle H_0(z) = \frac{1}{2} \int_{\bf R} \Phi(u)\exp(izu)\ du \ \ \ \ \ (1)


where {\Phi: {\bf R} \rightarrow {\bf C}} is the function

\displaystyle \Phi(u) := \sum_{n=1}^\infty (2 \pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp( - \pi n^2 e^{4u} ), \ \ \ \ \ (2)


which one can verify to be rapidly decreasing both as {u \rightarrow +\infty} and as {u \rightarrow -\infty}, with the decrease as {u \rightarrow +\infty} faster than any exponential. In particular {H_0} extends holomorphically to the upper half plane.

If we normalize the Fourier transform {{\mathcal F} f(\xi)} of a (Schwartz) function {f(x)} as {{\mathcal F} f(\xi) := \int_{\bf R} f(x) e^{-2\pi i \xi x}\ dx}, it is well known that the Gaussian {x \mapsto e^{-\pi x^2}} is its own Fourier transform. The creation operator {2\pi x - \frac{d}{dx}} interacts with the Fourier transform by the identity

\displaystyle {\mathcal F} (( 2\pi x - \frac{d}{dx} ) f) (\xi) = -i (2 \pi \xi - \frac{d}{d\xi} ) {\mathcal F} f(\xi).

Since {(-i)^4 = 1}, this implies that the function

\displaystyle x \mapsto (2\pi x - \frac{d}{dx})^4 e^{-\pi x^2} = 128 \pi^2 (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2} + 48 \pi^2 e^{-\pi x^2}

is its own Fourier transform. (One can view the polynomial {128 \pi^2 (2\pi^2 x^4 - 3 \pi x^2) + 48 \pi^2} as a renormalised version of the fourth Hermite polynomial.) Taking a suitable linear combination of this with {x \mapsto e^{-\pi x^2}}, we conclude that

\displaystyle x \mapsto (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2}

is also its own Fourier transform. Rescaling {x} by {e^{2u}} and then multiplying by {e^u}, we conclude that the Fourier transform of

\displaystyle x \mapsto (2 \pi^2 x^4 e^{9u} - 3 \pi x^2 e^{5u}) \exp( - \pi x^2 e^{4u} )


\displaystyle x \mapsto (2 \pi^2 x^4 e^{-9u} - 3 \pi x^2 e^{-5u}) \exp( - \pi x^2 e^{-4u} ),

and hence by the Poisson summation formula (using symmetry and vanishing at {n=0} to unfold the {n} summation in (2) to the integers rather than the natural numbers) we obtain the functional equation

\displaystyle \Phi(-u) = \Phi(u),

which implies that {\Phi} and {H_0} are even functions (in particular, {H_0} now extends to an entire function). From this symmetry we can also rewrite (1) as

\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du,

which now gives a convergent expression for the entire function {H_0(z)} for all complex {z}. As {\Phi} is even and real-valued on {{\bf R}}, {H_0(z)} is even and also obeys the functional equation {H_0(\overline{z}) = \overline{H_0(z)}}, which is equivalent to the usual functional equation for the Riemann zeta function. The Riemann hypothesis is equivalent to the claim that all the zeroes of {H_0} are real.

De Bruijn introduced the family {H_t: {\bf C} \rightarrow {\bf C}} of deformations of {H_0: {\bf C} \rightarrow {\bf C}}, defined for all {t \in {\bf R}} and {z \in {\bf C}} by the formula

\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du.

From a PDE perspective, one can view {H_t} as the evolution of {H_0} under the backwards heat equation {\partial_t H_t(z) = - \partial_{zz} H_t(z)}. As with {H_0}, the {H_t} are all even entire functions that obey the functional equation {H_t(\overline{z}) = \overline{H_t(z)}}, and one can ask an analogue of the Riemann hypothesis for each such {H_t}, namely whether all the zeroes of {H_t} are real. De Bruijn showed that these hypotheses were monotone in {t}: if {H_t} had all real zeroes for some {t}, then {H_{t'}} would also have all zeroes real for any {t' \geq t}. Newman later sharpened this claim by showing the existence of a finite number {\Lambda \leq 1/2}, now known as the de Bruijn-Newman constant, with the property that {H_t} had all zeroes real if and only if {t \geq \Lambda}. Thus, the Riemann hypothesis is equivalent to the inequality {\Lambda \leq 0}. Newman then conjectured the complementary bound {\Lambda \geq 0}; in his words, this conjecture asserted that if the Riemann hypothesis is true, then it is only “barely so”, in that the reality of all the zeroes is destroyed by applying heat flow for even an arbitrarily small amount of time. Over time, a significant amount of evidence was established in favour of this conjecture; most recently, in 2011, Saouter, Gourdon, and Demichel showed that {\Lambda \geq -1.15 \times 10^{-11}}.

In this paper we finish off the proof of Newman’s conjecture, that is we show that {\Lambda \geq 0}. The proof is by contradiction, assuming that {\Lambda < 0} (which among other things, implies the truth of the Riemann hypothesis), and using the properties of backwards heat evolution to reach a contradiction.

Very roughly, the argument proceeds as follows. As observed by Csordas, Smith, and Varga (and also discussed in this previous blog post, the backwards heat evolution of the {H_t} introduces a nice ODE dynamics on the zeroes {x_j(t)} of {H_t}, namely that they solve the ODE

\displaystyle \frac{d}{dt} x_j(t) = -2 \sum_{j \neq k} \frac{1}{x_k(t) - x_j(t)} \ \ \ \ \ (3)


for all {j} (one has to interpret the sum in a principal value sense as it is not absolutely convergent, but let us ignore this technicality for the current discussion). Intuitively, this ODE is asserting that the zeroes {x_j(t)} repel each other, somewhat like positively charged particles (but note that the dynamics is first-order, as opposed to the second-order laws of Newtonian mechanics). Formally, a steady state (or equilibrium) of this dynamics is reached when the {x_k(t)} are arranged in an arithmetic progression. (Note for instance that for any positive {u}, the functions {z \mapsto e^{tu^2} \cos(uz)} obey the same backwards heat equation as {H_t}, and their zeroes are on a fixed arithmetic progression {\{ \frac{2\pi (k+\tfrac{1}{2})}{u}: k \in {\bf Z} \}}.) The strategy is to then show that the dynamics from time {-\Lambda} to time {0} creates a convergence to local equilibrium, in which the zeroes {x_k(t)} locally resemble an arithmetic progression at time {t=0}. This will be in contradiction with known results on pair correlation of zeroes (or on related statistics, such as the fluctuations on gaps between zeroes), such as the results of Montgomery (actually for technical reasons it is slightly more convenient for us to use related results of Conrey, Ghosh, Goldston, Gonek, and Heath-Brown). Another way of thinking about this is that even very slight deviations from local equilibrium (such as a small number of gaps that are slightly smaller than the average spacing) will almost immediately lead to zeroes colliding with each other and leaving the real line as one evolves backwards in time (i.e., under the forward heat flow). This is a refinement of the strategy used in previous lower bounds on {\Lambda}, in which “Lehmer pairs” (pairs of zeroes of the zeta function that were unusually close to each other) were used to limit the extent to which the evolution continued backwards in time while keeping all zeroes real.

How does one obtain this convergence to local equilibrium? We proceed by broad analogy with the “local relaxation flow” method of Erdos, Schlein, and Yau in random matrix theory, in which one combines some initial control on zeroes (which, in the case of the Erdos-Schlein-Yau method, is referred to with terms such as “local semicircular law”) with convexity properties of a relevant Hamiltonian that can be used to force the zeroes towards equilibrium.

We first discuss the initial control on zeroes. For {H_0}, we have the classical Riemann-von Mangoldt formula, which asserts that the number of zeroes in the interval {[0,T]} is {\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log T)} as {T \rightarrow \infty}. (We have a factor of {4\pi} here instead of the more familiar {2\pi} due to the way {H_0} is normalised.) This implies for instance that for a fixed {\alpha}, the number of zeroes in the interval {[T, T+\alpha]} is {\frac{\alpha}{4\pi} \log T + O(\log T)}. Actually, because we get to assume the Riemann hypothesis, we can sharpen this to {\frac{\alpha}{4\pi} \log T + o(\log T)}, a result of Littlewood (see this previous blog post for a proof). Ideally, we would like to obtain similar control for the other {H_t}, {\Lambda \leq t < 0}, as well. Unfortunately we were only able to obtain the weaker claims that the number of zeroes of {H_t} in {[0,T]} is {\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log^2 T)}, and that the number of zeroes in {[T, T+\alpha \log T]} is {\frac{\alpha}{4 \pi} \log^2 T + o(\log^2 T)}, that is to say we only get good control on the distribution of zeroes at scales {\gg \log T} rather than at scales {\gg 1}. Ultimately this is because we were only able to get control (and in particular, lower bounds) on {|H_t(x-iy)|} with high precision when {y \gg \log x} (whereas {|H_0(x-iy)|} has good estimates as soon as {y} is larger than (say) {2}). This control is obtained by the expressing {H_t(x-iy)} in terms of some contour integrals and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it turns out that this weaker control is still (barely) enough for the rest of our argument to go through.

Once one has the initial control on zeroes, we now need to force convergence to local equilibrium by exploiting convexity of a Hamiltonian. Here, the relevant Hamiltonian is

\displaystyle H(t) := \sum_{j,k: j \neq k} \log \frac{1}{|x_j(t) - x_k(t)|},

ignoring for now the rather important technical issue that this sum is not actually absolutely convergent. (Because of this, we will need to truncate and renormalise the Hamiltonian in a number of ways which we will not detail here.) The ODE (3) is formally the gradient flow for this Hamiltonian. Furthermore, this Hamiltonian is a convex function of the {x_j} (because {t \mapsto \log \frac{1}{t}} is a convex function on {(0,+\infty)}). We therefore expect the Hamiltonian to be a decreasing function of time, and that the derivative should be an increasing function of time. As time passes, the derivative of the Hamiltonian would then be expected to converge to zero, which should imply convergence to local equilibrium.

Formally, the derivative of the above Hamiltonian is

\displaystyle \partial_t H(t) = -4 E(t), \ \ \ \ \ (4)


where {E(t)} is the “energy”

\displaystyle E(t) := \sum_{j,k: j \neq k} \frac{1}{|x_j(t) - x_k(t)|^2}.

Again, there is the important technical issue that this quantity is infinite; but it turns out that if we renormalise the Hamiltonian appropriately, then the energy will also become suitably renormalised, and in particular will vanish when the {x_j} are arranged in an arithmetic progression, and be positive otherwise. One can also formally calculate the derivative of {E(t)} to be a somewhat complicated but manifestly non-negative quantity (a sum of squares); see this previous blog post for analogous computations in the case of heat flow on polynomials. After flowing from time {\Lambda} to time {0}, and using some crude initial bounds on {H(t)} and {E(t)} in this region (coming from the Riemann-von Mangoldt type formulae mentioned above and some further manipulations), we can eventually show that the (renormalisation of the) energy {E(0)} at time zero is small, which forces the {x_j} to locally resemble an arithmetic progression, which gives the required convergence to local equilibrium.

There are a number of technicalities involved in making the above sketch of argument rigorous (for instance, justifying interchanges of derivatives and infinite sums turns out to be a little bit delicate). I will highlight here one particular technical point. One of the ways in which we make expressions such as the energy {E(t)} finite is to truncate the indices {j,k} to an interval {I} to create a truncated energy {E_I(t)}. In typical situations, we would then expect {E_I(t)} to be decreasing, which will greatly help in bounding {E_I(0)} (in particular it would allow one to control {E_I(0)} by time-averaged quantities such as {\int_{\Lambda/2}^0 E_I(t)\ dt}, which can in turn be controlled using variants of (4)). However, there are boundary effects at both ends of {I} that could in principle add a large amount of energy into {E_I}, which is bad news as it could conceivably make {E_I(0)} undesirably large even if integrated energies such as {\int_{\Lambda/2}^0 E_I(t)\ dt} remain adequately controlled. As it turns out, such boundary effects are negligible as long as there is a large gap between adjacent zeroes at boundary of {I} – it is only narrow gaps that can rapidly transmit energy across the boundary of {I}. Now, narrow gaps can certainly exist (indeed, the GUE hypothesis predicts these happen a positive fraction of the time); but the pigeonhole principle (together with the Riemann-von Mangoldt formula) can allow us to pick the endpoints of the interval {I} so that no narrow gaps appear at the boundary of {I} for any given time {t}. However, there was a technical problem: this argument did not allow one to find a single interval {I} that avoided gaps for all times {\Lambda/2 \leq t \leq 0} simultaneously – the pigeonhole principle could produce a different interval {I} for each time {t}! Since the number of times was uncountable, this was a serious issue. (In physical terms, the problem was that there might be very fast “longitudinal waves” in the dynamics that, at each time, cause some gaps between zeroes to be highly compressed, but the specific gap that was narrow changed very rapidly with time. Such waves could, in principle, import a huge amount of energy into {E_I} by time {0}.) To resolve this, we borrowed a PDE trick of Bourgain’s, in which the pigeonhole principle was coupled with local conservation laws. More specifically, we use the phenomenon that very narrow gaps {g_i = x_{i+1}-x_i} take a nontrivial amount of time to expand back to a reasonable size (this can be seen by comparing the evolution of this gap with solutions of the scalar ODE {\partial_t g = \frac{4}{g^2}}, which represents the fastest at which a gap such as {g_i} can expand). Thus, if a gap {g_i} is reasonably large at some time {t_0}, it will also stay reasonably large at slightly earlier times {t \in [t_0-\delta, t_0]} for some moderately small {\delta>0}. This lets one locate an interval {I} that has manageable boundary effects during the times in {[t_0-\delta, t_0]}, so in particular {E_I} is basically non-increasing in this time interval. Unfortunately, this interval is a little bit too short to cover all of {[\Lambda/2,0]}; however it turns out that one can iterate the above construction and find a nested sequence of intervals {I_k}, with each {E_{I_k}} non-increasing in a different time interval {[t_k - \delta, t_k]}, and with all of the time intervals covering {[\Lambda/2,0]}. This turns out to be enough (together with the obvious fact that {E_I} is monotone in {I}) to still control {E_I(0)} for some reasonably sized interval {I}, as required for the rest of the arguments.

ADDED LATER: the following analogy (involving functions with just two zeroes, rather than an infinite number of zeroes) may help clarify the relation between this result and the Riemann hypothesis (and in particular why this result does not make the Riemann hypothesis any easier to prove, in fact it confirms the delicate nature of that hypothesis). Suppose one had a quadratic polynomial {P} of the form {P(z) = z^2 + \Lambda}, where {\Lambda} was an unknown real constant. Suppose that one was for some reason interested in the analogue of the “Riemann hypothesis” for {P}, namely that all the zeroes of {P} are real. A priori, there are three scenarios:

  • (Riemann hypothesis false) {\Lambda > 0}, and {P} has zeroes {\pm i |\Lambda|^{1/2}} off the real axis.
  • (Riemann hypothesis true, but barely so) {\Lambda = 0}, and both zeroes of {P} are on the real axis; however, any slight perturbation of {\Lambda} in the positive direction would move zeroes off the real axis.
  • (Riemann hypothesis true, with room to spare) {\Lambda < 0}, and both zeroes of {P} are on the real axis. Furthermore, any slight perturbation of {P} will also have both zeroes on the real axis.

The analogue of our result in this case is that {\Lambda \geq 0}, thus ruling out the third of the three scenarios here. In this simple example in which only two zeroes are involved, one can think of the inequality {\Lambda \geq 0} as asserting that if the zeroes of {P} are real, then they must be repeated. In our result (in which there are an infinity of zeroes, that become increasingly dense near infinity), and in view of the convergence to local equilibrium properties of (3), the analogous assertion is that if the zeroes of {H_0} are real, then they do not behave locally as if they were in arithmetic progression.

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)

for medium-sized {h} and large {X}, where {k \geq l \geq 1} are natural numbers and {d_k(n) = \sum_{n = m_1 \dots m_k} 1} is the {k^{th}} divisor function (actually our methods can also treat a generalisation in which {k} is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})

for any fixed {\varepsilon>0}, where {P_{k,l,h}} is a certain explicit (but rather complicated) polynomial of degree {k+l-1}. Such asymptotics are known when {l \leq 2}, but remain open for {k \geq l \geq 3}. In the previous paper, we were able to obtain a weaker bound of the form

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)

for {1-O_A(\log^{-A} X)} of the shifts {-H \leq h \leq H}, whenever the shift range {H} lies between {X^{8/33+\varepsilon}} and {X^{1-\varepsilon}}. But the methods become increasingly hard to use as {H} gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X

for {1-o(1)} of the shifts {-H \leq h \leq H}, where {H} can now be as short as {H = \log^{10^4 k \log k} X}. The constant {10^4} can be improved, but there are serious obstacles to using our method to go below {\log^{k \log k} X} (as the exceptionally large values of {d_k} then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions {d_k,d_l} are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

\displaystyle  S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).

Actually, it is convenient to first prune {d_k} slightly by zeroing out this function on “atypical” numbers {n} that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of {S(\alpha)} for “major arc” {\alpha} can be treated by standard techniques (and is the source of the main term {P_{k,l,h}(\log X) X}; the main difficulty comes from treating the contribution of “minor arc” {\alpha}.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global {L^2} norm {\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local {L^2} norms {\int_I |S(\alpha)|^2\ d\alpha}, where {I} was a minor arc interval of length about {1/H}, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For {d_k}, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local {L^2} norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as {d_k}, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global {L^2} bound will definitely be unusable, because the {\ell^2} sum {\sum_{n \leq X} d_k(n)^2} has too many unwanted factors of {\log X}. Fortunately, we can substitute this global {L^2} bound with a “large values” bound that controls expressions such as

\displaystyle  \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha

for a moderate number of disjoint intervals {I_1,\dots,I_J}, with a bound that is slightly better (for {J} a medium-sized power of {\log X}) than what one would have obtained by bounding each integral {\int_{I_i} |S(\alpha)|^2\ d\alpha} separately. (One needs to save more than {J^{1/2}} for the argument to work; we end up saving a factor of about {J^{3/4}}.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

\displaystyle  \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},

where {\alpha_i} is the midpoint of {I_i}; thus we need some upper bound on the large local Fourier coefficients of {d_k}. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace {d_k} by a more tractable and “pseudorandom” majorant {\tilde d_k} for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

\displaystyle  |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|

after various averaging in the {x, i,i'} parameters. These local Fourier coefficients of {\tilde d_k} turn out to be small on average unless {\alpha_i -\alpha_{i'}} is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the {\alpha_i} are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies {\alpha_i} together, and modifying the duality argument accordingly).

I have just uploaded to the arXiv the paper “An inverse theorem for an inequality of Kneser“, submitted to a special issue of the Proceedings of the Steklov Institute of Mathematics in honour of Sergei Konyagin. It concerns an inequality of Kneser discussed previously in this blog, namely that

\displaystyle \mu(A+B) \geq \min(\mu(A)+\mu(B), 1) \ \ \ \ \ (1)

whenever {A,B} are compact non-empty subsets of a compact connected additive group {G} with probability Haar measure {\mu}.  (A later result of Kemperman extended this inequality to the nonabelian case.) This inequality is non-trivial in the regime

\displaystyle \mu(A), \mu(B), 1- \mu(A)-\mu(B) > 0. \ \ \ \ \ (2)

The connectedness of {G} is essential, otherwise one could form counterexamples involving proper subgroups of {G} of positive measure. In the blog post, I indicated how this inequality (together with a more “robust” strengthening of it) could be deduced from submodularity inequalities such as

\displaystyle \mu( (A_1 \cup A_2) + B) + \mu( (A_1 \cap A_2) + B)

\displaystyle \leq \mu(A_1+B) + \mu(A_2+B) \ \ \ \ \ (3)

which in turn easily follows from the identity {(A_1 \cup A_2) + B = (A_1+B) \cup (A_2+B)} and the inclusion {(A_1 \cap A_2) + B \subset (A_1 +B) \cap (A_2+B)}, combined with the inclusion-exclusion formula.

In the non-trivial regime (2), equality can be attained in (1), for instance by taking {G} to be the unit circle {G = {\bf R}/{\bf Z}} and {A,B} to be arcs in that circle (obeying (2)). A bit more generally, if {G} is an arbitrary connected compact abelian group and {\xi: G \rightarrow {\bf R}/{\bf Z}} is a non-trivial character (i.e., a continuous homomorphism), then {\xi} must be surjective (as {{\bf R}/{\bf Z}} has no non-trivial connected subgroups), and one can take {A = \xi^{-1}(I)} and {B = \xi^{-1}(J)} for some arcs {I,J} in that circle (again choosing the measures of these arcs to obey (2)). The main result of this paper is an inverse theorem that asserts that this is the only way in which equality can occur in (1) (assuming (2)); furthermore, if (1) is close to being satisfied with equality and (2) holds, then {A,B} must be close (in measure) to an example of the above form {A = \xi^{-1}(I), B = \xi^{-1}(J)}. Actually, for technical reasons (and for the applications we have in mind), it is important to establish an inverse theorem not just for (1), but for the more robust version mentioned earlier (in which the sumset {A+B} is replaced by the partial sumset {A +_\varepsilon B} consisting of “popular” sums).

Roughly speaking, the idea is as follows. Let us informally call {(A,B)} a critical pair if (2) holds and the inequality (1) (or more precisely, a robust version of this inequality) is almost obeyed with equality. The notion of a critical pair obeys some useful closure properties. Firstly, it is symmetric in {A,B}, and invariant with respect to translation of either {A} or {B}. Furthermore, from the submodularity inequality (3), one can show that if {(A_1,B)} and {(A_2,B)} are critical pairs (with {\mu(A_1 \cap A_2)} and {1 - \mu(A_1 \cup A_2) - \mu(B)} positive), then {(A_1 \cap A_2,B)} and {(A_1 \cup A_2, B)} are also critical pairs. (Note that this is consistent with the claim that critical pairs only occur when {A,B} come from arcs of a circle.) Similarly, from associativity {(A+B)+C = A+(B+C)}, one can show that if {(A,B)} and {(A+B,C)} are critical pairs, then so are {(B,C)} and {(A,B+C)}.

One can combine these closure properties to obtain further ones. For instance, suppose {A,B} is such that {\mu(A+B)  0}. Then (cheating a little bit), one can show that {(A+B,C)} is also a critical pair, basically because {A+B} is the union of the {A+b}, {b \in B}, the {(A+b,C)} are all critical pairs, and the {A+b} all intersect each other. This argument doesn’t quite work as stated because one has to apply the closure property under union an uncountable number of times, but it turns out that if one works with the robust version of sumsets and uses a random sampling argument to approximate {A+B} by the union of finitely many of the {A+b}, then the argument can be made to work.

Using all of these closure properties, it turns out that one can start with an arbitrary critical pair {(A,B)} and end up with a small set {C} such that {(A,C)} and {(kC,C)} are also critical pairs for all {1 \leq k \leq 10^4} (say), where {kC} is the {k}-fold sumset of {C}. (Intuitively, if {A,B} are thought of as secretly coming from the pullback of arcs {I,J} by some character {\xi}, then {C} should be the pullback of a much shorter arc by the same character.) In particular, {C} exhibits linear growth, in that {\mu(kC) = k\mu(C)} for all {1 \leq k \leq 10^4}. One can now use standard technology from inverse sumset theory to show first that {C} has a very large Fourier coefficient (and thus is biased with respect to some character {\xi}), and secondly that {C} is in fact almost of the form {C = \xi^{-1}(K)} for some arc {K}, from which it is not difficult to conclude similar statements for {A} and {B} and thus finish the proof of the inverse theorem.

In order to make the above argument rigorous, one has to be more precise about what the modifier “almost” means in the definition of a critical pair. I chose to do this in the language of “cheap” nonstandard analysis (aka asymptotic analysis), as discussed in this previous blog post; one could also have used the full-strength version of nonstandard analysis, but this does not seem to convey any substantial advantages. (One can also work in a more traditional “non-asymptotic” framework, but this requires one to keep much more careful account of various small error terms and leads to a messier argument.)


[Update, Nov 15: Corrected the attribution of the inequality (1) to Kneser instead of Kemperman.  Thanks to John Griesmer for pointing out the error.]

Joni Teräväinen and I have just uploaded to the arXiv our paper “Odd order cases of the logarithmically averaged Chowla conjecture“, submitted to J. Numb. Thy. Bordeaux. This paper gives an alternate route to one of the main results of our previous paper, and more specifically reproves the asymptotic

\displaystyle \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o(\log x) \ \ \ \ \ (1)


for all odd {k} and all integers {h_1,\dots,h_k} (that is to say, all the odd order cases of the logarithmically averaged Chowla conjecture). Our previous argument relies heavily on some deep ergodic theory results of Bergelson-Host-Kra, Leibman, and Le (and was applicable to more general multiplicative functions than the Liouville function {\lambda}); here we give a shorter proof that avoids ergodic theory (but instead requires the Gowers uniformity of the (W-tricked) von Mangoldt function, established in several papers of Ben Green, Tamar Ziegler, and myself). The proof follows the lines sketched in the previous blog post. In principle, due to the avoidance of ergodic theory, the arguments here have a greater chance to be made quantitative; however, at present the known bounds on the Gowers uniformity of the von Mangoldt function are qualitative, except at the {U^2} level, which is unfortunate since the first non-trivial odd case {k=3} requires quantitative control on the {U^3} level. (But it may be possible to make the Gowers uniformity bounds for {U^3} quantitative if one assumes GRH, although when one puts everything together, the actual decay rate obtained in (1) is likely to be poor.)

Apoorva Khare and I have updated our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“, announced at this post from last month. The quantitative results are now sharpened using a new monotonicity property of ratios {s_{\lambda}(u)/s_{\mu}(u)} of Schur polynomials, namely that such ratios are monotone non-decreasing in each coordinate of {u} if {u} is in the positive orthant, and the partition {\lambda} is larger than that of {\mu}. (This monotonicity was also independently observed by Rachid Ait-Haddou, using the theory of blossoms.) In the revised version of the paper we give two proofs of this monotonicity. The first relies on a deep positivity result of Lam, Postnikov, and Pylyavskyy, which uses a representation-theoretic positivity result of Haiman to show that the polynomial combination

\displaystyle s_{(\lambda \wedge \nu) / (\mu \wedge \rho)} s_{(\lambda \vee \nu) / (\mu \vee \rho)} - s_{\lambda/\mu} s_{\nu/\rho} \ \ \ \ \ (1)

of skew-Schur polynomials is Schur-positive for any partitions {\lambda,\mu,\nu,\rho} (using the convention that the skew-Schur polynomial {s_{\lambda/\mu}} vanishes if {\mu} is not contained in {\lambda}, and where {\lambda \wedge \nu} and {\lambda \vee \nu} denotes the pointwise min and max of {\lambda} and {\nu} respectively). It is fairly easy to derive the monotonicity of {s_\lambda(u)/s_\mu(u)} from this, by using the expansion

\displaystyle s_\lambda(u_1,\dots, u_n) = \sum_k u_1^k s_{\lambda/(k)}(u_2,\dots,u_n)

of Schur polynomials into skew-Schur polynomials (as was done in this previous post).

The second proof of monotonicity avoids representation theory by a more elementary argument establishing the weaker claim that the above expression (1) is non-negative on the positive orthant. In fact we prove a more general determinantal log-supermodularity claim which may be of independent interest:

Theorem 1 Let {A} be any {n \times n} totally positive matrix (thus, every minor has a non-negative determinant). Then for any {k}-tuples {I_1,I_2,J_1,J_2} of increasing elements of {\{1,\dots,n\}}, one has

\displaystyle \det( A_{I_1 \wedge I_2, J_1 \wedge J_2} ) \det( A_{I_1 \vee I_2, J_1 \vee J_2} ) - \det(A_{I_1,J_1}) \det(A_{I_2,J_2}) \geq 0

where {A_{I,J}} denotes the {k \times k} minor formed from the rows in {I} and columns in {J}.

For instance, if {A} is the matrix

\displaystyle A = \begin{pmatrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{pmatrix}

for some real numbers {a,\dots,p}, one has

\displaystyle a h - de\geq 0

(corresponding to the case {k=1}, {I_1 = (1), I_2 = (2), J_1 = (4), J_2 = (1)}), or

\displaystyle \det \begin{pmatrix} a & c \\ i & k \end{pmatrix} \det \begin{pmatrix} f & h \\ n & p \end{pmatrix} - \det \begin{pmatrix} e & h \\ i & l \end{pmatrix} \det \begin{pmatrix} b & c \\ n & o \end{pmatrix} \geq 0

(corresponding to the case {k=2}, {I_1 = (2,3)}, {I_2 = (1,4)}, {J_1 = (1,4)}, {J_2 = (2,3)}). It turns out that this claim can be proven relatively easy by an induction argument, relying on the Dodgson and Karlin identities from this previous post; the difficulties are largely notational in nature. Combining this result with the Jacobi-Trudi identity for skew-Schur polynomials (discussed in this previous post) gives the non-negativity of (1); it can also be used to directly establish the monotonicity of ratios {s_\lambda(u)/s_\mu(u)} by applying the theorem to a generalised Vandermonde matrix.

(Log-supermodularity also arises as the natural hypothesis for the FKG inequality, though I do not know of any interesting application of the FKG inequality in this current setting.)

Apoorva Khare and I have just uploaded to the arXiv our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“. This paper explores the relationship between positive definiteness of Hermitian matrices, and entrywise operations on these matrices. The starting point for this theory is the Schur product theorem, which asserts that if {A = (a_{ij})_{1 \leq i,j \leq N}} and {B = (b_{ij})_{1 \leq i,j \leq N}} are two {N \times N} Hermitian matrices that are positive semi-definite, then their Hadamard product

\displaystyle  A \circ B := (a_{ij} b_{ij})_{1 \leq i,j \leq N}

is also positive semi-definite. (One should caution that the Hadamard product is not the same as the usual matrix product.) To prove this theorem, first observe that the claim is easy when {A = {\bf u} {\bf u}^*} and {B = {\bf v} {\bf v}^*} are rank one positive semi-definite matrices, since in this case {A \circ B = ({\bf u} \circ {\bf v}) ({\bf u} \circ {\bf v})^*} is also a rank one positive semi-definite matrix. The general case then follows by noting from the spectral theorem that a general positive semi-definite matrix can be expressed as a non-negative linear combination of rank one positive semi-definite matrices, and using the bilinearity of the Hadamard product and the fact that the set of positive semi-definite matrices form a convex cone. A modification of this argument also lets one replace “positive semi-definite” by “positive definite” in the statement of the Schur product theorem.

One corollary of the Schur product theorem is that any polynomial {P(z) = c_0 + c_1 z + \dots + c_d z^d} with non-negative coefficients {c_n \geq 0} is entrywise positivity preserving on the space {{\mathbb P}_N({\bf C})} of {N \times N} positive semi-definite Hermitian matrices, in the sense that for any matrix {A = (a_{ij})_{1 \leq i,j \leq N}} in {{\mathbb P}_N({\bf C})}, the entrywise application

\displaystyle  P[A] := (P(a_{ij}))_{1 \leq i,j \leq N}

of {P} to {A} is also positive semi-definite. (As before, one should caution that {P[A]} is not the application {P(A)} of {P} to {A} by the usual functional calculus.) Indeed, one can expand

\displaystyle  P[A] = c_0 A^{\circ 0} + c_1 A^{\circ 1} + \dots + c_d A^{\circ d},

where {A^{\circ i}} is the Hadamard product of {i} copies of {A}, and the claim now follows from the Schur product theorem and the fact that {{\mathbb P}_N({\bf C})} is a convex cone.

A slight variant of this argument, already observed by Pólya and Szegö in 1925, shows that if {I} is any subset of {{\bf C}} and

\displaystyle  f(z) = \sum_{n=0}^\infty c_n z^n \ \ \ \ \ (1)

is a power series with non-negative coefficients {c_n \geq 0} that is absolutely and uniformly convergent on {I}, then {f} will be entrywise positivity preserving on the set {{\mathbb P}_N(I)} of positive definite matrices with entries in {I}. (In the case that {I} is of the form {I = [0,\rho]}, such functions are precisely the absolutely monotonic functions on {I}.)

In the work of Schoenberg and of Rudin, we have a converse: if {f: (-1,1) \rightarrow {\bf C}} is a function that is entrywise positivity preserving on {{\mathbb P}_N((-1,1))} for all {N}, then it must be of the form (1) with {c_n \geq 0}. Variants of this result, with {(-1,1)} replaced by other domains, appear in the work of Horn, Vasudeva, and Guillot-Khare-Rajaratnam.

This gives a satisfactory classification of functions {f} that are entrywise positivity preservers in all dimensions {N} simultaneously. However, the question remains as to what happens if one fixes the dimension {N}, in which case one may have a larger class of entrywise positivity preservers. For instance, in the trivial case {N=1}, a function would be entrywise positivity preserving on {{\mathbb P}_1((0,\rho))} if and only if {f} is non-negative on {I}. For higher {N}, there is a necessary condition of Horn (refined slightly by Guillot-Khare-Rajaratnam) which asserts (at least in the case of smooth {f}) that all derivatives of {f} at zero up to {(N-1)^{th}} order must be non-negative in order for {f} to be entrywise positivity preserving on {{\mathbb P}_N((0,\rho))} for some {0 < \rho < \infty}. In particular, if {f} is of the form (1), then {c_0,\dots,c_{N-1}} must be non-negative. In fact, a stronger assertion can be made, namely that the first {N} non-zero coefficients in (1) (if they exist) must be positive, or equivalently any negative term in (1) must be preceded (though not necessarily immediately) by at least {N} positive terms. If {f} is of the form (1) is entrywise positivity preserving on the larger set {{\mathbb P}_N((0,+\infty))}, one can furthermore show that any negative term in (1) must also be followed (though not necessarily immediately) by at least {N} positive terms.

The main result of this paper is that these sign conditions are the only constraints for entrywise positivity preserving power series. More precisely:

Theorem 1 For each {n}, let {\epsilon_n \in \{-1,0,+1\}} be a sign.

  • Suppose that any negative sign {\epsilon_M = -1} is preceded by at least {N} positive signs (thus there exists {0 \leq n_0 < \dots < n_{N-1}< M} with {\epsilon_{n_0} = \dots = \epsilon_{n_{N-1}} = +1}). Then, for any {0 < \rho < \infty}, there exists a convergent power series (1) on {(0,\rho)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,\rho))}.
  • Suppose in addition that any negative sign {\epsilon_M = -1} is followed by at least {N} positive signs (thus there exists {M < n_N < \dots < n_{2N-1}} with {\epsilon_{n_N} = \dots = \epsilon_{n_{2N-1}} = +1}). Then there exists a convergent power series (1) on {(0,+\infty)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,+\infty))}.

One can ask the same question with {(0,\rho)} or {(0,+\infty)} replaced by other domains such as {(-\rho,\rho)}, or the complex disk {D(0,\rho)}, but it turns out that there are far fewer entrywise positivity preserving functions in those cases basically because of the non-trivial zeroes of Schur polynomials in these ranges; see the paper for further discussion. We also have some quantitative bounds on how negative some of the coefficients can be compared to the positive coefficients, but they are a bit technical to state here.

The heart of the proofs of these results is an analysis of the determinants {\mathrm{det} P[ {\bf u} {\bf u}^* ]} of polynomials {P} applied entrywise to rank one matrices {uu^*}; the positivity of these determinants can be used (together with a continuity argument) to establish the positive definiteness of {P[uu^*]} for various ranges of {P} and {u}. Using the Cauchy-Binet formula, one can rewrite such determinants as linear combinations of squares of magnitudes of generalised Vandermonde determinants

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N},

where {{\bf u} = (u_1,\dots,u_N)} and the signs of the coefficients in the linear combination are determined by the signs of the coefficients of {P}. The task is then to find upper and lower bounds for the magnitudes of such generalised Vandermonde determinants. These determinants oscillate in sign, which makes the problem look difficult; however, an algebraic miracle intervenes, namely the factorisation

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N} = \pm V({\bf u}) s_\lambda({\bf u})

of the generalised Vandermonde determinant into the ordinary Vandermonde determinant

\displaystyle  V({\bf u}) = \prod_{1 \leq i < j \leq N} (u_j - u_i)

and a Schur polynomial {s_\lambda} applied to {{\bf u}}, where the weight {\lambda} of the Schur polynomial is determined by {n_1,\dots,n_N} in a simple fashion. The problem then boils down to obtaining upper and lower bounds for these Schur polynomials. Because we are restricting attention to matrices taking values in {(0,\rho)} or {(0,+\infty)}, the entries of {{\bf u}} can be taken to be non-negative. One can then take advantage of the total positivity of the Schur polynomials to compare these polynomials with a monomial, at which point one can obtain good criteria for {P[A]} to be positive definite when {A} is a rank one positive definite matrix {A = {\bf u} {\bf u}^*}.

If we allow the exponents {n_1,\dots,n_N} to be real numbers rather than integers (thus replacing polynomials or power series by Pusieux series or Hahn series), then we lose the above algebraic miracle, but we can replace it with a geometric miracle, namely the Harish-Chandra-Itzykson-Zuber identity, which I discussed in this previous blog post. This factors the above generalised Vandermonde determinant as the product of the ordinary Vandermonde determinant and an integral of a positive quantity over the orthogonal group, which one can again compare with a monomial after some fairly elementary estimates.

It remains to understand what happens for more general positive semi-definite matrices {A}. Here we use a trick of FitzGerald and Horn to amplify the rank one case to the general case, by expressing a general positive semi-definite matrix {A} as a linear combination of a rank one matrix {{\bf u} {\bf u}^*} and another positive semi-definite matrix {B} that vanishes on the last row and column (and is thus effectively a positive definite {N-1 \times N-1} matrix). Using the fundamental theorem of calculus to continuously deform the rank one matrix {{\bf u} {\bf u}^*} to {A} in the direction {B}, one can then obtain positivity results for {P[A]} from positivity results for {P[{\bf u} {\bf u}^*]} combined with an induction hypothesis on {N}.

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

\displaystyle  \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_0(n+h_0) g_1(n+h_1)}{n} = 0

whenever {1 \leq \omega_m \leq x_m} were sequences going to infinity, {h_0,h_1} were distinct integers, and {g_0,g_1: {\bf N} \rightarrow {\bf C}} were {1}-bounded multiplicative functions which were non-pretentious in the sense that

\displaystyle  \liminf_{X \rightarrow \infty} \inf_{|t_j| \leq X} \sum_{p \leq X} \frac{1-\mathrm{Re}( g_j(p) \overline{\chi_j}(p) p^{it_j})}{p} = \infty \ \ \ \ \ (1)

for all Dirichlet characters {\chi_j} and for {j=0,1}. Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o(\log x)

for fixed any non-zero {h}, where {\lambda} was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o(\log x) \ \ \ \ \ (2)

for all odd {k} and all integers {h_1,\dots,h_k} (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument {n}).

For the more general Elliott conjecture, we can show that

\displaystyle  \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+h_1) \dots g_k(n+h_k)}{n} = 0

for any {k}, any integers {h_1,\dots,h_k} and any bounded multiplicative functions {g_1,\dots,g_k}, unless the product {g_1 \dots g_k} weakly pretends to be a Dirichlet character {\chi} in the sense that

\displaystyle  \sum_{p \leq X} \frac{1 - \hbox{Re}( g_1 \dots g_k(p) \overline{\chi}(p)}{p} = o(\log\log X).

This can be seen to imply (2) as a special case. Even when {g_1,\dots,g_k} does pretend to be a Dirichlet character {\chi}, we can still say something: if the limits

\displaystyle  f(a) := \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n}

exist for each {a \in {\bf Z}} (which can be guaranteed if we pass to a suitable subsequence), then {f} is the uniform limit of periodic functions {f_i}, each of which is {\chi}isotypic in the sense that {f_i(ab) = f_i(a) \chi(b)} whenever {a,b} are integers with {b} coprime to the periods of {\chi} and {f_i}. This does not pin down the value of any single correlation {f(a)}, but does put significant constraints on how these correlations may vary with {a}.

Among other things, this allows us to show that all {16} possible length four sign patterns {(\lambda(n+1),\dots,\lambda(n+4)) \in \{-1,+1\}^4} of the Liouville function occur with positive density, and all {65} possible length four sign patterns {(\mu(n+1),\dots,\mu(n+4)) \in \{-1,0,+1\}^4 \backslash \{-1,+1\}^4} occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

\displaystyle  f(a) := \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+a) \dots \lambda(n+(k-1)a)}{n}, \ \ \ \ \ (3)

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function {f}. The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime {p} and observe that {\lambda(pn)=-\lambda(n)} for any {n}, which allows us to rewrite (3) as

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle  \sum_{n \leq X} \frac{\lambda(pn) \lambda(pn+ap) \dots \lambda(pn+(k-1)ap)}{n}.

Making the change of variables {n' = pn}, we obtain

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle \sum_{n' \leq pX} \frac{\lambda(n') \lambda(n'+ap) \dots \lambda(n'+(k-1)ap)}{n'} p 1_{p|n'}.

The difference between {n' \leq pX} and {n' \leq X} is negligible in the limit (here is where we crucially rely on the log-averaging), hence

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} p 1_{p|n}

and thus by (3) we have

\displaystyle  (-1)^k f(a) = f(ap) + \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} (p 1_{p|n}-1).

The entropy decrement argument can be used to show that the latter limit is small for most {p} (roughly speaking, this is because the factors {p 1_{p|n}-1} behave like independent random variables as {p} varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the {\lambda} factors). We thus obtain the approximate isotopy property

\displaystyle  (-1)^k f(a) \approx f(ap) \ \ \ \ \ (4)

for most {a} and {p}.

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express {f(a)} as a multiple correlation

\displaystyle  f(a) = \int_X g(x) g(T^a x) \dots g(T^{(k-1)a} x)\ d\mu(x)

for some probability space {(X,\mu)} equipped with a measure-preserving invertible map {T: X \rightarrow X}. Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

\displaystyle  f(a) = f_1(a) + f_2(a) \ \ \ \ \ (5)

where {f_1} is a nilsequence, and {f_2} goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on {X}, which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on {f_1} so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error {f_2(a)}, we can now combine (5) to conclude that

\displaystyle  f(a) \approx (-1)^k f_1(ap).

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up {f_1} further into a periodic piece {f_0} and an “irrational” or “minor arc” piece {f_3}. The contribution of the minor arc piece {f_3} can be shown to mostly cancel itself out after dilating by primes {p} and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

\displaystyle  f(a) \approx (-1)^k f_0(ap),

which already shows (heuristically, at least) the claim that {f} can be approximated by periodic functions {f_0} which are isotopic in the sense that

\displaystyle  f_0(a) \approx (-1)^k f_0(ap).

But if {k} is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes {p} that are {1} modulo the period of {f_0}, and conclude now that {f_0} vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in {p} using the “{W}-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

\displaystyle  (-1)^k f(a) \approx {\bf E}_{b: (b,W)=1} f(ab)

where {b} ranges over a large range of integers coprime to some primorial {W = \prod_{p \leq w} p}. On the other hand, by iterating (4) we have

\displaystyle  f(a) \approx f(apq)

for most semiprimes {pq}, and by again averaging over semiprimes one can obtain an approximation of the form

\displaystyle  f(a) \approx {\bf E}_{b: (b,W)=1} f(ab).

For {k} odd, one can combine the two approximations to conclude that {f(a)=0}. (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well {\partial_{tt} u = -\nabla V(u)}, here we try to embed dynamical systems into the incompressible Euler equations

\displaystyle  \partial_t u + \nabla_u u = - \mathrm{grad}_g p \ \ \ \ \ (1)

\displaystyle  \mathrm{div}_g u = 0

on a Riemannian manifold {(M,g)}. (One is particularly interested in the case of flat manifolds {M}, particularly {{\bf R}^3} or {({\bf R}/{\bf Z})^3}, but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on {M} (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field {u}. It is thus natural to compare the Euler equations with quadratic ODE of the form

\displaystyle  \partial_t y = B(y,y) \ \ \ \ \ (2)

where {y: {\bf R} \rightarrow {\bf R}^n} is the unknown solution, and {B: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n} is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold {(M,g)}, which means that there is an injective linear map {U: {\bf R}^n \rightarrow \Gamma(TM)} from {{\bf R}^n} to smooth vector fields on {M}, as well as a bilinear map {P: {\bf R}^n \times {\bf R}^n \rightarrow C^\infty(M)} to smooth scalar fields on {M}, such that the map {y \mapsto (U(y), P(y,y))} takes solutions to (2) to solutions to (1), or equivalently that

\displaystyle  U(B(y,y)) + \nabla_{U(y)} U(y) = - \mathrm{grad}_g P(y,y)

\displaystyle  \mathrm{div}_g U(y) = 0

for all {y \in {\bf R}^n}.

For simplicity let us restrict {M} to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form {B} in (2) has to obey a cancellation condition

\displaystyle  \langle B(y,y), y \rangle = 0 \ \ \ \ \ (3)

for some positive definite inner product {\langle, \rangle: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}} on {{\bf R}^n}. The main result of the paper is the converse to this statement: if {B} is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold {(M,g)}; the catch is that this manifold will depend on the form {B} and on the dimension {n} (in fact in the construction I have, {M} is given explicitly as {SO(n) \times ({\bf R}/{\bf Z})^{n+1}}, with a funny metric on it that depends on {B}).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric {g} by partially reformulating the Euler equations (1) in terms of the covelocity {g \cdot u} (which is a {1}-form) instead of the velocity {u}. Using the freedom to modify the dimension of the underlying manifold {M}, one can also decouple the metric {g} from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

I’ve just uploaded to the arXiv my paper “On the universality of potential well dynamics“, submitted to Dynamics of PDE. This is a spinoff from my previous paper on blowup of nonlinear wave equations, inspired by some conversations with Sungjin Oh. Here we focus mainly on the zero-dimensional case of such equations, namely the potential well equation

\displaystyle  \partial_{tt} u = - (\nabla F)(u) \ \ \ \ \ (1)

for a particle {u: {\bf R} \rightarrow {\bf R}^m} trapped in a potential well with potential {F: {\bf R}^m \rightarrow {\bf R}}, with {F(z) \rightarrow +\infty} as {z \rightarrow \infty}. This ODE always admits global solutions from arbitrary initial positions {u(0)} and initial velocities {\partial_t u(0)}, thanks to conservation of the Hamiltonian {\frac{1}{2} |\partial_t u|^2 + F(u)}. As this Hamiltonian is coercive (in that its level sets are compact), solutions to this equation are always almost periodic. On the other hand, as can already be seen using the harmonic oscillator {\partial_{tt} u = - k^2 u} (and direct sums of this system), this equation can generate periodic solutions, as well as quasiperiodic solutions.

All quasiperiodic motions are almost periodic. However, there are many examples of dynamical systems that admit solutions that are almost periodic but not quasiperiodic. So one can pose the question: are the dynamics of potential wells universal in the sense that they can capture all almost periodic solutions?

A precise question can be phrased as follows. Let {M} be a compact manifold, and let {X} be a smooth vector field on {M}; to avoid degeneracies, let us take {X} to be non-singular in the sense that it is everywhere non-vanishing. Then the trajectories of the first-order ODE

\displaystyle  \partial_t u = X(u) \ \ \ \ \ (2)

for {u: {\bf R} \rightarrow M} are always global and almost periodic. Can we then find a (coercive) potential {F: {\bf R}^m \rightarrow {\bf R}} for some {m}, as well as a smooth embedding {\phi: M \rightarrow {\bf R}^m}, such that every solution {u} to (2) pushes forward under {\phi} to a solution to (1)? (Actually, for technical reasons it is preferable to map into the phase space {{\bf R}^m \times {\bf R}^m}, rather than position space {{\bf R}^m}, but let us ignore this detail for this discussion.)

It turns out that the answer is no; there is a very specific obstruction. Given a pair {(M,X)} as above, define a strongly adapted {1}-form to be a {1}-form {\phi} on {M} such that {\phi(X)} is pointwise positive, and the Lie derivative {{\mathcal L}_X \phi} is an exact {1}-form. We then have

Theorem 1 A smooth compact non-singular dynamics {(M,X)} can be embedded smoothly in a potential well system if and only if it admits a strongly adapted {1}-form.

For the “only if” direction, the key point is that potential wells (viewed as a Hamiltonian flow on the phase space {{\bf R}^m \times {\bf R}^m}) admit a strongly adapted {1}-form, namely the canonical {1}-form {p dq}, whose Lie derivative is the derivative {dL} of the Lagrangian {L := \frac{1}{2} |\partial_t u|^2 - F(u)} and is thus exact. The converse “if” direction is mainly a consequence of the Nash embedding theorem, and follows the arguments used in my previous paper.

Interestingly, the same obstruction also works for potential wells in a more general Riemannian manifold than {{\bf R}^m}, or for nonlinear wave equations with a potential; combining the two, the obstruction is also present for wave maps with a potential.

It is then natural to ask whether this obstruction is non-trivial, in the sense that there are at least some examples of dynamics {(M,X)} that do not support strongly adapted {1}-forms (and hence cannot be modeled smoothly by the dynamics of a potential well, nonlinear wave equation, or wave maps). I posed this question on MathOverflow, and Robert Bryant provided a very nice construction, showing that the vector field {(\sin(2\pi x), \cos(2\pi x))} on the {2}-torus {({\bf R}/{\bf Z})^2} had no strongly adapted {1}-forms, and hence the dynamics of this vector field cannot be smoothly reproduced by a potential well, nonlinear wave equation, or wave map:

On the other hand, the suspension of any diffeomorphism does support a strongly adapted {1}-form (the derivative {dt} of the time coordinate), and using this and the previous theorem I was able to embed a universal Turing machine into a potential well. In particular, there are flows for an explicitly describable potential well whose trajectories have behavior that is undecidable using the usual ZFC axioms of set theory! So potential well dynamics are “effectively” universal, despite the presence of the aforementioned obstruction.

In my previous work on blowup for Navier-Stokes like equations, I speculated that if one could somehow replicate a universal Turing machine within the Euler equations, one could use this machine to create a “von Neumann machine” that replicated smaller versions of itself, which on iteration would lead to a finite time blowup. Now that such a mechanism is present in nonlinear wave equations, it is tempting to try to make this scheme work in that setting. Of course, in my previous paper I had already demonstrated finite time blowup, at least in a three-dimensional setting, but that was a relatively simple discretely self-similar blowup in which no computation occurred. This more complicated blowup scheme would be significantly more effort to set up, but would be proof-of-concept that the same scheme would in principle be possible for the Navier-Stokes equations, assuming somehow that one can embed a universal Turing machine into the Euler equations. (But I’m still hopelessly stuck on how to accomplish this latter task…)