You are currently browsing the category archive for the ‘paper’ category.

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N) < N^\theta} for almost all {N} and any fixed {\theta > 0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this time. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)

When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

 

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

 

\displaystyle \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

 

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

 

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.

A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

 

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“, submitted to Inventiones. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes {{\mathcal P} := \{2,3,5,\dots\}}, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If {x} is a large number, then the largest prime gap {G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n} in {[1,x]} is of size {\asymp \log^2 x}. (Granville refines this conjecture to {\gtrsim \xi \log^2 x}, where {\xi := 2e^{-\gamma} = 1.1229\dots}. Here we use the asymptotic notation {X \gtrsim Y} for {X \geq (1-o(1)) Y}, {X \sim Y} for {X \gtrsim Y \gtrsim X}, {X \gg Y} for {X \geq C^{-1} Y}, and {X \asymp Y} for {X \gg Y \gg X}.)

Conjecture 2 (Hardy-Littlewood conjecture) If {\mathcal{H} := \{h_1,\dots,h_k\}} are fixed distinct integers, then the number of numbers {n \in [1,x]} with {n+h_1,\dots,n+h_k} all prime is {({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}} as {x \rightarrow \infty}, where the singular series {{\mathfrak S}(\mathcal{H})} is defined by the formula

\displaystyle  {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps {p_{n+1}-p_n} with {n \leq x} were asymptotically distributed according to an exponential distribution of mean {\log x}, in the sense that

\displaystyle  | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)

as {x \rightarrow \infty} for any fixed {\lambda \geq 0}. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of {\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}} for fixed {k,\lambda}, where {p_n} ranges over the primes in {[1,x]}. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

\displaystyle  1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is even and

\displaystyle  1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is odd, for any natural number {N}; setting {N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|} and taking means, one then gets upper and lower bounds for the probability that the interval {(p_n, p_n + \lambda \log x)} is free of primes. The most difficult step is to control the mean values of the singular series {{\mathfrak S}(\mathcal{H})} as {{\mathcal H}} ranges over {k}-tuples in a fixed interval such as {[0, \lambda \log x]}.

Heuristically, if one extrapolates the asymptotic (1) to the regime {\lambda \asymp \log x}, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below {1} when {\lambda} is significantly larger than {\log x}. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of {\lambda}, which is only enough to establish the far weaker bound {G_{\mathcal P}(x) / \log x \rightarrow \infty}, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on {G_{\mathcal P}(x)}). An inspection of the argument shows that if one wished to extend (1) to parameter choices {\lambda} that were allowed to grow with {x}, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length {k} of the tuple {{\mathcal H} = (h_1,\dots,h_k)}, as well as the magnitudes of the shifts {h_1,\dots,h_k}, were also allowed to grow with {x}. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

  • If the Hardy-Littlewood conjecture is uniformly true for {k}-tuples of length {k \ll \frac{\log x}{\log\log x}}, and with shifts {h_1,\dots,h_k} of size {O( \log^2 x )}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}.
  • If the Hardy-Littlewood conjecture is “true on average” for {k}-tuples of length {k \ll \frac{y}{\log x}} and shifts {h_1,\dots,h_k} of size {y} for all {\log x \leq y \leq \log^2 x \log\log x}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \log^2 x}.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with {k} allowed to grow slowly with {x}. Again, the main difficulty is to accurately estimate average values of the singular series {{\mathfrak S}({\mathfrak H})}. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

\displaystyle  V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)

of the singular series, for a suitable cutoff {z}; it turns out that when studying prime tuples of size {t}, the most convenient cutoff {z(t)} is the “Pólya magic cutoff“, defined as the largest prime for which

\displaystyle  \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)

(this is well defined for {t \geq e^2}); by Mertens’ theorem, we have {z(t) \sim t^{1/e^\gamma}}. One can interpret {V_{\mathcal Z}(z)} probabilistically as

\displaystyle  V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )

where {\mathcal{S}_z \subset {\bf Z}} is the randomly sifted set of integers formed by removing one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p \leq z}. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes {{\mathcal P}} behave in some approximate statistical sense like the random sifted set {\mathcal{S}_z}, and one can prove the above theorem by using the Bonferroni inequalities both for the primes {{\mathcal P}} and for the random sifted set, and comparing the two (using an even {K} for the sifted set and an odd {K} for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes {{\mathcal P}} other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of {{\mathcal P}} which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model {{\mathcal C}} of the primes, in which each natural number {n \geq 3} is selected for membership in {{\mathcal C}} with an independent probability of {1/\log n}. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

\displaystyle  | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})

and Cramér also showed that the largest gap {G_{\mathcal C}(x)} was almost surely {\sim \log^2 x}. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series {{\mathfrak S}({\mathcal H})} is absent.

Granville proposed a refinement {{\mathcal G}} to Cramér’s random model {{\mathcal C}} in which one first sieves out (in each dyadic interval {[x,2x]}) all residue classes {0 \hbox{ mod } p} for {p \leq A} for a certain threshold {A = \log^{1-o(1)} x = o(\log x)}, and then places each surviving natural number {n} in {{\mathcal G}} with an independent probability {\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap {G_{\mathcal G}(x)} in this model was almost surely {\gtrsim \xi \log^2 x}, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to {10^{18}}, for instance, have shown that {G_{\mathcal P}(x)} never exceeds {0.9206 \log^2 x} in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff {A}. After some experimentation, we were able to produce a tractable random model {{\mathcal R}} for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of {\gtrsim \xi \log^2 x} (we also get an upper bound of {\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}} for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p}, and as before we let {S_z} be the sifted set of integers formed by deleting the residue classes {a_p \hbox{ mod } p} with {p \leq z}. We then set

\displaystyle  {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}

with {z(t)} Pólya’s magic cutoff (this is the cutoff that gives {{\mathcal R}} a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

\displaystyle  \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)

and that the Hardy-Littlewood conjectures hold with power savings for {k} up to {\log^c x} for any fixed {c < 1} and for shifts {h_1,\dots,h_k} of size {O(\log^c x)}. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint {c=1}), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model {{\mathcal R}} (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for {{\mathcal G}_{\mathcal R}(x) } that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo {p} from an interval {[0,y]} for all primes {p} up to a given threshold. The lower bound in (3) relates to the case of deleting the {0 \hbox{ mod } p} residue classes from {[0,y]}; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of {{\mathcal R}} is somewhat more complicated than of {{\mathcal C}} or {{\mathcal G}} as there is now non-trivial coupling between the events {n \in {\mathcal R}} as {n} varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval {[0,y]} and randomly deletes a residue class {a_p \hbox{ mod } p} for each prime {p} up to a given threshold. For very small {p} this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized {p}, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes {p}, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements {n \in [0,y]} to primes {p} if {n} falls in the random residue class {a_p \hbox{ mod } p}) turns out to give the best results.

I’ve just uploaded to the arXiv my paper “Quantitative bounds for critically bounded solutions to the Navier-Stokes equations“, submitted to the proceedings of the Linde Hall Inaugural Math Symposium. (I unfortunately had to cancel my physical attendance at this symposium for personal reasons, but was still able to contribute to the proceedings.) In recent years I have been interested in working towards establishing the existence of classical solutions for the Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p

\displaystyle \nabla \cdot u = 0

that blow up in finite time, but this time for a change I took a look at the other side of the theory, namely the conditional regularity results for this equation. There are several such results that assert that if a certain norm of the solution stays bounded (or grows at a controlled rate), then the solution stays regular; taken in the contrapositive, they assert that if a solution blows up at a certain finite time {T_*}, then certain norms of the solution must also go to infinity. Here are some examples (not an exhaustive list) of such blowup criteria:

  • (Leray blowup criterion, 1934) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\liminf_{t \rightarrow T_*} (T_* - t)^{\frac{1}{2}-\frac{3}{2p}} \| u(t) \|_{L^p_x({\bf R}^3)} \geq c} for an absolute constant {c>0}.
  • (ProdiSerrinLadyzhenskaya blowup criterion, 1959-1967) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\| u \|_{L^q_t L^p_x([0,T_*) \times {\bf R}^3)} =+\infty}, where {\frac{1}{q} := \frac{1}{2} - \frac{3}{2p}}.
  • (Beale-Kato-Majda blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\| \omega \|_{L^1_t L^\infty_x([0,T_*) \times {\bf R}^3)} = +\infty}, where {\omega := \nabla \times u} is the vorticity.
  • (Kato blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\liminf_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} \geq c} for some absolute constant {c>0}.
  • (Escauriaza-Seregin-Sverak blowup criterion, 2003) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Seregin blowup criterion, 2012) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Phuc blowup criterion, 2015) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^{3,q}_x({\bf R}^3)} = +\infty} for any {q < \infty}.
  • (Gallagher-Koch-Planchon blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.
  • (Albritton blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.

My current paper is most closely related to the Escauriaza-Seregin-Sverak blowup criterion, which was the first to show a critical (i.e., scale-invariant, or dimensionless) spatial norm, namely {L^3_x({\bf R}^3)}, had to become large. This result now has many proofs; for instance, many of the subsequent blowup criterion results imply the Escauriaza-Seregin-Sverak one as a special case, and there are also additional proofs by Gallagher-Koch-Planchon (building on ideas of Kenig-Koch), and by Dong-Du. However, all of these proofs rely on some form of a compactness argument: given a finite time blowup, one extracts some suitable family of rescaled solutions that converges in some weak sense to a limiting solution that has some additional good properties (such as almost periodicity modulo symmetries), which one can then rule out using additional qualitative tools, such as unique continuation and backwards uniqueness theorems for parabolic heat equations. In particular, all known proofs use some version of the backwards uniqueness theorem of Escauriaza, Seregin, and Sverak. Because of this reliance on compactness, the existing proofs of the Escauriaza-Seregin-Sverak blowup criterion are qualitative, in that they do not provide any quantitative information on how fast the {\|u(t)\|_{L^3_x({\bf R}^3)}} norm will go to infinity (along a subsequence of times).

On the other hand, it is a general principle that qualitative arguments established using compactness methods ought to have quantitative analogues that replace the use of compactness by more complicated substitutes that give effective bounds; see for instance these previous blog posts for more discussion. I therefore was interested in trying to obtain a quantitative version of this blowup criterion that gave reasonably good effective bounds (in particular, my objective was to avoid truly enormous bounds such as tower-exponential or Ackermann function bounds, which often arise if one “naively” tries to make a compactness argument effective). In particular, I obtained the following triple-exponential quantitative regularity bounds:

Theorem 1 If {u} is a classical solution to Navier-Stokes on {[0,T) \times {\bf R}^3} with

\displaystyle \| u \|_{L^\infty_t L^3_x([0,T) \times {\bf R}^3)} \leq A \ \ \ \ \ (1)

 

for some {A \geq 2}, then

\displaystyle | \nabla^j u(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+1}{2}}

and

\displaystyle | \nabla^j \omega(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+2}{2}}

for {(t,x) \in [0,T) \times {\bf R}^3} and {j=0,1}.

As a corollary, one can now improve the Escauriaza-Seregin-Sverak blowup criterion to

\displaystyle \limsup_{t \rightarrow T_*} \frac{\|u(t) \|_{L^3_x({\bf R}^3)}}{(\log\log\log \frac{1}{T_*-t})^c} = +\infty

for some absolute constant {c>0}, which to my knowledge is the first (very slightly) supercritical blowup criterion for Navier-Stokes in the literature.

The proof uses many of the same quantitative inputs as previous arguments, most notably the Carleman inequalities used to establish unique continuation and backwards uniqueness theorems for backwards heat equations, but also some additional techniques that make the quantitative bounds more efficient. The proof focuses initially on points of concentration of the solution, which we define as points {(t_0,x_0)} where there is a frequency {N_0} for which one has the bound

\displaystyle |N_0^{-1} P_{N_0} u(t_0,x_0)| \geq A^{-C_0} \ \ \ \ \ (2)

 

for a large absolute constant {C_0}, where {P_{N_0}} is a Littlewood-Paley projection to frequencies {\sim N_0}. (This can be compared with the upper bound of {O(A)} for the quantity on the left-hand side that follows from (1).) The factor of {N_0^{-1}} normalises the left-hand side of (2) to be dimensionless (i.e., critical). The main task is to show that the dimensionless quantity {t_0 N_0^2} cannot get too large; in particular, we end up establishing a bound of the form

\displaystyle t_0 N_0^2 \lesssim \exp\exp\exp A^{O(C_0^6)}

from which the above theorem ends up following from a routine adaptation of the local well-posedness and regularity theory for Navier-Stokes.

The strategy is to show that any concentration such as (2) when {t_0 N_0^2} is large must force a significant component of the {L^3_x} norm of {u(t_0)} to also show up at many other locations than {x_0}, which eventually contradicts (1) if one can produce enough such regions of non-trivial {L^3_x} norm. (This can be viewed as a quantitative variant of the “rigidity” theorems in some of the previous proofs of the Escauriaza-Seregin-Sverak theorem that rule out solutions that exhibit too much “compactness” or “almost periodicity” in the {L^3_x} topology.) The chain of causality that leads from a concentration (2) at {(t_0,x_0)} to significant {L^3_x} norm at other regions of the time slice {t_0 \times {\bf R}^3} is somewhat involved (though simpler than the much more convoluted schemes I initially envisaged for this argument):

  1. Firstly, by using Duhamel’s formula, one can show that a concentration (2) can only occur (with {t_0 N_0^2} large) if there was also a preceding concentration

    \displaystyle |N_1^{-1} P_{N_1} u(t_1,x_1)| \geq A^{-C_0} \ \ \ \ \ (3)

     

    at some slightly previous point {(t_1,x_1)} in spacetime, with {N_1} also close to {N_0} (more precisely, we have {t_1 = t_0 - A^{-O(C_0^3)} N_0^{-2}}, {N_1 = A^{O(C_0^2)} N_0}, and {x_1 = x_0 + O( A^{O(C_0^4)} N_0^{-1})}). This can be viewed as a sort of contrapositive of a “local regularity theorem”, such as the ones established by Caffarelli, Kohn, and Nirenberg. A key point here is that the lower bound {A^{-C_0}} in the conclusion (3) is precisely the same as the lower bound in (2), so that this backwards propagation of concentration can be iterated.

  2. Iterating the previous step, one can find a sequence of concentration points

    \displaystyle |N_n^{-1} P_{N_n} u(t_n,x_n)| \geq A^{-C_0} \ \ \ \ \ (4)

     

    with the {(t_n,x_n)} propagating backwards in time; by using estimates ultimately resulting from the dissipative term in the energy identity, one can extract such a sequence in which the {t_0-t_n} increase geometrically with time, the {N_n} are comparable (up to polynomial factors in {A}) to the natural frequency scale {(t_0-t_n)^{-1/2}}, and one has {x_n = x_0 + O( |t_0-t_n|^{1/2} )}. Using the “epochs of regularity” theory that ultimately dates back to Leray, and tweaking the {t_n} slightly, one can also place the times {t_n} in intervals {I_n} (of length comparable to a small multiple of {|t_0-t_n|}) in which the solution is quite regular (in particular, {u, \nabla u, \omega, \nabla \omega} enjoy good {L^\infty_t L^\infty_x} bounds on {I_n \times {\bf R}^3}).

  3. The concentration (4) can be used to establish a lower bound for the {L^2_t L^2_x} norm of the vorticity {\omega} near {(t_n,x_n)}. As is well known, the vorticity obeys the vorticity equation

    \displaystyle \partial_t \omega = \Delta \omega - (u \cdot \nabla) \omega + (\omega \cdot \nabla) u.

    In the epoch of regularity {I_n \times {\bf R}^3}, the coefficients {u, \nabla u} of this equation obey good {L^\infty_x} bounds, allowing the machinery of Carleman estimates to come into play. Using a Carleman estimate that is used to establish unique continuation results for backwards heat equations, one can propagate this lower bound to also give lower {L^2} bounds on the vorticity (and its first derivative) in annuli of the form {\{ (t,x) \in I_n \times {\bf R}^3: R \leq |x-x_n| \leq R' \}} for various radii {R,R'}, although the lower bounds decay at a gaussian rate with {R}.

  4. Meanwhile, using an energy pigeonholing argument of Bourgain (which, in this Navier-Stokes context, is actually an enstrophy pigeonholing argument), one can locate some annuli {\{ x \in {\bf R}^3: R \leq |x-x_n| \leq R'\}} where (a slightly normalised form of) the entrosphy is small at time {t=t_n}; using a version of the localised enstrophy estimates from a previous paper of mine, one can then propagate this sort of control forward in time, obtaining an “annulus of regularity” of the form {\{ (t,x) \in [t_n,t_0] \times {\bf R}^3: R_n \leq |x-x_n| \leq R'_n\}} in which one has good estimates; in particular, one has {L^\infty_x} type bounds on {u, \nabla u, \omega, \nabla \omega} in this cylindrical annulus.
  5. By intersecting the previous epoch of regularity {I_n \times {\bf R}^3} with the above annulus of regularity, we have some lower bounds on the {L^2} norm of the vorticity (and its first derivative) in the annulus of regularity. Using a Carleman estimate first introduced by Escauriaza, Seregin, and Sverak, as well as a second application of the Carleman estimate used previously, one can then propagate this lower bound back up to time {t=t_0}, establishing a lower bound for the vorticity on the spatial annulus {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}. By some basic Littlewood-Paley theory one can parlay this lower bound to a lower bound on the {L^3} norm of the velocity {u}; crucially, this lower bound is uniform in {n}.
  6. If {t_0 N_0^2} is very large (triple exponential in {A}!), one can then find enough scales {n} with disjoint {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}} annuli that the total lower bound on the {L^3_x} norm of {u(t_0)} provided by the above arguments is inconsistent with (1), thus establishing the claim.

The chain of causality is summarised in the following image:

scheme

It seems natural to conjecture that similar triply logarithmic improvements can be made to several of the other blowup criteria listed above, but I have not attempted to pursue this question. It seems difficult to improve the triple logarithmic factor using only the techniques here; the Bourgain pigeonholing argument inevitably costs one exponential, the Carleman inequalities cost a second, and the stacking of scales at the end to contradict the {L^3} upper bound costs the third.

 

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv the short unpublished note “Eigenvectors from eigenvalues“. This note gives two proofs of a general eigenvector identity observed recently by Denton, Parke and Zhang in the course of some quantum mechanical calculations. The identity is as follows:

Theorem 1 Let {A} be an {n \times n} Hermitian matrix, with eigenvalues {\lambda_1(A),\dots,\lambda_n(A)}. Let {v_i} be a unit eigenvector corresponding to the eigenvalue {\lambda_i(A)}, and let {v_{i,j}} be the {j^{th}} component of {v_i}. Then

\displaystyle  |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))

where {M_j} is the {n-1 \times n-1} Hermitian matrix formed by deleting the {j^{th}} row and column from {A}.

For instance, if we have

\displaystyle  A = \begin{pmatrix} a & X^* \\ X & M \end{pmatrix}

for some real number {a}, {n-1}-dimensional vector {X}, and {n-1 \times n-1} Hermitian matrix {M}, then we have

\displaystyle  |v_{i,1}|^2 = \frac{\prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M))}{\prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A))} \ \ \ \ \ (1)

assuming that the denominator is non-zero.

Once one is aware of the identity, it is not so difficult to prove it; we give two proofs, each about half a page long, one of which is based on a variant of the Cauchy-Binet formula, and the other based on properties of the adjugate matrix. But perhaps it is surprising that such a formula exists at all; one does not normally expect to learn much information about eigenvectors purely from knowledge of eigenvalues. In the random matrix theory literature, for instance in this paper of Erdos, Schlein, and Yau, or this later paper of Van Vu and myself, a related identity has been used, namely

\displaystyle  |v_{i,1}|^2 = \frac{1}{1 + \| (M-\lambda_i(A))^{-1} X \|^2}, \ \ \ \ \ (2)

but it is not immediately obvious that one can derive the former identity from the latter. (I do so below the fold; we ended up not putting this proof in the note as it was longer than the two other proofs we found. I also give two other proofs below the fold, one from a more geometric perspective and one proceeding via Cramer’s rule.) It was certainly something of a surprise to me that there is no explicit appearance of the {a,X} components of {A} in the formula (1) (though they do indirectly appear through their effect on the eigenvalues {\lambda_k(A)}; for instance from taking traces one sees that {a = \sum_{k=1}^n \lambda_k(A) - \sum_{k=1}^{n-1} \lambda_k(M)}).

One can get some feeling of the identity (1) by considering some special cases. Suppose for instance that {A} is a diagonal matrix with all distinct entries. The upper left entry {a} of {A} is one of the eigenvalues of {A}. If it is equal to {\lambda_i(A)}, then the eigenvalues of {M} are the other {n-1} eigenvalues of {A}, and now the left and right-hand sides of (1) are equal to {1}. At the other extreme, if {a} is equal to a different eigenvalue of {A}, then {\lambda_i(A)} now appears as an eigenvalue of {M}, and both sides of (1) now vanish. More generally, if we order the eigenvalues {\lambda_1(A) \leq \dots \leq \lambda_n(A)} and {\lambda_1(M) \leq \dots \leq \lambda_{n-1}(M)}, then the Cauchy interlacing inequalities tell us that

\displaystyle  0 \leq \lambda_i(A) - \lambda_k(M) \leq \lambda_i(A) - \lambda_k(A)

for {1 \leq k < i}, and

\displaystyle  \lambda_i(A) - \lambda_{k+1}(A) \leq \lambda_i(A) - \lambda_k(M) < 0

for {i \leq k \leq n-1}, so that the right-hand side of (1) lies between {0} and {1}, which is of course consistent with (1) as {v_i} is a unit vector. Thus the identity relates the coefficient sizes of an eigenvector with the extent to which the Cauchy interlacing inequalities are sharp.

Read the rest of this entry »

I have just uploaded to the arXiv my paper “Sharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint“, submitted to Mathematika. In this paper I return (after more than a decade’s absence) to one of my first research interests, namely the Kakeya and restriction family of conjectures. The starting point is the following “multilinear Kakeya estimate” first established in the non-endpoint case by Bennett, Carbery, and myself, and then in the endpoint case by Guth (with further proofs and extensions by Bourgain-Guth and Carbery-Valdimarsson:

Theorem 1 (Multilinear Kakeya estimate) Let {\delta > 0} be a radius. For each {j = 1,\dots,d}, let {\mathbb{T}_j} denote a finite family of infinite tubes {T_j} in {{\bf R}^d} of radius {\delta}. Assume the following axiom:

  • (i) (Transversality) whenever {T_j \in \mathbb{T}_j} is oriented in the direction of a unit vector {n_j} for {j =1,\dots,d}, we have

    \displaystyle  \left|\bigwedge_{j=1}^d n_j\right| \geq A^{-1}

    for some {A>0}, where we use the usual Euclidean norm on the wedge product {\bigwedge^d {\bf R}^d}.

Then, for any {p \geq \frac{1}{d-1}}, one has

\displaystyle  \left\| \prod_{j=1}^d \sum_{T_j \in \mathbb{T}_j} 1_{T_j} \right\|_{L^p({\bf R}^d)} \lesssim_{A,p} \delta^{\frac{d}{p}} \prod_{j \in [d]} \# \mathbb{T}_j. \ \ \ \ \ (1)

where {L^p({\bf R}^d)} are the usual Lebesgue norms with respect to Lebesgue measure, {1_{T_j}} denotes the indicator function of {T_j}, and {\# \mathbb{T}_j} denotes the cardinality of {\mathbb{T}_j}.

The original proof of this proceeded using a heat flow monotonicity method, which in my previous post I reinterpreted using a “virtual integration” concept on a fractional Cartesian product space. It turns out that this machinery is somewhat flexible, and can be used to establish some other estimates of this type. The first result of this paper is to extend the above theorem to the curved setting, in which one localises to a ball of radius {O(1)} (and sets {\delta} to be small), but allows the tubes {T_j} to be curved in a {C^2} fashion. If one runs the heat flow monotonicity argument, one now picks up some additional error terms arising from the curvature, but as the spatial scale approaches zero, the tubes become increasingly linear, and as such the error terms end up being an integrable multiple of the main term, at which point one can conclude by Gronwall’s inequality (actually for technical reasons we use a bootstrap argument instead of Gronwall). A key point in this approach is that one obtains optimal bounds (not losing factors of {\delta^{-\varepsilon}} or {\log^{O(1)} \frac{1}{\delta}}), so long as one stays away from the endpoint case {p=\frac{1}{d-1}} (which does not seem to be easily treatable by the heat flow methods). Previously, the paper of Bennett, Carbery, and myself was able to use an induction on scale argument to obtain a curved multilinear Kakeya estimate losing a factor of {\log^{O(1)} \frac{1}{\delta}} (after optimising the argument); later arguments of Bourgain-Guth and Carbery-Valdimarsson, based on algebraic topology methods, could also obtain a curved multilinear Kakeya estimate without such losses, but only in the algebraic case when the tubes were neighbourhoods of algebraic curves of bounded degree.

Perhaps more interestingly, we are also able to extend the heat flow monotonicity method to apply directly to the multilinear restriction problem, giving the following global multilinear restriction estimate:

Theorem 2 (Multilinear restriction theorem) Let {\frac{1}{d-1} < p \leq \infty} be an exponent, and let {A \geq 2} be a parameter. Let {M} be a sufficiently large natural number, depending only on {d}. For {j \in [d]}, let {U_j} be an open subset of {B^{d-1}(0,A)}, and let {h_j: U_j \rightarrow {\bf R}} be a smooth function obeying the following axioms:

  • (i) (Regularity) For each {j \in [d]} and {\xi \in U_j}, one has

    \displaystyle  |\nabla_\xi^{\otimes m} \otimes h_j(\xi)| \leq A \ \ \ \ \ (2)

    for all {1 \leq m \leq M}.

  • (ii) (Transversality) One has

    \displaystyle  \left| \bigwedge_{j \in [d]} (-\nabla_\xi h_j(\xi_j),1) \right| \geq A^{-1}

    whenever {\xi_j \in U_j} for {j \in [d]}.

Let {U_{j,1/A} \subset U_j} be the sets

\displaystyle  U_{j,1/A} := \{ \xi \in U_j: B^{d-1}(\xi,1/A) \subset U_j \}. \ \ \ \ \ (3)

Then one has

\displaystyle  \left\| \prod_{j \in [d]} {\mathcal E}_j f_j \right\|_{L^{2p}({\bf R}^d)} \leq A^{O(1)} \left(d-1-\frac{1}{p}\right)^{-O(1)} \prod_{j \in [d]} \|f_j \|_{L^2(U_{j,1/A})}

for any {f_j \in L^2(U_{j,1/A} \rightarrow {\bf C})}, {j \in [d]}, extended by zero outside of {U_{j,1/A}}, and {{\mathcal E}_j} denotes the extension operator

\displaystyle  {\mathcal E}_j f_j( x', x_d ) := \int_{U_j} e^{2\pi i (x' \xi^T + x_d h_j(\xi))} f_j(\xi)\ d\xi.

Local versions of such estimate, in which {L^{2p}({\bf R}^d)} is replaced with {L^{2p}(B^d(0,R))} for some {R \geq 2}, and one accepts a loss of the form {\log^{O(1)} R}, were already established by Bennett, Carbery, and myself using an induction on scale argument. In a later paper of Bourgain-Guth these losses were removed by “epsilon removal lemmas” to recover Theorme 2, but only in the case when all the hypersurfaces involved had curvatures bounded away from zero.

There are two main new ingredients in the proof of Theorem 2. The first is to replace the usual induction on scales scheme to establish multilinear restriction by a “ball inflation” induction on scales scheme that more closely resembles the proof of decoupling theorems. In particular, we actually prove the more general family of estimates

\displaystyle  \left\| \prod_{j \in [d]} E_{r}[{\mathcal E}_j f_j] \right\|_{L^{p}({\bf R}^d)} \leq A^{O(1)} \left(d-1 - \frac{1}{p}\right)^{O(1)} r^{\frac{d}{p}} \prod_{j \in [d]} \| f_j \|_{L^2(U_{j,1/A})}^2

where {E_r} denotes the local energies

\displaystyle  E_{r}[f](x',x_d) := \int_{B^{d-1}(x',r)} |f(y',x_d)|^2\ dy'

(actually for technical reasons it is more convenient to use a smoother weight than the strict cutoff to the disk {B^{d-1}(x',r)}). With logarithmic losses, it is not difficult to establish this estimate by an upward induction on {r}. To avoid such losses we use the heat flow monotonicity method. Here we run into the issue that the extension operators {{\mathcal E}_j f_j} are complex-valued rather than non-negative, and thus would not be expected to obey many good montonicity properties. However, the local energies {E_r[{\mathcal E}_j f_j]} can be expressed in terms of the magnitude squared of what is essentially the Gabor transform of {{\mathcal E}_j f_j}, and these are non-negative; furthermore, the dispersion relation associated to the extension operators {{\mathcal E}_j f_j} implies that these Gabor transforms propagate along tubes, so that the situation becomes quite similar (up to several additional lower order error terms) to that in the multilinear Kakeya problem. (This can be viewed as a continuous version of the usual wave packet decomposition method used to relate restriction and Kakeya problems, which when combined with the heat flow monotonicity method allows for one to use a continuous version of induction on scales methods that do not concede any logarithmic factors.)

Finally, one can combine the curved multilinear Kakeya result with the multilinear restriction result to obtain estimates for multilinear oscillatory integrals away from the endpoint. Again, this sort of implication was already established in the previous paper of Bennett, Carbery, and myself, but the arguments there had some epsilon losses in the exponents; here we were able to run the argument more carefully and avoid these losses.

The AMS and MAA have recently published (and made available online) a collection of essays entitled “Living Proof: Stories of Resilience Along the Mathematical Journey”.  Each author contributes a story of how they encountered some internal or external difficulty in advancing their mathematical career, and how they were able to deal with such difficulties.  I myself have contributed one of these essays; I was initially somewhat surprised when I was approached for a contribution, as my career trajectory has been somewhat of an outlier, and I have been very fortunate to not experience to the same extent many of the obstacles that other contributors write about in this text.    Nevertheless there was a turning point in my career that I write about here during my graduate years, when I found that the improvised and poorly disciplined study habits that were able to get me into graduate school due to an over-reliance on raw mathematical ability were completely inadequate to handle the graduate qualifying exam.  With a combination of an astute advisor and some sheer luck, I was able to pass the exam and finally develop a more sustainable approach to learning and doing mathematics, but it could easily have gone quite differently.  (My 20 25-year old writeup of this examination, complete with spelling errors, may be found here.)

 

 

I was recently asked to contribute a short comment to Nature Reviews Physics, as part of a series of articles on fluid dynamics on the occasion of the 200th anniversary (this August) of the birthday of George Stokes.  My contribution is now online as “Searching for singularities in the Navier–Stokes equations“, where I discuss the global regularity problem for Navier-Stokes and my thoughts on how one could try to construct a solution that blows up in finite time via an approximately discretely self-similar “fluid computer”.  (The rest of the series does not currently seem to be available online, but I expect they will become so shortly.)

 

The Polymath15 paper “Effective approximation of heat flow evolution of the Riemann {\xi} function, and a new upper bound for the de Bruijn-Newman constant“, submitted to Research in the Mathematical Sciences, has just been uploaded to the arXiv. This paper records the mix of theoretical and computational work needed to improve the upper bound on the de Bruijn-Newman constant {\Lambda}. This constant can be defined as follows. The function

\displaystyle H_0(z) := \frac{1}{8} \xi\left(\frac{1}{2} + \frac{iz}{2}\right),

where {\xi} is the Riemann {\xi} function

\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma\left(\frac{s}{2}\right) \zeta(s)

has a Fourier representation

\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du

where {\Phi} is the super-exponentially decaying function

\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u} ).

The Riemann hypothesis is equivalent to the claim that all the zeroes of {H_0} are real. De Bruijn introduced (in different notation) the deformations

\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du

of {H_0}; one can view this as the solution to the backwards heat equation {\partial_t H_t = -\partial_{zz} H_t} starting at {H_0}. From the work of de Bruijn and of Newman, it is known that there exists a real number {\Lambda} – the de Bruijn-Newman constant – such that {H_t} has all zeroes real for {t \geq \Lambda} and has at least one non-real zero for {t < \Lambda}. In particular, the Riemann hypothesis is equivalent to the assertion {\Lambda \leq 0}. Prior to this paper, the best known bounds for this constant were

\displaystyle 0 \leq \Lambda < 1/2

with the lower bound due to Rodgers and myself, and the upper bound due to Ki, Kim, and Lee. One of the main results of the paper is to improve the upper bound to

\displaystyle \Lambda \leq 0.22. \ \ \ \ \ (1)

At a purely numerical level this gets “closer” to proving the Riemann hypothesis, but the methods of proof take as input a finite numerical verification of the Riemann hypothesis up to some given height {T} (in our paper we take {T \sim 3 \times 10^{10}}) and converts this (and some other numerical verification) to an upper bound on {\Lambda} that is of order {O(1/\log T)}. As discussed in the final section of the paper, further improvement of the numerical verification of RH would thus lead to modest improvements in the upper bound on {\Lambda}, although it does not seem likely that our methods could for instance improve the bound to below {0.1} without an infeasible amount of computation.

We now discuss the methods of proof. An existing result of de Bruijn shows that if all the zeroes of {H_{t_0}(z)} lie in the strip {\{ x+iy: |y| \leq y_0\}}, then {\Lambda \leq t_0 + \frac{1}{2} y_0^2}; we will verify this hypothesis with {t_0=y_0=0.2}, thus giving (1). Using the symmetries and the known zero-free regions, it suffices to show that

\displaystyle H_{0.2}(x+iy) \neq 0 \ \ \ \ \ (2)

whenever {x \geq 0} and {0.2 \leq y \leq 1}.

For large {x} (specifically, {x \geq 6 \times 10^{10}}), we use effective numerical approximation to {H_t(x+iy)} to establish (2), as discussed in a bit more detail below. For smaller values of {x}, the existing numerical verification of the Riemann hypothesis (we use the results of Platt) shows that

\displaystyle H_0(x+iy) \neq 0

for {0 \leq x \leq 6 \times 10^{10}} and {0.2 \leq y \leq 1}. The problem though is that this result only controls {H_t} at time {t=0} rather than the desired time {t = 0.2}. To bridge the gap we need to erect a “barrier” that, roughly speaking, verifies that

\displaystyle H_t(x+iy) \neq 0 \ \ \ \ \ (3)

for {0 \leq t \leq 0.2}, {x = 6 \times 10^{10} + O(1)}, and {0.2 \leq y \leq 1}; with a little bit of work this barrier shows that zeroes cannot sneak in from the right of the barrier to the left in order to produce counterexamples to (2) for small {x}.

To enforce this barrier, and to verify (2) for large {x}, we need to approximate {H_t(x+iy)} for positive {t}. Our starting point is the Riemann-Siegel formula, which roughly speaking is of the shape

\displaystyle H_0(x+iy) \approx B_0(x+iy) ( \sum_{n=1}^N \frac{1}{n^{\frac{1+y-ix}{2}}} + \gamma_0(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\frac{1+y+ix}{2}}} )

where {N := \sqrt{x/4\pi}}, {B_0(x+iy)} is an explicit “gamma factor” that decays exponentially in {x}, and {\gamma_0(x+iy)} is a ratio of gamma functions that is roughly of size {(x/4\pi)^{-y/2}}. Deforming this by the heat flow gives rise to an approximation roughly of the form

\displaystyle H_t(x+iy) \approx B_t(x+iy) ( \sum_{n=1}^N \frac{b_n^t}{n^{s_*}} + \gamma_t(x+iy) \sum_{n=1}^N \frac{n^y}{n^{\overline{s_*}}} ) \ \ \ \ \ (4)

where {B_t(x+iy)} and {\gamma_t(x+iy)} are variants of {B_0(x+iy)} and {\gamma_0(x+iy)}, {b_n^t := \exp( \frac{t}{4} \log^2 n )}, and {s_*} is an exponent which is roughly {\frac{1+y-ix}{2} + \frac{t}{4} \log \frac{x}{4\pi}}. In particular, for positive values of {t}, {s_*} increases (logarithmically) as {x} increases, and the two sums in the Riemann-Siegel formula become increasingly convergent (even in the face of the slowly increasing coefficients {b_n^t}). For very large values of {x} (in the range {x \geq \exp(C/t)} for a large absolute constant {C}), the {n=1} terms of both sums dominate, and {H_t(x+iy)} begins to behave in a sinusoidal fashion, with the zeroes “freezing” into an approximate arithmetic progression on the real line much like the zeroes of the sine or cosine functions (we give some asymptotic theorems that formalise this “freezing” effect). This lets one verify (2) for extremely large values of {x} (e.g., {x \geq 10^{12}}). For slightly less large values of {x}, we first multiply the Riemann-Siegel formula by an “Euler product mollifier” to reduce some of the oscillation in the sum and make the series converge better; we also use a technical variant of the triangle inequality to improve the bounds slightly. These are sufficient to establish (2) for moderately large {x} (say {x \geq 6 \times 10^{10}}) with only a modest amount of computational effort (a few seconds after all the optimisations; on my own laptop with very crude code I was able to verify all the computations in a matter of minutes).

The most difficult computational task is the verification of the barrier (3), particularly when {t} is close to zero where the series in (4) converge quite slowly. We first use an Euler product heuristic approximation to {H_t(x+iy)} to decide where to place the barrier in order to make our numerical approximation to {H_t(x+iy)} as large in magnitude as possible (so that we can afford to work with a sparser set of mesh points for the numerical verification). In order to efficiently evaluate the sums in (4) for many different values of {x+iy}, we perform a Taylor expansion of the coefficients to factor the sums as combinations of other sums that do not actually depend on {x} and {y} and so can be re-used for multiple choices of {x+iy} after a one-time computation. At the scales we work in, this computation is still quite feasible (a handful of minutes after software and hardware optimisations); if one assumes larger numerical verifications of RH and lowers {t_0} and {y_0} to optimise the value of {\Lambda} accordingly, one could get down to an upper bound of {\Lambda \leq 0.1} assuming an enormous numerical verification of RH (up to height about {4 \times 10^{21}}) and a very large distributed computing project to perform the other numerical verifications.

This post can serve as the (presumably final) thread for the Polymath15 project (continuing this post), to handle any remaining discussion topics for that project.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Value patterns of multiplicative functions and related sequences“, submitted to Forum of Mathematics, Sigma. This paper explores how to use recent technology on correlations of multiplicative (or nearly multiplicative functions), such as the “entropy decrement method”, in conjunction with techniques from additive combinatorics, to establish new results on the sign patterns of functions such as the Liouville function {\lambda}. For instance, with regards to length 5 sign patterns

\displaystyle  (\lambda(n+1),\dots,\lambda(n+5)) \in \{-1,+1\}^5

of the Liouville function, we can now show that at least {24} of the {32} possible sign patterns in {\{-1,+1\}^5} occur with positive upper density. (Conjecturally, all of them do so, and this is known for all shorter sign patterns, but unfortunately {24} seems to be the limitation of our methods.)

The Liouville function can be written as {\lambda(n) = e^{2\pi i \Omega(n)/2}}, where {\Omega(n)} is the number of prime factors of {n} (counting multiplicity). One can also consider the variant {\lambda_3(n) = e^{2\pi i \Omega(n)/3}}, which is a completely multiplicative function taking values in the cube roots of unity {\{1, \omega, \omega^2\}}. Here we are able to show that all {27} sign patterns in {\{1,\omega,\omega^2\}} occur with positive lower density as sign patterns {(\lambda_3(n+1), \lambda_3(n+2), \lambda_3(n+3))} of this function. The analogous result for {\lambda} was already known (see this paper of Matomäki, Radziwiłł, and myself), and in that case it is even known that all sign patterns occur with equal logarithmic density {1/8} (from this paper of myself and Teräväinen), but these techniques barely fail to handle the {\lambda_3} case by itself (largely because the “parity” arguments used in the case of the Liouville function no longer control three-point correlations in the {\lambda_3} case) and an additional additive combinatorial tool is needed. After applying existing technology (such as entropy decrement methods), the problem roughly speaking reduces to locating patterns {a \in A_1, a+r \in A_2, a+2r \in A_3} for a certain partition {G = A_1 \cup A_2 \cup A_3} of a compact abelian group {G} (think for instance of the unit circle {G={\bf R}/{\bf Z}}, although the general case is a bit more complicated, in particular if {G} is disconnected then there is a certain “coprimality” constraint on {r}, also we can allow the {A_1,A_2,A_3} to be replaced by any {A_{c_1}, A_{c_2}, A_{c_3}} with {c_1+c_2+c_3} divisible by {3}), with each of the {A_i} having measure {1/3}. An inequality of Kneser just barely fails to guarantee the existence of such patterns, but by using an inverse theorem for Kneser’s inequality in this previous paper of mine we are able to identify precisely the obstruction for this method to work, and rule it out by an ad hoc method.

The same techniques turn out to also make progress on some conjectures of Erdös-Pomerance and Hildebrand regarding patterns of the largest prime factor {P^+(n)} of a natural number {n}. For instance, we improve results of Erdös-Pomerance and of Balog demonstrating that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3)

each hold for infinitely many {n}, by demonstrating the stronger claims that the inequalities

\displaystyle  P^+(n+1) < P^+(n+2) < P^+(n+3) > P^+(n+4)

and

\displaystyle  P^+(n+1) > P^+(n+2) > P^+(n+3) < P^+(n+4)

each hold for a set of {n} of positive lower density. As a variant, we also show that we can find a positive density set of {n} for which

\displaystyle  P^+(n+1), P^+(n+2), P^+(n+3) > n^\gamma

for any fixed {\gamma < e^{-1/3} = 0.7165\dots} (this improves on a previous result of Hildebrand with {e^{-1/3}} replaced by {e^{-1/2} = 0.6065\dots}. A number of other results of this type are also obtained in this paper.

In order to obtain these sorts of results, one needs to extend the entropy decrement technology from the setting of multiplicative functions to that of what we call “weakly stable sets” – sets {A} which have some multiplicative structure, in the sense that (roughly speaking) there is a set {B} such that for all small primes {p}, the statements {n \in A} and {pn \in B} are roughly equivalent to each other. For instance, if {A} is a level set {A = \{ n: \omega(n) = 0 \hbox{ mod } 3 \}}, one would take {B = \{ n: \omega(n) = 1 \hbox{ mod } 3 \}}; if instead {A} is a set of the form {\{ n: P^+(n) \geq n^\gamma\}}, then one can take {B=A}. When one has such a situation, then very roughly speaking, the entropy decrement argument then allows one to estimate a one-parameter correlation such as

\displaystyle  {\bf E}_n 1_A(n+1) 1_A(n+2) 1_A(n+3)

with a two-parameter correlation such as

\displaystyle  {\bf E}_n {\bf E}_p 1_B(n+p) 1_B(n+2p) 1_B(n+3p)

(where we will be deliberately vague as to how we are averaging over {n} and {p}), and then the use of the “linear equations in primes” technology of Ben Green, Tamar Ziegler, and myself then allows one to replace this average in turn by something like

\displaystyle  {\bf E}_n {\bf E}_r 1_B(n+r) 1_B(n+2r) 1_B(n+3r)

where {r} is constrained to be not divisible by small primes but is otherwise quite arbitrary. This latter average can then be attacked by tools from additive combinatorics, such as translation to a continuous group model (using for instance the Furstenberg correspondence principle) followed by tools such as Kneser’s inequality (or inverse theorems to that inequality).

I have just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds, II. Non-rigidity of Euler flows“, submitted to Pure and Applied Functional Analysis. This paper continues my attempts to establish “universality” properties of the Euler equations on Riemannian manifolds {(M,g)}, as I conjecture that the freedom to set the metric {g} ought to allow one to “program” such Euler flows to exhibit a wide range of behaviour, and in particular to achieve finite time blowup (if the dimension is sufficiently large, at least).

In coordinates, the Euler equations read

\displaystyle \partial_t u^k + u^j \nabla_j u^k = - \nabla^k p \ \ \ \ \ (1)

 

\displaystyle \nabla_k u^k = 0

where {p: [0,T] \rightarrow C^\infty(M)} is the pressure field and {u: [0,T] \rightarrow \Gamma(TM)} is the velocity field, and {\nabla} denotes the Levi-Civita connection with the usual Penrose abstract index notation conventions; we restrict attention here to the case where {u,p} are smooth and {M} is compact, smooth, orientable, connected, and without boundary. Let’s call {u} an Euler flow on {M} (for the time interval {[0,T]}) if it solves the above system of equations for some pressure {p}, and an incompressible flow if it just obeys the divergence-free relation {\nabla_k u^k=0}. Thus every Euler flow is an incompressible flow, but the converse is certainly not true; for instance the various conservation laws of the Euler equation, such as conservation of energy, will already block most incompressible flows from being an Euler flow, or even being approximated in a reasonably strong topology by such Euler flows.

However, one can ask if an incompressible flow can be extended to an Euler flow by adding some additional dimensions to {M}. In my paper, I formalise this by considering warped products {\tilde M} of {M} which (as a smooth manifold) are products {\tilde M = M \times ({\bf R}/{\bf Z})^m} of {M} with a torus, with a metric {\tilde g} given by

\displaystyle d \tilde g^2 = g_{ij}(x) dx^i dx^j + \sum_{s=1}^m \tilde g_{ss}(x) (d\theta^s)^2

for {(x,\theta) \in \tilde M}, where {\theta^1,\dots,\theta^m} are the coordinates of the torus {({\bf R}/{\bf Z})^m}, and {\tilde g_{ss}: M \rightarrow {\bf R}^+} are smooth positive coefficients for {s=1,\dots,m}; in order to preserve the incompressibility condition, we also require the volume preservation property

\displaystyle \prod_{s=1}^m \tilde g_{ss}(x) = 1 \ \ \ \ \ (2)

though in practice we can quickly dispose of this condition by adding one further “dummy” dimension to the torus {({\bf R}/{\bf Z})^m}. We say that an incompressible flow {u} is extendible to an Euler flow if there exists a warped product {\tilde M} extending {M}, and an Euler flow {\tilde u} on {\tilde M} of the form

\displaystyle \tilde u(t,(x,\theta)) = u^i(t,x) \frac{d}{dx^i} + \sum_{s=1}^m \tilde u^s(t,x) \frac{d}{d\theta^s}

for some “swirl” fields {\tilde u^s: [0,T] \times M \rightarrow {\bf R}}. The situation here is motivated by the familiar situation of studying axisymmetric Euler flows {\tilde u} on {{\bf R}^3}, which in cylindrical coordinates take the form

\displaystyle \tilde u(t,(r,z,\theta)) = u^r(t,r,z) \frac{d}{dr} + u^z(t,r,z) \frac{d}{dz} + \tilde u^\theta(t,r,z) \frac{d}{d\theta}.

The base component

\displaystyle u^r(t,r,z) \frac{d}{dr} + u^z(t,r,z) \frac{d}{dz}

of this flow is then a flow on the two-dimensional {r,z} plane which is not quite incompressible (due to the failure of the volume preservation condition (2) in this case) but still satisfies a system of equations (coupled with a passive scalar field {\rho} that is basically the square of the swirl {\tilde u^\rho}) that is reminiscent of the Boussinesq equations.

On a fixed {d}-dimensional manifold {(M,g)}, let {{\mathcal F}} denote the space of incompressible flows {u: [0,T] \rightarrow \Gamma(TM)}, equipped with the smooth topology (in spacetime), and let {{\mathcal E} \subset {\mathcal F}} denote the space of such flows that are extendible to Euler flows. Our main theorem is

Theorem 1

  • (i) (Generic inextendibility) Assume {d \geq 3}. Then {{\mathcal E}} is of the first category in {{\mathcal F}} (the countable union of nowhere dense sets in {{\mathcal F}}).
  • (ii) (Non-rigidity) Assume {M = ({\bf R}/{\bf Z})^d} (with an arbitrary metric {g}). Then {{\mathcal E}} is somewhere dense in {{\mathcal F}} (that is, the closure of {{\mathcal E}} has non-empty interior).

More informally, starting with an incompressible flow {u}, one usually cannot extend it to an Euler flow just by extending the manifold, warping the metric, and adding swirl coefficients, even if one is allowed to select the dimension of the extension, as well as the metric and coefficients, arbitrarily. However, many such flows can be perturbed to be extendible in such a manner (though different perturbations will require different extensions, in particular the dimension of the extension will not be fixed). Among other things, this means that conservation laws such as energy (or momentum, helicity, or circulation) no longer present an obstruction when one is allowed to perform an extension (basically this is because the swirl components of the extension can exchange energy (or momentum, etc.) with the base components in a basically arbitrary fashion.

These results fall short of my hopes to use the ability to extend the manifold to create universal behaviour in Euler flows, because of the fact that each flow requires a different extension in order to achieve the desired dynamics. Still it does seem to provide a little bit of support to the idea that high-dimensional Euler flows are quite “flexible” in their behaviour, though not completely so due to the generic inextendibility phenomenon. This flexibility reminds me a little bit of the flexibility of weak solutions to equations such as the Euler equations provided by the “{h}-principle” of Gromov and its variants (as discussed in these recent notes), although in this case the flexibility comes from adding additional dimensions, rather than by repeatedly adding high-frequency corrections to the solution.

The proof of part (i) of the theorem basically proceeds by a dimension counting argument (similar to that in the proof of Proposition 9 of these recent lecture notes of mine). Heuristically, the point is that an arbitrary incompressible flow {u} is essentially determined by {d-1} independent functions of space and time, whereas the warping factors {\tilde g_{ss}} are functions of space only, the pressure field is one function of space and time, and the swirl fields {u^s} are technically functions of both space and time, but have the same number of degrees of freedom as a function just of space, because they solve an evolution equation. When {d>2}, this means that there are fewer unknown functions of space and time than prescribed functions of space and time, which is the source of the generic inextendibility. This simple argument breaks down when {d=2}, but we do not know whether the claim is actually false in this case.

The proof of part (ii) proceeds by direct calculation of the effect of the warping factors and swirl velocities, which effectively create a forcing term (of Boussinesq type) in the first equation of (1) that is a combination of functions of the Eulerian spatial coordinates {x^i} (coming from the warping factors) and the Lagrangian spatial coordinates {a^\beta} (which arise from the swirl velocities, which are passively transported by the flow). In a non-empty open subset of {{\mathcal F}}, the combination of these coordinates becomes a non-degenerate set of coordinates for spacetime, and one can then use the Stone-Weierstrass theorem to conclude. The requirement that {M} be topologically a torus is a technical hypothesis in order to avoid topological obstructions such as the hairy ball theorem, but it may be that the hypothesis can be dropped (and it may in fact be true, in the {M = ({\bf R}/{\bf Z})^d} case at least, that {{\mathcal E}} is dense in all of {{\mathcal F}}, not just in a non-empty open subset).

 

Archives