<p>
I’ve just uploaded to the arXiv my paper “<a href=”http://arxiv.org/abs/1909.03562″>Almost all Collatz orbits attain almost bounded values</a>”, submitted to the proceedings of the <a href=”https://www.cambridge.org/core/journals/forum-of-mathematics-pi”>Forum of Mathematics, Pi</a>. In this paper I returned to the topic of the notorious <a href=”https://en.wikipedia.org/wiki/Collatz_conjecture”>Collatz conjecture</a> (also known as the {3x+1} conjecture), which I previously discussed in <a href=”https://terrytao.wordpress.com/2011/08/25/the-collatz-conjecture-littlewood-offord-theory-and-powers-of-2-and-3/”>this blog post</a>. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have
<p>

<blockquote><b>Conjecture 1 (Collatz conjecture)</b> One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}. </blockquote>

<p>

<p>
Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the <a href=”https://terrytao.wordpress.com/2011/08/25/the-collatz-conjecture-littlewood-offord-theory-and-powers-of-2-and-3/”>previous blog post</a>, it is basically at least as difficult as <a href=”https://en.wikipedia.org/wiki/Baker%27s_theorem”>Baker’s theorem</a>, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result <a href=”https://mathscinet.ams.org/mathscinet-getitem?mr=1980260″>of Krasikov and Lagarias</a> that <p align=center>\displaystyle \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}</p>
for all sufficiently large {x}. In another direction, it was shown <a href=”https://mathscinet.ams.org/mathscinet-getitem?mr=568274″>by Terras</a> that for almost all {N} (in the sense of <a href=”https://en.wikipedia.org/wiki/Natural_density”>natural density</a>), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved <a href=”https://mathscinet.ams.org/mathscinet-getitem?mr=567874″>by Allouche</a> to {\mathrm{Col}_{\min}(N) < N^\theta} for almost all {N} and any fixed {\theta > 0.869}, and extended later <a href=”https://mathscinet.ams.org/mathscinet-getitem?mr=1290275″>by Korec</a> to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):
<p>

<blockquote><b>Theorem 2</b> <a name=”main”></a> Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density). </blockquote>

<p>

<p>
Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).
<p>
The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.
<p>
However, as <a href=”https://mathscinet.ams.org/mathscinet-getitem?mr=1309539″>observed by Bourgain</a> in the context of nonlinear Schr&ouml;dinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an <em>invariant measure</em> for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.
<p>
So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the <em>Syracuse map</em> {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)
<p>
When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a <a href=”https://en.wikipedia.org/wiki/Geometric_distribution”>geometric distribution</a> of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities <p align=center>\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63} </p>
respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a <em>Syracuse random variable</em>, and can be described explicitly as <a name=”coe”><p align=center>\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)</p>
</a> where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.
<p>
In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate <a name=”starb”><p align=center>\displaystyle \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)</p>
</a> <p align=center>\displaystyle \ll_A m^{-A}</p>
for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.
<p>
A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers <p align=center>\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}</p>
are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions <a name=”rate”><p align=center>\displaystyle 2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)</p>
</a> are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in <a href=”#rate”>(3)</a> then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component <a href=”#rate”>(3)</a> of <a href=”#coe”>(1)</a> is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property <a href=”#starb”>(2)</a> we have to exploit the mixing effects of the remaining portion of <a href=”#coe”>(1)</a> that does not come from <a href=”#rate”>(3)</a>. After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the <a href=”https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory)”>characteristic function</a> of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that <a name=”charb”><p align=center>\displaystyle \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)</p>
</a> for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.
<p>
If the random variable <a href=”#coe”>(1)</a> was the sum of independent terms, one could express this characteristic function as something like a <a href=”https://www.encyclopediaofmath.org/index.php/Riesz_product”>Riesz product</a>, which would be straightforward to estimate well. Unfortunately, the terms in <a href=”#coe”>(1)</a> are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in <a href=”#coe”>(1)</a> together, one can rewrite it (assuming {n} is even for sake of discussion) as <p align=center>\displaystyle (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots</p>
<p align=center>\displaystyle + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n</p>
where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a <em>conditional</em> sum of independent random variables. This lets one express the characeteristic function of <a href=”#coe”>(1)</a> as an <em>averaged</em> Riesz product. One can use this to establish the bound <a href=”#charb”>(4)</a> as long as one can show that the expression <p align=center>\displaystyle \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n} </p>
is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which <p align=center>\displaystyle \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z}, </p>
we have to show that (with overwhelming probability) the random walk <p align=center>\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})</p>
(which we view as a two-dimensional <a href=”https://en.wikipedia.org/wiki/Renewal_theory”>renewal process</a>) contains at least a few points lying outside of {B}.
<p>
A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of <a href=”#charb”>(4)</a>, and thus Theorem <a href=”#main”>2</a>.
<p>