I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let denote the positive integers (with the natural numbers), and let be the map defined by setting equal to when is odd and when is even. Let be the minimal element of the Collatz orbit . Then we have

Conjecture 1 (Collatz conjecture)One has for all .

Establishing the conjecture for all remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” in some sense. For instance, it is a result of Krasikov and Lagarias that

for all sufficiently large . In another direction, it was shown by Terras that for almost all (in the sense of natural density), one has . This was then improved by Allouche to for almost all and any fixed , and extended later by Korec to cover all . In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2Let be any function with . Then we have for almost all (in the sense of logarithmic density).

Thus for instance one has for almost all (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution for times that only get as large as a small multiple of ; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this time. However, to get all the way down to one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state .

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an *invariant measure* for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if one picks at random an integer from a large interval , then in most cases, the orbit of will eventually move into the interval . Similarly, if one picks an integer at random from , then in most cases, the orbit of will eventually move into . It is then tempting to concatenate the two statements and conclude that for most in , the orbit will eventually move . Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn reaches , the distribution of the final value is unlikely to be close to being uniformly distributed on , and in particular could potentially concentrate almost entirely in the exceptional set of that do not make it into . The point here is the uniform measure on is not transported by Collatz dynamics to anything resembling the uniform measure on .

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the *Syracuse map* , defined on the odd numbers by setting , where is the largest power of that divides . (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of at each iteration step, which makes the map better behaved when performing “-adic” analysis.)

When viewed -adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, is never divisible by . A little less obviously, is twice as likely to equal mod as it is to equal mod . This is because for a randomly chosen odd , the number of times that divides can be seen to have a geometric distribution of mean – it equals any given value with probability . Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of . For instance, one can compute that for large random odd , will take the residue classes with probabilities

respectively. More generally, for any , will be distributed according to the law of a random variable on that we call a *Syracuse random variable*, and can be described explicitly as

where are iid copies of a geometric random variable of mean .

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this -adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables to construct such a measure, but only if these random variables stabilise in the limit in a certain total variation sense. More precisely, in the paper we establish the estimate

for any and any . This type of stabilisation is plausible from entropy heuristics – the tuple of geometric random variables that generates has Shannon entropy , which is significantly larger than the total entropy of the uniform distribution on , so we expect a lot of “mixing” and “collision” to occur when converting the tuple to ; these heuristics can be supported by numerics (which I was able to work out up to about before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

are all distinct as vary over tuples in . Unfortunately, the process of reducing mod creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

are mostly distinct for “typical” (as drawn using the geometric distribution) as long as is a bit smaller than (basically because the rational number appearing in (3) then typically takes a form like with an integer between and ). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of of density less than for some large absolute constant ). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of , and more precisely in showing that

for any and any that is not divisible by .

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming is even for sake of discussion) as

where . The point here is that after conditioning on the to be fixed, the random variables remain independent (though the distribution of each depends on the value that we conditioned to), and so the above expression is a *conditional* sum of independent random variables. This lets one express the characeteristic function of (1) as an *averaged* Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

is not close to an integer for a moderately large number (, to be precise) of indices . (Actually, for technical reasons we have to also restrict to those for which , but let us ignore this detail here.) To put it another way, if we let denote the set of pairs for which

we have to show that (with overwhelming probability) the random walk

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of .

A little bit of elementary number theory and combinatorics allows one to describe the set as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of . The most difficult case is when the renewal process passes through a particularly large triangle in . However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of that one can finish the proof of (4), and thus Theorem 2.

## 75 comments

Comments feed for this article

10 September, 2019 at 8:40 am

goingtoinfinityWhat is the relations between results for “almost all” cases vs. subsequent proofs of the full result, from historic examples? Are there good examples where the former influences the developments of the later? Or is it more common that, proving results for full results of a mathematical question, is conducted in an entirely different way usually?

As an example, Falting’s proof that there are only finitely many solutions to Fermat’s Last Theorem — did his techniques influence and appear in Wiles’s/Taylor’s final proof?

10 September, 2019 at 9:10 am

Terence TaoOne can broadly divide arguments involving some parameter (with a non-empty range) into three types: “worst case analysis”, which establish some claim for all choices of parameters; “average case analysis”,which establish some claim for almost all choices of parameters; and “best case analysis”, which establish some claim for at least one choice of parameters. (One can also introduce an often useful variant of the average case analysis by working with “a positive fraction” of choices rather than “almost all”, but let us ignore this variant for sake of this discussion.) There are obvious implications between the three: worst case analysis results imply average case analysis results (these are often referred to as “deterministic arguments”), and average case analysis results imply best case analysis results (the “probabilistic method”). In the contrapositive, if a claim fails in the average case, then it will also fail in the worst case; and if it fails even in the best case, then it fails in the average case. However, besides these obvious implications, one generally sees quite different methods used the three different types of results. In particular, average case analysis (such as the arguments discussed in this blog post) gets to exploit methods from probability (and related areas such as measure theory and ergodic theory); best case analysis relies a lot on explicit constructions to design the most favorable parameters for the problem; but worst case analysis is largely excluded from using any of these methods, except when there is some “invariance”, “dispersion”, “unique ergodicity”, “averaging” or “mixing” property that allows one to derive worst-case results from average-case results by showing that every worst-case counterexample must generate enough siblings that at they begin to be detected by the average-case analysis. For instance, one can derive Vinogradov’s theorem (all large odd numbers are a sum of three primes) from a (suitably quantitative) almost all version of the even Goldbach conjecture (almost all even numbers are the sum of two primes), basically because a single counterexample to the former implies a lot of counterexamples to the latter (see Chapter 19.4 of Iwaniec-Kowalski for details). At a more trivial (but still widely used) level, if there is so much invariance with respect to a parameter that the truth value of a given property does not actually depend on the choice of parameter, then the worst, average, and best case results are equivalent, so one can reduce the worst case to the best case (such arguments are generally described as “without loss of generality” reductions).

However, in the absence of such mixing properties, one usually cannot rigorously convert positive average case results to positive worst case results, and when the worst case result is eventually proved, it is often by a quite different set of techniques (as was done for instance with FLT). So it is often better to think of these different types of analysis as living in parallel, but somewhat disjoint, “worlds”. (In additive combinatorics, there is a similar distinction made between the “100% world”, “99% world”, and “1% world”, corresponding roughly to worst case analysis and the two variants of average case analysis respectively, although in this setting there are some important non-trivial connections between these worlds.) In the specific case of the Collatz conjecture, the only obvious invariance property is that coming from the Collatz map itself ( obeys the Collatz conjecture if and only if does), but this seems too weak of an invariance to hope to obtain worst case results from average case ones (unless the average case results were really,

really, strong).11 September, 2019 at 4:42 am

BuzzmanLet and denote the odd and even parities, respectively, of the terms in the Collatz orbit , where . Define such that

,

where with the floor function being a stopping time criterion. Parenthetically, one can also express taking . Then a simple proposition for satisfying the “almost all” argument, in the sense that , is of the form

,

where as . Observe that decomposes into probability for even and for odd. The same argument applies for if one takes . Exceptional cases are due mostly to or when such are iterates of .

11 September, 2019 at 7:15 am

suitably impressedThat’s a wonderfully clarifying set of remarks, that deserves promotion to a blogpost of its own.

14 September, 2019 at 9:39 am

random readerIn addition to Terry’s (amazing) remarks on worst vs average case, the methodological distinctions between qualitative vs quantitative, and constructive vs nonconstructive, are further dividing lines that are predictive of when the proof of a weaker result can be re-used in a proof of a stronger result.

As in the worst vs average, unless there is some transference principle to go from one side of the divide to the other, methods and results on the weak side will typically be dis-similar to those on the strong side.

The example of Faltings’ theorem (Mordell conjecture) was a good one. It is a qualitative result, finite number of rational points, with no bound on the size of the solutions. Finding all of the rational points on specific curves, and proving that those are all the solutions, is superficially a similar problem but in practice uses different techniques than Faltings.

14 September, 2019 at 11:17 am

Terence TaoOne further addendum: even when there are no direct transference results between different “worlds” of result types, and even when the methods of proof in different worlds are quite distinct, one can still use the

relativedifficulty between results in one world as a predictor of the relative difficulty in another world. For instance, suppose that two similar-looking statements and , with the “almost all” versions of both and known, but the “all” versions of both statements not known, then there may be no way to transfer the almost all theorems and methods to the all setting. But if the almost all version of was more difficult to establish than the almost all version of , then it is then reasonable to predict that the “all” version of will similarly be more difficult to establish than the “all” version of .14 September, 2019 at 12:16 pm

random readerTo formalize: comparative difficulty tends to transfer along (both the horizontal and vertical directions of) the commutative diagrams we call “analogies”.

Speaking of transference principles, having arbitrarily slow unbounded functions seems similar to the principle in nonstandard analysis that infinite (nonstandard) objects are equivalent to unbounded sequences of finite/standard objects, e.g., over/under-spill, or the translation between nonstandard and epsilon-delta definitions in calculus. Here there is no obvious transference from arbitrarily good almost-everywhere asymptotic bounds to a finite bound holding a.e., but are there similar situations in analysis where the finite a.e. statement (the equivalent in this context of “almost all integers go to 1 under the Collatz iteration”) is actually known to be false?

14 September, 2019 at 6:00 pm

Terence TaoOne can easily cook up examples. For instance, for any function that goes to infinity, one can easily show (using the Weyl equidistribution theorem) that for almost all (in the sense of natural density), where is the fractional part of , but there is no bound such that for almost all . (Instead, the previous claim is equivalent to the assertion that for each bound , the bound holds for a set of of density that depends on , but goes to 1 as .)

14 September, 2019 at 1:47 pm

random readerThis is turning into a very interesting side discussion making (somewhat) precise what is normally unwritten folk wisdom. To that end I’ll add an addendum to Terry’s addendum on relative difficulty.

The informal judgement that result X is harder to prove than related (e.g., weaker) result Y, often rests on the use of particular advanced tools as modules in the proof of X and not Y. Typical examples:

X requires explicit error estimates in the prime number theorem (or some analytic property of the Riemann zeta function), but Y only uses the infinitude of prime numbers.

X uses the Weil conjectures or things only known to follow from those, but Y only uses Lang-Weil estimates for the number of points on varieties mod p.

X uses the classification of finite simple groups, Y uses basic representation theory of finite groups.

X uses algebraic K-theory, Y uses De Rham cohomology on smooth manifolds.

In many cases of this sort, the machinery used in X is a black box, effectively an extra set of axioms. So X is from this point of view a result in a stronger formal system (as long as one does not resort to re-proving from scratch results equivalent to the black box) than the one used to prove Y, and the comparison of difficulty becomes similar to that of results proved with and without a particular axiom (e.g., the Axiom of Choice, or Excluded Middle). This makes the comparison closer to the constructive/nonconstructive distinction, polynomial vs exponential time for algorithms, and independence results in logic where the weaker system simply cannot prove a particular theorem. Also, there is often a sort of “reverse mathematics” where a cluster of X-like results all imply or are implied by the advanced module used to prove X (analogous to the polynomial reductions in NP completeness), or are equivalent to each other, giving evidence that the heavy machinery is really necessary.

10 September, 2019 at 11:29 am

PeteTo give a different viewpoint..

Today, immediately after you conjecture ‘all X satisfy..’ you would want to at least heuristically convince yourself that your conjecture holds for almost all X (ideally with a few different random models).

But suppose you prove ‘almost all X’ for some reasonable random model. Is this good evidence that ‘all X’? Well, for a (obviously stupid) example, you might conjecture every -vertex graph with minimum degree is Hamiltonian. This is false – but true a.a.s. for a bunch of natural random models. The proofs that the random models are a.a.s. Hamiltonian, though, use the randomness in a very essential way.

On the other hand, suppose I prove that ‘all X such that horrible structural condition’. If ‘horrible structural condition’ happens to hold for almost all X (for some reasonable model) then I’m likely to title my paper ‘Almost all X…’ simply because it sells better. Such a proof quite possibly could be useful in a proof of ‘all X’; maybe later I (or someone) can weaken the horribleness and deal with the remaining cases.

Short version: no rigorous connection, but if you look at the proof of ‘almost all’ you may sometimes find it is useful.

10 September, 2019 at 11:17 am

omarleoblogThe teorem of the conjecture is very awesome. Maybe because the teorem has a lot of results.

10 September, 2019 at 12:13 pm

Terry Tao partial results on Collatz conjecture[…] Tao announced today that he has new partial results toward proving the Collatz conjecture. His blog post and arXiv paper are both entitled “Almost all Collatz orbits attain almost bounded […]

10 September, 2019 at 12:45 pm

Частичные результаты Терри Тао о гипотезе Коллатца {/} For coders {}[…] результаты в доказательстве гипотезы Коллатца. Его Сообщение блога а также бумага arXiv оба имеют название «Почти все […]

10 September, 2019 at 2:22 pm

AnonymousIt seems that in the summands of (2) a second probability symbol is missing.

[Corrected, thanks – T.]10 September, 2019 at 5:17 pm

Anonymous“in the sense of logarithmic density”

I do not understand this. Could you please explain this?

10 September, 2019 at 8:17 pm

Lior SilbermanLet be a set of positive integers. We measure its “natural density” as follows: we impose a cutoff , and ask: “what proportion of integers up to are contained in ?”. If this proportion tends to a limit as we say that the “natural density” of is . In symbols

This notion of “density” gives each integer an equal weight. We can define a new notion of “size” by giving the integer the weight . We would then measure the “proportion” of the interval occupied by by the following fraction:

Suppose now that this ratio tends to a limit as . Note that the denominator (a partial sum of the harmonic series) is very close to . We therefore have

This limit (if it exists) is called the “logarithmic density” of (note the logarithm in the denominator).

10 September, 2019 at 8:39 pm

Alex JonesFix a property that a positive integer might have. Say if has property and otherwise. It is said that almost all positive integers (w.r.t. logarithmic density) have property if .

10 September, 2019 at 8:25 pm

Lior SilbermanOn page 2 of the paper (definition of logarithmic density) you have “for all where you probably mean “for all .

[Thanks, this will be corrected in the next revision of the ms. -T]10 September, 2019 at 9:29 pm

AnonymousThank you for your reply !

10 September, 2019 at 10:39 pm

BuzzmanYour formulation takes if is odd and if is even. There should be appropriate clarification when citing previous results based on if is odd.

11 September, 2019 at 1:24 pm

Terence TaoI refer to this acceleration of the Collatz map as in the paper. It is not difficult to see that for all , so all the results in the literature concerning hold without any changes if one works with instead.

11 September, 2019 at 8:12 pm

BuzzmanI see now. And I can only smile at how shallow I can be, but always fervent at learning your depth. Thanks.

11 September, 2019 at 3:27 am

AnonymousIt seems that in definition 1.2 In the arXiv paper, should be

[This will be corrected in the next revision of the ms, thanks -T.]11 September, 2019 at 4:15 am

AnonymousIn definition 1.7 it seems clearer to replace by the strict inequality

[This will be corrected in the next revision of the ms, thanks -T.]11 September, 2019 at 7:17 am

DavidIt would obviously be nice to extend the result to showing that was bounded on almost all integers. Having toyed around with this result for a bit, it seems that you clearly can’t have a positive logarithmic density subset on which limits to infinity (because then you can set and some element will have to be bounded by its own value.)

However, it feels hard to construct such a set even if you assume the stronger result for a contradiction. Is there a good reason for an obstruction to this method?

11 September, 2019 at 7:36 am

DavidMore specifically, are there well known examples of functions of which are not bounded (more weakly) on any subset of positive density or (more strongly) on almost all integers and yet have no positive density subsets on which the function goes to infinity?

11 September, 2019 at 1:44 pm

Terence TaoIf there is no set of positive density on which a function is bounded, then for each threshold , the set must necessarily be of zero density, and a standard diagonalisation argument then shows that outside of a set of density zero.

11 September, 2019 at 1:37 pm

Terence TaoThis is discussed in Remark 1.4 of the paper. The best one seems to be able to do with the methods in the paper is to show that for each , one has for a set of positive integers of logarithmic density for some absolute constant (though there is some hope of reducing the error a bit). In particular, for sufficiently large , one can show for a positive logarithmic density set of integers; in principle, this could be combined with a numerical verification of cases up to to obtain the result that the Collatz conjecture held for a set of positive logarithmic density. (Thanks to Ben Green for pointing out this possibility to me.) Unfortunately the value of produced by the arguments in my paper, while in principle explicit, are almost certainly too enormous for this strategy to be easily implemented, but perhaps some variant of the methods in my paper could be used to achieve this.

11 September, 2019 at 12:59 pm

David Lowry-DudaI haven’t had the time to work out the details, but it seems to me that much of what you’ve written would apply for something like the 5n+1 variant of the Collatz conjecture. I wonder if it is also true that almost all (5n+1)-collatz orbits attain almost bounded values?

I think this would be a bit interesting since it is believed that there are orbits in the (5n+1)-collatz problem that get arbitrarily large.

Does this happen to be something you have thought about?

11 September, 2019 at 1:33 pm

Terence TaoThe argument relies in many places on the inequality , to ensure that the Collatz iterations have average negative logarithmic drift. Since , I would instead expect the reverse to be true for the map, namely that almost all iterates would escape to infinity. (Unfortunately the method of proof in this paper won’t extend to this case, as the probability measures are now being stretched into ever larger regions of space rather than being compressed into ever smaller regions, and one can’t hope to keep the probability measures stable in total variation in the former case.)

11 September, 2019 at 5:00 pm

David Lowry-DudaAha, I understand. Thank you.

14 September, 2019 at 7:31 am

oculus driftIs the negative-drift condition enough to get the same type of result for generalized Collatz recursions (finite partition of N into arithmetic progressions with a linear function on each progression)? i.e. if the map heuristically pulls integers toward 1 on average, then the almost everywhere quasi-boundedness holds. It’s hard to hope for anything more than that, and the statement is not so strong as to contradict the undecidability of such recursions in general.

14 September, 2019 at 8:33 am

Terence TaoThis is a very good question. One has to be careful however to define “negative drift” properly – a naive definition can give incorrect predictions. For instance, consider the map that sends when , and when for . A typical number has a 90% chance of being reduced in size by 10 by this map, and a 10% chance of doubling, which would naively suggest a negative drift, but in fact pretty much all orbits get sent to infinity, because they almost certainly meet the residue class at some point, at which point they grow exponentially.

Perhaps to define negative drift properly one has to define the dynamics on the profinite integers instead of the natural numbers (actually one should not need to use all primes here, only the primes that show up in the modulus of the residue classes and in the linear coefficient of the affine maps should be relevant, for instance for Collatz it is the dynamics on which is relevant). If the dynamics on maps Haar measure towards an ergodic invariant probability measure then one can then define the drift to be the mean value of with respect to this measure, and one could tentatively conjecture that the results of my paper would extend to any dynamics with an ergodic limiting measure with negative drift. For instance in the case of Syracuse the attracting measure on is the law of , where (in the notation of the paper) are independent random variables with the laws of respectively; if one includes other primes then one just adds on independent copies of . Here the drift is . (If one uses the original Collatz map rather than the Syracuse acceleration then the invariant measure is somewhat more complicated to describe; roughly speaking it is , where is a non-normalised uniform probability kernel on the natural numbers independent of , conditioned to the event for a geometric random variable independent of all previous variables.) By the ergodic theorem, negative drift would imply that almost all profinite integers would exhibit orbits whose cumulative linear multiplier would go to zero in the Archimedean sense, which would heuristically support almost all orbits becoming bounded.

One interesting question would be to see if one can construct an iteration that had negative drift in the above sense, but which still had at least one unbounded orbit. I don’t think the Conway-style FRACTRAN iterations can achieve this, but perhaps some more subtle variant of them can.

14 September, 2019 at 10:34 am

random readerThe nondegeneracy condition seems to be that the associated finite Markov chain has a single ergodic component (ie., strongly connected subgraph of recurring states) and then the contraction/expansion factor should be the same as on the adelic thing. I am assuming here that, taking N = product of all the primes (each to the largest power it occurs in the congruence moduli, numerators and denominators of the linear maps), there is a suitable and not too complicated sense in which the iteration is a Markov chain whose states are “randomly chosen integers in one residue class mod N”, which would be a restatement and formalization of the probability heuristic, and then to use that chain to calculate the contraction factor.

Assuming that this could be done, is it plausible that your methods adapt without major new ideas to the general iteration, or are they tied in some way to the map being 3x+1?

A related question (to assess dependence on 3x+1 versus variants with the same contraction factor) is the adversarial 3x+1 problem. The Collatz iteration, but sometimes changed to 3x+A where A is chosen each time from a given finite set of alternatives (such as anything from -10 to +10). If an adversary can change 100 percent, or 1 percent, or O(log n), of the applications of 3x+1 to (3x+A)’s, can it force the iteration to go unbounded? If the adversary always has enough freedom to force divergence, is that still true if the choices of A’s are require to equal, in average, the A in the iteration (such as +1 for Collatz) or to concentrate suitably around that number?

14 September, 2019 at 5:50 pm

Terence TaoI think it would be non-trivial, but perhaps possible, to extend the arguments in this paper to this more general setting. The original Collatz problem has a nice property that I rely on somewhat heavily in my paper, namely that the iteration is controlled entirely by the 2-adic nature of the starting point, but the irregularities of distribution caused by the iteration only manifests in the 3-adic statistics. (In particular, there is a key use of the Chinese remainder theorem in the proof of Lemma 5.2 that basically relies on this decoupling between the controlling coordinates and the irregularly distributed coordinates.) For more general Collatz-type systems this would not be the case and the analysis may become somewhat more difficult (though perhaps not completely intractable).

I would imagine that an adversarial model could flip a negative drift model such as the original Collatz to a positive drift one given enough options available to the adversary, which would provide heuristic justification at least for unbounded orbits.

14 September, 2019 at 6:58 pm

random readerThanks for that inspiring answer (and the blog that makes such Q & A possible). It sounds like either the full generalization is attainable, or there are very interesting things to learn about the decoupling property and its extensions and possible role as a (dis)obstruction.

14 September, 2019 at 11:41 pm

Alberto IbañezDespite having no prestige to lose, I do not want to interfere in your great work with my comments of, surely, low quality, but in turn, I think it is time to try to expose my intuitions about the conjecture. When you talk about “nondegeneracy condition” it is because the conjecture avoids this condition or seeks it. I say this because if it is possible to represent the binary relations of the function in some way, we would obtain an infinite Boolean matrix, and apparently, it must be singular or degenerate. “Singular matrices are rare in the sense that a square matrix randomly selected from a continuous uniform distribution on its entries will almost never be singular.” (Wikipedia).If this is so, and if the conjecture is true, it is precisely because the function 3n + 1, and apparently perhaps only it, manages to avoid the kind of residue 0 mod 3 and at the same time from an initial N, also manages to avoid the residue class 0 mod N.If this is so, and if the conjecture is true, it is precise because the function 3n + 1, and apparently perhaps only it, manages to avoid the kind of residue 0 mod 3 and at the same time from an initial N, also manages to avoid the residue class 0 mod N. These characteristics prevent get trapped in a periodic orbit. In the 5n + 1 conjecture, we know that certain numbers fall into periodic orbits and, I don’t know if by chance, but at least a multiple of 3 is involved. Perhaps this condition is what makes it impossible for any multiple of 3 to fall into a periodic orbit, and perhaps the truth of the conjecture is here because it is the multiples of 3 that enable the possibility of entering a periodic orbit.

Maybe I extend too much, but I want to take the opportunity to ask if Professor Tao’s strategy would change if instead of using mod 3 ^ n, would use mod N, where N is the starting point, to see if zero residue mod N is avoided.

Another question I wanted to ask is whether it is interesting or possible to try to prove that if we assume that no N falls into a periodic orbit, it could be shown that no N escapes to infinity, and therefore the truth of the conjecture would depend solely on demonstrating that no it is possible to fall into a periodic orbit. Thanks.

11 September, 2019 at 11:10 pm

AnonymousDear Sir.Tao,

Thanks you very much,

We are in a math conference in our country now. We are talking about you very much. We expect you have a hattrick in September, 2019 with Twin primes conjecture, Goldbach conjecture beside above Collatz conjecture.

Best wishes,

11 September, 2019 at 3:50 pm

Ridiculous AmateurI’m an amateur who didn’t take any math beyond undergrad and so *surely* I will embarass myself here, but I’ve come back to Collatz a lot over the years and I always feel like I’m stumbling somewhere near the answer.

If we can show that repeated application of Col() to any N > 1 will always eventually produce an integer < N (‘reduce’), that removes both the possibility of any other orbits and the possibility of unbounded growth in one go. Easier said than done, but that one property is all we need.

So what I do next is start carving up N into constructible categories that reduce, thereby chipping away at the space where a counterexample could exist.

Obviously even N reduce immediately and need no further consideration.

Fairly trivially, N of the form 4k+1 reduce in 3 applications of Col(), since 4k+1 is necessarily odd, 3 * (4k+1) + 1 = 12k + 4 which is divisible by 4 and so halves twice to yield 3k + 1. 3k + 1 > 4k + 1, so we reduce.

That’s now 3/4 of all N that reduce! Wow, we’re really cooking! Duh, the constructions get hairier from here, but there’s still a pattern hiding in there.

Skipping over the repeated applications and just listing some reductions with (the proportion of N now shown to reduce) and [H]alve and [M]ultiply sequence, we have:

2k -> k (1/2)[H]

4k + 1 -> 3k + 1 (3/4) [MHH]

16k + 3 -> 9k + 2 (13/16) [MHMHHH]

32k + 11 -> 27k + 10 (27/32) [MHMHHMHH]

32k + 23 -> 27k + 20 (28/32) [MHMHMHHH]

128k + 7 -> 81k + 5 (113/128) [MHMHMHHMHHH]

128k + 15 -> 81k + 10 (114/128) [MHMHMHMHHHH]

This is hardly comprehensive, but continuing to fill these out will inch us closer and closer to showing ‘all N’ reduce and if along the way somebody sees a hidden relation or pattern we can close the whole thing.

What I see is that the power of 2 acts as the left-hand coefficient and represents the number of ‘halving’ applications of Col() and the right hand power of 3 is the ‘multiplying’ applications of Col(). We know the power of 2 is always such that the result is greater than the power of 3 on the other side. Any less and it wouldn’t be a reduction, any more and we would have ‘passed’ the reduction.

This means some powers of 2 will never show up, because there’s no available power of 3 for them to reduce. Any 8k + x is more-than-reduced from a 3k + y form, and insufficient to reduce a 9k + y form.

So we can trace out the ‘possible’ forms by examining the relationship between growth of powers of 2 and 3. We know how many M/H operations will happen and it’s only the order that can change.

The possible orderings are bounded by needing to always follow M with H and needing to ‘spread out’ the H operations so as to not overcome the M until the end. The k term remains even until the final application of Col().

So that leaves the constant term to ‘select’ the order within the valid options – or the order to select the constant, however you want to view it.

Even if this approach *doesn’t* reveal any fundamental truth or provide unique insight to solve the problem, it *does* allow a computer program to very quickly prove an arbitrary fraction of possible N reduce, which is much better than counting and testing individual numbers.

12 September, 2019 at 4:06 am

Terence TaoI think you may be on your way to rediscovering the arguments and results of Terras (link to article here). In the language of this blog post or my paper, the conclusion is that for almost all (in the sense of natural density). It is true that if one could in fact show that for

all(not just almost all), this would resolve the Collatz conjecture, but this is one of these situations where there seems to be an enormous gap in difficulty between “almost all” results and “all” results.14 September, 2019 at 12:14 am

AnonymousSuppose that a function from into itself has the property that if then Collatz conjecture would follow. Is the smallest function with this property?

[It appears that your comments are being parsed incorrectly due to the use of < and > symbols that are parsed as HTML tags. Use < and > instead -T.]14 September, 2019 at 9:18 am

AnonymousSuppose that a function has the property that for all implies Collatz conjecture. Is there such (with this property) whose growth to infinity is faster than ?

14 September, 2019 at 9:57 am

Terence TaoWell, one trivially has the bound , so this question is equivalent to the full Collatz conjecture.

11 September, 2019 at 5:34 pm

Almost all Collatz orbits attain almost bounded values – INDIA NEWS[…] Article URL: https://terrytao.wordpress.com/2019/09/10/almost-all-collatz-orbits-attain-almost-bounded-values/ […]

11 September, 2019 at 8:12 pm

Anonymous1） almost all {N} (in the sense of natural density)

2） almost all {N} (in the sense of logarithmic density).

1) is not equivalent to 2） ？

12 September, 2019 at 3:59 am

Terence TaoIf a property holds for almost all in the sense of natural density, then it also holds for almost all in the sense of logarithmic density (this is a good exercise in summation by parts). However, the converse is not always true. For instance, the property of not having a number of digits that is a perfect square (that is, not lying in any interval of the form ) holds for almost all in the sense of logarithmic density, but not in the sense of natural density.

13 September, 2019 at 1:03 am

Alberto IbañezFirst of all, congratulations on this new breakthrough. It has made me very happy that you dedicate your time and effort to this conjecture that so many amateurs dream of making some small advance that helps their resolution, although it is evident that the mathematics necessary to solve it are only within the reach of high-level mathematics.

About “…the possibility of some very rare exceptional {n} for which the orbit goes to infinity, or gets trapped in a periodic loop”, from your previous post. How it is now with this new result?, I mean if the possibility is still less yet, or remains the same, but reinforced from a stronger point of view, in the sense of logarithmic density.

About “…the property of not having a number of digits that is a perfect square (that is, not lying in any interval of the form [10^{n^2-1}, 10^{n^2}-1]) holds for almost all N in the sense of logarithmic density, but not in the sense of natural density.”. I wish I could understand, is there a simple way to explain this?. Thank you. (Written with the help of Google translate and Grammarly)

13 September, 2019 at 7:11 am

Terence TaoOne can use the arguments in the paper to show that the set of points that are in a periodic loop have zero density (in fact the set of exceptions up to is of size for some ; I will sketch the argument in the next revision of the ms. However, the arguments do not say anything directly about the proportion of points whose orbit will

eventuallybe trapped in a periodic loop, or will eventually go to infinity; the analysis in the paper can control most initial values to the point where they can be carried very close to , but after that they are free to either get sent to , get trapped in some other periodic loop, or go to infinity, and I don’t see any rigorous tools to determine which of these is the most probable outcome.The verification of the claims about natural density and logarithmic density are a good exercise in computing sufficiently good upper and lower bounds on various averages. The continuous version of the exercise might be a good warm up for you. Namely: let be the set of positive reals that lie outside the intervals . See if you can show that (a) the limit does not exist, and (b) the limit does exist and is equal to 1. For (a) one needs to get a lower bound for the average for some values of and an upper bound for other values of that are incompatible with the scenario where the limit exists. For (b) one needs upper and lower bounds for the average for all large that converge to each other as . Using the asymptotic notation will make the calculations easier, and drawing pictures of the set (and perhaps of the areas under the curve one needs to integrate) may guide your intuition.

12 September, 2019 at 5:21 am

Terry Tao: can an approach used to prove almost all cases be extended to prove all cases? – Julia Wolffe's Notes[…] Terry Tao posted to the arXiv his paper Almost all Collatz orbits attain almost bounded values, which caused quite the stir on social media. For instance, this Reddit post about it is only a day […]

12 September, 2019 at 4:23 pm

CraigDoes the result also hold for the 3n-1 conjecture? (Where the plus is replaced with a minus) This conjecture is known to be false.

13 September, 2019 at 12:57 am

AnonymousCraig, what do you mean about 3n-1 being known to be false? Is it known to diverge for some n? I just tested it up to 1e8 and found the informal paper http://vixra.org/abs/1610.0106 which claims to have tested it to 1e9. Are there some more publications? It’s surprising that there is not much info about this sequence. It doesn’t seem to have been studied much, but its behaviour superficially resembles 3n+1’s. Wikipedia also says nothing about it.

13 September, 2019 at 4:51 am

CraigThe 3n-1 conjecture does not hold for n=5 an n=17. You get a cycle. It probably never diverges, just like the 3n+1 conjecture.

13 September, 2019 at 6:39 am

CraigThe probable reason why the 3n-1 conjecture is not found so much in the literature is because it is false. But it is very important, since it demonstrates that any proof of the 3n+1 conjecture use the fact that the 3n+1 function has a plus sign and not a minus sign.

13 September, 2019 at 10:29 am

AnonymousOk, thanks. I took the 3n-1 conjecture to mean that 1, 5, and 17 were the only cycles, per the vixra paper. A similar thing happens with the 3n+1 conjecture if you allow negative n. There are several known cycles but’s an open question whether there are more.

13 September, 2019 at 11:43 am

CraigThere is nothing stopping a 3n+1 sequence or a 3n-1 sequence from diverging to infinity. That can happen if most of the iterates in such a sequence use the function (3n+1)/2 or (3n-1)/2 instead of n/2. Even though the probability of this happening is nil, it is still possible. Because of this, I can’t see how anyone will ever prove that either sequence cannot diverge to infinity. The best one can do is do what Prof Tao did in his new paper.

13 September, 2019 at 4:43 pm

AnonymousI’m saying 3n-1 has 3 known cycles, so the 3n-1 conjecture (i.e. statement whose truth value is unknown) would be that all n end up in one of those 3 cycles. The conjecture is false if some number escapes to infinity, or if some number lands in a cycle other than the known 3. Anyway it sounds like that question is still open.

13 September, 2019 at 7:05 am

Terence TaoOne would have to check the details carefully, but on a quick glance it would appear that the main results of my paper would also cover the iteration, after inserting some minus signs in various places (most notably in the definition of the offset map). Heuristically one would expect all orbits of this iteration to be bounded (and thus eventually periodic), although the precise set of possible limiting cycles is different.

EDIT: One slightly tricky thing is that in the 3x-1 case the analogue offset map would only be expected to stabilise if one fixes the parity of due to a new factor of in the map. So one may have to only work with even iterates of the analogue of the Syracuse map, rather than all iterates, in order to get around this problem. But this should only require some minor notational complications in the argument rather than necessitate any serious new difficulties.

SECOND EDIT: On further thought, the offset map for the map doesn’t actually contain a factor, so the arguments should in fact go through without difficulty (indeed, one can think of the iteration as nothing more than the usual iteration applied to negative integers). The previous comments would apply however for a iteration for instance.

14 September, 2019 at 12:58 pm

Alberto IbañezI do not know if I bring some light, but I think that 3n + 1 is symmetric with respect to zero, that is, using negative numbers, to 3n-1, and 3n-1 with positive numbers behaves the same, it is symmetric with respect to 0, to the 3n +1 with negative numbers.

17 September, 2019 at 11:58 am

JeffSo since an infinite subset of the natural numbers map to 5 under 3x-1, almost all leaves that many numbers unaccounted for. Especially when this method sets the target bound of 1!

Basically, 3x-1 gives a concrete example that this ms represents no progress whatsoever on 3x+1. Just like all of the references it cites.

13 September, 2019 at 10:33 am

Жандос МәмбетәліCollatz conjecture in reverse:

Suppose we have such a Collatz fraction:

which is equal to a natural number. Then, for each power of two we have many other degrees of the form , ; any of which can replace the power of two and get another integer (natural) fractional solution.

– Are different natural numbers.

Example:

for: .

Depending on the , we equate the result of calculating such a fraction to , then the relations of finite differences will be:

where this is a subscript exponent associated with , (the index of the power of two of the variable part of the fraction).

13 September, 2019 at 11:20 am

Terry Tao prouve presque la conjecture de Collatz - Technologik.fr[…] Presque toutes les orbites de Collatz atteignent des valeurs presque limitées (article de blog) […]

13 September, 2019 at 2:32 pm

AnonymousIs it possible to extend the method for some collatz-like sequences in ?

14 September, 2019 at 10:15 am

RSHas there been any work on quantifying “almost all”? For instance, is it known that there are c, d < 1 such that for large enough N, all but of integers between 1 and N have minimal element of the Collatz orbit at most ?

14 September, 2019 at 5:45 pm

Terence TaoThis particular claim essentially follows from the first part of Proposition 1.11 of my paper (which also applies if the logarithmic distribution is replaced by the uniform distribution). It may have implicitly also appeared in the previous papers of Allouche and Korec mentioned in the paper, though I don’t have those references handy at present.

14 September, 2019 at 12:37 pm

AnonymousIs there any probabilistic conjecture on the distribution of the lengths of orbits starting with a given distribution of the initial element of the collatz sequence ?

14 September, 2019 at 5:40 pm

Terence TaoYes; see for instance this paper of Kontorovich and Lagarias.

14 September, 2019 at 1:53 pm

AnonymousSince it is not known if the Collatz conjecure is decidable, the current situation is similar to that of the RH (if it is undecidable then it must be true!)

14 September, 2019 at 2:16 pm

not necessarilyNo, because boundedness of individual orbits (unlike RH counterexamples) cannot necessarily be determined by finite calculation. Collatz conjecture, whether that means “all orbits include 1” or “all orbits eventually go below 3 zillion”, could be undecidable because one particular orbit is unbounded, but there is no proof of that in the axiom system chosen as a foundation. If there were a proof that a particular finite algorithm can test any given N for having an unbounded orbit or not, then the problem becomes RH-like in that undecidable implies true (i.e., every N passes the test, so there are no counterexamples, but there is no uniform proof that all N have this property).

15 September, 2019 at 6:19 am

itaibnIf I’m not mistaken, the proof of the prime number theorem can be thought of as first establishing that the prime numbers are on average distributed that way in the sense of logarithmic density, and then showing that the prime numbers are not correlated to any and usually Fourier analysis to establish the distribution of the prime numbers in ordinary density. Is it possible that a similar approach here will establish a result like this one but in the sense of ordinary density instead of logarithmic density? Specifically, we can seek out complex measures of the form , where is continuous in the 3-adic topology, which are eigenfunctions of the dynamics. I believe these measures can be characterized similarly to the Syracuse measure in (1), but with an extra weight , modulo sign and similar errors. If all the eigenvalues for have norm less than one then we will expect that starting from a uniform distribution these components will become negligible and only the Syracuse distribution will remain.

15 September, 2019 at 6:34 am

Terence TaoYes, I believe that some argument along these lines (considering the joint behaviour of the Syracuse iteration in both the “3-adic” and “Archimedean” aspects, by working with a joint character instead of just a 3-adic character in Proposition 1.17 (or equation (7.2)) of my paper, and in particular trying to estimate expressions such as ) should eventually be able to upgrade the “logarithmic density” aspect of my theorem to “natural density”, although the calculations in Section 5 will have to be replaced by a more careful computation that keeps much better track of the Archimedean behaviour of the Syracuse iteration and its potential coupling with the 3-adic behaviour. (It may potentially be that the notation and arguments become cleaner if one works on the adeles instead on various components of the adeles such as , , or (as is done in my paper) , in the spirit of Tate’s thesis, particularly if one wants to generalise to other Collatz type iterations as discussed in other comments here. I also have the vague sense that the nonabelian Fourier analysis of the affine (semi-)group on the adeles might also be naturally involved in all this analysis.) If one is lucky one may be able to leverage the existing purely 3-adic theory in my paper as a “black box” to handle the case and use some other, simpler, argument to handle the case, again somewhat in analogy with the prime number theorem. (The analysis in this paper of Lagarias and Soundararajan on Benford’s law in the Collatz iteration may also be of relevance here. Related to this, I suspect that to get a sufficient amount of Archimedean equidistribution one would need something like Baker’s theorem on the Diophantine nature of .)

I don’t have any plans to work on this particular refinement of the results (and all my current graduate students and postdocs are busy with their own projects and/or are working in rather different areas of mathematics than the ones that are relevant here), so you (or anyone else) are certainly welcome to pursue it if you wish.

15 September, 2019 at 7:45 am

mayor of SimpletonIn Remark 1.4 of the paper, at the end of the paragraph, is the equivalence literally with sets of log density approaching 1 (i.e., there always exists a regular enough subset whose log density exists) or the a priori weaker statement that the lim inf (of the sequence computing) log density goes to 1 as the Collatz orbit-minimum parameter is raised?

In either case there is a formulation of the main result without slow-growing functions, as “[lim sup] log-density of exceptions (Collatz orbit stays above C) goes to 0 as C increases” but it wasn’t immediately obvious which sense of lim-sup applies.

15 September, 2019 at 8:07 am

Terence TaoOh, good point. I had assumed without thinking about it too hard that every set of lim-inf log density at least had a subset whose log-density existed and was equal to , which is why I formulated the remark as stated, but now that I look at it more carefully this is not the case, and one should indeed use the lim-inf formulation instead to get the actual formal equivalence (although it may be possible in this specific case to still recover the stronger claim currently stated in Remark 1.4 even if it is not logically equivalent to the main theorem). This will be corrected in the next revision of the ms.

15 September, 2019 at 11:30 am

Terrence Tao’s Progress on the Collatz Conjecture | An Unstructured Mess[…] Paper: https://arxiv.org/abs/1909.03562 Tao’s blog post about the paper: https://terrytao.wordpress.com/2019/09/10/almost-all-collatz-orbits-attain-almost-bounded-values/ Wikipedia page for Collatz conjecture: https://en.wikipedia.org/wiki/Collatz_conjecture A slightly […]

17 September, 2019 at 3:54 am

Sandor PozsikDear Mr. Tao,

There could be a way to prove conjecture:

1) An impair number will always be followed by a pair.

2) A pair number will either be followed by an impair and then see point 1, either by a pair number.

3) No number is repeated within one series.

4) If the number is a power of 2, the series will converge to 1.

5) from Points 1,2 and 3 we see that unrepeated pair numbers will be generated infinitely, except if the series reaches a number which is a power of 2 (point 4).

6) The probability of generating a power of 2, when generating infinite number of pair numbers is 100%, so it is 100% sure that the serie will reach a power of 2 and then converge to 1.

Best Regards,

Sandor