You are currently browsing the tag archive for the ‘Baker’s theorem’ tag.
One of the most notorious problems in elementary mathematics that remains unsolved is the Collatz conjecture, concerning the function defined by setting
when
is odd, and
when
is even. (Here,
is understood to be the positive natural numbers
.)
Conjecture 1 (Collatz conjecture) For any given natural number
, the orbit
passes through
(i.e.
for some
).
Open questions with this level of notoriety can lead to what Richard Lipton calls “mathematical diseases” (and what I termed an unhealthy amount of obsession on a single famous problem). (See also this xkcd comic regarding the Collatz conjecture.) As such, most practicing mathematicians tend to spend the majority of their time on more productive research areas that are only just beyond the range of current techniques. Nevertheless, it can still be diverting to spend a day or two each year on these sorts of questions, before returning to other matters; so I recently had a go at the problem. Needless to say, I didn’t solve the problem, but I have a better appreciation of why the conjecture is (a) plausible, and (b) unlikely be proven by current technology, and I thought I would share what I had found out here on this blog.
Let me begin with some very well known facts. If is odd, then
is even, and so
. Because of this, one could replace
by the function
, defined by
when
is odd, and
when
is even, and obtain an equivalent conjecture. Now we see that if one chooses
“at random”, in the sense that it is odd with probability
and even with probability
, then
increases
by a factor of roughly
half the time, and decreases it by a factor of
half the time. Furthermore, if
is uniformly distributed modulo
, one easily verifies that
is uniformly distributed modulo
, and so
should be roughly
times as large as
half the time, and roughly
times as large as
the other half of the time. Continuing this at a heuristic level, we expect generically that
half the time, and
the other half of the time. The logarithm
of this orbit can then be modeled heuristically by a random walk with steps
and
occuring with equal probability. The expectation
is negative, and so (by the classic gambler’s ruin) we expect the orbit to decrease over the long term. This can be viewed as heuristic justification of the Collatz conjecture, at least in the “average case” scenario in which is chosen uniform at random (e.g. in some large interval
). (It also suggests that if one modifies the problem, e.g. by replacing
to
, then one can obtain orbits that tend to increase over time, and indeed numerically for this variant one sees orbits that appear to escape to infinity.) Unfortunately, one can only rigorously keep the orbit uniformly distributed modulo
for time about
or so; after that, the system is too complicated for naive methods to control at anything other than a heuristic level.
Remark 1 One can obtain a rigorous analogue of the above arguments by extending
from the integers
to the
-adics
. This compact abelian group comes with a Haar probability measure, and one can verify that this measure is invariant with respect to
; with a bit more effort one can verify that it is ergodic. This suggests the introduction of ergodic theory methods. For instance, using the pointwise ergodic theorem, we see that if
is a random
-adic integer, then almost surely the orbit
will be even half the time and odd half the time asymptotically, thus supporting the above heuristics. Unfortunately, this does not directly tell us much about the dynamics on
, as this is a measure zero subset of
. More generally, unless a dynamical system is somehow “polynomial”, “nilpotent”, or “unipotent” in nature, the current state of ergodic theory is usually only able to say something meaningful about generic orbits, but not about all orbits. For instance, the very simple system
on the unit circle
is well understood from ergodic theory (in particular, almost all orbits will be uniformly distributed), but the orbit of a specific point, e.g.
, is still nearly impossible to understand (this particular problem being equivalent to the notorious unsolved question of whether the digits of
are uniformly distributed).
The above heuristic argument only suggests decreasing orbits for almost all (though even this remains unproven, the state of the art is that the number of
in
that eventually go to
is
, a result of Krasikov and Lagarias). It leaves open the possibility of some very rare exceptional
for which the orbit goes to infinity, or gets trapped in a periodic loop. Since the only loop that
lies in is
(for
) or
(for
), we thus may isolate a weaker consequence of the Collatz conjecture:
Conjecture 2 (Weak Collatz conjecture) Suppose that
is a natural number such that
for some
. Then
is equal to
,
, or
.
Of course, we may replace with
(and delete “
“) and obtain an equivalent conjecture.
This weaker version of the Collatz conjecture is also unproven. However, it was observed by Bohm and Sontacchi that this weak conjecture is equivalent to a divisibility problem involving powers of and
:
Conjecture 3 (Reformulated weak Collatz conjecture) There does not exist
and integers
such that
is a positive integer that is a proper divisor of
Proof: To see this, it is convenient to reformulate Conjecture 2 slightly. Define an equivalence relation on
by declaring
if
for some integer
, thus giving rise to the quotient space
of equivalence classes
(which can be placed, if one wishes, in one-to-one correspondence with the odd natural numbers). We can then define a function
by declaring
for any , where
is the largest power of
that divides
. It is easy to see that
is well-defined (it is essentially the Syracuse function, after identifying
with the odd natural numbers), and that periodic orbits of
correspond to periodic orbits of
or
. Thus, Conjecture 2 is equivalent to the conjecture that
is the only periodic orbit of
.
Now suppose that Conjecture 2 failed, thus there exists such that
for some
. Without loss of generality we may take
to be odd, then
. It is easy to see that
is the only fixed point of
, and so
. An easy induction using (2) shows that
where, for each ,
is the largest power of
that divides
In particular, as is odd,
. Using the recursion
we see from induction that divides
, and thus
:
Since , we have
for some integer . Since
is divisible by
, and
is odd, we conclude
; if we rearrange the above equation as (1), then we obtain a counterexample to Conjecture 3.
Conversely, suppose that Conjecture 3 failed. Then we have , integers
and a natural number such that (1) holds. As
, we see that the right-hand side of (1) is odd, so
is odd also. If we then introduce the natural numbers
by the formula (3), then an easy induction using (4) shows that
with the periodic convention for
. As the
are increasing in
(even for
), we see that
is the largest power of
that divides the right-hand side of (5); as
is odd, we conclude that
is also the largest power of
that divides
. We conclude that
and thus is a periodic orbit of
. Since
is an odd number larger than
, this contradicts Conjecture 3.
Call a counterexample a tuple that contradicts Conjecture 3, i.e. an integer
and an increasing set of integers
such that (1) holds for some . We record a simple bound on such counterexamples, due to Terras and to Garner :
Lemma 5 (Exponent bounds) Let
, and suppose that the Collatz conjecture is true for all
. Let
be a counterexample. Then
Proof: The first bound is immediate from the positivity of . To prove the second bound, observe from the proof of Proposition 4 that the counterexample
will generate a counterexample to Conjecture 2, i.e. a non-trivial periodic orbit
. As the conjecture is true for all
, all terms in this orbit must be at least
. An inspection of the proof of Proposition 4 reveals that this orbit consists of
steps of the form
, and
steps of the form
. As all terms are at least
, the former steps can increase magnitude by a multiplicative factor of at most
. As the orbit returns to where it started, we conclude that
whence the claim.
The Collatz conjecture has already been verified for many values of (up to at least
, according to this web site). Inserting this into the above lemma, one can get lower bounds on
. For instance, by methods such as this, it is known that any non-trivial periodic orbit has length at least
, as shown in Garner’s paper (and this bound, which uses the much smaller value
that was available in 1981, can surely be improved using the most recent computational bounds).
Now we can perform a heuristic count on the number of counterexamples. If we fix and
, then
, and from basic combinatorics we see that there are
different ways to choose the remaining integers
to form a potential counterexample . As a crude heuristic, one expects that for a “random” such choice of integers, the expression (1) has a probability
of holding for some integer
. (Note that
is not divisible by
or
, and so one does not expect the special structure of the right-hand side of (1) with respect to those moduli to be relevant. There will be some choices of
where the right-hand side in (1) is too small to be divisible by
, but using the estimates in Lemma 5, one expects this to occur very infrequently.) Thus, the total expected number of solutions for this choice of
is
The heuristic number of solutions overall is then expected to be
where, in view of Lemma 5, one should restrict the double summation to the heuristic regime , with the approximation here accurate to many decimal places.
We need a lower bound on . Here, we will use Baker’s theorem (as discussed in this previous post), which among other things gives the lower bound
for some absolute constant . Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation
gives
where is the entropy function
A brief computation shows that
and so (ignoring all subexponential terms)
which makes the series (6) convergent. (Actually, one does not need the full strength of Lemma 5 here; anything that kept well away from
would suffice. In particular, one does not need an enormous value of
; even
(say) would be more than sufficient to obtain the heuristic that there are finitely many counterexamples.) Heuristically applying the Borel-Cantelli lemma, we thus expect that there are only a finite number of counterexamples to the weak Collatz conjecture (and inserting a bound such as
, one in fact expects it to be extremely likely that there are no counterexamples at all).
This, of course, is far short of any rigorous proof of Conjecture 2. In order to make rigorous progress on this conjecture, it seems that one would need to somehow exploit the structural properties of numbers of the form
In some very special cases, this can be done. For instance, suppose that one had with at most one exception (this is essentially what is called a
-cycle by Steiner). Then (8) simplifies via the geometric series formula to a combination of just a bounded number of powers of
and
, rather than an unbounded number. In that case, one can start using tools from transcendence theory such as Baker’s theorem to obtain good results; for instance, in the above-referenced paper of Steiner, it was shown that
-cycles cannot actually occur, and similar methods have been used to show that
-cycles (in which there are at most
exceptions to
) do not occur for any
, as was shown by Simons and de Weger. However, for general increasing tuples of integers
, there is no such representation by bounded numbers of powers, and it does not seem that methods from transcendence theory will be sufficient to control the expressions (8) to the extent that one can understand their divisibility properties by quantities such as
.
Amusingly, there is a slight connection to Littlewood-Offord theory in additive combinatorics – the study of the random sums
generated by some elements of an additive group
, or equivalently, the vertices of an
-dimensional parallelepiped inside
. Here, the relevant group is
. The point is that if one fixes
and
(and hence
), and lets
vary inside the simplex
then the set of all sums of the form (8) (viewed as an element of
) contains many large parallelepipeds. (Note, incidentally, that once one fixes
, all the sums of the form (8) are distinct; because given (8) and
, one can read off
as the largest power of
that divides (8), and then subtracting off
one can then read off
, and so forth.) This is because the simplex
contains many large cubes. Indeed, if one picks a typical element
of
, then one expects (thanks to Lemma 5) that there there will be
indices
such that
for
, which allows one to adjust each of the
independently by
if desired and still remain inside
. This gives a cube in
of dimension
, which then induces a parallelepiped of the same dimension in
. A short computation shows that the generators of this parallelepiped consist of products of a power of
and a power of
, and in particular will be coprime to
.
If the weak Collatz conjecture is true, then the set must avoid the residue class
in
. Let us suppose temporarily that we did not know about Baker’s theorem (and the associated bound (7)), so that
could potentially be quite small. Then we would have a large parallelepiped inside a small cyclic group
that did not cover all of
, which would not be possible for
small enough. Indeed, an easy induction shows that a
-dimensional parallelepiped in
, with all generators coprime to
, has cardinality at least
. This argument already shows the lower bound
. In other words, we have
Proposition 6 Suppose the weak Collatz conjecture is true. Then for any natural numbers
with
, one has
.
This bound is very weak when compared against the unconditional bound (7). However, I know of no way to get a nontrivial separation property between powers of and powers of
other than via transcendence theory methods. Thus, this result strongly suggests that any proof of the Collatz conjecture must either use existing results in transcendence theory, or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve the Collatz conjecture.)
By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence theory bound (7)):
Proposition 7 Suppose the weak Collatz conjecture is true. Then for any natural numbers
with
, one has
for some absolute constant
.
Proof: (Informal sketch only) Suppose not, then we can find with
of size
. We form the set
as before, which contains parallelepipeds in
of large dimension
that avoid
. We can count the number of times
occurs in one of these parallelepipeds by a standard Fourier-analytic computation involving Riesz products (see Chapter 7 of my book with Van Vu, or this recent preprint of Maples). Using this Fourier representation, the fact that this parallelepiped avoids
(and the fact that
) forces the generators
to be concentrated in a Bohr set, in that one can find a non-zero frequency
such that
of the
generators lie in the set
. However, one can choose the generators to essentially have the structure of a (generalised) geometric progression (up to scaling, it resembles something like
for
ranging over a generalised arithmetic progression, and
a fixed irrational), and one can show that such progressions cannot be concentrated in Bohr sets (this is similar in spirit to the exponential sum estimates of Bourgain on approximate multiplicative subgroups of
, though one can use more elementary methods here due to the very strong nature of the Bohr set concentration (being of the “
concentration” variety rather than the “
concentration”).). This furnishes the required contradiction.
Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful enough to create exponential separation between powers of and powers of
.
Unfortunately, once one uses the transcendence theory bound (7), the size of the cyclic group
becomes larger than the volume of any cube in
, and Littlewood-Offord techniques are no longer of much use (they can be used to show that
is highly equidistributed in
, but this does not directly give any way to prevent
from containing
).
One possible toy model problem for the (weak) Collatz conjecture is a conjecture of Erdos asserting that for , the base
representation of
contains at least one
. (See this paper of Lagarias for some work on this conjecture and on related problems.) To put it another way, the conjecture asserts that there are no integer solutions to
with and
. (When
, of course, one has
.) In this form we see a resemblance to Conjecture 3, but it looks like a simpler problem to attack (though one which is still a fair distance beyond what one can do with current technology). Note that one has a similar heuristic support for this conjecture as one does for Proposition 3; a number of magnitude
has about
base
digits, so the heuristic probability that none of these digits are equal to
is
, which is absolutely summable.
I’ve been focusing my blog posts recently on the mathematics around Hilbert’s fifth problem (is every locally Euclidean group a Lie group?), but today, I will be discussing another of Hilbert’s problems, namely Hilbert’s seventh problem, on the transcendence of powers of two algebraic numbers
. (I am not randomly going through Hilbert’s list, by the way; I hope to explain my interest in the seventh problem in a later post.) This problem was famously solved by Gelfond and Schneider in the 1930s:
Theorem 1 (Gelfond-Schneider theorem) Let
be algebraic numbers, with
and
irrational. Then (any of the values of the possibly multi-valued expression)
is transcendental.
For sake of simplifying the discussion, let us focus on just one specific consequence of this theorem:
Corollary 2
is transcendental.
Proof: If not, one could obtain a contradiction to the Gelfond-Schneider theorem by setting and
. (Note that
is clearly irrational, since
for any integers
with
positive.)
In the 1960s, Alan Baker established a major generalisation of the Gelfond-Schneider theorem known as Baker’s theorem, as part of his work in transcendence theory that later earned him a Fields Medal. Among other things, this theorem provided explicit quantitative bounds on exactly how transcendental quantities such as were. In particular, it gave a strong bound on how irrational such quantities were (i.e. how easily they were approximable by rationals). Here, in particular, is one special case of Baker’s theorem:
Proposition 3 (Special case of Baker’s theorem) For any integers
with
positive, one has
for some absolute (and effectively computable) constants
.
This theorem may be compared with (the easily proved) Liouville’s theorem on diophantine approximation, which asserts that if is an irrational algebraic number of degree
, then
for all with
positive, and some effectively computable
, and (the more significantly difficult) Thue-Siegel-Roth theorem, which under the same hypotheses gives the bound
for all , all
with
positive and an ineffective constant
. Finally, one should compare these results against Dirichlet’s theorem on Diophantine approximation, which asserts that for any real number
one has
for infinitely many with
positive. (The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the dueling conspiracies argument, i.e. playing off multiple “conspiracies”
against each other; the other results however only focus on one approximation at a time and thus avoid ineffectivity.)
Proposition 3 easily implies the following separation property between powers of and powers of
:
Corollary 4 (Separation between powers of
and powers of
) For any positive integers
one has
for some effectively computable constants
(which may be slightly different from those in Proposition 3).
Indeed, this follows quickly from Proposition 3, the identity
and some elementary estimates.
In particular, the gap between powers of three and powers of two
grows exponentially in the exponents
. I do not know of any other way to establish this fact other than essentially going through some version of Baker’s argument (which will be given below).
For comparison, by exploiting the trivial (yet fundamental) integrality gap – the obvious fact that if an integer is non-zero, then its magnitude is at least
– we have the trivial bound
for all positive integers (since, from the fundamental theorem of arithmetic,
cannot vanish). Putting this into (1) we obtain a very weak version of Proposition 3, that only gives exponential bounds instead of polynomial ones:
Proposition 5 (Trivial bound) For any integers
with
positive, one has
for some absolute (and effectively computable) constant
.
The proof of Baker’s theorem (or even of the simpler special case in Proposition 3) is largely elementary (except for some very basic complex analysis), but is quite intricate and lengthy, as a lot of careful book-keeping is necessary in order to get a bound as strong as that in Proposition 3. To illustrate the main ideas, I will prove a bound that is weaker than Proposition 3, but still significantly stronger than Proposition 5, and whose proof already captures many of the key ideas of Baker:
Proposition 6 (Weak special case of Baker’s theorem) For any integers
with
, one has
for some absolute constants
.
Note that Proposition 3 is equivalent to the assertion that one can take (and
effective) in the above proposition.
The proof of Proposition 6 can be made effective (for instance, it is not too difficult to make the close to
); however, in order to simplify the exposition (and in particular, to use some nonstandard analysis terminology to reduce the epsilon management), I will establish Proposition 6 with ineffective constants
.
Like many other results in transcendence theory, the proof of Baker’s theorem (and Proposition 6) rely on what we would nowadays call the polynomial method – to play off upper and lower bounds on the complexity of polynomials that vanish (or nearly vanish) to high order on a specified set of points. (I have discussed the polynomial method in relation to problems in incidence geometry in several previous blog posts.) In the specific case of Proposition 6, the points in question are of the form
for some large integer . On the one hand, the irrationality of
ensures that the curve
is not algebraic, and so it is difficult for a polynomial of controlled complexity to vanish (or nearly vanish) to high order at all the points of
; the trivial bound in Proposition 5 allows one to make this statement more precise. (Here, “complexity” of a polynomial is an informal term referring both to the degree of the polynomial, and the height of the coefficients, which in our application will essentially be integers up to some normalisation factors.) On the other hand, if Proposition 6 failed, then
is close to a rational, which by Taylor expansion makes
close to an algebraic curve over the rationals (up to some rescaling by factors such as
and
) at each point of
. This, together with a pigeonholing argument, allows one to find a polynomial
of reasonably controlled complexity to (nearly) vanish to high order at every point of
.
These observations, by themselves, are not sufficient to get beyond the trivial bound in Proposition 5. However, Baker’s key insight was to exploit the integrality gap to bootstrap the (near) vanishing of on a set
to imply near-vanishing of
on a larger set
with
. The point is that if a polynomial
of controlled degree and size (nearly) vanishes to higher order on a lot of points on an analytic curve such as
, then it will also be fairly small on many other points in
as well. (To quantify this statement efficiently, it is convenient to use the tools of complex analysis, which are particularly well suited to understand zeroes (or small values) of polynomials.) But then, thanks to the integrality gap (and the controlled complexity of
), we can amplify “fairly small” to “very small”.
Using this observation and an iteration argument, Baker was able to take a polynomial of controlled complexity that nearly vanished to high order on a relatively small set
, and bootstrap that to show near-vanishing on a much larger set
. This bootstrap allows one to dramatically bridge the gap between the upper and lower bounds on the complexity of polynomials that nearly vanish to a specified order on a given
, and eventually leads to Proposition 6 (and, with much more care and effort, to Proposition 3).
Below the fold, I give the details of this argument. My treatment here is inspired by this expose of Serre, and these lecture notes of Soundararajan (as transcribed by Ian Petrow).
Recent Comments