I’ve been focusing my blog posts recently on the mathematics around Hilbert’s fifth problem (is every locally Euclidean group a Lie group?), but today, I will be discussing another of Hilbert’s problems, namely Hilbert’s seventh problem, on the transcendence of powers of two algebraic numbers . (I am not randomly going through Hilbert’s list, by the way; I hope to explain my interest in the seventh problem in a later post.) This problem was famously solved by Gelfond and Schneider in the 1930s:
Theorem 1 (Gelfond-Schneider theorem) Let be algebraic numbers, with and irrational. Then (any of the values of the possibly multi-valued expression) is transcendental.
For sake of simplifying the discussion, let us focus on just one specific consequence of this theorem:
Corollary 2 is transcendental.
Proof: If not, one could obtain a contradiction to the Gelfond-Schneider theorem by setting and . (Note that is clearly irrational, since for any integers with positive.)
In the 1960s, Alan Baker established a major generalisation of the Gelfond-Schneider theorem known as Baker’s theorem, as part of his work in transcendence theory that later earned him a Fields Medal. Among other things, this theorem provided explicit quantitative bounds on exactly how transcendental quantities such as were. In particular, it gave a strong bound on how irrational such quantities were (i.e. how easily they were approximable by rationals). Here, in particular, is one special case of Baker’s theorem:
for some absolute (and effectively computable) constants .
This theorem may be compared with (the easily proved) Liouville’s theorem on diophantine approximation, which asserts that if is an irrational algebraic number of degree , then
for all with positive, and some effectively computable , and (the more significantly difficult) Thue-Siegel-Roth theorem, which under the same hypotheses gives the bound
for all , all with positive and an ineffective constant . Finally, one should compare these results against Dirichlet’s theorem on Diophantine approximation, which asserts that for any real number one has
for infinitely many with positive. (The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the dueling conspiracies argument, i.e. playing off multiple “conspiracies” against each other; the other results however only focus on one approximation at a time and thus avoid ineffectivity.)
Proposition 3 easily implies the following separation property between powers of and powers of :
Corollary 4 (Separation between powers of and powers of ) For any positive integers one has
for some effectively computable constants (which may be slightly different from those in Proposition 3).
Indeed, this follows quickly from Proposition 3, the identity
and some elementary estimates.
In particular, the gap between powers of three and powers of two grows exponentially in the exponents . I do not know of any other way to establish this fact other than essentially going through some version of Baker’s argument (which will be given below).
For comparison, by exploiting the trivial (yet fundamental) integrality gap – the obvious fact that if an integer is non-zero, then its magnitude is at least – we have the trivial bound
for all positive integers (since, from the fundamental theorem of arithmetic, cannot vanish). Putting this into (1) we obtain a very weak version of Proposition 3, that only gives exponential bounds instead of polynomial ones:
for some absolute (and effectively computable) constant .
The proof of Baker’s theorem (or even of the simpler special case in Proposition 3) is largely elementary (except for some very basic complex analysis), but is quite intricate and lengthy, as a lot of careful book-keeping is necessary in order to get a bound as strong as that in Proposition 3. To illustrate the main ideas, I will prove a bound that is weaker than Proposition 3, but still significantly stronger than Proposition 5, and whose proof already captures many of the key ideas of Baker:
for some absolute constants .
Note that Proposition 3 is equivalent to the assertion that one can take (and effective) in the above proposition.
The proof of Proposition 6 can be made effective (for instance, it is not too difficult to make the close to ); however, in order to simplify the exposition (and in particular, to use some nonstandard analysis terminology to reduce the epsilon management), I will establish Proposition 6 with ineffective constants .
Like many other results in transcendence theory, the proof of Baker’s theorem (and Proposition 6) rely on what we would nowadays call the polynomial method – to play off upper and lower bounds on the complexity of polynomials that vanish (or nearly vanish) to high order on a specified set of points. (I have discussed the polynomial method in relation to problems in incidence geometry in several previous blog posts.) In the specific case of Proposition 6, the points in question are of the form
for some large integer . On the one hand, the irrationality of ensures that the curve
is not algebraic, and so it is difficult for a polynomial of controlled complexity to vanish (or nearly vanish) to high order at all the points of ; the trivial bound in Proposition 5 allows one to make this statement more precise. (Here, “complexity” of a polynomial is an informal term referring both to the degree of the polynomial, and the height of the coefficients, which in our application will essentially be integers up to some normalisation factors.) On the other hand, if Proposition 6 failed, then is close to a rational, which by Taylor expansion makes close to an algebraic curve over the rationals (up to some rescaling by factors such as and ) at each point of . This, together with a pigeonholing argument, allows one to find a polynomial of reasonably controlled complexity to (nearly) vanish to high order at every point of .
These observations, by themselves, are not sufficient to get beyond the trivial bound in Proposition 5. However, Baker’s key insight was to exploit the integrality gap to bootstrap the (near) vanishing of on a set to imply near-vanishing of on a larger set with . The point is that if a polynomial of controlled degree and size (nearly) vanishes to higher order on a lot of points on an analytic curve such as , then it will also be fairly small on many other points in as well. (To quantify this statement efficiently, it is convenient to use the tools of complex analysis, which are particularly well suited to understand zeroes (or small values) of polynomials.) But then, thanks to the integrality gap (and the controlled complexity of ), we can amplify “fairly small” to “very small”.
Using this observation and an iteration argument, Baker was able to take a polynomial of controlled complexity that nearly vanished to high order on a relatively small set , and bootstrap that to show near-vanishing on a much larger set . This bootstrap allows one to dramatically bridge the gap between the upper and lower bounds on the complexity of polynomials that nearly vanish to a specified order on a given , and eventually leads to Proposition 6 (and, with much more care and effort, to Proposition 3).
— 1. Nonstandard formulation —
The proof of Baker’s theorem requires a lot of “epsilon management” in that one has to carefully choose a lot of parameters such as and in order to make the argument work properly. This is particularly the case if one wants a good value of exponents in the final result, such as the quantity in Proposition 6. To simplify matters, we will abandon all attempts to get good values of constants anywhere, which allows one to retreat to the nonstandard analysis setting where the notation is much cleaner, and much (though not all) of the epsilon management is eliminated. This is a relatively mild use of nonstandard analysis, though, and it is not difficult to turn all the arguments below into standard effective arguments (but at the cost of explicitly tracking all the constants ). See for instance the notes of Soundararajan for such an effective treatment.
We turn to the details. We will assume some basic familiarity with nonstandard analysis, as covered for instance in this previous blog post (but one should be able to follow this argument using only non-rigorous intuition of what terms such as “unbounded” or “infinitesimal” mean).
Let be an unbounded (nonstandard) positive real number. Relative to this , we can define various notions of size:
- A nonstandard number is said to be of polynomial size if one has for some standard .
- A nonstandard number is said to be of polylogarithmic size if one has for some standard .
- A nonstandard number is said to be of quasipolynomial size if one has for some standard .
- A nonstandard number is said to be quasiexponentially small if one has for every standard .
- Given two nonstandard numbers with non-negative, we write or if for some standard . We write or if we have for all standard .
As a general rule of thumb, in our analysis all exponents will be of polylogarithmic size, all coefficients will be of quasipolynomial size, and all error terms will be quasiexponentially small.
In this nonstandard analysis setting, there is a clean calculus (analogous to the calculus of the asymptotic notations and ) to manipulate these sorts of quantities without having to explicitly track the constants . For instance:
- The sum, product, or difference of two quantities of a given size (polynomial, polylogarithmic, quasipolynomial, or quasiexponentially small) remains of that given size (i.e. each size range forms a ring).
- If , and is of a given size, then is also of that size.
- If is of quasipolynomial size and is of polylogarithmic size, then is of quasipolynomial size, and (if is a natural number) is also of quasipolynomial size.
- If is quasiexponentially small, and is of quasipolynomial size, then is also quasiexponentially small. (Thus, the quasiexponentially small numbers form an ideal in the ring of quasipolynomial numbers.)
- Any quantity of polylogarithmic size, is of polynomial size; and any quantity of polynomial size, is of quasipolynomial size.
We will refer to these sorts of facts as asymptotic calculus, and rely upon them heavily to simplify a lot of computations (particularly regarding error terms).
Proposition 6 is then equivalent to the following assertion:
Proposition 7 (Nonstandard weak special case of Baker) Let be an unbounded nonstandard natural number, and let be a rational of height at most (i.e. ). Then is not quasiexponentially small (relative to , of course).
Let us quickly see why Proposition 7 implies Proposition 6 (the converse is easy and is left to the reader). This is the usual “compactness and contradiction” argument. Suppose for contradiction that Proposition 6 failed. Carefully negating the quantifiers, we may then find a sequence of (standard) rationals with , such that
for all natural numbers . As is irrational, must go to infinity. Taking the ultralimit of the , and setting to be (say) , we contradict Proposition 7.
It remains to prove Proposition 7. We fix the unbounded nonstandard natural number , and assume for contradiction that is quasiexponentially close to a nonstandard rational of height at most . We will write for the assertion that is quasiexponentially small, thus
The objective is to show that (2) leads to a contradiction.
— 2. The polynomial method —
Now it is time to introduce the polynomial method. We will be working with the following class of polynomials:
of two nonstandard variables of some (nonstandard) degree at most (in each variable), where is a nonstandard natural number of polylogarithmic size, and whose coefficients are (nonstandard) integers of quasipolynomial size. (A technical point: we require the to depend in an internal fashion on the indices , in order for the nonstandard summation here to be well-defined.) Define the height of the polynomial to be the maximum magnitude of the coefficients in ; thus, by hypothesis, is of quasipolynomial size.
We have a key definition:
for all nonstandard natural numbers and .
Now, from (2) we have
Using the asymptotic calculus (and the hypotheses that are of polylogarithmic size, and the are of quasipolynomial size) we conclude that the left-hand side of (4) is
The quantity (and its reciprocal) is of quasipolynomial size. Thus, the condition (4) is equivalent to the assertion that
for all and .
Using this reformulation of (4), we can now give some upper and lower bounds on the complexity of good polynomials that nearly vanish to a high order on a set . We first give an lower bound, that prevents the degree from being smaller than :
Proof: Suppose for contradiction that . Then from (6) we have
for ; thus there is a non-trivial linear dependence between the (nonstandard) vectors for . But, from the formula for the Vandermonde determinant, this would imply that two of the are equal, which is absurd.
In the converse direction, we can obtain polynomials that vanish to a high order on , but with degree larger than :
Then there exists a non-trivial good polynomial of degree at most that vanishes to order on . Furthermore, has height at most
Proof: We use the pigeonholing argument of Thue and Siegel. Let be an positive quantity of quasipolynomial size to be chosen later, and choose coefficients for that are nonstandard natural numbers between and . There are possible ways to make such a selection. For each such selection, we consider the expressions arising as the left-hand side of (6) with and . These expressions are nonstandard integers whose magnitude is bounded by
which by asymptotic calculus simplifies to be bounded by
The number of possible values of these expressions is thus
By the hypothesis and asymptotic calculus, we can make this quantity less than for some of size
In particular, can be taken to be of polylogarithmic size. Thus, by the pigeonhole principle, one can find two choices for the coefficients which give equal values for the expressions in the left-hand side of (6). Subtracting those two choices we obtain the result.
— 3. The bootstrap —
At present, there is no contradiction between the lower bound in Proposition 10 and the upper bound in Proposition 11, because there is plenty of room between the two bounds. To bridge the gap between the bounds, we need a bootstrap argument that uses vanishing on one to imply vanishing (to slightly lower order) on a larger . The key bootstrap in this regard is:
Let be a good polynomial of degree at most and height , that nearly vanishes to order on . Then also vanishes to order on for any .
Proof: It is convenient to use complex analysis methods. We consider the entire function
thus by (3)
By hypothesis, we have
for all and . We wish to show that
for and . Clearly we may assume that .
(oriented anticlockwise), where is to be chosen later, and estimate it in two different ways. Firstly, we have
so for , we have the bound
when , which by the hypotheses and asymptotic calculus, simplifies to
Also, when we have
We conclude the upper bound
Now we consider the poles at . For each such , we see that the first derivatives of are quasiexponentially small at . Thus, by Taylor expansion (and asymptotic calculus), one can express as the sum of a polynomial of degree with quasiexponentially small coefficients, plus an entire function that vanishes to order at . The latter term contributes nothing to the residue at , while from the Cauchy integral formula (applied, for instance, to a circle of radius around ) and asymptotic calculus, we see that the former term contributes a residue is quasiexponentially small. In particular, it is less than . We conclude that
choosing to be a large standard multiple of and using the hypothesis , we can simplify this to
To improve this bound, we use the integrality gap. Recall that from (5) that
in particular, is quasiexponentially close to a (nonstandard) integer. Since
(say). Using the integrality gap, we conclude that
which implies that nearly vanishes to order on , as required.
with a gap of a positive power of between each such inequality. For instance, one could take
many other choices are possible (and one can optimise these choices eventually to get a good value of exponent in Proposition 6).
Using Proposition 11, we can find a good polynomial which vanishes to order on , of height , and hence (by the assumptions on ) of height .
Applying Proposition 12, nearly vanishes to order on for any . Iterating this, an easy induction shows that for any standard , nearly vanishes to order on for any . As was chosen to be larger than a positive power of , we conclude that nearly vanishes to order at least on for any of polylogarithmic size. But for large enough, this contradicts Proposition 10.
Remark 1 The above argument places a lower bound on quantities such as
for integer . Baker’s theorem, in its full generality, gives a lower bound on quantities such as
for algebraic numbers , which is polynomial in the height of the quantities involved, assuming of course that are multiplicatively independent, and that all quantities are of bounded degree. The proof is more intricate than the one given above, but follows a broadly similar strategy, and the constants are completely effective.