Let {\bar{{\bf Q}}} be the algebraic closure of {{\bf Q}}, that is to say the field of algebraic numbers. We fix an embedding of {\bar{{\bf Q}}} into {{\bf C}}, giving rise to a complex absolute value {z \mapsto |z|} for algebraic numbers {z \in \bar{{\bf Q}}}.

Let {\alpha \in \bar{{\bf Q}}} be of degree {D > 1}, so that {\alpha} is irrational. A classical theorem of Liouville gives the quantitative bound

\displaystyle  |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^D} \ \ \ \ \ (1)

for the irrationality of {\alpha} fails to be approximated by rational numbers {p/q}, where {c>0} depends on {\alpha,D} but not on {p,q}. Indeed, if one lets {\alpha = \alpha_1, \alpha_2, \dots, \alpha_D} be the Galois conjugates of {\alpha}, then the quantity {\prod_{i=1}^D |q \alpha_i - p|} is a non-zero natural number divided by a constant, and so we have the trivial lower bound

\displaystyle  \prod_{i=1}^D |q \alpha_i - p| \geq c

from which the bound (1) easily follows. A well known corollary of the bound (1) is that Liouville numbers are automatically transcendental.

The famous theorem of Thue, Siegel and Roth improves the bound (1) to

\displaystyle  |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^{2+\epsilon}} \ \ \ \ \ (2)

for any {\epsilon>0} and rationals {\frac{p}{q}}, where {c>0} depends on {\alpha,\epsilon} but not on {p,q}. Apart from the {\epsilon} in the exponent and the implied constant, this bound is optimal, as can be seen from Dirichlet’s theorem. This theorem is a good example of the ineffectivity phenomenon that affects a large portion of modern number theory: the implied constant in the {\gg} notation is known to be finite, but there is no explicit bound for it in terms of the coefficients of the polynomial defining {\alpha} (in contrast to (1), for which an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule out one counterexample to (2), in which {\frac{p}{q}} is far closer to {\alpha} than {\frac{1}{|q|^{2+\epsilon}}}; however we can rule out two such counterexamples, by playing them off of each other.

A powerful strengthening of the Thue-Siegel-Roth theorem is given by the subspace theorem, first proven by Schmidt and then generalised further by several authors. To motivate the theorem, first observe that the Thue-Siegel-Roth theorem may be rephrased as a bound of the form

\displaystyle  | \alpha p - \beta q | \times | \alpha' p - \beta' q | \geq c (1 + |p| + |q|)^{-\epsilon} \ \ \ \ \ (3)

for any algebraic numbers {\alpha,\beta,\alpha',\beta'} with {(\alpha,\beta)} and {(\alpha',\beta')} linearly independent (over the algebraic numbers), and any {(p,q) \in {\bf Z}^2} and {\epsilon>0}, with the exception when {\alpha,\beta} or {\alpha',\beta'} are rationally dependent (i.e. one is a rational multiple of the other), in which case one has to remove some lines (i.e. subspaces in {{\bf Q}^2}) of rational slope from the space {{\bf Z}^2} of pairs {(p,q)} to which the bound (3) does not apply (namely, those lines for which the left-hand side vanishes). Here {c>0} can depend on {\alpha,\beta,\alpha',\beta',\epsilon} but not on {p,q}. More generally, we have

Theorem 1 (Schmidt subspace theorem) Let {d} be a natural number. Let {L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}} be linearly independent linear forms. Then for any {\epsilon>0}, one has the bound

\displaystyle  \prod_{i=1}^d |L_i(x)| \geq c (1 + \|x\| )^{-\epsilon}

for all {x \in {\bf Z}^d}, outside of a finite number of proper subspaces of {{\bf Q}^d}, where

\displaystyle  \| (x_1,\dots,x_d) \| := \max( |x_1|, \dots, |x_d| )

and {c>0} depends on {\epsilon, d} and the {\alpha_{i,j}}, but is independent of {x}.

Being a generalisation of the Thue-Siegel-Roth theorem, it is unsurprising that the known proofs of the subspace theorem are also ineffective with regards to the constant {c}. (However, the number of exceptional subspaces may be bounded effectively; cf. the situation with the Skolem-Mahler-Lech theorem, discussed in this previous blog post.) Once again, the lower bound here is basically sharp except for the {\epsilon} factor and the implied constant: given any {\delta_1,\dots,\delta_d > 0} with {\delta_1 \dots \delta_d = 1}, a simple volume packing argument (the same one used to prove the Dirichlet approximation theorem) shows that for any sufficiently large {N \geq 1}, one can find integers {x_1,\dots,x_d \in [-N,N]}, not all zero, such that

\displaystyle  |L_i(x)| \ll \delta_i

for all {i=1,\dots,d}. Thus one can get {\prod_{i=1}^d |L_i(x)|} comparable to {1} in many different ways.

There are important generalisations of the subspace theorem to other number fields than the rationals (and to other valuations than the Archimedean valuation {z \mapsto |z|}); we will develop one such generalisation below.

The subspace theorem is one of many finiteness theorems in Diophantine geometry; in this case, it is the number of exceptional subspaces which is finite. It turns out that finiteness theorems are very compatible with the language of nonstandard analysis. (See this previous blog post for a review of the basics of nonstandard analysis, and in particular for the nonstandard interpretation of asymptotic notation such as {\ll} and {o()}.) The reason for this is that a standard set {X} is finite if and only if it contains no strictly nonstandard elements (that is to say, elements of {{}^* X \backslash X}). This makes for a clean formulation of finiteness theorems in the nonstandard setting. For instance, the standard form of Bezout’s theorem asserts that if {P(x,y), Q(x,y)} are coprime polynomials over some field, then the curves {\{ (x,y): P(x,y) = 0\}} and {\{ (x,y): Q(x,y)=0\}} intersect in only finitely many points. The nonstandard version of this is then

Theorem 2 (Bezout’s theorem, nonstandard form) Let {P(x,y), Q(x,y)} be standard coprime polynomials. Then there are no strictly nonstandard solutions to {P(x,y)=Q(x,y)=0}.

Now we reformulate Theorem 1 in nonstandard language. We need a definition:

Definition 3 (General position) Let {K \subset L} be nested fields. A point {x = (x_1,\dots,x_d)} in {L^d} is said to be in {K}-general position if it is not contained in any hyperplane of {L^d} definable over {K}, or equivalently if one has

\displaystyle  a_1 x_1 + \dots + a_d x_d = 0 \iff a_1=\dots = a_d = 0

for any {a_1,\dots,a_d \in K}.

Theorem 4 (Schmidt subspace theorem, nonstandard version) Let {d} be a standard natural number. Let {L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}} be linearly independent standard linear forms. Let {x \in {}^* {\bf Z}^d} be a tuple of nonstandard integers which is in {{\bf Q}}-general position (in particular, this forces {x} to be strictly nonstandard). Then one has

\displaystyle  \prod_{i=1}^d |L_i(x)| \gg \|x\|^{-o(1)},

where we extend {L_i} from {\bar{{\bf Q}}} to {{}^* \bar{{\bf Q}}} (and also similarly extend {\| \|} from {{\bf Z}^d} to {{}^* {\bf Z}^d}) in the usual fashion.

Observe that (as is usual when translating to nonstandard analysis) some of the epsilons and quantifiers that are present in the standard version become hidden in the nonstandard framework, being moved inside concepts such as “strictly nonstandard” or “general position”. We remark that as {x} is in {{\bf Q}}-general position, it is also in {\bar{{\bf Q}}}-general position (as an easy Galois-theoretic argument shows), and the requirement that the {L_1,\dots,L_d} are linearly independent is thus equivalent to {L_1(x),\dots,L_d(x)} being {\bar{{\bf Q}}}-linearly independent.

Exercise 1 Verify that Theorem 1 and Theorem 4 are equivalent. (Hint: there are only countably many proper subspaces of {{\bf Q}^d}.)

We will not prove the subspace theorem here, but instead focus on a particular application of the subspace theorem, namely to counting integer points on curves. In this paper of Corvaja and Zannier, the subspace theorem was used to give a new proof of the following basic result of Siegel:

Theorem 5 (Siegel’s theorem on integer points) Let {P \in {\bf Q}[x,y]} be an irreducible polynomial of two variables, such that the affine plane curve {C := \{ (x,y): P(x,y)=0\}} either has genus at least one, or has at least three points on the line at infinity, or both. Then {C} has only finitely many integer points {(x,y) \in {\bf Z}^2}.

This is a finiteness theorem, and as such may be easily converted to a nonstandard form:

Theorem 6 (Siegel’s theorem, nonstandard form) Let {P \in {\bf Q}[x,y]} be a standard irreducible polynomial of two variables, such that the affine plane curve {C := \{ (x,y): P(x,y)=0\}} either has genus at least one, or has at least three points on the line at infinity, or both. Then {C} does not contain any strictly nonstandard integer points {(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2}.

Note that Siegel’s theorem can fail for genus zero curves that only meet the line at infinity at just one or two points; the key examples here are the graphs {\{ (x,y): y - f(x) = 0\}} for a polynomial {f \in {\bf Z}[x]}, and the Pell equation curves {\{ (x,y): x^2 - dy^2 = 1 \}}. Siegel’s theorem can be compared with the more difficult theorem of Faltings, which establishes finiteness of rational points (not just integer points), but now needs the stricter requirement that the curve {C} has genus at least two (to avoid the additional counterexample of elliptic curves of positive rank, which have infinitely many rational points).

The standard proofs of Siegel’s theorem rely on a combination of the Thue-Siegel-Roth theorem and a number of results on abelian varieties (notably the Mordell-Weil theorem). The Corvaja-Zannier argument rebalances the difficulty of the argument by replacing the Thue-Siegel-Roth theorem by the more powerful subspace theorem (in fact, they need one of the stronger versions of this theorem alluded to earlier), while greatly reducing the reliance on results on abelian varieties. Indeed, for curves with three or more points at infinity, no theory from abelian varieties is needed at all, while for the remaining cases, one mainly needs the existence of the Abel-Jacobi embedding, together with a relatively elementary theorem of Chevalley-Weil which is used in the proof of the Mordell-Weil theorem, but is significantly easier to prove.

The Corvaja-Zannier argument (together with several further applications of the subspace theorem) is presented nicely in this Bourbaki expose of Bilu. To establish the theorem in full generality requires a certain amount of algebraic number theory machinery, such as the theory of valuations on number fields, or of relative discriminants between such number fields. However, the basic ideas can be presented without much of this machinery by focusing on simple special cases of Siegel’s theorem. For instance, we can handle irreducible cubics that meet the line at infinity at exactly three points {[1,\alpha_1,0], [1,\alpha_2,0], [1,\alpha_3,0]}:

Theorem 7 (Siegel’s theorem with three points at infinity) Siegel’s theorem holds when the irreducible polynomial {P(x,y)} takes the form

\displaystyle  P(x,y) = (y - \alpha_1 x) (y - \alpha_2 x) (y - \alpha_3 x) + Q(x,y)

for some quadratic polynomial {Q \in {\bf Q}[x,y]} and some distinct algebraic numbers {\alpha_1,\alpha_2,\alpha_3}.

Proof: We use the nonstandard formalism. Suppose for sake of contradiction that we can find a strictly nonstandard integer point {(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2} on a curve {C := \{ (x,y): P(x,y)=0\}} of the indicated form. As this point is infinitesimally close to the line at infinity, {y_*/x_*} must be infinitesimally close to one of {\alpha_1,\alpha_2,\alpha_3}; without loss of generality we may assume that {y_*/x_*} is infinitesimally close to {\alpha_1}.

We now use a version of the polynomial method, to find some polynomials of controlled degree that vanish to high order on the “arm” of the cubic curve {C} that asymptotes to {[1,\alpha_1,0]}. More precisely, let {D \geq 3} be a large integer (actually {D=3} will already suffice here), and consider the {\bar{{\bf Q}}}-vector space {V} of polynomials {R(x,y) \in \bar{{\bf Q}}[x,y]} of degree at most {D}, and of degree at most {2} in the {y} variable; this space has dimension {3D}. Also, as one traverses the arm {y/x \rightarrow \alpha_1} of {C}, any polynomial {R} in {V} grows at a rate of at most {D}, that is to say {R} has a pole of order at most {D} at the point at infinity {[1,\alpha_1,0]}. By performing Laurent expansions around this point (which is a non-singular point of {C}, as the {\alpha_i} are assumed to be distinct), we may thus find a basis {R_1, \dots, R_{3D}} of {V}, with the property that {R_j} has a pole of order at most {D+1-j} at {[1,\alpha_1,0]} for each {j=1,\dots,3D}.

From the control of the pole at {[1,\alpha_1,0]}, we have

\displaystyle  |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{D+1-j}

for all {j=1,\dots,3D}. The exponents here become negative for {j > D+1}, and on multiplying them all together we see that

\displaystyle  \prod_{j=1}^{3D} |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{3D(D+1) - \frac{3D(3D+1)}{2}}.

This exponent is negative for {D} large enough (or just take {D=3}). If we expand

\displaystyle  R_j(x_*,y_*) = \sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b

for some algebraic numbers {\alpha_{j,a,b}}, then we thus have

\displaystyle  \prod_{j=1}^{3D} |\sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b| \ll (|x_*|+|y_*|)^{-\epsilon}

for some standard {\epsilon>0}. Note that the {3D}-dimensional vectors {(\alpha_{j,a,b})_{a+b \leq D; b \leq 2}} are linearly independent in {{\bf C}^{3D}}, because the {R_j} are linearly independent in {V}. Applying the Schmidt subspace theorem in the contrapositive, we conclude that the {3D}-tuple {( x_*^a y_*^b )_{a+b \leq D; b \leq 2} \in {}^* {\bf Z}^{3D}} is not in {{\bf Q}}-general position. That is to say, one has a non-trivial constraint of the form

\displaystyle  \sum_{a+b \leq D; b \leq 2} c_{a,b} x_*^a y_*^b = 0 \ \ \ \ \ (4)

for some standard rational coefficients {c_{a,b}}, not all zero. But, as {P} is irreducible and cubic in {y}, it has no common factor with the standard polynomial {\sum_{a+b \leq D; b \leq 2} c_{a,b} x^a y^b}, so by Bezout’s theorem (Theorem 2) the constraint (4) only has standard solutions, contradicting the strictly nonstandard nature of {(x_*,y_*)}. \Box

Exercise 2 Rewrite the above argument so that it makes no reference to nonstandard analysis. (In this case, the rewriting is quite straightforward; however, there will be a subsequent argument in which the standard version is significantly messier than the nonstandard counterpart, which is the reason why I am working with the nonstandard formalism in this blog post.)

A similar argument works for higher degree curves that meet the line at infinity in three or more points, though if the curve has singularities at infinity then it becomes convenient to rely on the Riemann-Roch theorem to control the dimension of the analogue of the space {V}. Note that when there are only two or fewer points at infinity, though, one cannot get the negative exponent of {-\epsilon} needed to usefully apply the subspace theorem. To deal with this case we require some additional tricks. For simplicity we focus on the case of Mordell curves, although it will be convenient to work with more general number fields {{\bf Q} \subset K \subset \bar{{\bf Q}}} than the rationals:

Theorem 8 (Siegel’s theorem for Mordell curves) Let {k} be a non-zero integer. Then there are only finitely many integer solutions {(x,y) \in {\bf Z}^2} to {y^2 - x^3 = k}. More generally, for any number field {K}, and any nonzero {k \in K}, there are only finitely many algebraic integer solutions {(x,y) \in {\mathcal O}_K^2} to {y^2-x^3=k}, where {{\mathcal O}_K} is the ring of algebraic integers in {K}.

Again, we will establish the nonstandard version. We need some additional notation:

Definition 9

  • We define an almost rational integer to be a nonstandard {x \in {}^* {\bf Q}} such that {Mx \in {}^* {\bf Z}} for some standard positive integer {M}, and write {{\bf Q} {}^* {\bf Z}} for the {{\bf Q}}-algebra of almost rational integers.
  • If {K} is a standard number field, we define an almost {K}-integer to be a nonstandard {x \in {}^* K} such that {Mx \in {}^* {\mathcal O}_K} for some standard positive integer {M}, and write {K {}^* {\bf Z} = K {\mathcal O}_K} for the {K}-algebra of almost {K}-integers.
  • We define an almost algebraic integer to be a nonstandard {x \in {}^* {\bar Q}} such that {Mx} is a nonstandard algebraic integer for some standard positive integer {M}, and write {\bar{{\bf Q}} {}^* {\bf Z}} for the {\bar{{\bf Q}}}-algebra of almost algebraic integers.
  • Theorem 10 (Siegel for Mordell, nonstandard version) Let {k} be a non-zero standard algebraic number. Then the curve {\{ (x,y): y^2 - x^3 = k \}} does not contain any strictly nonstandard almost algebraic integer point.

    Another way of phrasing this theorem is that if {x,y} are strictly nonstandard almost algebraic integers, then {y^2-x^3} is either strictly nonstandard or zero.

    Exercise 3 Verify that Theorem 8 and Theorem 10 are equivalent.

    Due to all the ineffectivity, our proof does not supply any bound on the solutions {x,y} in terms of {k}, even if one removes all references to nonstandard analysis. It is a conjecture of Hall (a special case of the notorious ABC conjecture) that one has the bound {|x| \ll_\epsilon |k|^{2+\epsilon}} for all {\epsilon>0} (or equivalently {|y| \ll_\epsilon |k|^{3+\epsilon}}), but even the weaker conjecture that {x,y} are of polynomial size in {k} is open. (The best known bounds are of exponential nature, and are proven using a version of Baker’s method: see for instance this text of Sprindzuk.)

    A direct repetition of the arguments used to prove Theorem 7 will not work here, because the Mordell curve {\{ (x,y): y^2 - x^3 = k \}} only hits the line at infinity at one point, {[0,1,0]}. To get around this we will exploit the fact that the Mordell curve is an elliptic curve and thus has a group law on it. We will then divide all the integer points on this curve by two; as elliptic curves have four 2-torsion points, this will end up placing us in a situation like Theorem 7, with four points at infinity. However, there is an obstruction: it is not obvious that dividing an integer point on the Mordell curve by two will produce another integer point. However, this is essentially true (after enlarging the ring of integers slightly) thanks to a general principle of Chevalley and Weil, which can be worked out explicitly in the case of division by two on Mordell curves by relatively elementary means (relying mostly on unique factorisation of ideals of algebraic integers). We give the details below the fold.

    — 1. Dividing by two on the Mordell curve —

    This section will be elementary (in the sense that the arguments may be phrased in the first-order language of rings), and so it will make no difference whether we are working in the standard or nonstandard formalism. As such, we make no reference to nonstandard analysis here.

    As is well known, any elliptic curve {C} (after adjoining a point {0} at infinity) has a group law {+}, with {P+Q+R=0} whenever {P,Q,R} are distinct collinear points on {C}, or {2P+R=0} if the tangent line at {P} meets {C} at another point {R}. In the case of the Mordell curve {C = \{ (x,y): y^2 - x^3 = k \}}, the double {2(x,y) = (x_2,y_2)} of a point {(x,y) \in C} is given by the formulae

    \displaystyle  \lambda := \frac{3x^2}{2y} \ \ \ \ \ (5)

    \displaystyle  x_2 := \lambda^2 - 2x

    \displaystyle  y_2 := \lambda (x-x_2)-y

    as long as {y \neq 0}. (For {y=0}, the double will be the point at infinity.) Geometrically, {\lambda} is the slope of the tangent line to {C} at {(x,y)}.

    Over the complex numbers, the constraint {y \neq 0} removes three points {(-k^{1/3},0), (-\omega k^{1/3},0), (-\omega^2 k^{1/3},0)} from {C}, where {\omega := e^{2\pi i/3}} is the standard cube root of unity. This thrice punctured curve is no longer an affine plane curve, but one can make it affine again by incorporating {\lambda} as another coordinate, giving rise to the lifted curve

    \displaystyle  C' := \{ (x,y,\lambda): y^2 -x^3 = k; 2y \lambda = 3x^2 \},

    and the doubling map {P \mapsto 2P} lifts to the polynomial map {f: C' \rightarrow C} defined by

    \displaystyle  f( x, y, \lambda ) := ( \lambda^2 - 2x, \lambda(x-(\lambda^2-2x)) - y).

    Over the complex numbers, it is well known that the elliptic curve {C} is isomorphic as a group to the torus {({\bf R}/{\bf Z})^2}, which implies that the map {f} is four-to-one. Now we consider the inversion problem explicitly: given a point {(x_2,y_2) \in C}, find a point {(x,y,\lambda) \in {\bf C}} such that

    \displaystyle  x_2 = \lambda^2 - 2x

    and

    \displaystyle  y_2 := \lambda (x-x_2)-y.

    We would also like to avoid using division as much as possible, since in our application {(x_2,y_2)} will be an integer point, and we would like {x,y,\lambda} to also be “close” to being integers too (in a sense to be made precise later). We can rearrange the above two equations as

    \displaystyle  x = \frac{\lambda^2-x_2}{2} \ \ \ \ \ (6)

    and

    \displaystyle  y = \lambda(x-x_2)-y_2 \ \ \ \ \ (7)

    and so we see that to find {(x,y,\lambda)}, it suffices to find {\lambda}, and furthermore if {x_2,y_2,\lambda} are “close” to integers, then {x,y} are close to integers also (up to a single division by two). One can also check from a routine computation that if (6), (7), (5) hold, then we have {y^2-x^3=y_2^2 - x_2^3}, so that {(x,y,\lambda)} indeed lies in {C'}.

    If one inserts (6), (7) into (5), one arrives at the quartic equation

    \displaystyle  \lambda^4 - 6 x_2 \lambda^2 - 8 \lambda y_2 - 3 x_2^2 = 0. \ \ \ \ \ (8)

    One could solve this equation by the general quartic formula, but actually the formula simplifies substantially for this specific quartic equation, and can be obtained as follows. The Mordell equation {y_2^2 - x_2^3 = k} can be rewritten as

    \displaystyle  y_2^2 = (x_2 + k^{1/3}) (x_2 + \omega k^{1/3}) (x_2 + \omega^2 k^{1/3}).

    We can take square roots and then write

    \displaystyle  y_2 = r s t

    where

    \displaystyle  r^2 = x_2 + k^{1/3}

    \displaystyle  s^2 = x_2 + \omega k^{1/3}

    \displaystyle  t^2 = x_2 + \omega^2 k^{1/3}.

    As each {r,s,t} is determined up to sign, and {rst} is fixed, there are exactly four choices for {(r,s,t)} here (even when {y_2} vanishes). For any such choice, if one sets

    \displaystyle  \lambda := r+s+t, \ \ \ \ \ (9)

    then on squaring and rearranging we have

    \displaystyle  \lambda^2 - 3x_2 = 2(rs + st + tr)

    and on squaring again

    \displaystyle  \lambda^4 - 6x_2 \lambda^2 + 9 x_2^2 = 4(rs + st + tr)^2;

    since

    \displaystyle  (rs + st + tr)^2 = 3x_2^2 + 2 rst (r+s+t)

    one soon sees that (8) holds. Thus we have found the four solutions {(x,y,\lambda)} to the equation {f(x,y,\lambda) = (x_2,y_2)} by choosing the square roots {r,s,t} of {x_2+k^{1/3}, x_2+\omega k^{1/3}, x_2+\omega^2 k^{1/3}}, and then defining {\lambda} by (9) and {x,y} by (6), (7). In particular, if {x_2,y_2} are (rational) integers, then {r,s,t,\lambda} are algebraic integers, and {x,y} are algebraic integers divided by two.

    — 2. The Chevalley-Weil principle —

    The Chevalley-Weil principle asserts, roughly speaking, that if one has a “finite cover” of one affine variety {V} by another {W}, with the varieties and covering maps defined over the rationals, then every integer point in {V} lifts to a “near”-integer point of {W}. (There is also a version for projective varieties, which I will not discuss here; one can also generalise integers and rationals to other number fields.) To make the notion of “finite cover” precise, one needs the notion of an etale morphism; to make the notion of “near”-integer precise, we can use the notion of almost algebraic integer. We will not formalise the principle in full generality here; see for instance this text of Lang, this text of Hindry-Silverman, this text of Serre, or this text of Bombieri and Gubler, for a precise statement.

    Here, we will focus on simple special cases of the Chevalley-Weil principle, in which one can work “by hand” using quite classical methods from algebraic number theory. We begin with a very elementary example, which in some sense goes all the way back to Diophantus himself:

    Theorem 11 (Baby case of Chevalley-Weil) Let {k} be a non-zero integer, and let {(x,y) \in {\bf Z}^2} be such that {y^2 = x(x+k)}. Then {x,x+k} are “nearly perfect squares” in the sense that one has {x = au^2}, {x+k = bv^2} for some integers {u,v,a,b} with {a,b} dividing {k}.

    To interpret this result as a special case of the Chevalley-Weil principle, we take {V} to be the affine plane curve {\{ (x,y): y^2 = x(x+k)\}}, {W} to be the affine curve {\{ (x,r,s): r^2 = x; s^2 = x+k \}}, with covering map {(x,r,s) \mapsto (x,rs)}; thus {W} is a double cover of {V}, and the theorem is asserting that integer points in {V} are covered by near-integer points in {W} (up to square roots of factors of {k}).

    By transfer, the above theorem also applies in the setting where {x,y} are non-standard integers and {k} remains standard; in that case, {u,v} now become nonstandard also, but {a,b} remain standard. In particular, {\pm x^{1/2} = \pm a^{1/2} u} and {\pm (x+k)^{1/2} = \pm b^{1/2} v} become almost algebraic integers.

    Proof: We can assume that {y}, and hence {x} and {x+k}, are non-zero, since the {y=0} case is easily verified. We use the fundamental theorem of arithmetic to factor {y} into primes {y = \pm p_1^{c_1} \dots p_r^{c_r}} for some distinct primes {p_1,\dots,p_r} and positive natural numbers {c_1,\dots,c_r}, thus

    \displaystyle  x(x+k) = p_1^{2c_1} \dots p_r^{2c_r}.

    On the other hand, the greatest common divisor of {x} and {x+k} is a divisor of {k}. From this, we see that for each prime {p_i} not dividing {k}, {p_i} will divide one of {x,x+k} exactly {2c_i} times, and not divide the other, whereas for primes {p_i} dividing {k}, {p_i} will either divide each of {x,x+k} an even number of times, or an odd number of times. Collecting terms (including the units {\pm 1} in the prime factorisations of {x,x+k}) to form {a,b,u,v}, the claim follows. \Box

    Remark 1 One can reformulate the above argument by using the {p}-adic valuations {||_p} instead of the fundamental theorem of arithmetic. That formulation is more convenient when proving the Chevalley-Weil theorem in full generality.

    Now we give a variant of the above theorem which is of relevance for our application. It will be convenient to phrase the variant in the nonstandard setting.

    Theorem 12 (Toddler case of Chevalley-Weil) Let {k} be a non-zero standard algebraic number, and let {(x,y)\in (\bar{{\bf Q}} {}^* {\bf Z})^2} be a pair of almost algebraic integers such that {y^2-x^3=k}. Then we can write {x+k^{1/3} = r^2}, {x + \omega k^{1/3} = s^2}, {x + \omega k^{1/3} = t^2}, where {r,s,t} are almost algebraic integers.

    This is a case of the Chevalley-Weil principle with {V = \{ (x,y): y^2 - x^3 = k\}}, {W = \{ (x,r,s,t): r^2 = x+k^{1/3}, s^2 = x+\omega k^{1/3}, t^2 = x +\omega^2 k^{1/3} \}}, and covering map {(x,r,s,t) \mapsto (x,rst)}. The claim is then that any pre-image of an almost algebraic integer point is again an almost algebraic integer point.

    Proof: We use essentially the same argument as the previous theorem, but in the language of ideals rather than numbers. The case {y=0} is easy, so we assume that {y \neq 0}. By choosing a suitably large standard number field {K}, we may assume that {Mx, My, Mk^{1/3}, \omega M\omega k^{1/3}, M \omega^2 k^{1/3}} are nonstandard algebraic integers in {{}^* {\mathcal O}_K} for some standard {M \geq 1}. As is well known, the ring of algebraic integers {{\mathcal O}_K} is a Dedekind domain, so that one has unique factorisation of ideals. In particular, the nonstandard principal ideal {(My)}, which is an ideal of {{}^* {\mathcal O}_K}, factorises as a nonstandard product

    \displaystyle  (My) = P_1^{c_1} \dots P_r^{c_r}

    for some distinct nonstandard prime ideals {P_1,\dots,P_r} (not necessarily principal), a nonstandard natural number {r}, and positive nonstandard natural numbers {c_1,\dots,c_r}. Since {y^2 = (x+k^{1/3}) (x+\omega k^{1/3}) (x+\omega^2 k^{1/3})}, we conclude in particular that

    \displaystyle  (Mx+Mk^{1/3}) (Mx+M\omega k^{1/3}) (Mx+M\omega^2 k^{1/3}) = (M) P_1^{2c_1} \dots P_r^{2c_r}.

    If (say) {(Mx+Mk^{1/3})} and {(M\omega x+Mk^{1/3})} are both divisible by a prime ideal {P_i}, then {(Mk^{1/3} - M\omega k^{1/3})} is also; but the latter ideal is standard, and so such {P_i} are standard, and furthermore the number of such {P_i} is bounded. Similarly for other pairs from {(Mx+Mk^{1/3}), (Mx+M\omega k^{1/3}), (Mx+M\omega^2 k^{1/3})}. Thus for all other {P_i}, the {P_i} divide exactly one of {(x+k^{1/3}), (x+\omega k^{1/3}), (x+\omega^2 k^{1/3})} an even number of times, and do not divide the other two ideals. This implies a factorisation of the form

    \displaystyle  (Mx+Mk^{1/3}) = A U^2;

    \displaystyle  (Mx+M\omega k^{1/3}) = B V^2;

    \displaystyle  (Mx+M\omega^2 k^{1/3}) = C W^2

    where {U,V,W} are nonstandard ideals and {A,B,C} are standard ideals. From class field theory, the class group of {K} is finite, which implies that any nonstandard ideal can be expressed as the product of a nonstandard principal ideal and a standard fractional ideal. Thus we may write

    \displaystyle  (x+k^{1/3}) = A' (u)^2; (x+\omega k^{1/3}) = B' (v)^2; \quad (x+\omega^2 k^{1/3}) = C' (w)^2

    where {A',B',C'} are now standard fractional ideals. However, the ratio of two principal ideals is a principal fractional ideal, and thus {A' = (a')}, {B' = (b')}, {C' = (c')} for some standard {a',b',c' \in K}. In other words, we have

    \displaystyle  x+k^{1/3} = e a' u^2; x+\omega k^{1/3} = fb' v^2; \quad x+\omega^2 k^{1/3} = gc' w^2

    for some nonstandard units {e,f,g}. But from

    \begin

    Let {k} be a standard non-zero algebraic number, and let {C'} be the affine curve

    \displaystyle  C' := \{ (x,y,\lambda): y^2 -x^3 = k; 2y \lambda = 3x^2 \},

    Then {C'} does not contain any strictly nonstandard almost algebraic integer point {(x,y,\lambda)}.

    Indeed, Theorem 12 and the calculations from the previous section reveal that if the curve {\{ (x,y): x^2 - y^3 \}} contains a strictly nonstandard almost algebraic integer point, then the lifted curve {C'} does also.

    The argument from Theorem 7 can be adapted without much difficulty to show that {C'} does not contain any strictly nonstandard point in {{}^* {\bf Z}^3}. However to work with almost algebraic integers instead of nonstandard integers, we need to extend the subspace theorem to this setting, a topic to which we now turn.

    — 3. The subspace theorem in number fields —

    If {x \in {\bf Q} {}^* {\bf Z}} is a nonstandard almost rational integer which is non-zero, then one clearly has {|x| \gg 1}; this is (up to the {o(1)} exponent) the {d=1} case of the subspace theorem, Theorem 4.

    Now suppose that {x \in K {}^* {\bf Z}} is a non-zero nonstandard almost algebraic integer for some standard number field {K}. Then it is no longer necessarily the case that {|x| \gg 1}; for instance, this would imply that

    \displaystyle  | p - \sqrt{2} q| \gg 1

    for any nonstandard integers {p,q}, which contradicts the Dirichlet approximation theorem. However, if {K} is a Galois extension of {{\bf Q}}, and {G = \hbox{Gal}(K/Q)} is the Galois group, then it is still true that

    \displaystyle  \prod_{\sigma \in G} |\sigma(x)| \gg 1

    since {\prod_{\sigma \in G} \sigma(x)} is a non-zero element of {{\bf Q} {}^* {\bf Z}}.

    In a similar spirit, we claim the following generalisation of the subspace theorem to number fields. If {x} is an algebraic number of degree {D}, we write

    \displaystyle  \| x \|_\infty := (|x^{(1)}| \dots |x^{(D)}|)^{1/D}

    for the norm at {\infty}, where {x^{(1)},\dots,x^{(D)}} are the Galois conjugates of {x}, and for {(x_1,\dots,x_d) \in \bar{{\bf Q}}^d}, we write

    \displaystyle  \| (x_1,\dots,x_d) \|_\infty := \max( \|x_1\|_\infty, \dots, \|x_d\|_\infty ).

    Clearly this norm is Galois-invariant.

    Theorem 13 (Schmidt subspace theorem for number fields) Let {d} be a standard natural number, and let {K} be a standard Galois extension of {{\bf Q}}, with Galois group {G := \hbox{Gal}(K/{\bf Q})}. For each {\sigma \in G}, let {L_{\sigma,1},\dots,L_{\sigma,d}: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}} be linearly independent standard linear forms. Let {x \in (K {}^* {\bf Z})^d} be in {K}-general position. Then one has

    \displaystyle  \prod_{\sigma \in G} \prod_{i=1}^d |\sigma( L_{\sigma,i}(x))| \gg \|x\|_\infty^{-o(1)},

    where {\sigma} acts on {K {}^* {\bf Z}} (and hence on {(K {}^* {\bf Z})^d}) in the obvious fashion.

    Exercise 4 State a logically equivalent standard form of Theorem 13 (analogous to Theorem 1), and prove the logical equivalence.

    Theorem 13 (in the logically equivalent standard form) was proven by Schlickewei (who also handled more general valuations than the complex absolute value {z \mapsto |z|}, and also allowed the coefficients of {q} to be {S}-integers rather than algebraic integers). However, Theorem 13 can also be deduced from using Theorem 1 as a black box, as we shall shortly demonstrate.

    Let us see how Theorem 13 implies Theorem 2.

    Proof: (of Theorem 2 assuming Theorem 13). This is a modification of the argument of Theorem 7. Suppose for contradiction that we can find a non-zero algebraic number {k} such that the curve

    \displaystyle  C' := \{ (x,y,\lambda): y^2 -x^3 = k; 2y \lambda = 3x^2 \}

    contains a strictly nonstandard almost algebraic integer point {(x_*,y_*,\lambda_*)}. We may place {x_*,y_*,\lambda_*} inside {K {}^* {\bf Z}} for some number field {K}, which we may take to be Galois by enlarging {K} as necessary. Let {G = \hbox{Gal}(K/{\bf Q})} be the Galois group.

    For each {\sigma \in G}, the points {\sigma(x_*,y_*,\lambda_*)} are either bounded (i.e., {|\sigma(x_*)|, |\sigma(y_*)|, |\sigma(\lambda_*)| \ll 1}) or unbounded. If {\sigma(x_*,y_*,\lambda_*)} is bounded for all {\sigma \in G} then the coefficients of {x_*,y_*,\lambda_* \in K {}^* {\bf Z}} are all bounded, so that {(x_*,y_*,\lambda_*)} is standard, a contradiction. Thus {\sigma(x_*,y_*,\lambda_*)} is unbounded for at least one {\sigma \in G}.

    Let {D} be a large integer to be chosen later (actually {D=3} will do here), and consider the vector space {V} of polynomials {R(x,y,\lambda) \in \bar{{\bf Q}}[x,y,\lambda]} spanned by the monomials

    \displaystyle  x^i y^j \hbox{ for } j=0,\dots,D \hbox{ and } i=0,1,2

    and

    \displaystyle  x^i \lambda^j \hbox{ for } j=1,\dots,3D \hbox{ and } i=0,1,2.

    This space thus has dimension {3(4D+1)}. We claim that none of the non-zero polynomials {R(x,y,\lambda)} in {V} vanish identically on {C'}. To see this, we divide into two cases. If {R} contains at least one monomial with a non-trivial power of {\lambda}, then we can write {R(x,y,\lambda) = \lambda^j P(x) + \dots} for some {1 \leq j \leq 3D}, where {P} is a non-zero polynomial in {x} of degree at most {2} and {\dots} has degree less than {j} in {\lambda}. By inspecting the limits {x \rightarrow - k^{1/3}, -\omega k^{1/3}, -\omega^2 k^{1/3}}, {y \rightarrow 0}, {\lambda\rightarrow \infty} of {C'}, we see that if {R} vanished on {C'}, then the non-zero polynomial {P} of degree at most {2} would have to vanish at all three points {x = -k^{1/3}, -\omega k^{1/3}, -\omega^2 k^{1/3}}, a contradiction. Finally, if {R} has no terms involving {\lambda}, then it is a polynomial in {x,y} of degree at most {2} in {x}, and thus not divisible by {y^2-x^3}, and the claim now follows from Bezout’s theorem.

    We claim that for every {\sigma \in G}, we may find a basis {R_{\sigma,1},\dots,R_{\sigma,3(4D+1)}} for {V} with the property that

    \displaystyle  \prod_{i=1}^{3(4D+1)} |\sigma( R_{\sigma,i}( x_*, y_*, \lambda_* ))| \ll 1,

    with the improvement

    \displaystyle  \prod_{i=1}^{3(4D+1)} |\sigma( R_{\sigma,i}( x_*, y_*, \lambda_* ))| \ll \| (x_*,y_*,\lambda_*) \|^{-\epsilon} \ \ \ \ \ (10)

    for some standard {\epsilon > 0} if {\sigma(x_*,y_*,\lambda_*)} is unbounded (which, as previously observed, must hold at least once). If this claim held, then on multiplying these bounds together and applying Theorem 13, we conclude that the tuple

    \displaystyle  ( ( x_*^i y_*^j )_{j=0,\dots,D; i=0,1,2}, ( x_*^i \lambda_*^j )_{j=1,\dots,3D; i=0,1,2} )

    is not in {K}-general position, which implies that {R(x_*,y_*,\lambda_*)=0} for some non-zero {R \in V}. But {R} does not vanish identically on the irreducible curve {C'}, so by Bezout’s theorem {R} can only vanish at standard points of {C'}, a contradiction.

    It remains to prove the claim. To simplify the notation we just handle the case when {\sigma} is the identity {1}, as the other cases are treated similarly. If {(x_*, y_*, \lambda_* )} is bounded, then the claim is trivial (taking {R_{1,i}} to be an arbitrary basis of {V}, e.g. the monomial basis), so assume that {(x_*,y_*,\lambda_*)} is unbounded. From the geometry of {C'}, we see that this can only occur if either {y_*} is unbounded and {\lambda_*} is infinitesimally close to zero, or else if {\lambda_*} is unbounded, {y_*} is infinitesimally close to zero, and {x_*} is infinitesimally close to one of {-k^{1/3}}, {-\omega k^{1/3}}, or {-\omega^2 k^{1/3}}.

    First suppose that {y_*} is unbounded. Then as {y \rightarrow \infty}, one can perform a Puiseux series expansion of any polynomial {V} in powers of {y^{1/3}}, with the highest power being {y^{D + 2/3}}. As such, we may find a basis {R_1,\dots,R_{3(4D+1)}} of {V} for which

    \displaystyle  |R_i( x_*,y_*,\lambda_*)| \ll |y_*|^{D+\frac{2-i}{3}}

    for {i=1,\dots,3(4D+1)}, which gives (10) for {D} large enough.

    Now suppose that it is {\lambda_*} which is unbounded. Then as {\lambda \rightarrow \infty}, {y \rightarrow 0}, and {x} converges to one of {-k^{1/3}, -\omega k^{1/3}, -\omega^2 k^{1/3}}, one can perform a Laurent series expansion of any polynomial in {V} in powers of {\lambda}, with the highest power being {\lambda^{3D}}. Thus we may find a basis {R_1,\dots,R_{3(4D+1)}} of {V} for which

    \displaystyle  |R_i( x_*,y_*,\lambda_*)| \ll |\lambda_*|^{3D+1-i}

    for {i=1,\dots,3(4D+1)}, and again (10) follows for {D} large enough. This concludes the proof of Theorem 7. \Box

    — 4. Amplifying the subspace theorem —

    Now we show how Theorem 4 may be amplified to prove Theorem 13. Our arguments will use very little number theory, relying primarily on linear algebra and the nonstandard analysis framework. (As always, one can reformulate these arguments in the standard setting, but the linear algebra arguments become quite complicated, requiring the use of many rank reduction arguments, as per Section 2 of this previous blog post.

    Let {d, K, G, x} be as in Theorem 13. The space {K} is a {|G|}-dimensional vector space over {{\bf Q}}, and so we may (non-canonically) identify {K} with {{\bf Q}^{|G|}}, and {(K {}^* {\bf Z})^d} with {({\bf Q} {}^* {\bf Z})^{|G| d}}. In particular, {x} can be viewed as a {|G|d}-tuple of almost rational integers. It would be helpful if we knew that this tuple was in {{\bf Q}}-general position, as one could then apply Theorem 4 directly to conclude. However, our hypothesis is only the weaker assertion that {x} is in {K}-general position. Nevertheless, we may place {x} in a certain “row echelon form” in terms of {{\bf Q}}-general position parameters, and we will then be able to deduce Theorem 13 from averaging together various applications of Theorem 4, with the precise applications to use being uncovered through linear algebra.

    We turn to the details. Recall that if {V} is an {n}-dimensional vector space over a field {K}, then a flag in {V} is a nested sequence of spaces

    \displaystyle  \{0\} = V_0 \subset V_1 \subset \dots \subset V_n = V

    such that each {V_i} is a {K}-linear subspace of dimension {i}.

    Theorem 14 (Standard row echelon form) Let {K} be a {D}-dimensional field extension of a field {F}, let {W} be a {d}-dimensional {K}-vector space, and let {V} be an {F}-linear subspace of {W} (viewed as an {Dd}-dimensional vector space over {F}). Suppose also that the {F}-dimension of {\pi(V)} is at least {s} for every non-zero {K}-linear functional {\pi: W \rightarrow K} and some {s \geq 0}. Then one may find a flag

    \displaystyle  \{0\} = W_0 \subset W_1 \subset \dots \subset W_d = W

    of {K}-subspaces of {K^d}, as well as natural numbers

    \displaystyle  D \geq r_1 \geq \dots \geq r_d \geq s

    such that {(V \cap W_i) / W_{i-1}} has {F}-dimension {r_i} for {i=1,\dots,d}.

    Proof: The case {d=0} is trivial, so assume inductively that {d \geq 1} and that the claim has already been proven for {d-1}. Let {\pi: W \rightarrow K} be a non-zero {K}-linear functional which minimises the {F}-dimension of the space {\pi(V)}; if we denote the dimension of this space by {r_d}, then {s \leq r_d \leq D}. We set {W_d := W}, and let {W_{d-1}} be the kernel of {\pi}, then {W_{d-1}} is a {K}-hyperplane in {W}. We claim that the {F}-dimension of {\pi'(V \cap W_{d-1})} is at least {r_d} for any non-zero {\pi': W_{d-1} \rightarrow K}, for if this were not the case, one could find an extension {\tilde \pi: W \rightarrow K} of {\pi'} to {W} which annihilated a section of {\pi(V)} in {V}, so that {\tilde \pi(V)= \pi'(V \cap W_{d-1})}, contradicting the minimality of {r_d}. We may thus apply the induction hypothesis to find a flag

    \displaystyle  \{0\} = W_0 \subset W_1 \subset \dots \subset W_{d-1}

    and natural numbers

    \displaystyle  D \geq r_1 \geq \dots \geq r_{d-1} \geq r_d

    such that such that {(V \cap W_i) / W_{i-1}} has {F}-dimension {r_i} for {i=1,\dots,d-1}. The claim follows. \Box

    Corollary 15 Let {K} be a degree {D} extension of {{\bf Q}}, and let {x \in (K {}^* {\bf Z})^d} be in {K}-general position. Then one find a {{\bf Q}}-subspace {V} of {K^d} such that {x} is an element of the {{}^*{\bf Z}} module {{}^* {\bf Z} V = \{ n_1 v_1 + \dots + n_l v_l: n_1,\dots,n_l \in {\bf Q} {}^* {\bf Z} \}}, where {v_1,\dots,v_l} is an arbitrary {{\bf Q}}-basis of {V}. Furthermore, {x} is a {{\bf Q}}-generic point of {V}, in the sense that for any {{\bf Q}}-basis {v_1,\dots,v_l} of {V}, one has {x = n_1 v_1 + \dots + n_l v_l} with {(n_1,\dots,n_l)} in {{\bf Q}}-general position. (Equivalently, {x} does not lie in {{}^*{\bf Z} W} for any proper {{\bf Q}}-subspace {W} of {V}.)

    Furthermore, one may find linearly independent {K}-linear forms {L_1,\dots,L_d: K^d \rightarrow K} and natural numbers

    \displaystyle  D \geq r_1 \geq \dots \geq r_d \geq 1

    such that for each {1 \leq i \leq d}, the space

    \displaystyle  V'_i := L_i( V \cap \hbox{ker}(L_{i+1}) \cap \dots \cap \hbox{ker}(L_d) ) \ \ \ \ \ (11)

    (which is a {{\bf Q}}-subspace of {K}) has {{\bf Q}}-dimension {r_i}.

    This corollary is a sort of “regularity lemma” for points in {{}^* {\bf Z} V}; this is perhaps the component of the argument which is the most difficult to convert into a standard setting.

    Proof: Let {V} denote the intersection of all the {{\bf Q}}-subspaces {U} of {K^d} such that {x \in {}^* {\bf Z} U}. Then {V} is itself a {{\bf Q}}-subspace of {K^d} with {x \in {}^* {\bf Z} V}; one can think of {V} as the {K}-linear analogue of a “Zariski closure” of {x}. If {v_1,\dots,v_l} is a basis of {V}, then we have {x = n_1 v_1 + \dots + n_l v_l} for some {n_1,\dots,n_l \in {\bf Q} {}^* {\bf Z}}; if {(n_1,\dots,n_l)} is not in {{\bf Q}}-general position, then we have a non-trivial dependence {c_1 n_1 + \dots + c_l n_l = 0} for some {c_1,\dots,c_l \in {\bf Q}}, not all zero, and we may use this to place {x} in {{}^* {\bf Z} U} for some proper {{\bf Q}}-subspace {U} of {V}, a contradiction. Thus {x} is a {{\bf Q}}-generic point in {{}^* {\bf Z} V}.

    Since {x} is in {K}-general position, {\pi(V)} is non-trivial for any {K}-linear form {\pi: K^d \rightarrow K}. From Theorem 14 we may then find a flag

    \displaystyle  \{0\} = W_0 \subset W_1 \subset \dots \subset W_d = K^d

    of {K}-subspaces of {K^d}, as well as natural numbers

    \displaystyle  D \geq r_1 \geq \dots \geq r_d \geq 1

    such that {(V \cap W_i) / W_{i-1}} has {{\bf Q}}-dimension {r_i} for {i=1,\dots,d}. If, for each {i=1,\dots,d}, we let {L_i: K^d \rightarrow K} be a {K}-linear form that annihilates {W_{i-1}} but does not annihilate {W_i}, we obtain the claim. \Box

    Next, we need a classical lemma on intersection of flags.

    Lemma 16 (Schubert cell decomposition) Let

    \displaystyle  \{0\} = V_0 \subset V_1 \subset \dots \subset V_n = V

    and

    \displaystyle  \{0\} = W_0 \subset W_1 \subset \dots \subset W_n = V

    be two flags on the same {n}-dimensional vector space {V} over a field {F}. Then one can find a basis {e_1,\dots,e_n} of {F} and a permutation {\rho: \{1,\dots,n\} \rightarrow \{1,\dots,n\}} such that {e_i \in V_i \cap W_{\rho(i)}} for all {i=1,\dots,n}.

    Proof: For any {1 \leq i \leq d}, the function {j \mapsto \hbox{dim}(V_i \cap W_j) - \hbox{dim}(V_{i-1} \cap W_j)} is non-decreasing in {j}, is equal to {0} when {j=0}, and {1} when {j=d}, thus there exists a unique {1 \leq \rho(i) \leq d} such that this function equals {1} for {j \geq \rho(i)} and {0} for {j < \rho(i)}. By inspecting the quantities {i \mapsto \hbox{dim}(V_i \cap W_{j-1}) - \hbox{dim}(V_i \cap W_j)} for a given {j}, which similarly increase from {0} to {1}, we see that there is at most one {i} with {\rho(i)=j}, thus {\rho} is a permutation. If we then choose {e_i} to be an element of {V_i \cap W_{\rho(i)}} that lies outside of {V_{i-1}}, we obtain the claim. \Box

    Let {d, K, G, L_{\sigma,i}, x} be as in Theorem 13, and let {L_1,\dots,L_d} and {r_1,\dots,r_d} be as in Corollary 15, with {D=|G|} being the degree of {K}. By permuting the {L_{\sigma,i}}, we may assume that {|\sigma(L_{\sigma,i}(x))|} is non-decreasing in {i}.

    For each {\sigma \in G}, we apply Lemma 16 to the flags

    \displaystyle  \{0\} \subset \hbox{span}(L_d) \subset \hbox{span}(L_d,L_{d-1}) \subset \dots \subset (\bar{{\bf Q}}^d)^*

    and

    \displaystyle  \{0\} \subset \hbox{span}(L_{\sigma,1}) \subset \hbox{span}(L_{\sigma,1},L_{\sigma,2}) \subset \dots \subset (\bar{{\bf Q}}^d)^*

    (with the span being over {\bar{{\bf Q}}}, and with the {L_i} extened from {K}-linear forms to {\bar{{\bf Q}}}-linear forms) we may find a {\bar{{\bf Q}}}-basis {L'_{\sigma,1},\dots,L'_{\sigma,d}} of the {\bar{{\bf Q}}}-linear forms {(\bar{{\bf Q}}^d)^*} on {\bar{{\bf Q}}^d} and a permutation {\rho_\sigma: \{1,\dots,d\} \rightarrow \{1,\dots,d\}} such that for each {1 \leq i \leq d}, {L'_{\sigma,i}} is a {\bar{{\bf Q}}}-linear combination of {L_d,\ldots,L_{d+1-i}}, and is also a {K}-linear combination of {L_{\sigma,1},\dots,L_{\sigma,\rho_\sigma(i)}}. From the latter claim, the non-decreasing nature of {|\sigma(L_{\sigma,i}(x))|}, and the triangle inequality we have

    \displaystyle  \prod_{i=1}^d |L'_{\sigma,i}(\sigma(x))| \ll \prod_{i=1}^d |\sigma(L_{\sigma,i}(x))|.

    It will now suffice to show that

    \displaystyle  \prod_{\sigma \in G} \prod_{i=1}^d |\sigma(L'_{\sigma,i}(x))| \gg \|x\|_\infty^{-o(1)}. \ \ \ \ \ (12)

    We will prove the inequalities

    \displaystyle  \prod_{\sigma \in G} \prod_{i=1}^j |\sigma(L'_{\sigma,i}(x))|^{r_{d+1-i}/|G|} \gg \|x\|_\infty^{-o(1)}; \ \ \ \ \ (13)

    for all {1 \leq j \leq d}; since the standard quantities {|G|/r_{d+1-i}} are finite and non-increasing in {i}, the claim (12) follows from taking a telescoping product of powers of (13).

    To prove (13), we need a classical lemma concerning the canonical embedding of {K} into {K^G} via the Galois group action.

    Lemma 17 The space {\{ (\sigma(x))_{\sigma \in G}: x \in K \}} spans {K^G} as a {K}-linear space.

    Proof: We give a nonstandard analysis proof. If this were not the case, then we would have coefficients {c_\sigma \in K} for {\sigma \in G}, not all zero, such that

    \displaystyle  \sum_{\sigma \in G} c_\sigma \sigma(x) = 0

    for all {x \in K}, and hence also for all {x \in K {}^* {\bf Z}}. Now let {\delta_\sigma > 0} be nonstandard real numbers for {\sigma \in G} such that {\prod_{\sigma \in G} \delta_\sigma = 1}, and such that no two {\delta_\sigma} are comparable. From the Dirichlet approximation theorem argument given at the start of this post, we can find a non-zero {x \in K {}^* {\bf Z}} such that {|\sigma(x)| \ll \delta_\sigma}. On the other hand, the norm {\prod_{\sigma \in G} \sigma(x)} is a non-zero rational integer and so {\prod_{\sigma \in G} |\sigma(x)| \gg 1}. Combining the two estimates, we conclude that {|\sigma(x)| \sim \delta_\sigma} for all {\sigma \in G}. But as the {\delta_\sigma} are incomparable, this forces the {\sigma(x)} to be linearly independent over {K}, a contradiction. \Box

    Corollary 18 If {V} is a {{\bf Q}}-subspace of {K} of dimension {r}, then there exist distinct elements {\sigma_1,\dots,\sigma_r} of {G} such that the {{\bf Q}}-linear functions {x \mapsto \sigma_i(x)} for {i=1,\dots,r} are {{\bf Q}}-linearly independent on {V}.

    Proof: Let {e_1,\dots,e_r} be a {{\bf Q}}-basis for {V}, which we complete to a {{\bf Q}}-basis {e_1,\dots,e_D} for {K}. By Lemma 17, the vectors {(\sigma(e_i))_{\sigma \in G}} for {i=1,\dots,D} span {K^G} as a {K}-linear space, and are thus {K}-linearly independent. In particular, the matrix {(\sigma(e_i))_{\sigma \in G; 1 \leq i \leq r}} has full rank, and so has a minor {(\sigma_j(e_i))_{1 \leq i,j \leq r}} which is non-singular, giving the claim. \Box

    Now we can prove (13) for a given {1 \leq j \leq d}. For each {1 \leq i \leq j}, we apply Corollary 18 to the space {V'_i} defined in (11) to find distinct elements {\sigma_{i,1},\dots,\sigma_{i,r_{d+1-i}}} of {G} such that the forms {x \mapsto \sigma_{i,l}(x)} for {l=1,\dots,r_{d+1-i}} are {{\bf Q}}-linearly independent on {V'_i}; applying an arbitrary element {\mu} of {G}, we also see that the forms {x \mapsto \mu \sigma_{i,l}(x)} for {l=1,\dots,r_{d+1-i}} are {{\bf Q}}-linearly independent on {V'_i}. We will show that

    \displaystyle  \prod_{i=1}^j \prod_{l=1}^{r_{d+1-i}} |\mu\sigma_{i,l}(L'_{\mu \sigma_{i,l},i}(x))| \gg \|x\|_\infty^{-o(1)}; \ \ \ \ \ (14)

    taking the geometric mean over all {\mu \in G}, we obtain (13).

    It remains to prove (14). By construction of the {L'_{\sigma,i}}, we know that the {L'_{\mu \sigma_{i,l},i}(x)} is a linear combination of the tuple

    \displaystyle  x_j := (L_d(x), \dots, L_{d+1-j}(x)) \in (K {}^* {\bf Z})^j.

    This tuple lies in {{}^*{\bf Z} V_j}, where {V_j \subset K^j} is the image of {V} under the map {x \mapsto (L_d(x), \dots, L_{d+1-j}(x))}. By Corollary 15, {V_j} is a {{\bf Q}}-vector space of dimension {r_d + \dots + r_{d+1-j}}, and {x_j} is a {{\bf Q}}-generic point of {V_j}. By Theorem 4 (and the subsequent remarks), we will be able to conclude (14) as soon as we show that the {r_d + \dots + r_{d+1-j}} expressions {\mu\sigma_{i,l}(L'_{\mu \sigma_{i,l},i}(x))} for {i=1,\dots,j} and {l=1,\dots,r_{d+1-i}} are {\bar{{\bf Q}}}-linearly independent. Suppose for contradiction that one has a non-trivial linear dependence

    \displaystyle  \sum_{i=1}^j \sum_{l=1}^{r_{d+1-i}} c_{i,l} \mu\sigma_{i,l}(L'_{\mu \sigma_{i,l},i}(x)) = 0

    for some {c_{i,l} \in \bar{{\bf Q}}}, not all zero. Let {i_*} be the largest value of {i=1,\dots,j} for which there is a non-zero value of {c_{i,l}}. Each {L'_{\mu \sigma_{i_*,l},i_*}(x)} is a {\bar{{\bf Q}}}-linear combination of {L_d(x),\dots,L_{d+1-i_*}(x)}, with the coefficient of {L_{d+1-i_*}(x)} being non-zero (otherwise the {L'_{\sigma,i}} could not be linearly independent in {i}). We conclude that there is a non-trivial {\bar{{\bf Q}}}-linear combination of the {\mu \sigma_{i_*,l}( L_{d+1-i_*}(x) )} for {l=1,\dots,r_{d+1-i_*}}, which is equal to a {\bar{{\bf Q}}}-linear combination of the {\sigma(L_{d+1-i}(x))} for {i=1,\dots,i_*-1} and {\sigma \in G}. But as {x_j} is a {{\bf Q}}-generic point in {V_j}, this can only happen if the corresponding linear combination of the forms {\mu \sigma_{i_*,l}( L_{d+1-i_*})} and {\sigma(L_{d+1-i})} vanishes on {V_j} (with the {L_{d+1-i}} now interpreted as coordinate functions on {V_j}). This implies that the forms {x \mapsto \mu \sigma_{i_*,l}} are {{\bf Q}}-linearly dependent on {V'_{i_*}}, giving the required contradiction. This proves (14), and Theorem 13 follows.

    — 5. The flag structure of the magnitude of linear forms —

    Although we will not need this fact here, it is interesting to note that the subspace theorem (Theorem 4) gives rather precise control on the size of various linear combinations of algebraic numbers.

    Theorem 19 (Schmidt subspace theorem, flag version) Let {d} be a standard natural number, and let {x \in {}^* {\bf Z}^d} be in {{\bf Q}}-general position. Then one has a standard flag

    \displaystyle  \{0\} = V_0 \subset V_1 \subset \dots \subset V_d = (\bar{{\bf Q}}^d)^*

    in the space {(\bar{{\bf Q}}^d)^*} of linear forms {L: (\bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}}, as well as standard real numbers

    \displaystyle  -\infty < \lambda_1 \leq \dots \leq \lambda_d = 1

    with

    \displaystyle  \lambda_1 + \dots + \lambda_d \geq 0 \ \ \ \ \ (15)

    such that

    \displaystyle  |L(x)| = \|x\|^{\lambda_i + o(1)} \ \ \ \ \ (16)

    whenever {i=1,\dots,d} and {L \in V_i \backslash V_{i-1}}.

    For instance, if {x_1, x_2 \in {}^* {\bf Z}} are linearly independent over {{\bf Q}}, then Theorem 19 asserts that

    \displaystyle  | \alpha x_1 + \beta x_2 | = \|(x_1,x_2)\|^{o(1)}

    for all {(\alpha,\beta) \in \bar{{\bf Q}}^2} outside of a one-dimensional subspace {V_1} of {\bar{{\bf Q}}^2}, and we have

    \displaystyle  | \alpha x_1 + \beta x_2 | = \|(x_1,x_2)\|^{\lambda_1 + o(1)}

    for all non-zero {(\alpha,\beta) \in V_1} and some {\lambda_1 \geq -1}. One can use Theorem 13 obtain a similar description of the magnitude of {K}-linear forms (and their Galois conjugates) of a point {x \in (K {}^* {\bf Z})^d} in {K}-general position, but we will not do so here.

    Proof: For each standard {\lambda \in {\bf R}}, the space

    \displaystyle  V(\lambda) := \{ L \in (\bar{{\bf Q}}^d)^*: |L(x)| \leq \|x\|^{\lambda+o(1)} \}

    is a {\bar{{\bf Q}}}-linear subspace of {(\bar{{\bf Q}}^d)^*}, which is non-decreasing and right-continuous in {\lambda}, and equals all of {(\bar{{\bf Q}}^d)^*} when {\lambda \geq 1}. As subspaces of {(\bar{{\bf Q}}^d)^*} must have an integer dimension between {0} and {d}, we may thus find a complete flag

    \displaystyle  \{0\} = V_0 \subset V_1 \subset \dots \subset V_d = (\bar{{\bf Q}}^d)^*

    and extended real numbers

    \displaystyle  -\infty = \lambda_0 \leq \lambda_1 \leq \dots \leq \lambda_d \leq 1 \leq \lambda_{d+1} = +\infty

    such that {V(\lambda) = V_i} whenever {1 \leq i \leq d} and {\lambda_i \leq \lambda < \lambda_{i+1}}. Selecting {L_i} to be an element of {V_i \backslash V_{i-1}} for {i=1,\dots,d} and applying Theorem 4, we obtain (15), which in particular implies that the {\lambda_1,\dots,\lambda_d} are finite, at which point (16) follows from the definition of the {V(\lambda)}. Finally, by taking the coordinate linear forms we see that {|L(x)| \gg \|x\|} for at least one {L}, which forces {\lambda_d=1}. \Box

    Exercise 5 Try to state and prove a standard version of Theorem 19. (This is remarkably tricky – it takes the form of a “regularity lemma” in the spirit of Section 2 of this blog post – and may help to illustrate the power of the nonstandard formalism.)