I had occasion recently to look up the proof of Hilbert’s nullstellensatz, which I haven’t studied since cramming for my algebra qualifying exam as a graduate student. I was a little unsatisfied with the proofs I was able to locate – they were fairly abstract and used a certain amount of algebraic machinery, which I was terribly rusty on – so, as an exercise, I tried to find a more computational proof that avoided as much abstract machinery as possible. I found a proof which used only the extended Euclidean algorithm and high school algebra, together with an induction on dimension and the obvious observation that any non-zero polynomial of one variable on an algebraically closed field has at least one non-root. It probably isn’t new (in particular, it might be related to the standard model-theoretic proof of the nullstellensatz, with the Euclidean algorithm and high school algebra taking the place of quantifier elimination), but I thought I’d share it here anyway.

Throughout this post, F is going to be a fixed algebraically closed field (e.g. the complex numbers {\Bbb C}). I’d like to phrase the nullstellensatz in a fairly concrete fashion, in terms of the problem of solving a set of simultaneous polynomial equations P_1(x) = \ldots = P_m(x) = 0 in several variables x = (x_1,\ldots,x_d) \in F^d over F, thus P_1,\ldots,P_m \in F[x] are polynomials in d variables. One obvious obstruction to solvability of this system is if the equations one is trying to solve are inconsistent in the sense that they can be used to imply 1=0. In particular, if one can find polynomials Q_1,\ldots,Q_m \in F[x] such that P_1 Q_1 + \ldots + P_m Q_m = 1, then clearly one cannot solve P_1(x)=\ldots=P_m(x)=0. The weak nullstellensatz asserts that this is, in fact, the only obstruction:

Weak nullstellensatz. Let P_1,\ldots,P_m \in F[x] be polynomials. Then exactly one of the following statements holds:

  1. The system of equations P_1(x)=\ldots=P_m(x)=0 has a solution x \in F^d.
  2. There exist polynomials Q_1,\ldots,Q_m \in F[x] such that P_1 Q_1 + \ldots + P_m Q_m = 1.

Note that the hypothesis that F is algebraically closed is crucial; for instance, if F is the reals, then the equation x^2+1=0 has no solution, but there is no polynomial Q(x) such that (x^2+1) Q(x) = 1.

Like many results of the “The only obstructions are the obvious obstructions” type, the power of the nullstellensatz lies in the ability to take a hypothesis about non-existence (in this case, non-existence of solutions to P_1(x)=\ldots=P_m(x)=0) and deduce a conclusion about existence (in this case, existence of Q_1,\ldots,Q_m such that P_1 Q_1 + \ldots + P_m Q_m = 1). The ability to get “something from nothing” is clearly going to be both non-trivial and useful. In particular, the nullstellensatz offers an important correspondence between algebraic geometry (the conclusion 1 is an assertion that a certain algebraic variety is empty) and commutative algebra (the conclusion 2 is an assertion that a certain ideal is non-proper).

Now suppose one is trying to solve the more complicated system P_1(x)=\ldots=P_d(x)=0; R(x) \neq 0 for some polynomials P_1,\ldots,P_d, R. Again, any identity of the form P_1 Q_1 + \ldots + P_m Q_m = 1 will be an obstruction to solvability, but now more obstructions are possible: any identity of the form P_1 Q_1 + \ldots + P_m Q_m = R^r for some non-negative integer r will also obstruct solvability. The strong nullstellensatz asserts that this is the only obstruction:

Strong nullstellensatz. Let P_1,\ldots,P_m, R \in F[x] be polynomials. Then exactly one of the following statements holds:

  1. The system of equations P_1(x)=\ldots=P_m(x)=0, R(x) \neq 0 has a solution x \in F^d.
  2. There exist polynomials Q_1,\ldots,Q_m \in F[x] and a non-negative integer r such that P_1 Q_1 + \ldots + P_m Q_m = R^r.

Of course, the weak nullstellensatz corresponds to the special case in which R=1. The strong nullstellensatz is usually phrased instead in terms of ideals and radicals, but the above formulation is easily shown to be equivalent to the usual version (modulo Hilbert’s basis theorem).

One could consider generalising the nullstellensatz a little further by considering systems of the form P_1(x)=\ldots=P_m(x)=0, R_1(x),\ldots,R_n(x) \neq 0, but this is not a significant generalisation, since all the inequations R_1(x) \neq 0, \ldots, R_n(x) \neq 0 can be concatenated into a single inequation R_1(x) \ldots R_n(x) \neq 0. The presence of the exponent r in conclusion (2) is a little annoying; to get rid of it, one needs to generalise the notion of an algebraic variety to that of a scheme (which is worth doing for several other reasons too, in particular one can now work over much more general objects than just algebraically closed fields), but that is a whole story in itself (and one that I am not really qualified to tell).

[Update, Nov 26: It turns out that my approach is more complicated than I first thought, and so I had to revise the proof quite a bit to fix a certain gap, in particular making it significantly messier than my first version. On the plus side, I was able to at least eliminate any appeal to Hilbert's basis theorem, so in particular the proof is now manifestly effective (but with terrible bounds). In any case, I am keeping the argument here in case it has some interest.]

– The base case d=1

I had initially posted here an attempted proof of the weak nullstellensatz by induction on dimension, and then deduced the strong nullstellensatz from the weak via a lifting trick of Zariski (the key observation being that the inequation x \neq 0 was equivalent to the solvability of the equation xy-1 = 0 for some y). But I realised that my proof of the weak nullstellensatz was incorrect (it required the strong nullstellensatz in one lower dimension as the inductive hypothesis), and so now I am proceeding by establishing the strong nullstellensatz directly.

We shall induct on the dimension d (i.e. the number of variables in the system of equations).

The case d=0 is trivial, so we use the d=1 case as the base case, thus P_1, \ldots, P_m, R are all polynomials of one variable. This case follows easily from the fundamental theorem of algebra, but it will be important for later purposes to instead use a more algorithmic proof in which the coefficients of the polynomials Q_1,\ldots,Q_m required for (2) are obtained from the coefficients of P_1, \ldots, P_m, R in an explicitly computable fashion (using only the operations of addition, subtraction, multiplication, division, and branching on whether a given field element is zero or non-zero). In particular, one does not need to locate roots of polynomials in order to construct Q_1,\ldots,Q_m (although one will of course need to do so to locate a solution x to (1)). [It is likely that one could get these sorts of computability properties on Q_1,\ldots,Q_m "for free" from Galois theory, but I have not attempted to do so.] It is however instructive to secretly apply the fundamental theorem of algebra throughout the proof which follows, to clarify what is going on.

Let us say that a collection (P_1,\ldots,P_m; R) of polynomials obeys the nullstellensatz if at least one of (1) and (2) is true. It is clear that (1) and (2) cannot both be true, so to prove the nullstellensatz it suffices to show that every collection (P_1,\ldots,P_m; R) obeys the nullstellensatz.

We can of course throw away any of the P_i that are identically zero, as this does not affect whether (P_1,\ldots,P_m;R) obeys the nullstellensatz. If none of the P_i remain, then we are in case (1), because the polynomial R has at most finitely many zeroes, and because an algebraically closed field must be infinite. So suppose that we have some non-zero P_i. We repeatedly use the extended Euclidean algorithm to locate the greatest common divisor D(x) of the remaining P_i. Note that this algorithm automatically supplies for us some polynomials Q_1(x),\ldots,Q_m(x) such that

P_1(x) Q_1(x) + \ldots + P_m(x) Q_m(x) = D(x).

Because of this, we see that (P_1,\ldots,P_m;R) obeys the nullstellensatz if and only if (D;R) obeys the nullstellensatz. So we have effectively reduced to the case m=1.

Now we apply the extended Euclidean algorithm again, this time to D and R, to express the gcd D’ of D and R as a combination D’ = D A + R B, and also to factor D = D’ S and R = D’ T for some polynomials A, B, S, T with AS + BT = 1. A little algebra then shows that one has a solution to the problem

D(x) = 0; R(x) \neq 0

whenever one has a solution to the problem

S(x) = 0; D'(x) \neq 0.

Also, if some power of D’ is a multiple of S, then some power of R is a multiple of D. Thus we see that if (S; D’) obeys the nullstellensatz, then (D; R) does also. But we see that the net degree of S and D’ is less than the net degree of D and R unless R is constant, so by infinite descent we may reduce to that case. If R is zero then we are clearly in case (2), so we may assume R is non-zero. If D is constant then we are again in case (2), so assume that D is non-constant. But then as the field is algebraically closed, D has at least one root, and so we are in case (1). This completes the proof of the d=1 case.

For the inductive step, it is important to remark that the above proof is algorithmic in the sense that a computer which was given the coefficients for P_1,\ldots,P_m,R as inputs could apply a finite number of arithmetic operations (addition, subtraction, multiplication, division), as well as a finite number of branching operations based on whether a given variable was zero or non-zero, in order to output either

  1. the coefficients of a non-constant polynomial D with the property that any root x of D would solve (1);
  2. the coefficients of a non-zero polynomial R with the property that any non-root x of R would solve (1); or
  3. the coefficients of polynomials Q_1,\ldots,Q_m which solved (2) for some specific r.

In most cases, the number of branching operations is rather large (see for instance the example of solving two linear equations below). There is however one simple case in which only one branching is involved, namely when m=2, R=1, and P_1, P_2 are monic. In this case, we have an identity of the form

P_1 S_1 + P_2 S_2 = \hbox{Res}(P_1,P_2)

where S_1,S_2 are polynomials (whose coefficients are polynomial combinations of the coefficients of P_1 and P_2 and \hbox{Res}(P_1,P_2) \in F is the resultant of P_1 and P_2, which is another polynomial combination of the coefficients of P_1 and P_2. If the resultant is non-zero then we have

\frac{1}{\hbox{Res}(P_1,P_2)} S_1 P_1 + \frac{1}{\hbox{Res}(P_1,P_2)} S_2 P_2  = 1

and so the system is unsolvable (we are in case (2)); otherwise, the system is solvable.

– The inductive case d>1

Now we do the inductive case, when d \geq 2 and the claim has already been proven for d-1. The basic idea is to view the system (1) not as a system of equations in d unknowns, but as a d-1-dimensional family of systems in one unknown. We will then apply the d=1 theory to each system in that family and use the algorithmic nature of that theory to glue everything together properly.

We write the variable x \in F^d as x = (y, t) for y \in F^{d-1} and t \in F. The ring F[x] of polynomials in d variables can thus be viewed as a ring F[y][t] of polynomials in one variable t, in which the coefficients lie in the ring F[y].

Let I be the ideal in F[x] generated by P_1,\ldots,P_m. We either need to solve the system

P_1(y,t) = \ldots = P_m(y,t) = 0; R(y,t) \neq 0 (1)

or show that

R^r = 0 \hbox{ mod } I for some r. (2)

We assume that no solution to (1) exists, and use this to synthesise a relation of the form (2).

Let y \in F^{d-1} be arbitrary. We can view the polynomials P_1(y,t),\ldots,P_m(y,t), R(y,t) as polynomials in F[t], whose coefficients lie in F but happen to depend in a polynomial fashion on y. To emphasise this, we write P_{j,y}(t) for P_j(y,t) and R_{y}(t) for R(y,t). Then by hypothesis, there is no t for which

P_{1,y}(t) = \ldots = P_{m,y}(t) = 0; R_{y}(t) \neq 0.

To motivate the strategy, let us consider the easy case when R = 1, m=2, and P_1, P_2 are monic polynomials in t. Then by our previous discussion, the above system is solvable for any fixed y precisely when \hbox{Res}(P_{1,y}, P_{2,y}) is zero. So either the equation \hbox{Res}(P_{1,y}, P_{2,y}) = 0 has a solution, in which case we are in case (1), or it does not. But in the latter case, by applying the nullstellensatz at one lower dimension we see that \hbox{Res}(P_{1,y}, P_{2,y}) must be constant in y. But recall that the resultant is a linear combination P_{1,y} S_{1,y} + P_{2,y} S_{2,y} of P_{1,y} and P_{2,y}, where the polynomials S_{1,y} and S_{2,y} depend polynomially on P_{1,y} and P_{2,y} and thus on y itself. Thus we end up in case (2), and the induction closes in this case.

Now we turn to the general case. Applying the d=1 analysis, we conclude that there exist polynomials Q_{1,y},\ldots,Q_{m,y} \in F[t] of t, and an r = r_{y} \geq 0, such that

P_{1,y}(t) Q_{1,y}(t) + \ldots + P_{m,y}(t) Q_{m,y}(t) = R^{r_{y}}_{y}(t). (3)

Now, if the exponent r_{y} was constant in y, and the coefficients of Q_{1,y},\ldots,Q_{m,y} depended polynomially on y, we would be in case (2) and therefore done.

It is not difficult to make r_{y} constant in y. Indeed, we observe that the degrees of P_{1,y}(t),\ldots,P_{m,y}(t) are bounded uniformly in y. Inspecting the d=1 analysis, we conclude that the exponent r_{y} returned by that algorithm is then also bounded uniformly in y. We can always raise the value of r_{y} by multiplying both sides of (3) by R_{y}, and so we can make r = r_{y} independent of y, thus

P_1(y,t)  Q_{1,y}(t) + \ldots + P_m(y,t) Q_{m,y}(t) = R^r(y,t). (4)

Now we need to work on the Q’s. Unfortunately, the coefficients on Q are not polynomial in y; instead, they are piecewise rational in y. Indeed, by inspecting the algorithm used to prove the d=1 case, we see that the algorithm makes a finite number of branches, depending on whether certain polynomial expressions T(y) of y are zero or non-zero. At the end of each branching path, the algorithm returns polynomials Q_{1,y},\ldots,Q_{m,y} whose coefficients were rational combinations of the coefficients of P_{1,y},\ldots,P_{m,y} and are thus rational functions of x. Furthermore, all the division operations are by polynomials T(y) which were guaranteed to be non-zero by some stage of the branching process, and so the net denominator of any of these coefficients is some product of the T(y) that are guaranteed non-zero.

An example might help illustrate what’s going on here. Suppose m=2, that R = 1, and P_1(y,t), P_2(y,t) is linear in t, thus

P_{1,y}(t) = a(y) + t b(y); P_{2,y}(t) = c(y) + t d(y)

for some polynomials a,b,c,d \in F[y]. To find the gcd of P_{1,y} and P_{2,y} for a given y, which determines the solvability of the system P_{1,y}(t)=P_{2,y}(t)=0, the Euclidean algorithm branches as follows:

  1. If b(y) is zero, then
    1. If a(y) is zero, then
      1. If d(y) is non-zero, then 0  P_{1,y} + \frac{1}{d(y)} P_{2,y} is the gcd (and the system is solvable).
      2. Otherwise, if d(y) is zero, then
        1. If c(y) is non-zero, then 0 P_{1,y} + \frac{1}{c(y)} P_{2,y} = 1 is the gcd (and the system is unsolvable).
        2. Otherwise, if c(y) is zero, then 0 P_{1,y} + 0 P_{2,y} is the gcd (and the system is solvable).
    2. Otherwise, if a(y) is non-zero, then \frac{1}{a(y)} P_{1,y} + 0 P_{2,y} = 1 is the gcd (and the system is unsolvable).
  2. Otherwise, if b(y) is non-zero, then
    1. If a(y)d(y)-b(y)c(y) is non-zero, then \frac{d(y)}{a(y) d(y)-b(y)c(y)} P_{1,y} - \frac{b(y)}{a(y) d(y)-b(y)c(y)} P_{2,y} = 1 is the gcd (and the system is unsolvable).
    2. Otherwise, if a(y)d(y)-b(y)c(y) is zero, then \frac{1}{b(y)} P_{1,y} + 0 P_{2,y} is the gcd (and the system is solvable).

So we see that even in the rather simple case of solving two linear equations in one unknown, there is a moderately complicated branching tree involved. Nevertheless, there are only finitely many branching paths. Some of these paths may be infeasible, in the sense that there do not exist any y \in F^{d-1} which can follow these paths. But given any feasible path, say one in which the polynomials S_1(y),\ldots,S_a(y) are observed to be zero, and T_1(y),\ldots,T_b(y) are observed to be non-zero, we know (since we are assuming no solution to (1)) that the algorithm creates an identity of the form (4) in which the coefficients of Q_{1,y},\ldots,Q_{m,y} are rational polynomials in y, whose denominators are products of T_1,\ldots,T_b. We may thus clear denominators (enlarging r if necessary) and obtain an identity of the form

P_1(y,t) U_1(y,t) + \ldots + P_m(y,t) U_m(y,t) = (T_1(y) \ldots T_b(y) R(y))^r (5)

for some polynomials U_1,\ldots,U_m. This identity holds whenever y is such that S_1(y),\ldots,S_a(y) are zero and T_1(y),\ldots,T_b(y) are non-zero. But an inspection of the algorithm shows that the only reason we needed T_1(y),\ldots,T_b(y) to be non-zero was in order to divide by these numbers; if we clear denominators throughout, we thus see that we can remove these constraints and deduce that (5) holds whenever S_1(y),\ldots,S_a(y) are zero. Further inspection of the algorithm then shows that even if S_1(y),\ldots,S_a(y) are non-zero, this only introduces additional terms to (5) which are combinations (over F[y,t]) of S_1, \ldots, S_a. Thus, for any feasible path, we obtain an identity in F[y,t] of the form

P_1 U_1 + \ldots + P_m U_m = (T_1 \ldots T_b R)^r + S_1 V_1 + \ldots + S_a V_a

for some polynomials U_1,\ldots,U_m,V_1,\ldots,V_a \in F[y,t]. In other words, we see that

(T_1 \ldots T_b R)^r = 0 \hbox{ mod } I, S_1, \ldots, S_a (5)

for any feasible path.

Now what we need to do is fold up the branching tree and simplify the relations (5) until we obtain (2). More precisely, we claim that (5) holds (for some r) not only for complete feasible paths (in which we follow the branching tree all the way to the end), but for partial feasible paths, in which we branch some of the way and then stop in a place where at least one y \in F^{d-1} can solve all the constraints branched on so far. In particular, the empty feasible path will then give (2).

To prove this claim, we induct backwards on the length of the partial path. So suppose we have some partial feasible path, which required S_1(y),\ldots,S_a(y) to be zero and T_1(y),\ldots,T_b(y) to be non-zero in order to get here. If this path is complete, then we are already done, so suppose there is a further branching, say on a polynomial W(y). At least one of the cases W(y)=0 and W(y) \neq 0 must be feasible; and so we now divide into three cases.

Case 1: W(y)=0 is feasible and W(y) \neq 0 is infeasible. If we follow the W(y)=0 path and use the inductive hypothesis, we obtain a constraint

(T_1 \ldots T_b R)^r = 0 \hbox{ mod } I, S_1, \ldots, S_a, W (6)

for some r. On the other hand, since W(y) \neq 0 is infeasible, we see that the problem

S_1(y) = \ldots = S_a(y) = 0; T_1 \ldots T_b W(y) \neq 0

has no solution. Since the nullstellensatz is assumed to hold for dimension d-1, we conclude that

(T_1 \ldots T_b W)^{r'} = 0 \hbox{ mod } S_1,\ldots,S_a.

for some r’. If we then raise (6) to the power r’ by (T_1 \ldots T_b R)^{r'} to eliminate the role of W, we conclude (5) (for rr’+r’) as required.

Case 2: W(y)=0 is infeasible and W(y) \neq 0 is feasible. If we follow the W(y) \neq 0 path, we obtain a constraint

(T_1 \ldots T_b W R)^{r''} = 0 \hbox{ mod } I, S_1, \ldots, S_a (7)

for some r”, while the infeasibility of the W(y)=0 path means that there is no solution to

S_1(y) = \ldots = S_a(y) = W(y) = 0; T_1 \ldots T_b(y) \neq 0

and so by the nullstellensatz in dimension d-1 we have

(T_1 \ldots T_b)^{r'''} = W Z \hbox{ mod } S_1,\ldots,S_a

for some polynomial Z and some r”’. If we then multiply (7) by Z^{r''} to eliminate W, we obtain (5) as desired (for r”+r”’).

Case 3: W(y)=0 and W(y) \neq 0 are both feasible.

In this case we obtain the constraints (6) and (7). We rewrite (6) in the form

(T_1 \ldots T_b R)^{r} = W Z \hbox{ mod } S_1,\ldots,S_a

for some Z, and then multiply (7) by Z^r to eliminate W and obtain (5) as desired (for r + r”).

This inductively establishes (5) for all partial branching paths, leading eventually to (2) as desired.