Classically, the fundamental object of study in algebraic geometry is the solution set

\displaystyle  V = V_{P_1,\ldots,P_m} := \{ (x_1,\ldots,x_d) \in k^d: P_1(x_1,\ldots,x_d) = \ldots \ \ \ \ \ (1)

\displaystyle  = P_m(x_1,\ldots,x_d) = 0 \}

to multiple algebraic equations

\displaystyle  P_1(x_1,\ldots,x_d) = \ldots = P_m(x_1,\ldots,x_d) = 0

in multiple unknowns {x_1,\ldots,x_d} in a field {k}, where the {P_1,\ldots,P_m: k^d \rightarrow k} are polynomials of various degrees {D_1, \ldots, D_m}. We adopt the classical perspective of viewing {V} as a set (and specifically, as an algebraic set), rather than as a scheme. Without loss of generality we may order the degrees in non-increasing order:

\displaystyle  D_1 \geq D_2 \geq \ldots \geq D_m \geq 1.

We can distinguish between the underdetermined case {m < d}, when there are more unknowns than equations; the determined case {m=d} when there are exactly as many unknowns as equations; and the overdetermined case {m>d}, when there are more equations than unknowns.

Experience has shown that the theory of such equations is significantly simpler if one assumes that the underlying field {k} is algebraically closed , and so we shall make this assumption throughout the rest of this post. In particular, this covers the important case when {k={\bf C}} is the field of complex numbers (but it does not cover the case {k={\bf R}} of real numbers – see below).

From the general “soft” theory of algebraic geometry, we know that the algebraic set {V} is a union of finitely many algebraic varieties, each of dimension at least {d-m}, with none of these components contained in any other. In particular, in the underdetermined case {m<d}, there are no zero-dimensional components of {V}, and thus {V} is either empty or infinite.

Now we turn to the determined case {d=m}, where we expect the solution set {V} to be zero-dimensional and thus finite. Here, the basic control on the solution set is given by Bezout’s theorem. In our notation, this theorem states the following:

Theorem 1 (Bezout’s theorem) Let {d=m=2}. If {V} is finite, then it has cardinality at most {D_1 D_2}.

This result can be found in any introductory algebraic geometry textbook; it can for instance be proven using the classical tool of resultants. The solution set {V} will be finite when the two polynomials {P_1,P_2} are coprime, but can (and will) be infinite if {P_1,P_2} share a non-trivial common factor.

By defining the right notion of multiplicity on {V} (and adopting a suitably “scheme-theoretic” viewpoint), and working in projective space rather than affine space, one can make the inequality {|V| \leq D_1 D_2} an equality. However, for many applications (and in particular, for the applications to combinatorial incidence geometry), the upper bound usually suffices.

Bezout’s theorem can be generalised in a number of ways. For instance, the restriction on the finiteness of the solution set {V} can be dropped by restricting attention to {V^0}, the union of the zero-dimensional irreducible components of {V}:

Corollary 2 (Bezout’s theorem, again) Let {d=m=2}. Then {V^0} has cardinality at most {D_1 D_2}.

Proof: We factor {P_1, P_2} into irreducible factors (using unique factorisation of polynomials). By removing repeated factors, we may assume {P_1, P_2} are square-free. We then write {P_1 = Q_1 R}, {P_2 = Q_2 R} where {R} is the greatest common divisor of {P_1,P_2} and {Q_1,Q_2} are coprime. Observe that the zero-dimensional component of {\{ P_1=P_2 = 0\}} is contained in {\{ Q_1 = Q_2 = 0\}}, which is finite from the coprimality of {Q_1,Q_2}. The claim follows. \Box

It is also not difficult to use Bezout’s theorem to handle the overdetermined case {m > d=2} in the plane:

Corollary 3 (Bezout’s theorem, yet again) Let {m \geq d=2}. Then {V^0} has cardinality at most {D_1 D_2}.

Proof: We may assume all the {P_i} are square-free. We write {P_1 = Q_2 R_2}, where {Q_2} is coprime to {P_2} and {R_2} divides {P_2} (and also {P_1}). We then write {R_2 = Q_3 R_3}, where {Q_3} is coprime to {P_3} and {R_3} divides {P_3} (and also {P_1,P_2}). Continuing in this fashion we obtain a factorisation {P_1 = Q_2 Q_3 \ldots Q_m R_m}. One then observes that {V^0} is contained in the set {\bigcup_{i=2}^m \{ Q_i = P_i = 0 \}}, which by Theorem 1 has cardinality at most {\sum_{i=2}^m \hbox{deg}(Q_i) D_i}. Since {D_i \leq D_2} and {\sum_{i=2}^m \hbox{deg}(Q_i) \leq \hbox{deg}(P_1) = D_1}, the claim follows. \Box

Remark 1 Of course, in the overdetermined case {m>d} one generically expects the solution set to be empty, but if there is enough degeneracy or numerical coincidence then non-zero solution sets can occur. In particular, by considering the case when {P_2=\ldots=P_m} and {D_2=\ldots=D_m} we see that the bound {D_1 D_2} can be sharp in some cases. However, one can do a little better in this situation; by decomposing {P_m} into irreducible components, for instance, one can improve the upper bound of {D_1 D_2} slightly to {D_1 D_m}. However, this improvement seems harder to obtain in higher dimensions (see below).

Bezout’s theorem also extends to higher dimensions. Indeed, we have

Theorem 4 (Higher-dimensional Bezout’s theorem) Let {d=m \geq 0}. If {V} is finite, then it has cardinality at most {D_1 \ldots D_d}.

This is a standard fact, and can for instance be proved from the more general and powerful machinery of intersection theory. A typical application of this theorem is to show that, given a degree {D} polynomial {P: {\bf R}^d \rightarrow {\bf R}} over the reals, the number of connected components of {\{ x \in {\bf R}^d: P(x) \neq 0 \}} is {O(D^d)}. The main idea of the proof is to locate a critical point {\nabla P(x) = 0} inside each connected component, and use Bezout’s theorem to count the number of zeroes of the polynomial map {\nabla P: {\bf R}^d \rightarrow {\bf R}^d}. (This doesn’t quite work directly because some components may be unbounded, and because the fibre of {\nabla P} at the origin may contain positive-dimensional components, but one can use truncation and generic perturbation to deal with these issues; see my recent paper with Solymosi for further discussion.)

Bezout’s theorem can be extended to the overdetermined case as before:

Theorem 5 (Bezout’s inequality) Let {m \geq d \geq 0}. Then {V^0} has cardinality at most {D_1 \ldots D_d}.

Remark 2 Theorem 5 ostensibly only controls the zero-dimensional components of {V}, but by throwing in a few generic affine-linear forms to the set of polynomials {P_1,\ldots,P_m} (thus intersecting {V} with a bunch of generic hyperplanes) we can also control the total degree of all the {i}-dimensional components of {V} for any fixed {i}. (Again, by using intersection theory one can get a slightly more precise bound than this, but the proof of that bound is more complicated than the arguments given here.)

This time, though, it is a slightly non-trivial matter to deduce Theorem 5 from Theorem 4, due to the standard difficulty that the intersection of irreducible varieties need not be irreducible (which can be viewed in some ways as the source of many other related difficulties, such as the fact that not every algebraic variety is a complete intersection), and so one cannot evade all irreducibility issues merely by assuming that the original polynomials {P_i} are irreducible. Theorem 5 first appeared explicitly in the work of Heintz.

As before, the most systematic way to establish Theorem 5 is via intersection theory. In this post, though, I would like to give a slightly more elementary argument (essentially due to Schmid), based on generically perturbing the polynomials {P_1,\ldots,P_m} in the problem ; this method is less powerful than the intersection-theoretic methods, which can be used for a wider range of algebraic geometry problems, but suffices for the narrow objective of proving Theorem 5. The argument requires some of the “soft” or “qualitative” theory of algebraic geometry (in particular, one needs to understand the semicontinuity properties of preimages of dominant maps), as well as basic linear algebra. As such, the proof is not completely elementary, but it uses only a small amount of algebraic machinery, and as such I found it easier to understand than the intersection theory arguments.

Theorem 5 is a statement about arbitrary polynomials {P_1,\ldots,P_m}. However, it turns out (in the determined case {m=d}, at least) that upper bounds on {|V^0|} are Zariski closed properties, and so it will suffice to establish this claim for generic polynomials {P_1,\ldots,P_m}. On the other hand, it is possible to use duality to deduce such upper bounds on {|V^0|} from a Zariski open condition, namely that a certain collection of polynomials are linearly independent. As such, to verify the generic case of this open condition, it suffices to establish this condition for a single family of polynomials, such as a family of monomials, in which case the condition can be verified by direct inspection. Thus we see an example of the somewhat strange strategy of establishing the general case from a specific one, using the generic case as an intermediate step.

Remark 3 There is an important caveat to note here, which is that these theorems only hold for algebraically closed fields, and in particular can fail over the reals {{\bf R}}. For instance, in {{\bf R}^3}, the polynomials

\displaystyle  P(x,y,z) = z

\displaystyle  Q(x,y,z) = z

\displaystyle  R(x,y,z) = (\prod_{j=1}^N (x-j))^2 + (\prod_{j=1}^N (y-j))^2

have degrees {1, 1, 2N} respectively, but their common zero locus {\{ (x,y,0): x,y \in \{1,\ldots,N\}\}} has cardinality {N^2}. In some cases one can safely obtain incidence bounds in {{\bf R}} by embedding {{\bf R}} inside {{\bf C}}, but as the above example shows, one needs to be careful when doing so.

— 1. The determined case —

We begin by establishing Theorem 4. Fix {m=d \geq 0}. If one wishes, one can dispose of the trivial case {m=d=0} and assume {m=d \geq 1}, although this is not strictly necessary for the argument below.

As mentioned in the introduction, the first idea is to generically perturb the {P_1,\ldots,P_d}. To this end, let {W} be the vector space {W := W_1 \times \ldots \times W_d}, where {W_i} is the vector space of all polynomials {P_i: k^d \rightarrow k} of degree at most {D_i}; thus {W} is the configuration space of all possible {P_1,\ldots,P_d}. This is a finite-dimensional vector space over {k} with an explicit dimension {\hbox{dim}(W)} which we will not need to compute here. The set {V_{P_1,\ldots,P_d} \subset k^d} in (1) can thus be viewed as the fibre over {(P_1,\ldots,P_d)} of the algebraic set {V_* \subset W \times k^d}, defined as

\displaystyle  V_* := \{ (P_1,\ldots,P_d,x_1,\ldots,x_d) \in W \times k^d:

\displaystyle  P_1(x_1,\ldots,x_d) = \ldots = P_d(x_1,\ldots,x_d) = 0 \hbox{ for all } 1 \leq i \leq d \}.

This algebraic set is cut out by {d} polynomial equations on {W \times k^d} and thus is the union of finitely many algebraic varieties {V_* = V_{*,1} \cup \ldots \cup V_{*,r}}, each of which has codimension at most {d} in {W \times k^d}. In particular, {\hbox{dim}(V_{*,i}) \geq \hbox{dim}(W)}.

Consider one of the components {V_{*,i}} of {V_*}. Its projection {W_{*,i}} in {W} will be Zariski dense in its closure {\overline{W_{*,i}}}, which is necessarily an irreducible variety since {V_{*,i}} is, and the projection map from {V_{*,i}} to {\overline{W_{*,i}}} will, by definition, be a dominant map. By basic algebraic geometry, the preimages in {V_{*,i}} of a point {(P_1,\ldots,P_d)} in {W_{*,i}} will generically have dimension {\hbox{dim}(V_{*,i}) - \hbox{dim}(W_{*,i})} (i.e. this dimension will occur outside of a algebraic subset of {W_{*,i}} of positive codimension), and even in the non-generic case will have dimension at least {\hbox{dim}(V_{*,i}) - \hbox{dim}(W_{*,i})}. Since {\hbox{dim}(V_{*,i}) \geq \hbox{dim}(W)}, we thus see that the only components {V_{*,i}} that can contribute to the zero-dimensional set {V^0} are those with {\hbox{dim}(V_{*,i}) = \hbox{dim}(W_{*,i}) = \hbox{dim}(W)}. Now, the projection map is a dominant map between two varieties of the same dimension, and is thus generically finite, with the preimages generically having some constant cardinality {D_{*,i}}, and non-generically the preimages have a zero-dimensional component of at most {D_{*,i}}. (Topologically, one can see this claim (at least in the model case {k={\bf C}}) as follows. Any isolated point {p} in a non-generic preimage must perturb to a zero-dimensional set or to an empty set in nearby preimages, since {V_{*,i}} is closed. On a generic preimage, {p} must perturb to a zero-dimensional set (for if it generically perturbs to the empty set, the dimension of {V_{*,i}} will be too small); since a generic preimage has {D_{*,i}} points, we conclude that the non-generic preimage can contain at most {D_{*,i}} isolated points. For a more detailed proof of this claim, see Proposition 1 of Heintz’s paper.)

As a consequence of this analysis, we see that the generic fibre always has at least as many zero-dimensional components as a non-generic fibre, and so to establish Theorem 4, it suffices to do so for generic {P_1,\ldots,P_m}.

Now take {P_1,\ldots,P_m} to be generic. We know that generically, the set {V} is finite; we seek to bound its cardinality {|V|} by {D_1 \ldots D_m}. To do this, we dualise the problem. Let {A} be the space of all affine-linear forms {\lambda: k^d \rightarrow k}; this is a {d+1}-dimensional vector space. We consider the set {\hat V} of all affine-linear forms {\lambda} whose kernel {\{ \lambda = 0 \}} intersects {V}. This is a union of {|V|} hyperplanes in {A}, and is thus a hypersurface of degree {|V|}. Thus, to upper bound the size of {V}, it suffices to upper bound the degree of the hypersurface {\hat V}, and this can be done by finding a non-zero polynomial of controlled degree that vanishes identically on {\hat V}. The point of this observation is that the property of a polynomial being non-zero is a Zariski-open condition, and so we have a chance of establishing the generic case from a special case.

Now let us look for a (generically non-trivial) polynomial of degree at most {\prod_{i=1}^d D_i} that vanishes on {\hat V}. The idea is to try to dualise the assertion that the monomials {x_1^{a_1} \ldots x_d^{a_d}} with {a_j < D_j} for all {1 \leq j \leq d} generically span the function ring of {V}, to become a statement about {\hat V}.

Let {D} be a sufficiently large integer (any integer larger than {\sum_{i=1}^d (D_i-1)} will do, actually), and let {X} be the space of all polynomials {P: k^d \rightarrow k} of degree at most {D}. This is a finite-dimensional vector space over {k}, generated by the monomials {x_1^{a_1} \ldots x_d^{a_d}} with {a_1,\ldots,a_d} non-negative integers adding up to at most {D}. We can split {X} as a direct sum

\displaystyle  X = (\sum_{i=1}^d x_i^{D_i} \cdot X_i) + X_0 \ \ \ \ \ (2)

where for {i,\ldots,d}, {X_i} is generated by those monomials {x_1^{a_1} \ldots x_d^{a_d}} of degree at most {D-D_i} with {a_j < D_j} for all {j < i} and with {a_i \geq 0}, and {X_0} is generated by those monomials {x_1^{a_1} \ldots x_d^{a_d}} with {a_j < D_j} for all {1 \leq j \leq d}. In particular,

\displaystyle  \hbox{dim}(X) = \sum_{i=1}^d \hbox{dim}(X_i) + \hbox{dim}(X_0). \ \ \ \ \ (3)

Also observe that

\displaystyle  \hbox{dim}(X_0) = \prod_{i=1}^d D_i. \ \ \ \ \ (4)

Now let {\lambda \in A} be an affine-linear form, and consider the sum

\displaystyle  (\sum_{i=1}^d P_i \cdot X_i) + \lambda \cdot X_0. \ \ \ \ \ (5)

This is a subspace of {X} for {D} large enough. In view of (2), we see that this sum can equal all of {X} in the case when {P_i = x_i^{D_i}} and {\lambda = 1}. From (3), the property of (5) spanning all of {X} is a Zariski-open condition, and thus we see that (5) spans {X} for generic choices of {P_1,\ldots,P_d} and {\lambda}.

On the other hand, suppose that {\lambda \in \hat V}, thus {\lambda(a) = P_1(a) = \ldots = P_d(a) = 0} for some {a \in k^d}. Then observe that each factor of (5) lies in the hyperplane {\{ P \in X: P(a) = 0 \}} of {X}, and so (5) does not span {X} in this case. Thus, for generic {P_1,\ldots,P_d}, we see that (5) spans {X} for generic {\lambda} but not for any {\lambda} in {\hat V}.

The property of (5) spanning {X} is equivalent to a certain resultant-like determinant of a {\hbox{dim}(X) \times \hbox{dim}(X)} matrix being non-zero, where the rows are given by the generators of {P_i \cdot X_i} and {\lambda \cdot X_0}. For generic {P_1,\ldots,P_d}, this determinant is a non-trivial polynomial in {\lambda \cdot X_0} which vanishes on {\hat V}; an inspection of the matrix reveals that this determinant has degree {\hbox{dim}(X_0) = \prod_{i=1}^d D_i} in {\lambda}. Thus we have found a non-trivial polynomial of degree {\prod_{i=1}^d D_i} that vanishes on {\hat V}, and the claim follows.

— 2. The overdetermined case —

Now we establish Theorem 5. Fix {P_1,\ldots,P_m}, and let {V'} be the union of all the positive-dimensional components of {V}, thus {V_0 = V \backslash V'}.

The main idea here is to perturb {P_1,\ldots,P_m} to be a regular sequence of polynomials outside of {V'}. More precisely, we have

Lemma 6 (Regular sequence) For any {1 \leq r \leq d}, one can find polynomials {Q_1,\ldots,Q_r}, such that for each {i=1,\ldots,r}, {Q_i} is a linear combination of {P_i,\ldots,P_m} (and thus has degree at most {D_i}), with the {P_i} coefficient being non-zero, and the set {\{ x \in k^d \backslash V': Q_1 = \ldots = Q_r = 0 \}} in {k^d} has dimension at most {d-r} (in the sense that it is covered by finitely many varieties of dimension at most {d-r}).

Proof: We establish this claim by induction on {r}. For {r=1} the claim follows by setting {Q_1 := P_1}. Now suppose inductively that {1 < r \leq d}, and that {Q_1,\ldots,Q_{r-1}} have already been found with the stated properties for {r-1}.

By construction, the polynomials {P_1,\ldots,P_m} are linear combinations of {Q_1,\ldots,Q_{r-1}, P_{r},\ldots,P_m}. Thus, on the set {W := \{ x \in k^d \backslash V': Q_1 = \ldots = Q_{r-1} = 0 \}}, the polynomials {P_r,\ldots,P_m} can only simultaneously vanish on the zero-dimensional set {V \backslash V' = V_0}. On the other hand, each irreducible component of {W} has dimension {d-r+1}, which is positive. Thus it is not possible for {P_r,\ldots,P_m} to simultaneously vanish on any of these components. If one then sets {Q_r} to be a generic linear combination of {P_r,\ldots,P_m}, then we thus see that {Q_r} will also not vanish on any of these components, and so {\{ x \in k^d \backslash V': Q_1 = \ldots = Q_r = 0 \}} has dimension at most {d-r}. Also, generically the {P_r} coefficient of {Q_r} is non-zero, and the claim follows. \Box

From the above lemma (with {r := d}) we see that {V_0} is contained in the set {\{ Q_1 = \ldots = Q_d = 0\}}. By Theorem 4, the latter set has cardinality at most {D_1 \ldots D_d}, and the claim follows.