Notational convention: As in Notes 2, I will colour a statement red in this post if it assumes the axiom of choice.  We will, of course, rely on every other axiom of Zermelo-Frankel set theory here (and in the rest of the course).  \diamond

In this course we will often need to iterate some sort of operation “infinitely many times” (e.g. to create a infinite basis by choosing one basis element at a time).  In order to do this rigorously, we will rely on Zorn’s lemma:

Zorn’s Lemma. Let (X, \leq) be a non-empty partially ordered set, with the property that every chain (i.e. a totally ordered set) in X has an upper bound.  Then X contains a maximal element (i.e. an element with no larger element).

Indeed, we have used this lemma several times already in previous notes.  Given the other standard axioms of set theory, this lemma is logically equivalent to

Axiom of choice. Let X be a set, and let {\mathcal F} be a collection of non-empty subsets of X.  Then there exists a choice function f: {\mathcal F} \to X, i.e. a function such that f(A) \in A for all A \in {\mathcal F}.

One implication is easy:

Proof of axiom of choice using Zorn’s lemma. Define a partial choice function to be a pair ({\mathcal F}', f'), where {\mathcal F}' is a subset of {\mathcal F} and f': {\mathcal F}' \to X is a choice function for {\mathcal F'}.  We can partially order the collection of partial choice functions by writing ({\mathcal F}', f') \leq ({\mathcal F}'', f'') if {\mathcal F}' \subset {\mathcal F}'' and f” extends f’.  The collection of partial choice functions is non-empty (since it contains the pair (\emptyset, ()) consisting of the empty set and the empty function), and it is easy to see that any chain of partial choice functions has an upper bound (formed by gluing all the partial choices together).  Hence, by Zorn’s lemma, there is a maximal partial choice function ({\mathcal F}_*, f_*).  But the domain {\mathcal F}_* of this function must be all of {\mathcal F}, since otherwise one could enlarge {\mathcal F}_* by a single set A and extend f_* to A by choosing a single element of A.  (One does not need the axiom of choice to make a single choice, or finitely many choices; it is only when making infinitely many choices that the axiom becomes necessary.)  The claim follows. \Box

In the rest of these notes I would like to supply the reverse implication, using the machinery of well-ordered sets.  Instead of giving the shortest or slickest proof of Zorn’s lemma here, I would like to take the opportunity to place the lemma in the context of several related topics, such as ordinals and transfinite induction, noting that much of this material is in fact independent of the axiom of choice.  The material here is standard, but for the purposes of this course one may simply take Zorn’s lemma as a “black box” and not worry about the proof, so this material is optional.

— Well-ordered sets —

To prove Zorn’s lemma, we first need to strengthen the notion of a totally ordered set.

Definition 1. A well-ordered set is a totally ordered set X = (X,\leq) such that every non-empty subset A of X has a minimal element \min(A) \in A.  Two well-ordered sets X, Y are isomorphic if there is an order isomorphism \phi: X \to Y between them, i.e. a bijection \phi which is monotone (\phi(x) < \phi(x') whenever x < x').

Example 1. The natural numbers are well-ordered (this is the well-ordering principle), as is any finite totally ordered set (including the empty set), but the integers, rationals, or reals are not well-ordered. \diamond

Example 2. Any subset of a well-ordered set is again well-ordered.  In particular, if a, b are two elements of a well-ordered set, then intervals such as {}[a,b] := \{ c \in X: a \leq c \leq b \}, {}[a,b) := \{ c \in X: a \leq c < b \}, etc. are also well-ordered. \diamond

Example 3. If X is a well-ordered set, then the ordered set X \oplus \{+\infty\}, defined by adjoining a new element +\infty to X and declaring it to be larger than all the elements of X, is also well-ordered.  More generally, if X and Y are well-ordered sets, then the ordered set X \oplus Y, defined as the disjoint union of X and Y, with any element of Y declared to be larger than any element of X, is also well-ordered.  Observe that the operation \oplus is associative (up to isomorphism), but not commutative in general: for instance, {\Bbb N} \oplus \{\infty\} is not isomorphic to \{\infty\} \oplus {\Bbb N}. \diamond

Example 4. If X, Y are well-ordered sets, then the ordered set X \otimes Y, defined as the Cartesian product X \times Y with the lexicographical ordering (thus (x,y) \leq (x',y') if x < x', or if x=x' and y \leq y'), is again a well-ordered set.  Again, this operation is associative (up to isomorphism) but not commutative.  Note that we have one-sided distributivity: (X \oplus Y) \otimes Z is isomorphic to (X \otimes Z) \oplus (Y \otimes Z), but Z \otimes (X \oplus Y) is not isomorphic to (Z \otimes X) \oplus (Z \otimes Y) in general. \diamond

Remark 1. The axiom of choice is trivially true in the case when X is well-ordered, since one can take \min to be the choice function.  Thus, the axiom of choice follows from the well-ordering theorem (every set has at least one well-ordering)Conversely, we will be able to deduce the well-ordering theorem from Zorn’s lemma (and hence from the axiom of choice): see Exercise 11 below. \diamond

One of the reasons that well-ordered sets are useful is that one can perform induction on them.  This is easiest to describe for the principle of strong induction:

Exercise 1. (Strong induction on well-ordered sets)  Let X be a well-ordered set, and let P: X \mapsto \{ \hbox{true}, \hbox{false}\} be a property of elements of X.  Suppose that whenever x \in X is such that P(y) is true for all y<x, then P(x) is true.  Then P(x) is true for every x \in X. This is called the principle of strong induction.  Conversely, show that a totally ordered set X enjoys the principle of strong induction if and only if it is well-ordered.  (For partially ordered sets, the corresponding notion is that of being well-founded.) \diamond

To describe the analogue of the ordinary principle of induction for well-ordered sets, we need some more notation.  Given a subset A of a non-empty well-ordered set X, we define the supremum \sup(A) \in X \oplus \{+\infty\} of A to be the least upper bound

\sup(A) := \min( \{ y \in X \oplus \{+\infty\}: x \leq y \hbox{ for all } x \in A \} (1)

of A (thus for instance the supremum of the empty set is \min(X)).  If x \in X, we define the successor \hbox{succ}(x) \in X \oplus \{+\infty\} by the formula

\hbox{succ}(x) := \min( (x,+\infty] ). (2)

We have the following Peano-type axioms:

Exercise 2. If x is an element of a non-empty well-ordered set X, show that exactly one of the following statements hold:

  1. (Limit case) x = \sup( [\min(X),x) ).
  2. (Successor case) x = \hbox{succ}(y) for some y.

In particular, \min(X) is not the successor of any element in X. \diamond

Exercise 3. Show that if x, y are elements of a well-ordered set such that \hbox{succ}(x)=\hbox{succ}(y), then x=y. \diamond

Exercise 4. (Transfinite induction for well-ordered sets)   Let X be a non-empty well-ordered set, and let P: X \mapsto \{ \hbox{true}, \hbox{false}\} be a property of elements of X. Suppose that

  1. (Base case) P( \min(X) ) is true.
  2. (Successor case) If x \in X and P(x) is true, then P(\hbox{succ}(x)) is true.
  3. (Limit case) If x = \sup([\min(X),x)) and P(y) is true for all y<x, then P(x) is true.  [Note that this subsumes the base case.]

Then P(x) is true for all x \in X\diamond

Remark 2. The usual Peano axioms for succession are the special case of Exercises 2-4 in which the limit case of Exercise 2 only occurs for \min(X) (which is denoted 0), and the successor function never attains +\infty.  With these additional axioms, X is necessarily isomorphic to {\Bbb N}\diamond

Now we introduce two more key concepts.

Definition 2. An initial segment of a well-ordered set X is a subset Y of X such that {}[\min(X),y] \subset Y for all y \in Y (i.e. whenever y lies in Y, all elements of X that are less than y also lie in Y).

A morphism from one well-ordered set X to another Y is a map \phi: X \to Y which is strictly monotone (thus \phi(x) < \phi(x') whenever x < x') and such that \phi(X) is an initial segment of Y.

Example 5. The only morphism from \{1,2,3\} to \{1,2,3,4,5\} is the inclusion map.  There is no morphism from \{1,2,3,4,5\} to \{1,2,3\}\diamond

Remark 3. With this notion of a morphism, the class of well-ordered sets becomes a category. \diamond

We can identify the initial segments of X with elements of X \cup \{+\infty\}:

Exercise 5. Let X be a non-empty well-ordered set.  Show that every initial segment I of X is of the form I = [\min(X),a) for exactly one a \in X \cup \{+\infty\}. \diamond

Exercise 6. Show that an arbitrary union or arbitrary intersection of initial segments is again an initial segment. \diamond

Exercise 7. Let \phi: X \to Y be a morphism.  Show that \phi maps initial segments of X to initial segments of Y.  If x, x' \in X is such that x’ is the successor of x, show that \phi(x') is the successor of \phi(x)\diamond

As Example 5 suggests, there are very few morphisms between well-ordered sets.  Indeed, we have

Proposition 1. (Uniqueness of morphisms) Given two well-ordered sets X and Y, there is at most one morphism from X to Y.

Proof. Suppose we have two morphisms \phi: X \to Y, \psi: X \to Y.  By using transfinite induction (Exercise 4) and Exercise 7, one can show that \phi, \psi agree on {}[\min(X),a) for every a \in X \oplus \{+\infty\}; setting a=+\infty gives the claim.  \Box

Exercise 8. (Schroder-Bernstein theorem for well-ordered sets) Show that two well-ordered sets X, Y are isomorphic if and only if there is a morphism from X to Y, and a morphism from Y to X.  \diamond

We can complement the uniqueness in Proposition 1 with existence:

Proposition 2. (Existence of morphisms) Given two well-ordered sets X and Y, there is either a morphism from X to Y or a morphism from Y to X.

Proof. Call an element a \in X \oplus \{+\infty\} good if there is a morphism \phi_a from {}[\min(X),a) to Y, thus \min(X) is good.  If +\infty is good, then we are done.  From uniqueness we see that if every element in a set A is good, then the supremum \sup(A) is also good.  Applying transfinite induction (Exercise 4), we thus see that we are done unless there exists a good a \in X such that \hbox{succ}(a) is not good.  By Exercise 5, \phi_a( [\min(X),a) ) = [\min(Y),b) for some b \in Y \oplus \{+\infty\}.  If b \in Y then we could extend the morphism \phi_a to {}[\min(X),a] = [\min(X),\hbox{succ}(a)) by mapping a to b, contradicting the fact that \hbox{succ}(a) is not good; thus b=+\infty and so \phi_a is surjective.  It is then easy to check that \phi_a^{-1} exists and is a morphism from Y to X, and the claim follows. \Box

Remark 4. Formally, Proposition 1, Exercise 8, and Proposition 2 tell us that the collection of all well-ordered sets, modulo isomorphism, is totally ordered by declaring one well-ordered set X to be at least as large as another Y when there is a morphism from Y to X.  However, this is not quite the case, because the collection of well-ordered sets is only a class rather than a set.  Indeed, as we shall soon see, this is not a technicality, but is in fact a fundamental fact about well-ordered sets that lies at the heart of Zorn’s lemma.  (From Russell’s paradox we know that the notions of class and set are necessarily distinct.)  \Box

— Ordinals —

As we learn very early on in our mathematics education, a finite set of a certain cardinality (e.g. a set \{a,b,c,d,e\}) can be put in one-to-one correspondence with a “standard” set of the same cardinality (e.g. the set \{1,2,3,4,5\}); two finite sets have the same cardinality if and only if they correspond to the same “standard” set \{1,\ldots,N\}).  (The same fact is true for infinite sets; see Exercise 12 below.) Similarly, we would like to place every well-ordered set in a “standard” form.  This motivates

Definition 3. A representation \rho of the well-ordered sets is an assignment of a well-ordered set \rho(X) to every well-ordered set X such that

  1. \rho(X) is isomorphic to X for every well-ordered set X.  (In particular, if \rho(X) and \rho(Y) are equal, then X and Y are isomorphic.)
  2. If there exists a morphism from X to Y, then \rho(X) is a subset of \rho(Y) (and the order structure on \rho(X) is induced from that on \rho(Y).  (In particular, if X and Y are isomorphic, then \rho(X) and \rho(Y) are equal.)

Remark 5. In the language of category theory, a representation is a covariant functor from the category of well-ordered sets to itself which turns all morphisms into inclusions, and which is naturally isomorphic to the identity functor. \diamond

Remark 6. Because the collection of all well-ordered sets is a class rather than a set, \rho is not actually a function (it is sometimes referred to as a class function).   \diamond

It turns out that several representations of the well-ordered sets exist.  The most commonly used one is that of the ordinals, defined by von Neumann as follows.

Definition 4. An ordinal is a well-ordered set \alpha with the property that x = \{ y \in \alpha: y < x \} for all x \in \alpha.  (In particular, each element of \alpha is also a subset of \alpha, and the strict order relation < on \alpha is identical to the set membership relation \in.)

Example 6. For each natural number n = 0,1,2,\ldots, define the ordinal number n^{th} recursively by setting 0^{th} := \emptyset and n^{th} := \{0^{th}, 1^{th},\ldots, (n-1)^{th} \} for all n \geq 1, thus for instance

0^{th} := \emptyset

1^{th} := \{0^{th}\} = \{ \emptyset \}

2^{th} := \{0^{th}, 1^{th} \} = \{ \emptyset, \{ \emptyset \} \} (3)

3^{th} := \{ 0^{th}, 1^{th}, 2^{th} \} = \{ \emptyset, \{ \emptyset \}, \{ \emptyset, \{\emptyset\}\} \},

and so forth.  (Of course, to be compatible with the English language conventions for ordinals, we should write 1^{st} instead of 1^{th}, etc., but let us ignore this discrepancy.)  One can easily check by induction that n^{th} is an ordinal for every n.  Furthermore, if we define \omega := \{ n^{th}: n \in {\Bbb N} \}, then \omega is also an ordinal. (In the foundations of set theory, this construction, together with the axiom of infinity, is sometimes used to define the natural numbers (so that n = n^{th} for all natural numbers n), although this construction can lead to some conceptually strange-looking consequences that blur the distinction between numbers and sets, such as 3 \in 5 and 4 = \{0,1,2,3\}.) \diamond

The fundamental theorem about ordinals is

Theorem 1.

  1. Given any two ordinals \alpha, \beta, one is a subset of the other (and the order structure on \alpha is induced from that on \beta).
  2. Every well-ordered set X is isomorphic to exactly one ordinal \hbox{ord}(X).

In particular, \hbox{ord} is a representation of the well-ordered sets.

Proof. We first prove 1.  From Proposition 2 and symmetry, we may assume that there is a morphism \phi from \alpha to \beta.   By strong induction (Exercise 1) and Definition 4, we see that \phi(x)=x for all x \in \alpha, and so \phi is the inclusion map from \alpha into \beta.  The claim follows.

Now we prove 2.  If uniqueness failed, then we would have two distinct ordinals that are isomorphic to each other, but as one ordinal is a subset of the other, this would contradict Proposition 1 (the inclusion morphism is not an isomorphism); so it suffices to prove existence.

We use transfinite induction.  It suffices to show that for every a \in X \oplus \{+\infty\}, that {}[\min(X),a) is isomorphic to an ordinal \alpha(a) (which we know to be unique).  This is of course true in the base case a = \min(X).  To handle the successor case a = \hbox{succ}(b), we set \alpha(a) := \alpha(b) \cup \{ \alpha(b)\}, which is easily verified to be an ordinal isomorphic to {}[\min(X).a).  To handle the limit case a = \sup( [\min(X), a) ), we take all the ordinals associated to elements in {}{[}\min(X), a{)} and take their union (here we rely crucially on the axiom schema of replacement and the axiom of union); by use of part 1 one can show that this union is an ordinal isomorphic to a as required.  \Box

Remark 7. Operations on well-ordered sets, such as the sum \oplus and product \otimes defined in Examples 3, 4, induce corresponding operations on ordinals, leading to ordinal arithmetic, which we will not discuss here. (Note that the convention for which order multiplication proceeds in is swapped in some of the literature, thus \alpha \beta would be the ordinal of \beta \otimes \alpha rather than \alpha \otimes \beta.) \diamond

Exercise 9. (Ordinals are themselves well-ordered)  Let {\mathcal F} be a non-empty class of ordinals.  Show that there is a least ordinal \min({\mathcal F}) in this class, which is a subset of all the other ordinals in this class.  In particular, this shows that any set of ordinals is well-ordered by set inclusion. \diamond

Remark 8. Because of Exercise 9, we can meaningfully talk about “the least ordinal obeying property P”, as soon as we can exhibit at least one ordinal with that property P.  For instance, once one can demonstrate the existence of an uncountable ordinal (which follows from Exercise 11 below), one can talk about the least uncountable ordinal.  \diamond

Exercise 10. (Transfinite induction for ordinals)  Let P(\alpha) be a property pertaining to ordinals \alpha.  Suppose that

  1. (Base case) P(\emptyset) is true.
  2. (Successor case) If \alpha = \beta \cup \{\beta\} for some ordinal \beta, and P(\beta) is true, then P(\alpha) is true.
  3. (Limit case) If \alpha = \bigcup_{\beta \in \alpha} \beta, and P(\beta) is true for all \beta \in \alpha, then P(\alpha) is true.

Show that P(\alpha) is true for every ordinal \alpha. \diamond

Now we show a fundamental fact, that the well-ordered sets are just too “numerous” to all fit inside a single set, even modulo isomorphism.

Theorem 2. There does not exist a set A and a representation \rho of the well-ordered sets such that \rho(X) \in A for all well-ordered sets X.

Proof. By Theorem 1, any two distinct ordinals are non-isomorphic, and so get mapped under \rho to a different element of A.  Thus we can identify the class of ordinals with a subset of A, and so the class of ordinals is in fact a set.  In particular, by the axiom of union, we may take the union of all the ordinals, which one can verify to be another ordinal \varepsilon_0.  But then \varepsilon_0 \cup \{ \varepsilon_0\} is another ordinal, which implies that \varepsilon_0 \in \varepsilon_0, which contradicts the axiom of foundation. \Box

Remark 9. It is also possible to prove Theorem 2 without the theory of ordinals, or the axiom of foundation.  One first observes (by transfinite induction) that given two well-ordered sets X, X’, one of the sets \rho(X), \rho(X') is a subset of the other.  Because of this, one can show that the union S of all the \rho(X) (where X ranges over all well-ordered sets) is well-defined (because the \rho(X) form a subset of A) and well-ordered.  Now we look at the well-ordered set S \cup \{+\infty\}; by Proposition 1, it is not isomorphic to any subset of S, but \rho(S \cup \{+\infty\}) is necessarily contained in S, a contradiction. \diamond

Remark 10. The same argument also shows that there is no representation of the ordinals inside a given set; the ordinals are “too big” to be placed in anything other than a class.  \diamond

— Zorn’s lemma —

Now we can prove Zorn’s lemma. The key proposition is

Proposition 2. Let X be a partially ordered set, and let {\mathcal C} be the set of all well-ordered sets in X.  Then there does not exist a function g: {\mathcal C} \to X such that g(C) is a strict upper bound for C (i.e. g(C) > x for all x \in C) for all well-ordered C \in {\mathcal C}.

Proof. Suppose for contradiction that there existed X and g with the above properties.  Then, given any well-ordered set Y, we claim that there exists exactly one isomorphism \phi_Y: Y \to \rho(Y) from Y to a well-ordered set \rho(Y) in X such that \phi_Y(y) = g( \phi_Y([\min(Y),y) ) ) for all y \in Y.  Indeed, the uniqueness and existence can both be established by a transfinite induction that we leave as an exercise. (Informally, \phi_Y is what one gets by “applying g Y times, starting with the empty set”.)  From uniqueness we see that \rho(Y)=\rho(Y') whenever Y and Y’ are isomorphic, and another transfinite induction shows that \rho(Y) \subset \rho(Y') whenever Y is a subset of Y’.  Thus \rho is a representation of the ordinals.  But this contradicts Theorem 2. \Box

Remark 11. One can use transfinite induction on ordinals rather than well-ordered sets if one wishes here, using Remark 10 in place of Theorem 2.  \diamond

Proof of Zorn’s lemma. Suppose for contradiction that one had a non-empty partially ordered set X without maximal elements, such that every chain had an upper bound.  As there are no maximal elements, every element in X must be bounded by a strictly larger element in X, and so every chain in fact has a strict upper bound; in particular every well-ordered set has a strict upper bound.  Applying the axiom of choice, we may thus find a choice function g: {\mathcal C} \to X from the space of well-ordered sets in X to X, that maps every such set to a strict upper bound.  But this contradicts Proposition 2. \Box

Remark 12. It is important for Zorn’s lemma that X is a set, rather than a class.  Consider for instance the class of all ordinals.  Every chain of ordinals has an upper bound (namely, the union of the ordinals in that chain), and the class is certainly non-empty, but there is no maximal ordinal.  (Compare also Theorem 1 and Theorem 2.) \diamond

Remark 13. It is also important that every chain have an upper bound, and not just countable chains.  Indeed, the collection of countable subsets of an uncountable set (such as {\Bbb R}) is non-empty, and every countable chain has an upper bound, but there is no maximal element.  \diamond

Remark 14. The above argument shows that the hypothesis of Zorn’s lemma can be relaxed slightly; one does not need every chain to have an upper bound, merely every well-ordered set needs to have one.  But I do not know of any application in which this apparently stronger version of Zorn’s lemma dramatically simplifies an argument.  (In practice, either Zorn’s lemma can be applied routinely, or it fails utterly to be applicable at all.) \diamond

Exercise 11. Use Zorn’s lemma to establish the well-ordering theorem (every set has at least one well-ordering).   \diamond

Remark 15. By the above exercise, {\Bbb R} can be well-ordered. However, if one drops the axiom of choice from the axioms of set theory, one can no longer prove that {\Bbb R} is well-ordered.  Indeed, given a well-ordering of {\Bbb R}, it is not difficult (using Remark 1) to remove the axiom of choice from the Banach-Tarski constructions in Notes 2, and thus obtain constructions of non-measurable subsets of {\Bbb R}.  But a deep theorem of Solovay gives a model of set theory (without the axiom of choice) in which every set of reals is measurable.  \diamond

Exercise 12. Define a (von Neumann) cardinal to be an ordinal \alpha with the property that all smaller ordinals have strictly lesser cardinality (i.e. cannot be placed in one-to-one correspondence with \alpha).  Show that every set can be placed in one-to-one correspondence with exactly one cardinal. (This gives a representation of the category of sets, similar to how \hbox{ord} gives a representation of well-ordered sets.) \diamond

It seems appropriate to close these notes with a quote from Jerry Bona:

“The Axiom of Choice is obviously true, the well-ordering principle obviously false, and who can tell about Zorn’s Lemma?”

[Update, Jan 29: some more remarks added.]

[Update, Jan 30: more remarks added.]

[Update, Jan 31: See also “How to use Zorn’s lemma“, by Tim Gowers.]