Notational convention: As in Notes 2, I will colour a statement red in this post if it assumes the axiom of choice.  We will, of course, rely on every other axiom of Zermelo-Frankel set theory here (and in the rest of the course).  $\diamond$

In this course we will often need to iterate some sort of operation “infinitely many times” (e.g. to create a infinite basis by choosing one basis element at a time).  In order to do this rigorously, we will rely on Zorn’s lemma:

Zorn’s Lemma. Let $(X, \leq)$ be a non-empty partially ordered set, with the property that every chain (i.e. a totally ordered set) in X has an upper bound.  Then X contains a maximal element (i.e. an element with no larger element).

Indeed, we have used this lemma several times already in previous notes.  Given the other standard axioms of set theory, this lemma is logically equivalent to

Axiom of choice. Let X be a set, and let ${\mathcal F}$ be a collection of non-empty subsets of X.  Then there exists a choice function $f: {\mathcal F} \to X$, i.e. a function such that $f(A) \in A$ for all $A \in {\mathcal F}$.

One implication is easy:

Proof of axiom of choice using Zorn’s lemma. Define a partial choice function to be a pair $({\mathcal F}', f')$, where ${\mathcal F}'$ is a subset of ${\mathcal F}$ and $f': {\mathcal F}' \to X$ is a choice function for ${\mathcal F'}$.  We can partially order the collection of partial choice functions by writing $({\mathcal F}', f') \leq ({\mathcal F}'', f'')$ if ${\mathcal F}' \subset {\mathcal F}''$ and f” extends f’.  The collection of partial choice functions is non-empty (since it contains the pair $(\emptyset, ())$ consisting of the empty set and the empty function), and it is easy to see that any chain of partial choice functions has an upper bound (formed by gluing all the partial choices together).  Hence, by Zorn’s lemma, there is a maximal partial choice function $({\mathcal F}_*, f_*)$.  But the domain ${\mathcal F}_*$ of this function must be all of ${\mathcal F}$, since otherwise one could enlarge ${\mathcal F}_*$ by a single set A and extend $f_*$ to A by choosing a single element of A.  (One does not need the axiom of choice to make a single choice, or finitely many choices; it is only when making infinitely many choices that the axiom becomes necessary.)  The claim follows. $\Box$

In the rest of these notes I would like to supply the reverse implication, using the machinery of well-ordered sets.  Instead of giving the shortest or slickest proof of Zorn’s lemma here, I would like to take the opportunity to place the lemma in the context of several related topics, such as ordinals and transfinite induction, noting that much of this material is in fact independent of the axiom of choice.  The material here is standard, but for the purposes of this course one may simply take Zorn’s lemma as a “black box” and not worry about the proof, so this material is optional.

— Well-ordered sets —

To prove Zorn’s lemma, we first need to strengthen the notion of a totally ordered set.

Definition 1. A well-ordered set is a totally ordered set $X = (X,\leq)$ such that every non-empty subset A of X has a minimal element $\min(A) \in A$.  Two well-ordered sets X, Y are isomorphic if there is an order isomorphism $\phi: X \to Y$ between them, i.e. a bijection $\phi$ which is monotone ($\phi(x) < \phi(x')$ whenever $x < x'$).

Example 1. The natural numbers are well-ordered (this is the well-ordering principle), as is any finite totally ordered set (including the empty set), but the integers, rationals, or reals are not well-ordered. $\diamond$

Example 2. Any subset of a well-ordered set is again well-ordered.  In particular, if a, b are two elements of a well-ordered set, then intervals such as ${}[a,b] := \{ c \in X: a \leq c \leq b \}$, ${}[a,b) := \{ c \in X: a \leq c < b \}$, etc. are also well-ordered. $\diamond$

Example 3. If X is a well-ordered set, then the ordered set $X \oplus \{+\infty\}$, defined by adjoining a new element $+\infty$ to $X$ and declaring it to be larger than all the elements of X, is also well-ordered.  More generally, if X and Y are well-ordered sets, then the ordered set $X \oplus Y$, defined as the disjoint union of X and Y, with any element of Y declared to be larger than any element of X, is also well-ordered.  Observe that the operation $\oplus$ is associative (up to isomorphism), but not commutative in general: for instance, ${\Bbb N} \oplus \{\infty\}$ is not isomorphic to $\{\infty\} \oplus {\Bbb N}$. $\diamond$

Example 4. If X, Y are well-ordered sets, then the ordered set $X \otimes Y$, defined as the Cartesian product $X \times Y$ with the lexicographical ordering (thus $(x,y) \leq (x',y')$ if $x < x'$, or if $x=x'$ and $y \leq y'$), is again a well-ordered set.  Again, this operation is associative (up to isomorphism) but not commutative.  Note that we have one-sided distributivity: $(X \oplus Y) \otimes Z$ is isomorphic to $(X \otimes Z) \oplus (Y \otimes Z)$, but $Z \otimes (X \oplus Y)$ is not isomorphic to $(Z \otimes X) \oplus (Z \otimes Y)$ in general. $\diamond$

Remark 1. The axiom of choice is trivially true in the case when X is well-ordered, since one can take $\min$ to be the choice function.  Thus, the axiom of choice follows from the well-ordering theorem (every set has at least one well-ordering)Conversely, we will be able to deduce the well-ordering theorem from Zorn’s lemma (and hence from the axiom of choice): see Exercise 11 below. $\diamond$

One of the reasons that well-ordered sets are useful is that one can perform induction on them.  This is easiest to describe for the principle of strong induction:

Exercise 1. (Strong induction on well-ordered sets)  Let X be a well-ordered set, and let $P: X \mapsto \{ \hbox{true}, \hbox{false}\}$ be a property of elements of X.  Suppose that whenever $x \in X$ is such that $P(y)$ is true for all $y, then $P(x)$ is true.  Then $P(x)$ is true for every $x \in X$. This is called the principle of strong induction.  Conversely, show that a totally ordered set X enjoys the principle of strong induction if and only if it is well-ordered.  (For partially ordered sets, the corresponding notion is that of being well-founded.) $\diamond$

To describe the analogue of the ordinary principle of induction for well-ordered sets, we need some more notation.  Given a subset A of a non-empty well-ordered set X, we define the supremum $\sup(A) \in X \oplus \{+\infty\}$ of A to be the least upper bound

$\sup(A) := \min( \{ y \in X \oplus \{+\infty\}: x \leq y \hbox{ for all } x \in A \}$ (1)

of A (thus for instance the supremum of the empty set is $\min(X)$).  If $x \in X$, we define the successor $\hbox{succ}(x) \in X \oplus \{+\infty\}$ by the formula

$\hbox{succ}(x) := \min( (x,+\infty] )$. (2)

We have the following Peano-type axioms:

Exercise 2. If x is an element of a non-empty well-ordered set X, show that exactly one of the following statements hold:

1. (Limit case) $x = \sup( [\min(X),x) )$.
2. (Successor case) $x = \hbox{succ}(y)$ for some $y$.

In particular, $\min(X)$ is not the successor of any element in X. $\diamond$

Exercise 3. Show that if x, y are elements of a well-ordered set such that $\hbox{succ}(x)=\hbox{succ}(y)$, then x=y. $\diamond$

Exercise 4. (Transfinite induction for well-ordered sets)   Let X be a non-empty well-ordered set, and let $P: X \mapsto \{ \hbox{true}, \hbox{false}\}$ be a property of elements of X. Suppose that

1. (Base case) $P( \min(X) )$ is true.
2. (Successor case) If $x \in X$ and $P(x)$ is true, then $P(\hbox{succ}(x))$ is true.
3. (Limit case) If $x = \sup([\min(X),x))$ and $P(y)$ is true for all $y, then $P(x)$ is true.  [Note that this subsumes the base case.]

Then $P(x)$ is true for all $x \in X$$\diamond$

Remark 2. The usual Peano axioms for succession are the special case of Exercises 2-4 in which the limit case of Exercise 2 only occurs for $\min(X)$ (which is denoted 0), and the successor function never attains $+\infty$.  With these additional axioms, X is necessarily isomorphic to ${\Bbb N}$$\diamond$

Now we introduce two more key concepts.

Definition 2. An initial segment of a well-ordered set X is a subset Y of X such that ${}[\min(X),y] \subset Y$ for all $y \in Y$ (i.e. whenever y lies in Y, all elements of X that are less than y also lie in Y).

A morphism from one well-ordered set X to another Y is a map $\phi: X \to Y$ which is strictly monotone (thus $\phi(x) < \phi(x')$ whenever $x < x'$) and such that $\phi(X)$ is an initial segment of Y.

Example 5. The only morphism from $\{1,2,3\}$ to $\{1,2,3,4,5\}$ is the inclusion map.  There is no morphism from $\{1,2,3,4,5\}$ to $\{1,2,3\}$$\diamond$

Remark 3. With this notion of a morphism, the class of well-ordered sets becomes a category. $\diamond$

We can identify the initial segments of X with elements of $X \cup \{+\infty\}$:

Exercise 5. Let X be a non-empty well-ordered set.  Show that every initial segment I of X is of the form $I = [\min(X),a)$ for exactly one $a \in X \cup \{+\infty\}$. $\diamond$

Exercise 6. Show that an arbitrary union or arbitrary intersection of initial segments is again an initial segment. $\diamond$

Exercise 7. Let $\phi: X \to Y$ be a morphism.  Show that $\phi$ maps initial segments of X to initial segments of Y.  If $x, x' \in X$ is such that x’ is the successor of x, show that $\phi(x')$ is the successor of $\phi(x)$$\diamond$

As Example 5 suggests, there are very few morphisms between well-ordered sets.  Indeed, we have

Proposition 1. (Uniqueness of morphisms) Given two well-ordered sets X and Y, there is at most one morphism from X to Y.

Proof. Suppose we have two morphisms $\phi: X \to Y$, $\psi: X \to Y$.  By using transfinite induction (Exercise 4) and Exercise 7, one can show that $\phi, \psi$ agree on ${}[\min(X),a)$ for every $a \in X \oplus \{+\infty\}$; setting $a=+\infty$ gives the claim.  $\Box$

Exercise 8. (Schroder-Bernstein theorem for well-ordered sets) Show that two well-ordered sets X, Y are isomorphic if and only if there is a morphism from X to Y, and a morphism from Y to X.  $\diamond$

We can complement the uniqueness in Proposition 1 with existence:

Proposition 2. (Existence of morphisms) Given two well-ordered sets X and Y, there is either a morphism from X to Y or a morphism from Y to X.

Proof. Call an element $a \in X \oplus \{+\infty\}$ good if there is a morphism $\phi_a$ from ${}[\min(X),a)$ to Y, thus $\min(X)$ is good.  If $+\infty$ is good, then we are done.  From uniqueness we see that if every element in a set A is good, then the supremum $\sup(A)$ is also good.  Applying transfinite induction (Exercise 4), we thus see that we are done unless there exists a good $a \in X$ such that $\hbox{succ}(a)$ is not good.  By Exercise 5, $\phi_a( [\min(X),a) ) = [\min(Y),b)$ for some $b \in Y \oplus \{+\infty\}$.  If $b \in Y$ then we could extend the morphism $\phi_a$ to ${}[\min(X),a] = [\min(X),\hbox{succ}(a))$ by mapping a to b, contradicting the fact that $\hbox{succ}(a)$ is not good; thus $b=+\infty$ and so $\phi_a$ is surjective.  It is then easy to check that $\phi_a^{-1}$ exists and is a morphism from Y to X, and the claim follows. $\Box$

Remark 4. Formally, Proposition 1, Exercise 8, and Proposition 2 tell us that the collection of all well-ordered sets, modulo isomorphism, is totally ordered by declaring one well-ordered set X to be at least as large as another Y when there is a morphism from Y to X.  However, this is not quite the case, because the collection of well-ordered sets is only a class rather than a set.  Indeed, as we shall soon see, this is not a technicality, but is in fact a fundamental fact about well-ordered sets that lies at the heart of Zorn’s lemma.  (From Russell’s paradox we know that the notions of class and set are necessarily distinct.)  $\Box$

— Ordinals —

As we learn very early on in our mathematics education, a finite set of a certain cardinality (e.g. a set $\{a,b,c,d,e\}$) can be put in one-to-one correspondence with a “standard” set of the same cardinality (e.g. the set $\{1,2,3,4,5\}$); two finite sets have the same cardinality if and only if they correspond to the same “standard” set $\{1,\ldots,N\}$).  (The same fact is true for infinite sets; see Exercise 12 below.) Similarly, we would like to place every well-ordered set in a “standard” form.  This motivates

Definition 3. A representation $\rho$ of the well-ordered sets is an assignment of a well-ordered set $\rho(X)$ to every well-ordered set X such that

1. $\rho(X)$ is isomorphic to X for every well-ordered set X.  (In particular, if $\rho(X)$ and $\rho(Y)$ are equal, then X and Y are isomorphic.)
2. If there exists a morphism from X to Y, then $\rho(X)$ is a subset of $\rho(Y)$ (and the order structure on $\rho(X)$ is induced from that on $\rho(Y)$.  (In particular, if X and Y are isomorphic, then $\rho(X)$ and $\rho(Y)$ are equal.)

Remark 5. In the language of category theory, a representation is a covariant functor from the category of well-ordered sets to itself which turns all morphisms into inclusions, and which is naturally isomorphic to the identity functor. $\diamond$

Remark 6. Because the collection of all well-ordered sets is a class rather than a set, $\rho$ is not actually a function (it is sometimes referred to as a class function).   $\diamond$

It turns out that several representations of the well-ordered sets exist.  The most commonly used one is that of the ordinals, defined by von Neumann as follows.

Definition 4. An ordinal is a well-ordered set $\alpha$ with the property that $x = \{ y \in \alpha: y < x \}$ for all $x \in \alpha$.  (In particular, each element of $\alpha$ is also a subset of $\alpha$, and the strict order relation $<$ on $\alpha$ is identical to the set membership relation $\in$.)

Example 5. For each natural number $n = 0,1,2,\ldots$, define the ordinal number $n^{th}$ recursively by setting $0^{th} := \emptyset$ and $n^{th} := \{0^{th}, 1^{th},\ldots, (n-1)^{th} \}$ for all $n \geq 1$, thus for instance

$0^{th} := \emptyset$

$1^{th} := \{0^{th}\} = \{ \emptyset \}$

$2^{th} := \{0^{th}, 1^{th} \} = \{ \emptyset, \{ \emptyset \} \}$ (3)

$3^{th} := \{ 0^{th}, 1^{th}, 2^{th} \} = \{ \emptyset, \{ \emptyset \}, \{ \emptyset, \{\emptyset\}\} \},$

and so forth.  (Of course, to be compatible with the English language conventions for ordinals, we should write $1^{st}$ instead of $1^{th}$, etc., but let us ignore this discrepancy.)  One can easily check by induction that $n^{th}$ is an ordinal for every n.  Furthermore, if we define $\omega := \{ n^{th}: n \in {\Bbb N} \}$, then $\omega$ is also an ordinal. (In the foundations of set theory, this construction, together with the axiom of infinity, is sometimes used to define the natural numbers (so that $n = n^{th}$ for all natural numbers n), although this construction can lead to some conceptually strange-looking consequences that blur the distinction between numbers and sets, such as $3 \in 5$ and $4 = \{0,1,2,3\}$.) $\diamond$

The fundamental theorem about ordinals is

Theorem 1.

1. Given any two ordinals $\alpha, \beta$, one is a subset of the other (and the order structure on $\alpha$ is induced from that on $\beta$).
2. Every well-ordered set X is isomorphic to exactly one ordinal $\hbox{ord}(X)$.

In particular, $\hbox{ord}$ is a representation of the well-ordered sets.

Proof. We first prove 1.  From Proposition 2 and symmetry, we may assume that there is a morphism $\phi$ from $\alpha$ to $\beta$.   By strong induction (Exercise 1) and Definition 4, we see that $\phi(x)=x$ for all $x \in \alpha$, and so $\phi$ is the inclusion map from $\alpha$ into $\beta$.  The claim follows.

Now we prove 2.  If uniqueness failed, then we would have two distinct ordinals that are isomorphic to each other, but as one ordinal is a subset of the other, this would contradict Proposition 1 (the inclusion morphism is not an isomorphism); so it suffices to prove existence.

We use transfinite induction.  It suffices to show that for every $a \in X \oplus \{+\infty\}$, that ${}[\min(X),a)$ is isomorphic to an ordinal $\alpha(a)$ (which we know to be unique).  This is of course true in the base case $a = \min(X)$.  To handle the successor case $a = \hbox{succ}(b)$, we set $\alpha(a) := \alpha(b) \cup \{ \alpha(b)\}$, which is easily verified to be an ordinal isomorphic to ${}[\min(X).a)$.  To handle the limit case $a = \sup( [\min(X), a) )$, we take all the ordinals associated to elements in ${}{[}\min(X), a{)}$ and take their union (here we rely crucially on the axiom schema of replacement and the axiom of union); by use of part 1 one can show that this union is an ordinal isomorphic to a as required.  $\Box$

Remark 7. Operations on well-ordered sets, such as the sum $\oplus$ and product $\otimes$ defined in Examples 3, 4, induce corresponding operations on ordinals, leading to ordinal arithmetic, which we will not discuss here. (Note that the convention for which order multiplication proceeds in is swapped in some of the literature, thus $\alpha \beta$ would be the ordinal of $\beta \otimes \alpha$ rather than $\alpha \otimes \beta$.) $\diamond$

Exercise 9. (Ordinals are themselves well-ordered)  Let ${\mathcal F}$ be a non-empty class of ordinals.  Show that there is a least ordinal $\min({\mathcal F})$ in this class, which is a subset of all the other ordinals in this class.  In particular, this shows that any set of ordinals is well-ordered by set inclusion. $\diamond$

Remark 8. Because of Exercise 9, we can meaningfully talk about “the least ordinal obeying property P”, as soon as we can exhibit at least one ordinal with that property P.  For instance, once one can demonstrate the existence of an uncountable ordinal (which follows from Exercise 11 below), one can talk about the least uncountable ordinal.  $\diamond$

Exercise 10. (Transfinite induction for ordinals)  Let $P(\alpha)$ be a property pertaining to ordinals $\alpha$.  Suppose that

1. (Base case) $P(\emptyset)$ is true.
2. (Successor case) If $\alpha = \beta \cup \{\beta\}$ for some ordinal $\beta$, and $P(\beta)$ is true, then $P(\alpha)$ is true.
3. (Limit case) If $\alpha = \bigcup_{\beta \in \alpha} \beta$, and $P(\beta)$ is true for all $\beta \in \alpha$, then $P(\alpha)$ is true.

Show that $P(\alpha)$ is true for every ordinal $\alpha$. $\diamond$

Now we show a fundamental fact, that the well-ordered sets are just too “numerous” to all fit inside a single set, even modulo isomorphism.

Theorem 2. There does not exist a set A and a representation $\rho$ of the well-ordered sets such that $\rho(X) \in A$ for all well-ordered sets X.

Proof. By Theorem 1, any two distinct ordinals are non-isomorphic, and so get mapped under $\rho$ to a different element of A.  Thus we can identify the class of ordinals with a subset of A, and so the class of ordinals is in fact a set.  In particular, by the axiom of union, we may take the union of all the ordinals, which one can verify to be another ordinal $\varepsilon_0$.  But then $\varepsilon_0 \cup \{ \varepsilon_0\}$ is another ordinal, which implies that $\varepsilon_0 \in \varepsilon_0$, which contradicts the axiom of foundation. $\Box$

Remark 9. It is also possible to prove Theorem 2 without the theory of ordinals, or the axiom of foundation.  One first observes (by transfinite induction) that given two well-ordered sets X, X’, one of the sets $\rho(X), \rho(X')$ is a subset of the other.  Because of this, one can show that the union S of all the $\rho(X)$ (where X ranges over all well-ordered sets) is well-defined (because the $\rho(X)$ form a subset of A) and well-ordered.  Now we look at the well-ordered set $S \cup \{+\infty\}$; by Proposition 1, it is not isomorphic to any subset of S, but $\rho(S \cup \{+\infty\})$ is necessarily contained in S, a contradiction. $\diamond$

Remark 10. The same argument also shows that there is no representation of the ordinals inside a given set; the ordinals are “too big” to be placed in anything other than a class.  $\diamond$

— Zorn’s lemma —

Now we can prove Zorn’s lemma. The key proposition is

Proposition 2. Let X be a partially ordered set, and let ${\mathcal C}$ be the set of all well-ordered sets in X.  Then there does not exist a function $g: {\mathcal C} \to X$ such that $g(C)$ is a strict upper bound for C (i.e. $g(C) > x$ for all $x \in C$) for all well-ordered $C \in {\mathcal C}$.

Proof. Suppose for contradiction that there existed X and g with the above properties.  Then, given any well-ordered set Y, we claim that there exists exactly one isomorphism $\phi_Y: Y \to \rho(Y)$ from Y to a well-ordered set $\rho(Y)$ in X such that $\phi_Y(y) = g( \phi_Y([\min(Y),y) ) )$ for all $y \in Y$.  Indeed, the uniqueness and existence can both be established by a transfinite induction that we leave as an exercise. (Informally, $\phi_Y$ is what one gets by “applying g Y times, starting with the empty set”.)  From uniqueness we see that $\rho(Y)=\rho(Y')$ whenever Y and Y’ are isomorphic, and another transfinite induction shows that $\rho(Y) \subset \rho(Y')$ whenever Y is a subset of Y’.  Thus $\rho$ is a representation of the ordinals.  But this contradicts Theorem 2. $\Box$

Remark 11. One can use transfinite induction on ordinals rather than well-ordered sets if one wishes here, using Remark 10 in place of Theorem 2.  $\diamond$

Proof of Zorn’s lemma. Suppose for contradiction that one had a non-empty partially ordered set X without maximal elements, such that every chain had an upper bound.  As there are no maximal elements, every element in X must be bounded by a strictly larger element in X, and so every chain in fact has a strict upper bound; in particular every well-ordered set has a strict upper bound.  Applying the axiom of choice, we may thus find a choice function $g: {\mathcal C} \to X$ from the space of well-ordered sets in X to X, that maps every such set to a strict upper bound.  But this contradicts Proposition 2. $\Box$

Remark 12. It is important for Zorn’s lemma that X is a set, rather than a class.  Consider for instance the class of all ordinals.  Every chain of ordinals has an upper bound (namely, the union of the ordinals in that chain), and the class is certainly non-empty, but there is no maximal ordinal.  (Compare also Theorem 1 and Theorem 2.) $\diamond$

Remark 13. It is also important that every chain have an upper bound, and not just countable chains.  Indeed, the collection of countable subsets of an uncountable set (such as ${\Bbb R}$) is non-empty, and every countable chain has an upper bound, but there is no maximal element.  $\diamond$

Remark 14. The above argument shows that the hypothesis of Zorn’s lemma can be relaxed slightly; one does not need every chain to have an upper bound, merely every well-ordered set needs to have one.  But I do not know of any application in which this apparently stronger version of Zorn’s lemma dramatically simplifies an argument.  (In practice, either Zorn’s lemma can be applied routinely, or it fails utterly to be applicable at all.) $\diamond$

Exercise 11. Use Zorn’s lemma to establish the well-ordering theorem (every set has at least one well-ordering).   $\diamond$

Remark 15. By the above exercise, ${\Bbb R}$ can be well-ordered. However, if one drops the axiom of choice from the axioms of set theory, one can no longer prove that ${\Bbb R}$ is well-ordered.  Indeed, given a well-ordering of ${\Bbb R}$, it is not difficult (using Remark 1) to remove the axiom of choice from the Banach-Tarski constructions in Notes 2, and thus obtain constructions of non-measurable subsets of ${\Bbb R}$.  But a deep theorem of Solovay gives a model of set theory (without the axiom of choice) in which every set of reals is measurable.  $\diamond$

Exercise 12. Define a (von Neumann) cardinal to be an ordinal $\alpha$ with the property that all smaller ordinals have strictly lesser cardinality (i.e. cannot be placed in one-to-one correspondence with $\alpha$).  Show that every set can be placed in one-to-one correspondence with exactly one cardinal. (This gives a representation of the category of sets, similar to how $\hbox{ord}$ gives a representation of well-ordered sets.) $\diamond$

It seems appropriate to close these notes with a quote from Jerry Bona:

“The Axiom of Choice is obviously true, the well-ordering principle obviously false, and who can tell about Zorn’s Lemma?”

[Update, Jan 29: some more remarks added.]

[Update, Jan 30: more remarks added.]

[Update, Jan 31: See also “How to use Zorn’s lemma“, by Tim Gowers.]