One of the most well known problems from ancient Greek mathematics was that of trisecting an angle by straightedge and compass, which was eventually proven impossible in 1837 by Pierre Wantzel, using methods from Galois theory.

Formally, one can set up the problem as follows. Define a configuration to be a finite collection {{\mathcal C}} of points, lines, and circles in the Euclidean plane. Define a construction step to be one of the following operations to enlarge the collection {{\mathcal C}}:

  • (Straightedge) Given two distinct points {A, B} in {{\mathcal C}}, form the line {\overline{AB}} that connects {A} and {B}, and add it to {{\mathcal C}}.
  • (Compass) Given two distinct points {A, B} in {{\mathcal C}}, and given a third point {O} in {{\mathcal C}} (which may or may not equal {A} or {B}), form the circle with centre {O} and radius equal to the length {|AB|} of the line segment joining {A} and {B}, and add it to {{\mathcal C}}.
  • (Intersection) Given two distinct curves {\gamma, \gamma'} in {{\mathcal C}} (thus {\gamma} is either a line or a circle in {{\mathcal C}}, and similarly for {\gamma'}), select a point {P} that is common to both {\gamma} and {\gamma'} (there are at most two such points), and add it to {{\mathcal C}}.

We say that a point, line, or circle is constructible by straightedge and compass from a configuration {{\mathcal C}} if it can be obtained from {{\mathcal C}} after applying a finite number of construction steps.

Problem 1 (Angle trisection) Let {A, B, C} be distinct points in the plane. Is it always possible to construct by straightedge and compass from {A,B,C} a line {\ell} through {A} that trisects the angle {\angle BAC}, in the sense that the angle between {\ell} and {BA} is one third of the angle of {\angle BAC}?

Thanks to Wantzel’s result, the answer to this problem is known to be “no” in general; a generic angle {\angle BAC} cannot be trisected by straightedge and compass. (On the other hand, some special angles can certainly be trisected by straightedge and compass, such as a right angle. Also, one can certainly trisect generic angles using other methods than straightedge and compass; see the Wikipedia page on angle trisection for some examples of this.)

The impossibility of angle trisection stands in sharp contrast to the easy construction of angle bisection via straightedge and compass, which we briefly review as follows:

  1. Start with three points {A, B, C}.
  2. Form the circle {c_0} with centre {A} and radius {AB}, and intersect it with the line {\overline{AC}}. Let {D} be the point in this intersection that lies on the same side of {A} as {C}. ({D} may well be equal to {C}).
  3. Form the circle {c_1} with centre {B} and radius {AB}, and the circle {c_2} with centre {D} and radius {AB}. Let {E} be the point of intersection of {c_1} and {c_2} that is not {A}.
  4. The line {\ell := \overline{AE}} will then bisect the angle {\angle BAC}.

The key difference between angle trisection and angle bisection ultimately boils down to the following trivial number-theoretic fact:

Lemma 2 There is no power of {2} that is evenly divisible by {3}.

Proof: Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic. \Box

In contrast, there are of course plenty of powers of {2} that are evenly divisible by {2}, and this is ultimately why angle bisection is easy while angle trisection is hard.

The standard way in which Lemma 2 is used to demonstrate the impossibility of angle trisection is via Galois theory. The implication is quite short if one knows this theory, but quite opaque otherwise. We briefly sketch the proof of this implication here, though we will not need it in the rest of the discussion. Firstly, Lemma 2 implies the following fact about field extensions.

Corollary 3 Let {F} be a field, and let {E} be an extension of {F} that can be constructed out of {F} by a finite sequence of quadratic extensions. Then {E} does not contain any cubic extensions {K} of {F}.

Proof: If E contained a cubic extension K of F, then the dimension of E over F would be a multiple of three. On the other hand, if E is obtained from F by a tower of quadratic extensions, then the dimension of E over F is a power of two. The claim then follows from Lemma 2. \Box

To conclude the proof, one then notes that any point, line, or circle that can be constructed from a configuration {{\mathcal C}} is definable in a field obtained from the coefficients of all the objects in {{\mathcal C}} after taking a finite number of quadratic extensions, whereas a trisection of an angle {\angle ABC} will generically only be definable in a cubic extension of the field generated by the coordinates of {A,B,C}.

The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other branches of mathematics), the Galois theory argument is the “right” way to prove the impossibility of angle trisection within the broader framework of modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which is usually not presented until an advanced undergraduate algebra or number theory course, whilst the angle trisection problem requires only high-school level mathematics to formulate. Even if one is allowed to “cheat” and sweep several technicalities under the rug, one still needs to possess a fair amount of solid intuition about advanced algebra in order to appreciate the proof. (This was undoubtedly one reason why, even after Wantzel’s impossibility result was published, a large amount of effort was still expended by amateur mathematicians to try to trisect a general angle.)

In this post I would therefore like to present a different proof (or perhaps more accurately, a disguised version of the standard proof) of the impossibility of angle trisection by straightedge and compass, that avoids explicit mention of Galois theory (though it is never far beneath the surface). With “cheats”, the proof is actually quite simple and geometric (except for Lemma 2, which is still used at a crucial juncture), based on the basic geometric concept of monodromy; unfortunately, some technical work is needed however to remove these cheats.

To describe the intuitive idea of the proof, let us return to the angle bisection construction, that takes a triple {A, B, C} of points as input and returns a bisecting line {\ell} as output. We iterate the construction to create a quadrisecting line {m}, via the following sequence of steps that extend the original bisection construction:

  1. Start with three points {A, B, C}.
  2. Form the circle {c_0} with centre {A} and radius {AB}, and intersect it with the line {\overline{AC}}. Let {D} be the point in this intersection that lies on the same side of {A} as {C}. ({D} may well be equal to {C}).
  3. Form the circle {c_1} with centre {B} and radius {AB}, and the circle {c_2} with centre {D} and radius {AB}. Let {E} be the point of intersection of {c_1} and {c_2} that is not {A}.
  4. Let {F} be the point on the line {\ell := \overline{AE}} which lies on {c_0}, and is on the same side of {A} as {E}.
  5. Form the circle {c_3} with centre {F} and radius {AB}. Let {G} be the point of intersection of {c_1} and {c_3} that is not {A}.
  6. The line {m := \overline{AG}} will then quadrisect the angle {\angle BAC}.

Let us fix the points {A} and {B}, but not {C}, and view {m} (as well as intermediate objects such as {D}, {c_2}, {E}, {\ell}, {F}, {c_3}, {G}) as a function of {C}.

Let us now do the following: we begin rotating {C} counterclockwise around {A}, which drags around the other objects {D}, {c_2}, {E}, {\ell}, {F}, {c_3}, {G} that were constructed by {C} accordingly. For instance, here is an early stage of this rotation process, when the angle {\angle BAC} has become obtuse:

Now for the slightly tricky bit. We are going to keep rotating {C} beyond a half-rotation of {180^\circ}, so that {\angle BAC} now becomes a reflex angle. At this point, a singularity occurs; the point {E} collides into {A}, and so there is an instant in which the line {\ell = \overline{AE}} is not well-defined. However, this turns out to be a removable singularity (and the easiest way to demonstrate this will be to tap the power of complex analysis, as complex numbers can easily route around such a singularity), and we can blast through it to the other side, giving a picture like this:

Note that we have now deviated from the original construction in that {F} and {E} are no longer on the same side of {A}; we are thus now working in a continuation of that construction rather than with the construction itself. Nevertheless, we can still work with this continuation (much as, say, one works with analytic continuations of infinite series such as {\sum_{n=1}^\infty \frac{1}{n^s}} beyond their original domain of definition).

We now keep rotating {C} around {A}. Here, {\angle BAC} is approaching a full rotation of {360^\circ}:

When {\angle BAC} reaches a full rotation, a different singularity occurs: {c_1} and {c_2} coincide. Nevertheless, this is also a removable singularity, and we blast through to beyond a full rotation:

And now {C} is back where it started, as are {D}, {c_2}, {E}, and {\ell}… but the point {F} has moved, from one intersection point of {\ell \cap c_3} to the other. As a consequence, {c_3}, {G}, and {m} have also changed, with {m} being at right angles to where it was before. (In the jargon of modern mathematics, the quadrisection construction has a non-trivial monodromy.)

But nothing stops us from rotating {C} some more. If we continue this procedure, we see that after two full rotations of {C} around {A}, all points, lines, and circles constructed from {A, B, C} have returned to their original positions. Because of this, we shall say that the quadrisection construction described above is periodic with period {2}.

Similarly, if one performs an octisection of the angle {\angle BAC} by bisecting the quadrisection, one can verify that this octisection is periodic with period {4}; it takes four full rotations of {C} around {A} before the configuration returns to where it started. More generally, one can show

Proposition 4 Any construction of straightedge and compass from the points {A,B,C} is periodic with period equal to a power of {2}.

The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedge-and-compass construction there is an ambiguity of at most {2! = 2}. Each rotation of {C} around {A} can potentially flip one of these points to the other, but then if one rotates again, the point returns to its original position, and then one can analyse the next point in the construction in the same fashion until one obtains the proposition.

But now consider a putative trisection operation, that starts with an arbitrary angle {\angle BAC} and somehow uses some sequence of straightedge and compass constructions to end up with a trisecting line {\ell}:

What is the period of this construction? If we continuously rotate {C} around {A}, we observe that a full rotations of {C} only causes the trisecting line {\ell} to rotate by a third of a full rotation (i.e. by {120^\circ}):

Because of this, we see that the period of any construction that contains {\ell} must be a multiple of {3}. But this contradicts Proposition 4 and Lemma 2.

Below the fold, I will make the above proof rigorous. Unfortunately, in doing so, I had to again leave the world of high-school mathematics, as one needs a little bit of algebraic geometry and complex analysis to resolve the issues with singularities that we saw in the above sketch. Still, I feel that at an intuitive level at least, this argument is more geometric and accessible than the Galois-theoretic argument (though anyone familiar with Galois theory will note that there is really not that much difference between the proofs, ultimately, as one has simply replaced the Galois group with a closely related monodromy group instead).

ā€” 1. Details ā€”

We now make the argument more rigorous. We will assume for sake of contradiction that for every triple {A,B,C} of distinct points, we can find a construction by straightedge and compass that trisects the angle {\angle BAC}, and eventually deduce a contradiction out of this.

We remark that we do not initially assume any uniformity in this construction; for instance, it could be possible that the trisection procedure for obtuse angles is completely different from that of acute angles, using a totally different set of constructions, while some exceptional angles (e.g. right angles or degenerate angles) might use yet another construction. We will address these issues later.

The first step is to get rid of some possible degeneracies in one’s construction. At present, nothing in our definition of a construction prevents us from adding a point, line, or circle to the construction that was already present in the existing collection {{\mathcal C}} of points, lines, and circles. However, it is clear that any such step in the construction is redundant, and can be omitted. Thus, we may assume without loss of generality that for each {A,B,C}, the construction used to trisect the angle contains no such redundant steps. (This may make the construction even less uniform than it was previously, but we will address this issue later.)

Another form of degeneracy that we will need to eliminate for technical reasons is that of tangency. At present, we allow in our construction the ability to take two tangent circles, or a circle and a tangent line, and add the tangent point to the collection (if it was not already present in the construction). This would ordinarily be a harmless thing to do, but it complicates our strategy of perturbing the configuration, so we now act to eliminate it. Suppose first that one had two circles {c_1,c_2} already constructed in the configuration {{\mathcal C}} and tangent to each other, and one wanted to add the tangent point {T} to the configuration. But note that in order to have added {c_1} and {c_2} to {{\mathcal C}}, one must previously have added the centres {A_1} and {A_2} of these circles to {{\mathcal C}} also. One can then add {T} to {{\mathcal C}} by intersecting the line {\overline{A_1 A_2}} with {c_1} and picking the point that lies on {c_2}; this way, one does not need to intersect two tangent curves together.

Similarly, suppose that we already had a circle {c} and a tangent line {\ell} already constructed in the configuration, but with the tangent point {T} absent. The centre {A} of {c}, and at least two points {B, C} on {\ell}, must previously have also been constructed in order to have {c} and {\ell} present; note that {B, C} are not equal to {T} by hypothesis. One can then obtain {T} by dropping a perpendicular from {A} to {\ell} by the usual construction (i.e. drawing a circle centred at {A} with radius {|AB|} to hit {\ell} again at {D}, then drawing circles from {B} and {D} with the same radius {|AB|} to meet at a point {E} distinct from {A}, then intersecting {AE} with {\ell} to obtain {T}), thus avoiding tangencies again. (This construction may happen to use lines or circles that had already appeared in the construction, but in those cases one can simply skip those steps.)

As a consequence of these reductions, we may now assume that our construction is nondegenerate in the sense that

  • Any point, line, or circle added at a step in the construction, does not previously appear in that construction.
  • Whenever one intersects two circles in a construction together to add another point to the construction, the circles are non-tangent (and thus meet in exactly two points).
  • Whenever one intersects a circle and a line in a construction together to add another point to the construction, the circle and line are non-tangent (and thus meet in exactly two points).

The reason why we restrict attention to nondegenerate constructions is that they are stable with respect to perturbations. Note for instance that if one has two circles {c_1, c_2} that intersect in two different points, and one of them is labeled {P}, then we may perturb {c_1} and {c_2} by a small amount, and still have an intersection point close to {P} (with the other intersection point far away from {P}). Thus, {P} is locally a continuous function of {c_1} and {c_2}. Similarly if one forms the intersection of a circle and a secant (a line which intersects non-tangentially). In a similar vein, given two points {A} and {B} that are distinct, the line between them {\overline{AB}} varies continuously with {A} and {B} as long as one does not move {A} and {B} so far that they collide; and given two lines {\ell_1} and {\ell_2} that intersect at a point {C} (and in particular are non-parallel), then {C} also depends continuously on {\ell_1} and {\ell_2}. Thus, in a nondegenerate construction starting from the original three points {A,B,C}, every point, line, or circle created by the construction can be viewed as a continuous function of {A,B,C}, as long as one only works in a sufficiently small neighbourhood of the original configuration {(A,B,C)}. In particular, the final line {\ell} varies continuously in this fashion. Note however that the trisection property may be lost by this perturbation; just because {\ell} happens to trisect {\angle BAC} when {A,B,C} are in the original positions, this does not necessarily imply that after one perturbs {A,B,C}, that the resulting perturbed line {\ell} still trisects the angle. (For instance, there are a number of ways to trisect a right angle (e.g. by bisecting an angle of an equilateral triangle), but if one perturbs the angle to be slightly acute or slightly obtuse, the line created by this procedure would not be expected to continue to trisect that angle.)

The next step is to allow analytic geometry (and thence algebraic geometry) to enter the picture, by using Cartesian coordinates. We may identify the Euclidean plane with the analytic plane {{\bf R}^2 := \{ (x,y): x,y \in {\bf R}\}}; we may also normalise {A, B} to be the points {A = (0,0)}, {B = (1,0)} by this identification. We will also restrict {C} to lie on the unit circle {S^1 := \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \}}, so that there is now just one degree of freedom in the configuration {(A,B,C)}. One can describe a line in {{\bf R}^2} by an equation of the form

\displaystyle \{ (x,y) \in {\bf R}^2: ax+by+c = 0 \}

(with {a,b} not both zero), and describe a circle in {{\bf R}^2} by an equation of the form

\displaystyle \{ (x,y) \in {\bf R}^2: (x-x_0)^2 + (y-y_0)^2 = r^2 \}

with {r} non-zero. There is some non-uniqueness in these representations: for the line, one can multiply {a,b,c} by the same constant without altering the line, and for the circle, one can replace {r} by {-r}. However, this will not be a serious concern for us. Note that any two distinct points {P = (x_1,y_1)}, {Q = (x_2,y_2)} determine a line

\displaystyle \{ (x,y) \in {\bf R}^2: x y_1 - xy_2 - y x_1 + y x_2 + x_1 y_2 - x_2 y_1 = 0 \}

and given three points {O = (x_0,y_0)}, {A = (x_1,y_1)}, {B = (x_2,y_2)}, one can form a circle

\displaystyle \{ (x,y) \in {\bf R}^2: (x-x_0)^2 + (y-y_0)^2 = (x_1-x_2)^2 + (y_1-y_2)^2\}

with centre {O} and radius {|AB|}. Given two distinct non-parallel lines

\displaystyle \ell = \{ (x,y) \in {\bf R}^2: ax+by+c=0\}

and

\displaystyle \ell' = \{ (x,y) \in {\bf R}^2: a'x+b'y+c'=0\},

their unique intersection point is given as

\displaystyle ( \frac{bc'-b'c}{ab'-ba'}, \frac{a'c-c'a}{ab'-ba'} );

similarly, given two circles

\displaystyle c_1 = \{ (x,y) \in {\bf R}^2: (x-x_1)^2 + (y-y_1)^2 = r_1^2 \}

and

\displaystyle c_2 = \{ (x,y) \in {\bf R}^2: (x-x_2)^2 + (y-y_2)^2 = r_2^2 \},

their points of intersection (if they exist in {{\bf R}^2}) are given as

\displaystyle (x_1,y_1) + t (x_2-x_1,y_2-y_1) \pm (\frac{r^2 - t^2 d^2}{d^2})^{1/2} (y_1-y_2,x_2-x_1) \ \ \ \ \ (1)

 

where

\displaystyle t := \frac{1}{2} - \frac{r_2^2-r_1^2}{2d^2}

and

\displaystyle d^2 := (x_1-x_2)^2+(y_1-y_2)^2,

and the points of intersection between {\ell} and {c_1} (if they exist in {{\bf R}^2}) are given as

\displaystyle (x_1,y_1) - \frac{ax_1+by_1+c}{a^2+b^2}(a,b) \pm \sqrt{r^2 - (\frac{ax_1+by_1+c}{a^2+b^2})^2} (b,-a). \ \ \ \ \ (2)

 

The precise expressions given above are not particularly important for our argument, save to note that these expressions are always algebraic functions of the input coordinates such as {x_0,x_1,x_2,y_0,y_1,y_2,a,b,c,a',b',c',r_1,r_2}, defined over the reals {{\bf R}}, and that the only algebraic operations needed here besides the arithmetic operations of addition, subtraction, multiplication, and division is the square root operation. Thus, we see that any particular construction of, say, a line {\ell} from a configuration {(A,B,C)} will locally be an algebraic function of {C} (recall that we have already fixed {A,B}), and this definition can be extended until one reaches a degeneracy (two points, lines, or circles collide, two curves become tangent, or two lines become parallel); however, this degeneracy only occurs in an proper real algebraic set of configurations, and in particular for {C} in a dimension zero subset of the circle {S^1}.

These degeneracies are annoying because they disconnect the circle {S^1}, and can potentially block off large regions of that circle for which the construction is not even defined (because two circles stop intersecting, or a circle and line stop intersecting, in {{\bf R}^2}, due to the lack of a real square root for negative numbers). To fix this, we move now from the real plane {{\bf R}^2} to the complex plane {{\bf C}^2}. Note that the algebraic definitions of a line and a circle continue to make perfect sense in {{\bf C}^2} (with coefficients such as {a,b,c,x_0,y_0,r} now allowed to be complex numbers instead of real numbers), and the algebraic intersection formulae given previously continue to make sense in the complex setting. The point {C} now is allowed to range in the complex circle {S^1_{\bf C} = \{ (x,y) \in {\bf C}: x^2+y^2 = 1 \}}, which is a Riemann surface (conformal to the Riemann sphere {{\bf C} \cup \infty} after stereogrpahic projection). Furthermore, because all non-zero complex numbers have square roots, any given construction that was valid for at least one configuration is now valid (though possibly multi-valued) as an algebraic function on {S^1_{\bf C}} outside of a dimension zero set of singularities, i.e. outside of a finite number of exceptional values of {C}. But note now that these singularities do not disconnect the complex circle {S^1_{\bf C}}, which has topological dimension two instead of one.

As mentioned earlier, a line {\ell} given by such a construction may or may not trisect the original angle {\angle BAC}. But this trisection property can be expressed algebraically (e.g. using the triple angle formulae from trigonometry, or by building rotation matrices), and in particular makes sense over {{\bf C}}. Thus, for any given construction of a line {\ell}, the set of {C} in {S^1_{\bf C}} for which the construction is non-degenerate and trisects {\angle BAC} is a constructible set (a boolean combination of algebraic sets). But {S^1_{\bf C}} is an irreducible one-dimensional complex variety. As such, the aforementioned set of {C} is either generic (the complement of a dimension one algebraic set), or has dimension at most one. (Here we are implicitly using the fundamental theorem of algebra, because the basic dimension theory of algebraic geometry only works properly over algebraically closed fields.)

On the other hand, there are at most countably many constructions, and by hypothesis, for each choice of {C} in {S^1_{\bf C}}, at least one of these constructions has to trisect the angle. Applying the Baire category theorem (or countable additivity of Lebesgue measure, or using the algebraic geometry fact that an algebraic variety over an uncountable field cannot be covered by the union of countably many algebraic sets of smaller dimension), we conclude that there is a single construction which trisects the angle {\angle BAC} for a generic choice of {C}, i.e. for all {C} in {S^1_{\bf C}} outside of a finite set of points, there is a construction, which amongst its multiple possible values, is able to output at least one line {\ell} that trisects {\angle BAC}.

Now one performs monodromy. Suppose we move {C} around a closed loop in {S^1_{\bf C}} that avoids all points of degeneracy. Then all the other points, lines, and circles constructed from {A, B, C} can be continuously extended from an initial configuration as discussed earlier, with each such object tracing out its own path in its own configuration space. Because of the presence of square roots in constructions such as the intersection (1) between two circles, or the intersection (2) between a circle and a line, these constructions may map a closed loop to an open loop; but because the square root function forms a double cover of {{\bf C} \backslash 0}, we see that any closed loop in {{\bf C} \backslash 0}, if doubled, will continue to be a closed loop upon taking a square root. (Alternatively, one can argue geometrically rather than algebraically, noting that in the intersection of (say) two non-degenerate circles {c_1, c_2}, there are only two possible choices for the intersection point of these two circles, and so if one performs monodromy along a loop of possible pairs {(c_1,c_2)} of circles, either these two choices return to where they initially started, or are swapped; so if one doubles the loop, one must necessarily leave the intersection points unchanged.) Iterating this, we see that any object constructed by straightedge and compass from {A,B,C} must have period {2^k} for some power of two {2^k}, in the sense that if one iterates a loop of {C} in {S^1_{\bf C}} avoiding degenerate points {2^k} times, the object must return to where it started. (In more algebraic terminology: the monodromy group must be a {2}-group.)

Now, one traverses {C} along a slight perturbation of a single rotation of the real unit circle {S^1}, taking a slight detour around the finite number of degeneracy points one encounters along the way. Since {\ell} has to trisect the angle {\angle ABC} at each of these points, while varying continuously with {C}, we see that when {C} traverses a full rotation, {\ell} has only traversed one third of a rotation (or two thirds, depending on which trisection one obtained), and so the period of {\ell} must be a multiple of three; but this contradicts Lemma 2, and the claim follows.