In my discussion of the Oppenheim conjecture in my recent post on Ratner’s theorems, I mentioned in passing the simple but crucial fact that the (orthochronous) special orthogonal group SO(Q)^+ of an indefinite quadratic form on {\Bbb R}^3 can be generated by unipotent elements. This is not a difficult fact to prove, as one can simply diagonalise Q and then explicitly write down some unipotent elements (the magic words here are “null rotations“). But this is a purely algebraic approach; I thought it would also be instructive to show the geometric (or dynamic) reason for why unipotent elements appear in the orthogonal group of indefinite quadratic forms in three dimensions. (I’ll give away the punch line right away: it’s because the parabola is a conic section.) This is not a particularly deep or significant observation, and will not be surprising to the experts, but I would like to record it anyway, as it allows me to review some useful bits and pieces of elementary linear algebra.

— Unipotent matrices —

Before we get to unipotent elements of a group, let us first understand geometrically what a unipotent matrix (or linear transformation) A is. Suppose we consider an orbit x_n = A^n x of some initial vector x with respect to this transformation A (thus x_n is a linear recurrence sequence). How does x_n behave geometrically as n \to \infty?

Despite the simple and explicit description x_n = A^n x of the orbit, the geometric behaviour can be rather complicated, depending crucially on the spectrum of A (and, to a lesser extent, on the choice of x). If for instance A has an eigenvalue \lambda with \lambda > 1, and x is an eigenvector of A with eigenvalue \lambda, then we will of course have x_n = \lambda^n x_0, thus this orbit will grow exponentially. Similarly, if one has an eigenvalue between 0 and 1, then it is possible for the orbit to decay exponentially.

If one has eigenvalues with a complex phase, one can have oscillation. If for instance A is the rotation matrix A = R_\theta := \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix} corresponding to anticlockwise rotation around the origin by some non-trivial angle \theta (and which has complex eigenvalues e^{i\theta} and e^{-i\theta}), and (say) x_0 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, then the orbit x_n = \begin{pmatrix} \cos n \theta \\ \sin n\theta \end{pmatrix} will oscillate around the unit circle indefinitely.

If an eigenvalue has non-trivial magnitude and non-trivial phase, one gets a combination of exponential growth or decay and oscillation, leading for instance to orbits which follow a logarithmic spiral (this will be the case for instance if A = \lambda R_\theta for some rotation matrix R_\theta and some dilation factor \lambda > 0).

One can have even more complicated behaviour if there are multiple eigenvalues in play. Consider for instance the matrix A := \begin{pmatrix} \lambda & 0 \\ 0 & 1/\lambda \end{pmatrix} with \lambda > 1, with the initial vector x := \begin{pmatrix} y \\ z \end{pmatrix} with both y and z non-zero (so that x has a non-trivial presence in both the unstable and stable modes of A). Then the orbit x_n = \begin{pmatrix} \lambda^n y \\ \lambda^{-n} z\end{pmatrix} will expand exponentially in the unstable mode and contract exponentially in the stable mode, and the orbit will lie along the rectangular hyperbola \{ \begin{pmatrix} a \\ b \end{pmatrix}: ab = yz \}.

As the above examples show, orbits of linear transformations can exhibit a variety of behaviours, from exponential growth to exponential decay to oscillation to some combination of all three. But there is one special case in which the behaviour is much simpler, namely that the orbit remains polynomial. This occurs when A is a unipotent matrix, i.e. A = I + N where N is nilpotent (i.e. N^m=0 for some finite m). A typical example of a unipotent matrix is

\displaystyle A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix} (1)

(and indeed, by the Jordan canonical form, all unipotent matrices are similar to direct sums of matrices of this type). For unipotent matrices, the binomial formula terminates after m terms to obtain a polynomial expansion for A^n:

\displaystyle A^n = (I + N)^n

\displaystyle = I + n N + \frac{n(n-1)}{2} N^2 + \ldots + \frac{n \ldots (n-m+2)}{(m-1)!} N^{m-1}.

From this we easily see that, regardless of the choice of initial vector x, the coefficients of x_n are polynomial in n. (Conversely, if the coefficients of x_n are polynomial in n for every x, it is not hard to show that A is unipotent; I’ll leave this as an exercise.) It is instructive to see what is going on at the coefficient level, using the matrix (1) as an example. If we express the orbit x_n in coordinates as x_n = \begin{pmatrix} a_n \\ b_n \\ c_n \end{pmatrix}, then the recurrence x_{n+1} = A x_n becomes

a_{n+1} = a_n + b_n

b_{n+1} = b_n + c_n

c_{n+1} = c_n.

We thus see that the sequence c_n is constant, the sequence b_n grows linearly, and a_n grows quadratically, so the whole orbit x_n has polynomial coefficients. If one views the recurrence x_{n+1} = A x_n as a dynamical system, the polynomial nature of the dynamics are caused by the absence of (both positive and negative) feedback loops: c affects b, and b affects a, but there is no loop in which a component ultimately affects itself, which is the source of exponential growth, exponential decay, and oscillation. Indeed, one can view this absence of feedback loops as a definition of unipotence.

For the purposes of proving a dynamical theorem such as Ratner’s theorem, unipotence is important for several reasons. The lack of exponential growing modes means that the dynamics is not exponentially unstable going forward in time; similarly, the lack of exponentially decaying modes means that the dynamics is not exponentially unstable going backward in time. The lack of oscillation does not improve the stability further, but it does have an important effect on the smoothness of the dynamics. Indeed, because of this lack of oscillation, orbits which are polynomial in nature obey an important dichotomy: either they go to infinity, or they are constant. There is a quantitative version of this statement, known as Bernstein’s inequality: if a polynomial remains bounded over a long interval, then its derivative is necessarily small. (From a Fourier-analytic perspective, being polynomial with low degree is analogous to being “low frequency”; the Fourier-analytic counterpart of Bernstein’s inequality is closely related to the Sobolev inequality, and is extremely useful in PDE. But I digress.) These facts seem to play a fundamental role in all arguments that yield Ratner-type theorems.

— Unipotent actions —

Now that we understand unipotent matrices, let us now understand what it means for the action g: x \mapsto gx of a group element g \in G on a homogeneous space G/\Gamma to be unipotent. By definition, this means that the adjoint action g: X \mapsto g X g^{-1} on the Lie algebra {\mathfrak g} of G is unipotent. By the above discussion, this is the same as saying that the orbit (g^n X g^{-n})_{n \in {\Bbb Z}} always behaves polynomially in n.

This statement can be interpreted via the dynamics on the homogeneous space G/\Gamma. Consider a point x \in G/\Gamma, and look at the orbit (g^n x)_{n \in {\Bbb Z}}. Now let us perturb x infinitesimally in the direction of some Lie algebra element X to create a new point x_\epsilon := (1 + \epsilon X) x, where one should think of \epsilon as being infinitesimally small (or alternatively, one can insert errors of O(\epsilon^2) all over the place). Then the perturbed orbit g^n x_\epsilon at time n is located at

\displaystyle g^n (1 + \epsilon X) x = (1 + \epsilon g^n X g^{-n}) g^n x.

If g is unipotent, we thus see that the two orbits (g^n x)_{n \in {\Bbb Z}} and (g^n x_\epsilon)_{n \in {\Bbb Z}} only diverge polynomially in n, without any oscillation. In particular, we have the dichotomy that two orbits either diverge, or are translates of each other, together with Bernstein-like quantitative formulations of this dichotomy. This dichotomy is a crucial component in the proof of Ratner’s theorem, and explains why we need the group action to be generated by unipotent elements.

— Elliptic, parabolic, and hyperbolic elements of SL_2({\Bbb R})

I have described the distinction between exponential growth/decay, oscillation, and unipotent (polynomial) behaviour. This distinction is particularly easy to visualise geometrically in the context of actions of SL(2,{\Bbb R}) on the (affine) plane. Specifically, let us consider an affine linear recurrence sequence

x_{n+1} := A x_n + b; \quad x_0 := x (2)

where x \in {\Bbb R}^2 is an element of the plane, A \in SL(2,{\Bbb R}) is a special linear transformation (i.e. a 2 \times 2 matrix of determinant 1), and b \in {\Bbb R}^2 is a shift vector. If A-I is invertible, one can eliminate the shift b by translating the orbit x_n, or more specifically making the substitution

y_n := x_n + (A-I)^{-1} b

which simplifies (2) to

y_{n+1} := A y_n; \quad y_0 := x + (A-I)^{-1} b

which allows us to solve for the orbit x_n explicitly as

x_n = A^n (x + (A-I)^{-1} b) - (A-I)^{-1} b.

Of course, we have to analyse things a little differently in the degenerate case that A-I is not invertible, in particular the lower order term b plays a more significant role in this case. Leaving that case aside for the moment, we see from the above formula that the behaviour of the orbit x_n is going to be largely controlled by the spectrum of A. In this case, A will have two (generalised) eigenvalues \lambda, 1/\lambda whose product is 1 (since \det(A)=1) and whose sum is real (since A clearly has real trace). This gives three possibilities:

  1. Elliptic case. Here \lambda = e^{i\theta} is a non-trivial unit phase. Then A is similar (after a real linear transformation) to the rotation matrix R_\theta described earlier, and so the orbit x_n lies along a linear transform of a circle, i.e. the orbit lies along an ellipse.
  2. Hyperbolic case. Here \lambda is real with |\lambda| > 1 or 0 < |\lambda| < 1. In this case A is similar to the diagonal matrix \begin{pmatrix} \lambda & 0 \\ 0 & 1/\lambda \end{pmatrix}, and so by previous discussion we see that the orbit x_n lies along a linear transform of a rectangular hyperbola, i.e. the orbit lies along a general hyperbola.
  3. Parabolic case. This is the boundary case between the elliptic and hyperbolic cases, in which \lambda = 1. Then either A is the identity (in which case x_n travels along a line, or is constant), or else (by the Jordan canonical form) A is similar to the matrix \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}. Applying a linear change of coordinates, we thus see that the affine recurrence x_{n+1} = A x_n + b is equivalent to the 2 \times 2 system

    y_{n+1} = y_n + z_n + c
    z_{n+1} = z_n + d

    for some real constants c, d and some real sequences y_n, z_n. If c, d are non-zero, we see that z_n varies linearly in n and y_n varies quadratically in n, and so (y_n,z_n) lives on a parabola. Undoing the linear change of coordinates, we thus see in this case that the original orbit x_n also lies along a parabola. (If c or d vanish, the orbit lies instead on a line.)

Thus we see that all elements of SL(2,{\Bbb R}) preserve some sort of conic section. The elliptic elements trap their orbits along ellipses, the hyperbolic elements trap their orbits along hyperbolae, and the parabolic elements trap their orbits along parabolae (or along lines, in some degenerate cases). The elliptic elements thus generate oscillation, the hyperbolic elements generate exponential growth and decay, and the parabolic elements are unipotent and generate polynomial growth. (If one interprets elements of SL(2,{\Bbb R}) as area-preserving linear or affine transformations, then elliptic elements are rotations around some origin (and in some coordinate system), hyperbolic elements are compressions along one axis and dilations along another, and parabolic elements are shear transformations and translations.)

[It is curious that every element of SL(2,{\Bbb R}) preserves at least one non-trivial quadratic form; this statement is highly false in higher dimensions (consider for instance what happens to diagonal matrices). I don’t have a “natural” explanation of this fact – some sort of fixed point theorem at work, perhaps? I can cobble together a proof using the observations that (a) every matrix in SL(2,{\Bbb R}) is similar to its inverse, (b) the space of quadratic forms on {\Bbb R}^2 is odd-dimensional, (c) any linear transformation on an odd-dimensional vector space which is similar to its inverse has at least one eigenvalue equal to \pm 1, (d) the action of a non-degenerate linear transformation on quadratic forms preserves positive definiteness, and thus cannot have negative eigenvalues, but this argument seems rather ad hoc to me.]

One can view the parabolic elements of SL(2,{\Bbb R)} as the limit of elliptic or hyperbolic ones in a number of ways. For instance, the matrix \begin{pmatrix} 1 & 1 \\ \epsilon & 1 \end{pmatrix} is hyperbolic when \epsilon > 0, parabolic when \epsilon = 0, and elliptic when \epsilon < 0. This is related to how the hyperbola, parabola, and ellipse emerge as sections of the light cone. Another way to obtain the parabola a limit is to view that parabola as an infinitely large ellipse (or hyperbola), with centre infinitely far away. For instance, the ellipse of vertical radius R and horizontal radius \sqrt{R} centred at (0,R) is given by the equation \frac{x^2}{R} + \frac{(y-R)^2}{R^2} = 1, which can be rearranged as y = \frac{1}{2} x^2 + \frac{1}{2R} y^2. In the limit R \to \infty, this ellipse becomes the parabola y = \frac{1}{2} x^2, and rotations associated with those ellipses can converge to parabolic affine maps of the type described above. A similar construction allows one to view the parabola as a limit of hyperbolae; incidentally, one can use (the Fourier transform of) this limit to show (formally, at least) that the Schrödinger equation emerges as the non-relativistic limit of the Klein-Gordon equation.

— The Lorentz group —

Every non-degenerate quadratic form Q on d variables comes with its own symmetry group SO(Q) \leq SL(d,{\Bbb R}), defined as the group of special linear transformations which preserve Q. (Note that Q determines a translation-invariant pseudo-Riemannian metric, and thus a Haar measure; so any transformation which preserves Q must be volume-preserving and thus have a determinant of \pm 1. So the requirement that the linear transformation be special is not terribly onerous.) Equivalently, SO(Q) is the space of special linear transformations which preserve each of the level sets \{ x \in {\Bbb R}^d: Q(x) = \hbox{const} \} (which, by definition, is a quadric surface).

A non-degenerate quadratic form can always be diagonalised (e.g. by applying the Gram-Schmidt orthogonalisation process), and so after a linear change of coordinates one can express Q as

Q(x_1,\ldots,x_d) = x_1^2 + \ldots + x_r^2 - x_{r+1}^2 - \ldots - x_d^2

for some 0 \leq r \leq d. The pair (r,d-r) is the signature of Q, and SO(Q) is isomorphic to the group SO(r,d-r). The signature is an invariant of Q; this is Sylvester’s law of inertia.

In the Euclidean (i.e. definite) case r=d (or r=0), the level sets of Q are spheres (in diagonalised form) or ellipsoids (in general), and so the orbits of elements in SO(Q) \cong SO(d) stay trapped on spheres or ellipsoids. Thus their orbits cannot exhibit exponential growth or decay, or polynomial behaviour; they must instead oscillate, much like the elliptic elements of SL(2,{\Bbb R}). In particular, SO(Q) does not contain any non-trivial unipotent elements.

In the indefinite case d=2, r=1, the level sets of Q are hyperbolae (as well as the light cone \{ (x_1,x_2): x_1^2 - x_2^2 = 0 \}, which in two dimensions is just a pair of intersecting lines). It is then geometrically clear that most elements of SO(Q) \cong SO(1,1) are going to be hyperbolic, as their orbits will typically escape to infinity along hyperbolae. (The only exceptions are the identity and the negative identity.) Elements of SO(1,1) are also known as Lorentz boosts. (More generally, SO(d,1) (or SO(1,d)) is the structure group for special relativity in d-1 space and 1 time dimensions.)

Now we turn to the case of interest, namely d=3 and Q indefinite, thus r=1 or r=2. By changing the sign of Q if necessary we may take r=1, and after diagonalising we can write

Q(x_1, x_2, x_3) = x_1^2 + x_2^2 - x_3^2.

The level sets of Q are mostly hyperboloids, together with the light cone \{ (x_1,x_2,x_3): x_1^2 + x_2^2 - x_3^2 = 0 \}. So a typical element of SO(Q) \cong SO(2,1) will have orbits that are trapped inside light cones or on hyperboloids.

In general, these orbits will wander in some complicated fashion over such a cone or hyperboloid. But for some special elements of SO(Q), the orbit is contained in a smaller variety. For instance, consider a Euclidean rotation around the x_3 axis by some angle \theta. This clearly preserves Q, and the orbits of this rotation lie on horizontal circles, which are of course each contained in a hyperboloid or light cone. So we see that SO(Q) contains elliptical elements, and this is “because” we can get ellipses as sections of hyperboloids and cones, by slicing them with spacelike planes.

Similarly, if one considers a Lorentz boost in the x_1, x_2 directions, we also preserve Q, and the orbits of this rotation lie on vertical hyperbolae (or on a one-dimensional light cone). So we see that SO(Q) contains hyperbolic elements, which is “because” we can get hyperbolae as sections of hyperbolae and cones, by slicing them with timelike planes.

So, to get unipotent elements of SO(Q), it is clear what we should do: we should exploit the fact that parabolae are also sections of hyperboloids and cones, obtained by slicing these surfaces along null planes. For instance, if we slice the hyperboloid \{ (x_1,x_2,x_3): x_1^2 + x_2^2 - x_3^2 = 1\} with the null plane \{ (x_1,x_2,x_3): x_3 = x_2 + 1 \} we obtain the parabola \{ (x_1, x_3-1, x_3): 2x_3 = x_1^2 \}. A small amount of calculation then lets us find a linear transformation which preserves both the hyperboloid and the null plane (and thus preserves Q and preserves the parabola); indeed, if we introduce null coordinates (y_1,y_2,y_3) := (x_1, x_3-x_2, x_3+x_2), then the hyperboloid and null plane are given by the equations y_1^2 = y_2 y_3 + 1 and y_2 = 1 respectively; a little bit of algebra shows that the linear transformations (y_1,y_2,y_3) \mapsto (y_1+ay_2, y_2, y_3 + 2a y_1 + a^2 y_2) will preserve both surfaces for any constant a. This provides a one-parameter family (a parabolic subgroup, in fact) of unipotent elements (known as null rotations) in SO(Q). By rotating the null plane around we can get many such one-parameter families, whose orbits trace out all sorts of parabolae, and it is not too hard at this point to show that the unipotent elements can in fact be used to generate all of SO(Q) (or SO(Q)^+).

[Incidentally, the fact that the parabola is a section of a cone or hyperboloid of one higher dimension allows one (via the Fourier transform) to embed solutions to the free Schrödinger equation as solutions to the wave or Klein-Gordon equations of one higher dimension; this trick allows one, for instance, to derive the conservation laws of the former from those of the latter. See for instance Exercises 2.11, 3.2, and 3.30 of my book on dispersive PDE.]

[Update, Oct 6: Typos corrected.]