In my discussion of the Oppenheim conjecture in my recent post on Ratner’s theorems, I mentioned in passing the simple but crucial fact that the (orthochronous) special orthogonal group of an indefinite quadratic form on
can be generated by unipotent elements. This is not a difficult fact to prove, as one can simply diagonalise Q and then explicitly write down some unipotent elements (the magic words here are “null rotations“). But this is a purely algebraic approach; I thought it would also be instructive to show the geometric (or dynamic) reason for why unipotent elements appear in the orthogonal group of indefinite quadratic forms in three dimensions. (I’ll give away the punch line right away: it’s because the parabola is a conic section.) This is not a particularly deep or significant observation, and will not be surprising to the experts, but I would like to record it anyway, as it allows me to review some useful bits and pieces of elementary linear algebra.
— Unipotent matrices —
Before we get to unipotent elements of a group, let us first understand geometrically what a unipotent matrix (or linear transformation) A is. Suppose we consider an orbit of some initial vector x with respect to this transformation A (thus
is a linear recurrence sequence). How does
behave geometrically as
?
Despite the simple and explicit description of the orbit, the geometric behaviour can be rather complicated, depending crucially on the spectrum of A (and, to a lesser extent, on the choice of
). If for instance A has an eigenvalue
with
, and
is an eigenvector of A with eigenvalue
, then we will of course have
, thus this orbit will grow exponentially. Similarly, if one has an eigenvalue between 0 and 1, then it is possible for the orbit to decay exponentially.
If one has eigenvalues with a complex phase, one can have oscillation. If for instance A is the rotation matrix corresponding to anticlockwise rotation around the origin by some non-trivial angle
(and which has complex eigenvalues
and
), and (say)
, then the orbit
will oscillate around the unit circle indefinitely.
If an eigenvalue has non-trivial magnitude and non-trivial phase, one gets a combination of exponential growth or decay and oscillation, leading for instance to orbits which follow a logarithmic spiral (this will be the case for instance if for some rotation matrix
and some dilation factor
).
One can have even more complicated behaviour if there are multiple eigenvalues in play. Consider for instance the matrix with
, with the initial vector
with both
and
non-zero (so that x has a non-trivial presence in both the unstable and stable modes of A). Then the orbit
will expand exponentially in the unstable mode and contract exponentially in the stable mode, and the orbit will lie along the rectangular hyperbola
.
As the above examples show, orbits of linear transformations can exhibit a variety of behaviours, from exponential growth to exponential decay to oscillation to some combination of all three. But there is one special case in which the behaviour is much simpler, namely that the orbit remains polynomial. This occurs when A is a unipotent matrix, i.e. A = I + N where N is nilpotent (i.e. for some finite m). A typical example of a unipotent matrix is
(1)
(and indeed, by the Jordan canonical form, all unipotent matrices are similar to direct sums of matrices of this type). For unipotent matrices, the binomial formula terminates after m terms to obtain a polynomial expansion for :
From this we easily see that, regardless of the choice of initial vector x, the coefficients of are polynomial in n. (Conversely, if the coefficients of
are polynomial in n for every x, it is not hard to show that A is unipotent; I’ll leave this as an exercise.) It is instructive to see what is going on at the coefficient level, using the matrix (1) as an example. If we express the orbit
in coordinates as
, then the recurrence
becomes
.
We thus see that the sequence is constant, the sequence
grows linearly, and
grows quadratically, so the whole orbit
has polynomial coefficients. If one views the recurrence
as a dynamical system, the polynomial nature of the dynamics are caused by the absence of (both positive and negative) feedback loops: c affects b, and b affects a, but there is no loop in which a component ultimately affects itself, which is the source of exponential growth, exponential decay, and oscillation. Indeed, one can view this absence of feedback loops as a definition of unipotence.
For the purposes of proving a dynamical theorem such as Ratner’s theorem, unipotence is important for several reasons. The lack of exponential growing modes means that the dynamics is not exponentially unstable going forward in time; similarly, the lack of exponentially decaying modes means that the dynamics is not exponentially unstable going backward in time. The lack of oscillation does not improve the stability further, but it does have an important effect on the smoothness of the dynamics. Indeed, because of this lack of oscillation, orbits which are polynomial in nature obey an important dichotomy: either they go to infinity, or they are constant. There is a quantitative version of this statement, known as Bernstein’s inequality: if a polynomial remains bounded over a long interval, then its derivative is necessarily small. (From a Fourier-analytic perspective, being polynomial with low degree is analogous to being “low frequency”; the Fourier-analytic counterpart of Bernstein’s inequality is closely related to the Sobolev inequality, and is extremely useful in PDE. But I digress.) These facts seem to play a fundamental role in all arguments that yield Ratner-type theorems.
— Unipotent actions —
Now that we understand unipotent matrices, let us now understand what it means for the action of a group element
on a homogeneous space
to be unipotent. By definition, this means that the adjoint action
on the Lie algebra
of G is unipotent. By the above discussion, this is the same as saying that the orbit
always behaves polynomially in n.
This statement can be interpreted via the dynamics on the homogeneous space . Consider a point
, and look at the orbit
. Now let us perturb x infinitesimally in the direction of some Lie algebra element X to create a new point
, where one should think of
as being infinitesimally small (or alternatively, one can insert errors of
all over the place). Then the perturbed orbit
at time n is located at
.
If g is unipotent, we thus see that the two orbits and
only diverge polynomially in n, without any oscillation. In particular, we have the dichotomy that two orbits either diverge, or are translates of each other, together with Bernstein-like quantitative formulations of this dichotomy. This dichotomy is a crucial component in the proof of Ratner’s theorem, and explains why we need the group action to be generated by unipotent elements.
— Elliptic, parabolic, and hyperbolic elements of —
I have described the distinction between exponential growth/decay, oscillation, and unipotent (polynomial) behaviour. This distinction is particularly easy to visualise geometrically in the context of actions of on the (affine) plane. Specifically, let us consider an affine linear recurrence sequence
(2)
where is an element of the plane,
is a special linear transformation (i.e. a
matrix of determinant 1), and
is a shift vector. If
is invertible, one can eliminate the shift b by translating the orbit
, or more specifically making the substitution
which simplifies (2) to
which allows us to solve for the orbit explicitly as
Of course, we have to analyse things a little differently in the degenerate case that is not invertible, in particular the lower order term b plays a more significant role in this case. Leaving that case aside for the moment, we see from the above formula that the behaviour of the orbit
is going to be largely controlled by the spectrum of A. In this case, A will have two (generalised) eigenvalues
whose product is 1 (since
) and whose sum is real (since A clearly has real trace). This gives three possibilities:
- Elliptic case. Here
is a non-trivial unit phase. Then A is similar (after a real linear transformation) to the rotation matrix
described earlier, and so the orbit
lies along a linear transform of a circle, i.e. the orbit lies along an ellipse.
- Hyperbolic case. Here
is real with
or
. In this case A is similar to the diagonal matrix
, and so by previous discussion we see that the orbit
lies along a linear transform of a rectangular hyperbola, i.e. the orbit lies along a general hyperbola.
- Parabolic case. This is the boundary case between the elliptic and hyperbolic cases, in which
. Then either A is the identity (in which case
travels along a line, or is constant), or else (by the Jordan canonical form) A is similar to the matrix
. Applying a linear change of coordinates, we thus see that the affine recurrence
is equivalent to the
system
for some real constants c, d and some real sequences
. If c, d are non-zero, we see that
varies linearly in n and
varies quadratically in n, and so
lives on a parabola. Undoing the linear change of coordinates, we thus see in this case that the original orbit
also lies along a parabola. (If c or d vanish, the orbit lies instead on a line.)
Thus we see that all elements of preserve some sort of conic section. The elliptic elements trap their orbits along ellipses, the hyperbolic elements trap their orbits along hyperbolae, and the parabolic elements trap their orbits along parabolae (or along lines, in some degenerate cases). The elliptic elements thus generate oscillation, the hyperbolic elements generate exponential growth and decay, and the parabolic elements are unipotent and generate polynomial growth. (If one interprets elements of
as area-preserving linear or affine transformations, then elliptic elements are rotations around some origin (and in some coordinate system), hyperbolic elements are compressions along one axis and dilations along another, and parabolic elements are shear transformations and translations.)
[It is curious that every element of preserves at least one non-trivial quadratic form; this statement is highly false in higher dimensions (consider for instance what happens to diagonal matrices). I don’t have a “natural” explanation of this fact – some sort of fixed point theorem at work, perhaps? I can cobble together a proof using the observations that (a) every matrix in
is similar to its inverse, (b) the space of quadratic forms on
is odd-dimensional, (c) any linear transformation on an odd-dimensional vector space which is similar to its inverse has at least one eigenvalue equal to
, (d) the action of a non-degenerate linear transformation on quadratic forms preserves positive definiteness, and thus cannot have negative eigenvalues, but this argument seems rather ad hoc to me.]
One can view the parabolic elements of as the limit of elliptic or hyperbolic ones in a number of ways. For instance, the matrix
is hyperbolic when
, parabolic when
, and elliptic when
. This is related to how the hyperbola, parabola, and ellipse emerge as sections of the light cone. Another way to obtain the parabola a limit is to view that parabola as an infinitely large ellipse (or hyperbola), with centre infinitely far away. For instance, the ellipse of vertical radius R and horizontal radius
centred at (0,R) is given by the equation
, which can be rearranged as
. In the limit
, this ellipse becomes the parabola
, and rotations associated with those ellipses can converge to parabolic affine maps of the type described above. A similar construction allows one to view the parabola as a limit of hyperbolae; incidentally, one can use (the Fourier transform of) this limit to show (formally, at least) that the Schrödinger equation emerges as the non-relativistic limit of the Klein-Gordon equation.
— The Lorentz group —
Every non-degenerate quadratic form Q on d variables comes with its own symmetry group , defined as the group of special linear transformations which preserve Q. (Note that Q determines a translation-invariant pseudo-Riemannian metric, and thus a Haar measure; so any transformation which preserves Q must be volume-preserving and thus have a determinant of
. So the requirement that the linear transformation be special is not terribly onerous.) Equivalently, SO(Q) is the space of special linear transformations which preserve each of the level sets
(which, by definition, is a quadric surface).
A non-degenerate quadratic form can always be diagonalised (e.g. by applying the Gram-Schmidt orthogonalisation process), and so after a linear change of coordinates one can express Q as
for some . The pair
is the signature of Q, and SO(Q) is isomorphic to the group SO(r,d-r). The signature is an invariant of Q; this is Sylvester’s law of inertia.
In the Euclidean (i.e. definite) case r=d (or r=0), the level sets of Q are spheres (in diagonalised form) or ellipsoids (in general), and so the orbits of elements in stay trapped on spheres or ellipsoids. Thus their orbits cannot exhibit exponential growth or decay, or polynomial behaviour; they must instead oscillate, much like the elliptic elements of
. In particular, SO(Q) does not contain any non-trivial unipotent elements.
In the indefinite case d=2, r=1, the level sets of Q are hyperbolae (as well as the light cone , which in two dimensions is just a pair of intersecting lines). It is then geometrically clear that most elements of
are going to be hyperbolic, as their orbits will typically escape to infinity along hyperbolae. (The only exceptions are the identity and the negative identity.) Elements of SO(1,1) are also known as Lorentz boosts. (More generally, SO(d,1) (or SO(1,d)) is the structure group for special relativity in d-1 space and 1 time dimensions.)
Now we turn to the case of interest, namely d=3 and Q indefinite, thus r=1 or r=2. By changing the sign of Q if necessary we may take r=1, and after diagonalising we can write
The level sets of Q are mostly hyperboloids, together with the light cone . So a typical element of
will have orbits that are trapped inside light cones or on hyperboloids.
In general, these orbits will wander in some complicated fashion over such a cone or hyperboloid. But for some special elements of SO(Q), the orbit is contained in a smaller variety. For instance, consider a Euclidean rotation around the axis by some angle
. This clearly preserves Q, and the orbits of this rotation lie on horizontal circles, which are of course each contained in a hyperboloid or light cone. So we see that SO(Q) contains elliptical elements, and this is “because” we can get ellipses as sections of hyperboloids and cones, by slicing them with spacelike planes.
Similarly, if one considers a Lorentz boost in the directions, we also preserve Q, and the orbits of this rotation lie on vertical hyperbolae (or on a one-dimensional light cone). So we see that SO(Q) contains hyperbolic elements, which is “because” we can get hyperbolae as sections of hyperbolae and cones, by slicing them with timelike planes.
So, to get unipotent elements of SO(Q), it is clear what we should do: we should exploit the fact that parabolae are also sections of hyperboloids and cones, obtained by slicing these surfaces along null planes. For instance, if we slice the hyperboloid with the null plane
we obtain the parabola
. A small amount of calculation then lets us find a linear transformation which preserves both the hyperboloid and the null plane (and thus preserves Q and preserves the parabola); indeed, if we introduce null coordinates
, then the hyperboloid and null plane are given by the equations
and
respectively; a little bit of algebra shows that the linear transformations
will preserve both surfaces for any constant a. This provides a one-parameter family (a parabolic subgroup, in fact) of unipotent elements (known as null rotations) in SO(Q). By rotating the null plane around we can get many such one-parameter families, whose orbits trace out all sorts of parabolae, and it is not too hard at this point to show that the unipotent elements can in fact be used to generate all of SO(Q) (or
).
[Incidentally, the fact that the parabola is a section of a cone or hyperboloid of one higher dimension allows one (via the Fourier transform) to embed solutions to the free Schrödinger equation as solutions to the wave or Klein-Gordon equations of one higher dimension; this trick allows one, for instance, to derive the conservation laws of the former from those of the latter. See for instance Exercises 2.11, 3.2, and 3.30 of my book on dispersive PDE.]
[Update, Oct 6: Typos corrected.]
8 comments
Comments feed for this article
6 October, 2007 at 1:14 am
Emmanuel Kowalski
A small typo: the matrix depending on
(before the section on Lorentz group) has a
missing in the right-hand corner.
6 October, 2007 at 2:32 am
Attila Smith
Dear Terence,
on line 14 of the first section (“Unipotent matrices”) when you define A=R_(theta), I think you might have to exchange the signs preceding the two occurrences of sin(theta) on the antidiagonal of your matrix.
Your devoted,
Attila
6 October, 2007 at 8:46 am
Terence Tao
Thanks for the corrections!
8 October, 2007 at 6:19 am
Genghis Khan
I envy you guys who can handle numbers and make them jump hoops. I can’t.
8 October, 2007 at 10:20 am
Mike R.
Great article, Terence. Classifying orbits on homogeneous spaces (for G, an exceptional group) arises in the study of black holes in supergravity. This is one of the topics covered in Phys 261 this fall. The class starts this Thursday (Oct. 11th) in PAB 4-330. Feel free to stop by.
8 October, 2007 at 3:39 pm
Doug
Hi Terence,
Is there a PDF version of your book referenced above in the last sentence of your incidental note as: “my book on dispersive PDE.”?
I am not able to open a PS file.
8 October, 2007 at 5:20 pm
Terence Tao
Dear Doug: The sample chapters are now in PDF format.
18 October, 2007 at 11:47 am
Doug
I have noticed that some factors of the Monster [Borcherds] may be written in a [psuedo?]-unipotent format?
((19*3*2^5-1)*3^3*2^2)-1=
((19*3*32)-1)*27*4)-1=
196884-1=196883=(71*59*47)
=(72-1)*(60-1)*(48-1)
=(2^3*3^2-1)*(2^2*3*5-1)*(2^4*3-1)