In the last few weeks, the Great Internet Mersenne Prime Search (GIMPS) announced the discovery of two new Mersenne primes, both over ten million digits in length, including one discovered by the computing team right here at UCLA (see this page for more information).  [I was not involved in this computing effort.]  As for the question “Why do we want to find such big primes anyway?”, see this page, though this is not the focus of my post today.

The GIMPS approach to finding Mersenne primes relies of course on modern computing power, parallelisation, and efficient programming, but the number-theoretic heart of it – aside from some basic optimisation tricks such as fast multiplication and preliminary sieving to eliminate some obviously non-prime Mersenne number candidates – is the Lucas-Lehmer primality test for Mersenne numbers, which is much faster for this special type of number than any known general-purpose (deterministic) primality test (such as, say, the AKS test).  This test is easy enough to describe, and I will do so later in this post, and also has some short elementary proofs of correctness; but the proofs are sometimes presented in a way that involves pulling a lot of rabbits out of hats, giving the argument a magical feel rather than a natural one.  In this post, I will try to explain the basic ideas that make the primality test work, seeking a proof which is perhaps less elementary and a little longer than some of the proofs in the literature, but is perhaps a bit better motivated.

— Order —

We begin with a general discussion of how to tell when a given number n (which is not necessarily of the Mersenne form n = 2^m-1) is prime or not.  One should think of n as being moderately large, e.g. n \sim 10^{10^7} (which is broadly the size of the Mersenne primes discovered recently).

Our starting point will be Lagrange’s theorem, which asserts that

a^{|G|} = 1 (1)

for any finite group G and any a \in G, thus the order \hbox{ord}_G(a) of a in G divides |G|.  Specialising this to the multiplicative group {\Bbb F}_p^\times of a finite field of prime order p, we obtain Fermat’s little theorem

a^{p-1} = 1 \hbox{ mod } p (2)

for p prime and a coprime to p; applying it instead to the multiplicative group ({\Bbb Z}/n{\Bbb Z})^\times of a cyclic group of order n, we obtain Euler’s theorem

a^{\phi(n)} = 1 \hbox{ mod } n (3)

whenever a is coprime to n, where \phi(n) := |({\Bbb Z}/n{\Bbb Z})^\times| is the Euler totient function of n.

Fermat’s little theorem (2) already gives a necessary condition for the primality of a candidate prime n: take any a coprime to n (typically one picks a small number such as a=2 or a=3), and compute a^{n-1} modulo n.  If it is not equal to 1, then n cannot be prime.  This is a (barely) feasible test to execute for n as large as 10^{10^7}, because one can compute exponents such as a^{n-1} relatively quickly, by the trick of repeatedly squaring a modulo n to obtain a, a^2, a^{2^2}, a^{2^3}, \ldots \hbox{ mod } n, and then decomposing n-1 into binary to compute a^{n-1}.  (If n is a Mersenne number, then there are some pretty obvious shortcuts one can take for this last step, using division instead of multiplication.)  Unfortunately, while this test is necessary for primality, it is not sufficient, due to the existence of pseudoprimes.  So Fermat’s little theorem alone does not provide the answer, at least if one wants a deterministic certificate of primality rather than a probabilistic one.

Nevertheless, the above facts do provide some important information about the order \hbox{ord}_n(a) := \hbox{ord}_{({\Bbb Z}/n{\Bbb Z})^*}(a) of a modulo n.  If n is prime, Fermat’s little theorem (2) tells us that \hbox{ord}_n(a) is at most n-1 (in fact, it divides n-1).  On the other hand, if n is not prime, then Euler’s theorem (3) tells us that \hbox{ord}_n(a) cannot be as large as n-1, but is instead at most \phi(n), which is now strictly less than n-1.  Thus: if we can find a number a coprime to n such that \hbox{ord}_n(a) is exactly n-1, we have certified that n is prime.

Testing whether a given number a is coprime to n is very easy and fast (thanks to the Euclidean algorithm). Unfortunately, it is difficult in general to compute the order \hbox{ord}_n(a) of a number a if the base n is large; a brute-force approach would require one to compute up to n-1 powers of a, which is prohibitively expensive if n has size comparable to 10^{10^7}.  (More generally, the problem of computing order exactly in general is closely related to the discrete logarithm problem, which is notoriously difficult.) However, there are a few cases in which the order can be found very quickly.  Suppose that we somehow find positive integers a, k such that

a^{2^k} = - 1 \hbox{ mod } n (4)

(which in particular implies that a is coprime to n).  Squaring this, we obtain

a^{2^{k+1}} = 1 \hbox{ mod } n

and so we see that \hbox{ord}_n(a) divides 2^{k+1} but not 2^k, and thus must be exactly equal to 2^{k+1}.   Conversely, if \hbox{ord}_n(a) = 2^{k+1}, then we must have (4).  So in the special case when the order of a is a power of 2, we can compute the order using only one exponentiation, which is computationally feasible for the orders of magnitude we are considering.

Unfortunately, this is not quite what we need for the Mersenne prime problem, because if n is a Mersenne number, then it is n plus 1 which is a power of 2, rather than n minus 1.  (The above method would be ideal for finding Fermat primes rather than Mersenne primes, but it is likely that in fact there are no more such primes to be found.)

So this is a frustrating near miss: if n is a Mersenne number, we can easily check if a number has order n+1 modulo n, but we needed a test for when a number has order n-1 instead.  And indeed, even when n is prime, Fermat’s little theorem (2) shows that it is impossible for a number to have order n+1 modulo n, since the order needs to divide n-1.  So we seem to be a bit stuck.

But while n+1 clearly does not divide n-1, it does divide n^2-1.  Looking at Lagrange’s theorem (1), we then see that it could be possible to find elements of order n+1 in a multiplicative group of order n^2-1 rather than n-1.  Recall that if n was prime, then the multiplicative group {\Bbb F}_n^\times of the finite field {\Bbb F}_n had order n-1.  But {\Bbb F}_{n^2} is also a finite field, and its multiplicative group {\Bbb F}_{n^2}^\times has order n^2-1.  Aha!

So the plan (assuming for sake of argument that n is prime) is to somehow work in the finite field {\Bbb F}_{n^2} instead of {\Bbb F}_n, in order to find elements of order n+1.  We can get our hands on this larger finite field more concretely by viewing it as a quadratic extension {\Bbb F}_n[\sqrt{a}], where a is a quadratic non-residue of n.

Let’s now take n to be a Mersenne prime.  What numbers are quadratic non-residues?  A quick appeal to quadratic reciprocity and some elementary number theory soon reveals that 2 is a quadratic residue of n, but that 3 is not.  Thus we can take {\Bbb F}_{n^2} \equiv {\Bbb F}_n[\sqrt{3}].  (One could work with other numbers than 3 here, but being the smallest quadratic non-residue available, it is the simplest one to use, and the one which is most likely to be able to take advantage of the strong law of small numbers.)  Henceforth all calculations will be in this field {\Bbb F}_n[\sqrt{3}], which of course contains {\Bbb F}_n as a subfield.

Now we need to look for a field element a of order n+1, which is a power of 2.  Thus (by adapting (4) to {\Bbb F}_n[\sqrt{3}]) we need to find a solution to the equation

a^{(n+1)/2} = -1 (5)

in this field.

Let’s compute some expressions of the form a^{(n+1)/2}.  From Fermat’s little theorem (2) we have

3^{n-1} = 1;

because 3 is not a quadratic residue, we see (from taking discrete logarithms) that

3^{(n-1)/2} = -1 (6)

and thus

3^{(n+1)/2} = -3.

Similarly we have

2^{(n+1)/2} = +2 (7).

These are pretty close to (5), but not quite right.  To go further, it is convenient to work with n^{th} powers rather than ((n+1)/2)^{th} powers – i.e. we work with the Frobenius automorphism x \mapsto x^n.  Indeed, since {\Bbb F}_n[\sqrt{3}] has characteristic n, we have the automorphism properties

(x+y)^n = x^n + y^n; (xy)^n = x^n y^n. (8)

From (6) we have (\sqrt{3})^n = - \sqrt{3}, while from (2) we have a^n = a for a \in {\Bbb F}_n.  From (8) we thus see that

(a + b \sqrt{3})^n = a - b \sqrt{3}

for a, b \in {\Bbb F_n}; thus the Frobenius automorphism is nothing more than Galois conjugation.  (Actually, this can be deduced quite readily from standard Galois theory.)

Now we go back from n^{th} powers to ((n+1)/2)^{th} powers.  Multiplying both sides of the preceding equation by a + b \sqrt{3}, we obtain

(a+b \sqrt{3})^{n+1} = a^2 - 3b^2.

Squaring a+b\sqrt{3}, we conclude

(a^2+3b^2 + 2ab\sqrt{3})^{(n+1)/2} = a^2-3b^2.

Now we in a good position to solve the equation (5).  We cannot make a^2 - 3b^2 equal to -1 – since -1 is not a quadratic residue modulo 3 – but we can make it equal to, say, -2, by setting a=1 and b=-1 (say):

(4 + 2 \sqrt{3})^{(n+1)/2} = -2.

Dividing this by (7) we obtain the desired solution

\omega^{(n+1)/2} = -1 (9)

to (5), where \omega := 2 + \sqrt{3}. (Note that one could also use \omega^{-1} = 2-\sqrt{3} here; indeed, Galois theory tells us that +\sqrt{3} and -\sqrt{3} are interchangeable in these computations.)

To summarise, we have shown that

If n is a Mersenne prime, then (9) holds.

Based on our previous discussion, we expect to be able to reverse this implication.  Indeed, we have the following converse:

Lemma 1. Let n be a Mersenne number.  If (9) holds (in the ring ({\Bbb Z}/n{\Bbb Z})[\sqrt{3}]), then n is prime.

Proof. Let q be a prime divisor of n.  Then \omega^{(n+1)/2}=-1 in the field {\Bbb F}_q[\sqrt{3}] (which we define as {\Bbb F}_q if 3 is a quadratic residue there), thus \omega has order exactly n+1 (cf. (4)).  By Lagrange’s theorem (1), this means that n+1 divides the multiplicative order of {\Bbb F}_q[\sqrt{3}]^\times, which is q^2-1 (if 3 is a non-residue modulo q) or q-1 (if 3 is a residue modulo q).  In particular, q has to exceed \sqrt{n}.  Thus the only prime divisors of n exceed \sqrt{n}, and so by the sieve of Eratosthenes, n is prime. (This proof is due to Bruce.) \Box

We have thus shown

Corollary 1. (Lucas-Lehmer test, preliminary version)  Let n = 2^m-1 with m odd.  Then n is prime if and only if (9) holds in {\Bbb Z}/n{\Bbb Z}[\sqrt{3}].

This is already a reasonable criterion, but it is a little non-elementary (and also a little unpleasant numerically) due to the presence of the quadratic extension by \sqrt{3}.  One can get rid of this extension by the Galois theory trick of taking traces.  Indeed, observe that \omega^{-1} = 2 - \sqrt{3} is the Galois conjugate of \omega.  Basic Galois theory tells us that \omega^{(n+1)/4} + \omega^{-(n+1)/4} lies in {\Bbb Z}/n{\Bbb Z}, and vanishes precisely when \omega^{(n+1)/2} is equal to -1.  So it suffices to show that

\omega^{(n+1)/4} + \omega^{-(n+1)/4} = \omega^{2^{m-2}} + \omega^{-2^{m-2}} =: S_{m-2}

vanishes in {\Bbb Z}/n{\Bbb Z}.

The quantity \omega^{(n+1)/4} = \omega^{2^{m-2}} could be computed by repeated squaring in {\Bbb Z}/n{\Bbb Z}[\sqrt{3}].  The quantity S_{m-2} can be computed by a similar device in {\Bbb Z}/n{\Bbb Z}.  Indeed, the sequence S_j := \omega^{2^j} + \omega^{-2^j} is easily seen to obey the recursion

S_j = S_{j-1}^2 - 2; \quad S_0 = 4 (10)

and so we have

Theorem 1. (Lucas-Lehmer test, final version) Let n = 2^m-1 with m odd.  Then n is prime if and only if S_{m-2} vanishes modulo n, where S_{m-2} is given by the recursion (10).

To apply this test, one needs to perform about m squaring operations modulo n.  Doing everything as efficiently as possible (in particular, using fast multiplication), the total cost of testing a single Mersenne number n = 2^m-1 for primality is about O(m^2) (modulo some \log m terms).  This turns out to barely be within reach of modern computers for m \sim 10^7, especially since the algorithm is somewhat parallelisable.  (In contrast, the best known general-purpose deterministic primality testing algorithm, the AKS algorithm, has a run time of about O(m^6) (with a sizable implicit constant), which is not feasible for m \sim 10^7.)  There are general-purpose probabilistic tests (such as Miller-Rabin) which have run-time comparable to the Lucas-Lehmer test, but as mentioned at the beginning, we are only interested here in deterministic (and unconditional, e.g. not relying on GRH) certificates of primality.

[Updated, Nov 16: Lemma 1 corrected.]