The Agrawal-Kayal-Saxena (AKS) primality test, discovered in 2002, is the first provably deterministic algorithm to determine the primality of a given number with a run time which is guaranteed to be polynomial in the number of digits, thus, given a large number {n}, the algorithm will correctly determine whether that number is prime or not in time {O(\log^{O(1)} n)}. (Many previous primality testing algorithms existed, but they were either probabilistic in nature, had a running time slower than polynomial, or the correctness could not be guaranteed without additional hypotheses such as GRH.)

The AKS test is of some relevance to the polymath project “Finding primes“, so I thought I would sketch the details of the test (and the proof that it works) here. (Of course, full details can be found in the original paper, which is nine pages in length and almost entirely elementary in nature.) It relies on polynomial identities that are true modulo {n} when {n} is prime, but cannot hold for {n} non-prime as they would generate a large number of additional polynomial identities, eventually violating the factor theorem (which asserts that a polynomial identity of degree at most {d} can be obeyed by at most {d} values of the unknown).

Our starting point is Fermat’s little theorem, which asserts that

\displaystyle  a^p = a \hbox{ mod } p \ \ \ \ \ (1)

for every prime {p} and every {a}. This theorem suggests an obvious primality test: to test whether a number {n} is prime, pick a few values of {a} and see whether {a^n = a \hbox{ mod } n}. (Note that {a^n} can be computed in time {O(\log^{O(1)} n)} for any fixed {a} by expressing {n} in binary, and repeatedly squaring {a}.) If the statement {a^n = a \hbox{ mod } n} fails for some {a}, then {n} would be composite. Unfortunately, the converse is not true: there exist non-prime numbers {n}, known as Carmichael numbers, for which {a^n = a \hbox{ mod } n} for all {a} coprime to {n} ({561} is the first example). So Fermat’s little theorem cannot be used, by itself, to establish primality for general {n}, because it is too weak to eliminate all non-prime numbers. (The situation improves though for more special types of {n}, such as Mersenne numbers; see my earlier post on the Lucas-Lehmer test for more discussion.)

However, there is a stronger version of Fermat’s little theorem which does eliminate all non-prime numbers. Specifically, if {p} is prime and {a} is arbitrary, then one has the polynomial identity

\displaystyle  (X+a)^p = X^p + a \hbox{ mod } p \ \ \ \ \ (2)

where {X} is an indeterminate variable. (More formally, we have the identity {(X+a)^p=X^p+a} in the ring {F_p[X]} of polynomials of one variable {X} over the finite field {F_p} of {p} elements.) This identity (a manifestation of the Frobenius endomorphism) clearly implies (1) by setting {X=0}; conversely, one can easily deduce (2) from (1) by expanding out {(X+a)^p} using the binomial theorem and the observation that the binomial coefficients {\binom{p}{i} = \frac{p \cdot \ldots \cdot (p-i+1)}{i!}} are divisible by {p} for all {1 \leq i < p}. Conversely, if

\displaystyle  (X+a)^n = X^n + a \hbox{ mod } n \ \ \ \ \ (3)

(i.e. {(X+a)^n = X^n+a} in {({\mathbb Z}/n{\mathbb Z})[X]}) for some {a} coprime to {n}, then by comparing coefficients using the binomial theorem we see that {\binom{n}{i}} is divisible by {n} for all {1 \leq i < n}. But if {n} is divisible by some smaller prime {p}, then by setting {i} equal to the largest power of {p} that divides {n}, one sees that {\binom{n}{i}} is not divisible by enough powers of {p} to be divisible by {n}, a contradiction. Thus one can use (3) (for a single value of {a} coprime to {n}) to decide whether {n} is prime or not.

Unfortunately, this algorithm, while deterministic, is not polynomial-time, because the polynomial {(X+a)^n} has {n+1} coefficients and will therefore take at least {O(n)} time to compute. However, one can speed up the process by descending to a quotient ring of {({\mathbb Z}/n{\mathbb Z})[X]}, such as {{\mathbb Z}/n{\mathbb Z}[X]/(X^r-1)} for some {r}. Clearly, if the identity {(X+a)^n=X^n+a} holds in {({\mathbb Z}/n{\mathbb Z})[X]}, then it will also hold in {({\mathbb Z}/n{\mathbb Z})[X]/(X^r-1)}, thus

\displaystyle  (X+a)^n = X^n + a \hbox{ mod } n, X^r-1. \ \ \ \ \ (4)

The point of doing this is that (if {r} is not too large) the left-hand side of (4) can now be computed quickly (again by expanding {n} in binary and performing repeated squaring), because all polynomials can be reduced to be of degree less than {r}, rather than being as large as {n}. Indeed, if {r = O( \log^{O(1)} n )}, then one can test (4) in time {O(\log^{O(1)} n)}.

We are not done yet, because it could happen that (4) holds but (3) fails. But we have the following key theorem:

Theorem 1 (AKS theorem) Suppose that for all {1 \leq a, r \leq O( \log^{O(1)} n)}, (4) holds, and {a} is coprime to {n}. Then {n} is either a prime, or a power of a prime.

Of course, coprimality of {a} and {n} can be quickly tested using the Euclidean algorithm, and if coprimality fails then {n} is of course composite. Also, it is easy to quickly test for the property that {n} is a power of an integer (just compute the roots {n^{1/k}} for {1 \leq k \leq \log_2 n}), and such powers are clearly composite. From all this (and (2), one soon sees that theorem gives rise to a deterministic polynomial-time test for primality. One can optimise the powers of {\log n} in the bounds for {a,r} (as is done in the original paper), but we will not do so here to keep the exposition uncluttered.

Actually, we don’t need (4) satisfied for all that many exponents {r} to make the theorem work; just one well-chosen {r} will do. More precisely, we have

Theorem 2 (AKS theorem, key step) Let {r} be coprime to {n}, and such that {n} has order greater than {\log^2_2 n} in the multiplicative group {({\mathbb Z}/r{\mathbb Z})^\times} (i.e. the residues {n^i \hbox{ mod } r} for {1 \leq i \leq \log^2 n} are distinct). Suppose that for all {1 \leq a \leq O( r \log^{O(1)} n)}, (4) holds, and {a} is coprime to {n}. Then {n} is either a prime, or a power of a prime.

To find an {r} with the above properties we have

Lemma 3 (Existence of good {r}) There exists {r = O(\log^{O(1)} n)} coprime to {n}, such that {n} has order greater than {\log^2_2 n} in {({\mathbb Z}/r{\mathbb Z})^\times}.

Proof: For each {1 \leq i \leq \log^2_2 n}, the number {n^i - 1} has at most {O( \log^{O(1)} n)} prime divisors (by the fundamental theorem of arithmetic). If one picks {r} to be the first prime not equal to any of these prime divisors, one obtains the claim. (One can use a crude version of the prime number theorem to get the upper bound on {r}.) \Box

It is clear that Theorem 1 follows from Theorem 2 and Lemma 3, so it suffices now to prove Theorem 2.

Suppose for contradiction that Theorem 2 fails. Then {n} is divisible by some smaller prime {p}, but is not a power of {p}. Since {n} is coprime to all numbers of size {O(\log^{O(1)} n)} we know that {p} is not of polylogarithmic size, thus we may assume {p \geq \log^C n} for any fixed {C}. As {r} is coprime to {n}, we see that {r} is not a multiple of {p} (indeed, one should view {p} as being much larger than {r}).

Let {F} be a field extension of {F_p} by a primitive {r^{th}} root of unity {X}, thus {F = F_p[X]/h(X)} for some factor {h(X)} (in {F_p[X]}) of the {r^{th}} cyclotomic polynomial {\Phi_r(X)}. From the hypothesis (4), we see that

\displaystyle  (X+a)^n = X^n + a

in {F} for all {1 \leq a \leq A}, where {A = O(r \log^{O(1)} n)}. Note that {n} is coprime to every integer less than {A}, and thus {A < p}.

Meanwhile, from (2) one has

\displaystyle  (X+a)^p = X^p + a

in {F} for all such {a}. The two equations give

\displaystyle  (X^p+a)^{n/p} = (X^p)^{n/p} + a.

Note that the {p^{th}} power {X^p} of a primitive {r^{th}} root of unity {X} is again a primitive {r^{th}} root of unity (and conversely, every primitive {r^{th}} root arises in this fashion) and hence we also have

\displaystyle  (X+a)^{n/p} = X^{n/p} + a

in {F} for all {1 \leq a \leq A}.

Inspired by this, we define a key concept: a positive integer {m} is said to be introspective if one has

\displaystyle  (X+a)^m = X^m + a

in {F} for all {1 \leq a \leq A}, or equivalently if {(X+a)^m=\phi_m(X+a)}, where {\phi_m: F \rightarrow F} is the ring homomorphism that sends {X} to {X^m}.

We have just shown that {p, n, n/p} are all introspective; {1} is also trivially introspective. Furthermore, if {m} and {m'} are introspective, it is not hard to see that {mm'} is also introspective. Thus we in fact have a lot of introspective integers: any number of the form {p^i (n/p)^j} for {i,j \geq 0} is introspective.

It turns out in fact that it is not possible to create so many different introspective numbers, basically the presence of so many polynomial identities in the field would eventually violate the factor theorem. To see this, let {{\mathcal G} \subset F^\times} be the multiplicative group generated by the quantities {X+a} for {1 \leq a \leq A}. Observe that {z^m = \phi_m(z)} for all {z \in {\mathcal G}}. We now show that this places incompatible lower and upper bounds on {{\mathcal G}}. We begin with the lower bound:

Proposition 4 (Lower bound on {{\mathcal G}}) {|{\mathcal G}| \geq 2^t}.

Proof: Let {P(X)} be a product of less than {t} of the quantities {X+1,\ldots,X+A} (allowing repetitions), then {P(X)} lies in {{\mathcal G}}. Since {A \geq 2r \geq 2t}, there are certainly at least {2^t} ways to pick such a product. So to establish the proposition it suffices to show that all these products are distinct.

Suppose for contradiction that {P(X)=Q(X)}, where {P,Q} are different products of less than {t} of the {X+1,\ldots,X+A}. Then, for every introspective {m}, {P(X^m)=Q(X^m)} as well (note that {P(X^m) = \phi_m(P(X))}). In particular, this shows that {X^{m_1},\ldots,X^{m_t}} are all roots of the polynomial {P-Q}. But this polynomial has degree less than {t}, and the {X^{m_1},\ldots,X^{m_t}} are distinct by hypothesis, and we obtain the desired contradiction by the factor theorem. \Box

Proposition 5 (Upper bound on {{\mathcal G}}) Suppose that there are exactly {t} residue classes modulo {r} of the form {p^i (n/p)^j \hbox{ mod } r} for {i,j \geq 0}. Then {|{\mathcal G}| \leq n^{\sqrt{t}}}.

Proof: By the pigeonhole principle, we must have a collision

\displaystyle p^i (n/p)^j = p^{i'} (n/p)^{j'} \hbox{ mod } r

for some {0 \leq i,j,i',j' \leq \sqrt{t}} with {(i,j) \neq (i',j')}. Setting {m := p^i (n/p)^j} and {m' := p^{i'} (n/p)^{j'}}, we thus see that there are two distinct introspective numbers {m,m'} of size most {n^{\sqrt{t}}} which are equal modulo {r}. (To ensure that {m,m'} are distinct, we use the hypothesis that {n} is not a power of {p}.) This implies that {\phi_m = \phi_{m'}}, and thus {z^m = z^{m'}} for all {z \in {\mathcal G}}. But the polynomial {z^m - z^{m'}} has degree at most {n^{\sqrt{t}}}, and the claim now follows from the factor theorem. \Box

Since {n} has order greater than {\log^2 n} in {({\mathbb Z}/r{\mathbb Z})^\times}, we see that the number {t} of residue classes {r} of the form {p^i (n/p)^j} is at least {\log^2 n}. But then {2^t > n^{\sqrt{t}}}, and so Propositions 4, 5 are incompatible.

Update, Aug 12: A thorough discussion of the AKS algorithm by Andrew Granville for Bull. Amer. Math. Soc. can be found here.