Let {V} be a quasiprojective variety defined over a finite field {{\bf F}_q}, thus for instance {V} could be an affine variety

\displaystyle  V = \{ x \in {\bf A}^d: P_1(x) = \dots = P_m(x) = 0\} \ \ \ \ \ (1)

where {{\bf A}^d} is {d}-dimensional affine space and {P_1,\dots,P_m: {\bf A}^d \rightarrow {\bf A}} are a finite collection of polynomials with coefficients in {{\bf F}_q}. Then one can define the set {V[{\bf F}_q]} of {{\bf F}_q}-rational points, and more generally the set {V[{\bf F}_{q^n}]} of {{\bf F}_{q^n}}-rational points for any {n \geq 1}, since {{\bf F}_{q^n}} can be viewed as a field extension of {{\bf F}_q}. Thus for instance in the affine case (1) we have

\displaystyle  V[{\bf F}_{q^n}] := \{ x \in {\bf F}_{q^n}^d: P_1(x) = \dots = P_m(x) = 0\}.

The Weil conjectures are concerned with understanding the number

\displaystyle  S_n := |V[{\bf F}_{q^n}]| \ \ \ \ \ (2)

of {{\bf F}_{q^n}}-rational points over a variety {V}. The first of these conjectures was proven by Dwork, and can be phrased as follows.

Theorem 1 (Rationality of the zeta function) Let {V} be a quasiprojective variety defined over a finite field {{\bf F}_q}, and let {S_n} be given by (2). Then there exist a finite number of algebraic integers {\alpha_1,\dots,\alpha_k, \beta_1,\dots,\beta_{k'} \in O_{\overline{{\bf Q}}}} (known as characteristic values of {V}), such that

\displaystyle  S_n = \alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n

for all {n \geq 1}.

After cancelling, we may of course assume that {\alpha_i \neq \beta_j} for any {i=1,\dots,k} and {j=1,\dots,k'}, and then it is easy to see (as we will see below) that the {\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}} become uniquely determined up to permutations of the {\alpha_1,\dots,\alpha_k} and {\beta_1,\dots,\beta_{k'}}. These values are known as the characteristic values of {V}. Since {S_n} is a rational integer (i.e. an element of {{\bf Z}}) rather than merely an algebraic integer (i.e. an element of the ring of integers {O_{\overline{{\bf Q}}}} of the algebraic closure {\overline{{\bf Q}}} of {{\bf Q}}), we conclude from the above-mentioned uniqueness that the set of characteristic values are invariant with respect to the Galois group {Gal(\overline{{\bf Q}} / {\bf Q} )}. To emphasise this Galois invariance, we will not fix a specific embedding {\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}} of the algebraic numbers into the complex field {{\bf C} = {\bf C}_\infty}, but work with all such embeddings simultaneously. (Thus, for instance, {\overline{{\bf Q}}} contains three cube roots of {2}, but which of these is assigned to the complex numbers {2^{1/3}}, {e^{2\pi i/3} 2^{1/3}}, {e^{4\pi i/3} 2^{1/3}} will depend on the choice of embedding {\iota_\infty}.)

An equivalent way of phrasing Dwork’s theorem is that the ({T}-form of the) zeta function

\displaystyle \zeta_V(T) := \exp( \sum_{n=1}^\infty \frac{S_n}{n} T^n )

associated to {V} (which is well defined as a formal power series in {T}, at least) is equal to a rational function of {T} (with the {\alpha_1,\dots,\alpha_k} and {\beta_1,\dots,\beta_{k'}} being the poles and zeroes of {\zeta_V} respectively). Here, we use the formal exponential

\displaystyle  \exp(X) := 1 + X + \frac{X^2}{2!} + \frac{X^3}{3!} + \dots.

Equivalently, the ({s}-form of the) zeta-function {s \mapsto \zeta_V(q^{-s})} is a meromorphic function on the complex numbers {{\bf C}} which is also periodic with period {2\pi i/\log q}, and which has only finitely many poles and zeroes up to this periodicity.

Dwork’s argument relies primarily on {p}-adic analysis – an analogue of complex analysis, but over an algebraically complete (and metrically complete) extension {{\bf C}_p} of the {p}-adic field {{\bf Q}_p}, rather than over the Archimedean complex numbers {{\bf C}}. The argument is quite effective, and in particular gives explicit upper bounds for the number {k+k'} of characteristic values in terms of the complexity of the variety {V}; for instance, in the affine case (1) with {V} of degree {D}, Bombieri used Dwork’s methods (in combination with Deligne’s theorem below) to obtain the bound {k+k' \leq (4D+9)^{2d+1}}, and a subsequent paper of Hooley established the slightly weaker bound {k+k' \leq (11D+11)^{d+m+2}} purely from Dwork’s methods (a similar bound had also been pointed out in unpublished work of Dwork). In particular, one has bounds that are uniform in the field {{\bf F}_q}, which is an important fact for many analytic number theory applications.

These {p}-adic arguments stand in contrast with Deligne’s resolution of the last (and deepest) of the Weil conjectures:

Theorem 2 (Riemann hypothesis) Let {V} be a quasiprojective variety defined over a finite field {{\bf F}_q}, and let {\lambda \in \overline{{\bf Q}}} be a characteristic value of {V}. Then there exists a natural number {w} such that {|\iota_\infty(\lambda)|_\infty = q^{w/2}} for every embedding {\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}, where {| |_\infty} denotes the usual absolute value on the complex numbers {{\bf C} = {\bf C}_\infty}. (Informally: {\lambda} and all of its Galois conjugates have complex magnitude {q^{w/2}}.)

To put it another way that closely resembles the classical Riemann hypothesis, all the zeroes and poles of the {s}-form {s \mapsto \zeta_V(q^{-s})} lie on the critical lines {\{ s \in {\bf C}: \hbox{Re}(s) = \frac{w}{2} \}} for {w=0,1,2,\dots}. (See this previous blog post for further comparison of various instantiations of the Riemann hypothesis.) Whereas Dwork uses {p}-adic analysis, Deligne uses the essentially orthogonal technique of ell-adic cohomology to establish his theorem. However, ell-adic methods can be used (via the Grothendieck-Lefschetz trace formula) to establish rationality, and conversely, in this paper of Kedlaya p-adic methods are used to establish the Riemann hypothesis. As pointed out by Kedlaya, the ell-adic methods are tied to the intrinsic geometry of {V} (such as the structure of sheaves and covers over {V}), while the {p}-adic methods are more tied to the extrinsic geometry of {V} (how {V} sits inside its ambient affine or projective space).

In this post, I would like to record my notes on Dwork’s proof of Theorem 1, drawing heavily on the expositions of Serre, Hooley, Koblitz, and others.

The basic strategy is to control the rational integers {S_n} both in an “Archimedean” sense (embedding the rational integers inside the complex numbers {{\bf C}_\infty} with the usual norm {||_\infty}) as well as in the “{p}-adic” sense, with {p} the characteristic of {{\bf F}_q} (embedding the integers now in the “complexification” {{\bf C}_p} of the {p}-adic numbers {{\bf Q}_p}, which is equipped with a norm {||_p} that we will recall later). (This is in contrast to the methods of ell-adic cohomology, in which one primarily works over an {\ell}-adic field {{\bf Q}_\ell} with {\ell \neq p,\infty}.) The Archimedean control is trivial:

Proposition 3 (Archimedean control of {S_n}) With {S_n} as above, and any embedding {\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}, we have

\displaystyle  |\iota_\infty(S_n)|_\infty \leq C q^{A n}

for all {n} and some {C, A >0} independent of {n}.

Proof: Since {S_n} is a rational integer, {|\iota_\infty(S_n)|_\infty} is just {|S_n|_\infty}. By decomposing {V} into affine pieces, we may assume that {V} is of the affine form (1), then we trivially have {|S_n|_\infty \leq q^{nd}}, and the claim follows. \Box

Another way of thinking about this Archimedean control is that it guarantees that the zeta function {T \mapsto \zeta_V(T)} can be defined holomorphically on the open disk in {{\bf C}_\infty} of radius {q^{-A}} centred at the origin.

The {p}-adic control is significantly more difficult, and is the main component of Dwork’s argument:

Proposition 4 ({p}-adic control of {S_n}) With {S_n} as above, and using an embedding {\iota_p: \overline{{\bf Q}} \rightarrow {\bf C}_p} (defined later) with {p} the characteristic of {{\bf F}_q}, we can find for any real {A > 0} a finite number of elements {\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'} \in {\bf C}_p} such that

\displaystyle  |\iota_p(S_n) - (\alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n)|_p \leq q^{-An}

for all {n}.

Another way of thinking about this {p}-adic control is that it guarantees that the zeta function {T \mapsto \zeta_V(T)} can be defined meromorphically on the entire {p}-adic complex field {{\bf C}_p}.

Proposition 4 is ostensibly much weaker than Theorem 1 because of (a) the error term of {p}-adic magnitude at most {Cq^{-An}}; (b) the fact that the number {k+k'} of potential characteristic values here may go to infinity as {A \rightarrow \infty}; and (c) the potential characteristic values {\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}} only exist inside the complexified {p}-adics {{\bf C}_p}, rather than in the algebraic integers {O_{\overline{{\bf Q}}}}. However, it turns out that by combining {p}-adic control on {S_n} in Proposition 4 with the trivial control on {S_n} in Proposition 3, one can obtain Theorem 1 by an elementary argument that does not use any further properties of {S_n} (other than the obvious fact that the {S_n} are rational integers), with the {A} in Proposition 4 chosen to exceed the {A} in Proposition 3. We give this argument (essentially due to Borel) below the fold.

The proof of Proposition 4 can be split into two pieces. The first piece, which can be viewed as the number-theoretic component of the proof, uses external descriptions of {V} such as (1) to obtain the following decomposition of {S_n}:

Proposition 5 (Decomposition of {S_n}) With {\iota_p} and {S_n} as above, we can decompose {\iota_p(S_n)} as a finite linear combination (over the integers) of sequences {S'_n \in {\bf C}_p}, such that for each such sequence {n \mapsto S'_n}, the zeta functions

\displaystyle  \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n ) = \sum_{n=0}^\infty c_n T^n

are entire in {{\bf C}_p}, by which we mean that

\displaystyle  |c_n|_p^{1/n} \rightarrow 0

as {n \rightarrow \infty}.

This proposition will ultimately be a consequence of the properties of the Teichmuller lifting {\tau: \overline{{\bf F}_p}^\times \rightarrow {\bf C}_p^\times}.

The second piece, which can be viewed as the “{p}-adic complex analytic” component of the proof, relates the {p}-adic entire nature of a zeta function with control on the associated sequence {S'_n}, and can be interpreted (after some manipulation) as a {p}-adic version of the Weierstrass preparation theorem:

Proposition 6 ({p}-adic Weierstrass preparation theorem) Let {S'_n} be a sequence in {{\bf C}_p}, such that the zeta function

\displaystyle  \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n )

is entire in {{\bf C}_p}. Then for any real {A > 0}, there exist a finite number of elements {\beta_1,\dots,\beta_{k'} \in {\bf C}_p} such that

\displaystyle  |\iota_p(S'_n) + \beta_1^n + \dots + \beta_{k'}^n|_p \leq q^{-An}

for all {n} and some {C>0}.

Clearly, the combination of Proposition 5 and Proposition 6 (and the non-Archimedean nature of the {||_p} norm) imply Proposition 4.

— 1. Constructing the complex {p}-adics —

Given a field {k}, a norm on that field is defined to be a map {||: k \rightarrow {\bf R}^+} obeying the following axioms for {x,y \in k}:

  • (Non-degeneracy) {|x|=0} if and only if {x=0}.
  • (Multiplicativity) {|xy| = |x| |y|}.
  • (Triangle inequality) {|x+y| \leq |x| + |y|}.

If the triangle inequality can be improved to the ultra-triangle inequality

\displaystyle  |x+y| \leq \max(|x|, |y|) \ \ \ \ \ (3)

then we say that the norm is non-Archimedean. The pair {(k, ||)} will be referred to as a normed field.

The most familiar example of a norm is the usual (Archimedean) absolute value {z \mapsto |z| = |z|_\infty} on the complex numbers {{\bf C} = {\bf C}_\infty}, and thus also on its subfields {{\bf R}} and {{\bf Q}}. For a given prime {p}, we also have the {p}-adic norm {x \mapsto |x|_p} defined initially on the rationals {{\bf Q}} by the formula

\displaystyle  |\frac{a}{b}|_p := p^{\hbox{ord}_p(b) - \hbox{ord}_p(a)}

for any rational {\frac{a}{b}}, where {\hbox{ord}_p(a)} is the number of times {p} divides an integer {a} (with the conventions {\hbox{ord}_p(0)=+\infty} and {p^{-\infty} = 0}). Thus for instance {|p^j|_p = p^{-j}} for any integer {j}, which is of course inverse to the Archimedean norm {|p^j|_\infty = p^j}. (More generally, the fundamental theorem of arithmetic can be elegantly rephrased as the identity {\prod_\nu |x|_\nu = 1} for all non-zero rationals {x}, where {\nu} ranges over all places (i.e. over all the rational primes {p}, together with {\infty}.) It is easy to see that {||_p} is indeed a non-Archimedean norm. A classical theorem of Ostrowski asserts that all norms on {{\bf Q}} are equivalent to either the Archimedean norm {||_\infty} or one of the {p}-adic norms {||_p}, although we will not need this result here.

A norm {||} on a field {k} defines a metric {d(x,y) := |x-y|}, and then one can define the metric completion {\hbox{Clos}_{||}(k)} of this field in the usual manner (as equivalence classes of Cauchy sequences in {k} with respect to this metric). It is easy to see that the resulting completion is again a field, and that the norm {||} on {k} extends continuously to a norm on the metric completion {\hbox{Clos}_{||}(k)}.

The metric closure of a non-Archimedean normed field is again a non-Archimedean normed field. Once one has metric completeness, one can form infinite series {\sum_{n=1}^\infty x_n} of elements of the field in the usual manner; but the non-Archimedean setting is somewhat better behaved than the Archimedean setting. In particular, it is easy to see that if {k = (k,||)} is a non-Archimedean metrically complete normed vector field, then an infinite series {\sum_{n=1}^\infty x_n} is convergent if and only if it obeys the zero test {\lim_{n \rightarrow \infty} |x_n| = 0}, and furthermore that convergent series are automatically unconditionally convergent. (The notion of absolute convergence is not particularly relevant in non-Archimedean fields.) Thus we can talk about a countable series {\sum_{n \in I} x_n} in a non-Archimedean metrically complete normed vector field being convergent without having to be concerned about the ordering of the series.

As key examples of metric completion, we recall that using the Archimedean norm {||_\infty}, the metric completion {\hbox{Clos}_{||_\infty}({\bf Q})} of the rationals is the reals {{\bf R} = {\bf Q}_\infty}, whereas using a {p}-adic norm {||_\infty}, the metric completion {\hbox{Clos}_{||_p}({\bf Q})} of the rationals is instead the {p}-adic field {{\bf Q}_p}.

Note that the metric notion of completeness (convergence of every Cauchy sequence) is distinct from the algebraic notion of completeness (solvability of every non-constant polynomial equation, also known as being algebraically closed). For instance, the fields {{\bf R}={\bf Q}_\infty} and {{\bf Q}_p} are metrically complete, but not algebraically complete. However, the two notions of completeness are related to each other in a number of ways. Firstly, the metric completion of an algebraically complete field remains algebraically complete:

Lemma 7 Let {(k, ||)} be a normed field which is algebraically closed. Then the metric completion {\hbox{Clos}_{||}(k)} is also algebraically closed.

Proof: Let {P(x) = x^d + a_{d-1} x^{d-1} + \dots + a_0} be a monic polynomial of some degree {d \geq 1} with coefficients {a_0,\dots,a_{d-1}} in {\hbox{Clos}_{||}(k)}. We need to show that {P} has at least one root in {\hbox{Clos}_{||}(k)}. By construction of {\hbox{Clos}_{||}(k)}, we can view {P} as the limit of polynomials {P_n(x) = x^d + a_{d-1,n} x^{d-1} + \dots + a_{0,n}} with coefficients {a_{0,n},\dots,a_{d-1,n} \in k}, where the convergence is in the sense that each coefficient {a_{i,n}} converges to {a_i} as {n \rightarrow \infty} for {i=0,\dots,d-1}. As {k} is already algebraically closed, each {P_n} has {d} roots {z_{1,n},\dots,z_{d,n} \in k} (possibly with repetition). Because the {a_{i,n}} are bounded, it is easy to see from the equation {P_n(z_{i,n})=0} that the roots {z_{i,n}} are uniformly bounded in {n}. Among other things, this implies that {P_n(z_{i,m})} converges to zero as {n,m \rightarrow \infty}, since {P_m(z_{i,m})=0} and the coefficients of {P_m-P_n} converge to zero. Writing {P_n(z_{i,m}) = \prod_{j=1}^n (z_{i,m}-z_{j,n})}, we conclude that the distance between {z_{i,m}} and the zero set {\{z_{1,n},\dots,z_{d,n}\}} goes to zero as {n,m \rightarrow \infty}. From this one can easily extract a Cauchy sequence {z_{i_j, n_j}} with {n_j \rightarrow \infty}, which then converges to a limit {z \in \hbox{Clos}_{||}(k)} which can be seen to be a zero of {P}, giving the claim. \Box

In the other direction, in the case of the {p}-adics at least, it is possible to extend a norm on a field to the algebraic closure of that field:

Lemma 8 For any {z} in the algebraic closure {\overline{{\bf Q}_p}} of {{\bf Q}_p}, define the norm {|z|_p} of {z} by the formula

\displaystyle  |z| := |z_1 \dots z_d|^{1/d} \ \ \ \ \ (4)

where {z_1,\dots,z_d} are the Galois conjugates of {z} in {\overline{{\bf Q}_p}} (so in particular {z_1 \dots z_d \in {\bf Q}_p}). Then {\overline{{\bf Q}_p}} becomes a non-Archimedean normed field with this norm.

The situation is much more complicated in the Archimedean case, as there is no canonical way to extend the norm in this case. For instance, if one wishes to extend the Archimedean norm {||_\infty} from {{\bf Q}} to {\overline{{\bf Q}}}, one can do so by choosing an embedding {\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}} and using the Archimedean norm on {{\bf C}}, but this is not a Galois-invariant definition. For instance, one of the two roots of the equation {x^2 - x - 1 = 0} will have a larger norm than the other (one norm being the golden ratio, and the other being its reciprocal), but the choice of root that has the larger norm depends on the choice of embedding {\iota_\infty}. Note that the definition (4) fails to be a norm in the Archimedean case; for instance, in {\overline{{\bf Q}}}, (4) would require {3+2\sqrt{2}} and {3-2\sqrt{2}} to have norm {1}, while their sum would have norm {6}, violating the triangle inequality.

Proof: The only difficult task to show is the ultra-triangle inequality (3). It suffices to show that for every Galois extension {E} of {{\bf Q}_p} and every {z,w \in E}, one has

\displaystyle  |z+w| \leq \max(|z|, |w|).

We view {E} as a finite-dimensional vector space over {{\bf Q}_p} of some dimension {d}, and identify each {z \in E} with the multiplication operator {M_z: E \rightarrow E} defined by {M_z x := zx}. These {M_z} can be viewed as an element of {\hbox{Hom}_{{\bf Q}_p}(E \rightarrow E)}, the space of {{\bf Q}_p}-linear maps from {E} to itself, and the determinant of {M_z} has norm {|z|^d} by construction. We pick some arbitrary {{\bf Q}_p}-basis {e_1,\dots,e_d} of {E} and use this to define a non-Archimedean “norm” {\| \|} on {E} by the formula

\displaystyle  \| x_1 e_1 + \dots + x_d e_d \| := \sup_{i=1,\dots,d} |x_i|

for {x_1,\dots,x_d \in {\bf Q}_p}, and then define a “norm” {\|\|_{op}} on {\hbox{Hom}_{{\bf Q}_p}(E \rightarrow E)} by

\displaystyle  \| T \|_{op} := \sup \{ \|Tx\|: x \in E, \|x\| \leq 1 \}.

It is easy to see that the space {\{ M_z: z \in E \}} is then a closed linear subspace of {\hbox{Hom}_{{\bf Q}_p}(E \rightarrow E)}. In particular, since {{\bf Q}_p} is locally compact, we see that for any compact interval {I \subset (0,+\infty)}, the set {\{ M_z: \|M_z\|_{op} \in I \}} is compact. On the other hand, as all the {M_z} are invertible, {\hbox{det}(M_z)} is non-zero on this compact set. Thus, for any {I}, there exists a constant {C = C_I > 0} such that

\displaystyle  C^{-1} \leq |\hbox{det}(M_z)| \leq C

for all {z \in \{ M_z: \|M_z\|_{op} \in I \}}. Since {\hbox{det}(M_z) = |z|^d}, we then see from a rescaling argument that there is a constant {C'>0} such that

\displaystyle  (C')^{-1} \|z\|_{op} \leq |z| \leq C \|z\|_{op}

for all {z \in E}. Since {|z| = |z^n|^{1/n}}, we conclude the spectral radius formula

\displaystyle  |z| = \lim_{n \rightarrow \infty} \|z^n\|_{op}^{1/n}. \ \ \ \ \ (5)

Now we can prove the ultra-triangle inequality via the tensor power trick. If {|z|, |w| \leq A}, then from (5) we have

\displaystyle  \|z^n\|_{op}, \|w^n\|_{op} \leq (A+o(1))^n

as {n \rightarrow \infty}; from this and the easy bounds {\|zw\|_{op} \leq \|z\|_{op} \|w\|_{op}, \|z+w\|_{op} \leq \max( \|z\|_{op}, \|w\|_{op} )} and binomial expansion we also conclude that

\displaystyle  \|(z+w)^n\|_{op} \leq (A+o(1))^n

as {n \rightarrow \infty}. A second application of (5) then gives {|z+w| \leq A}, and the ultra-triangle inequality follows. \Box

Combining the two lemmas, we see that if we define

\displaystyle {\bf C}_p = \hbox{Clos}_{||_p}( \overline{{\bf Q}_p} )

to be the metric completion of the algebraic completion {\overline{{\bf Q}_p}} of the {p}-adic field {{\bf Q}_p}, then this is a non-Archimedean normed field which is both metrically complete and algebraically complete, and serves as the analogue of the complex field {{\bf C} = {\bf C}_\infty}. Note that {{\bf C}_p} comes with an embedding {\iota_p: \overline{{\bf Q}} \rightarrow {\bf C}_p}, since {\overline{{\bf Q}}} may clearly be embedded into {\overline{{\bf Q}_p}}. Also, the norm on {\overline{{\bf Q}}} induced from this embedding is clearly Galois-invariant and thus independent of the choice of embedding. Finally, we remark from construction that every non-zero element of {\overline{{\bf Q}_p}} has a norm which is a rational power {p^{a/b}} of {p}, so on taking limits (and using the ultra-triangle inequality) we see that the same is true for non-zero elements of {{\bf C}_p}.

Remark 9 In the Archimedean case, the analogue of {{\bf Q}_p} is the reals {{\bf R} = {\bf Q}_\infty}, and in this case the algebraic completion {\overline{{\bf R}}} is a finite extension of {{\bf R}} (in fact it is just a quadratic extension) and is thus already metrically complete. However, in the {p}-adic case, it turns out that {\overline{{\bf Q}_p}} is an infinite extension of {{\bf Q}_p} (for instance, it contains {n^{th}} roots of {p} for every {n \geq 1}), and is no longer metrically complete, requiring the additional application of Lemma 7 to recover metric completeness.

— 2. From meromorphicity to rationality —

We now show how Proposition 3 and Proposition 4 imply Theorem 1. The basic idea is to exploit the fact that a non-zero rational integer {N} cannot be simultaneously small in the Archimedean sense and in the {p}-adic sense, and in particular that we have an “uncertainty principle”

\displaystyle  |N|_\infty \times |N|_p \geq 1 \ \ \ \ \ (6)

which is immediate from the fundamental theorem of arithmetic. We would like to use this uncertainty principle to eliminate the error term in Proposition 4, but run into the issue that many of the quantities involved here are not rational integers, but instead merely lie in {{\bf C}_p}. To get around this, we have to work with expressions that are guaranteed to be rational integers, such as polynomial combinations of the {S_n} with integer coefficients. To this end, we introduce the following classical lemma:

Lemma 10 (Rationality criterion) Let {S_n} be a sequence in a field {k}, with the property that there exists a natural number {m \geq 0} such that the {m+1 \times m+1} determinants

\displaystyle  \det ( S_{n+i+j} )_{0 \leq i,j \leq m} \ \ \ \ \ (7)

vanish for all sufficiently large {n}. Then there exist {a_0,\dots,a_m \in k}, not all zero, such that we have the linear recurrence

\displaystyle  a_0 S_n + a_1 S_{n+1} + \dots + a_m S_{n+m} = 0 \ \ \ \ \ (8)

for all sufficiently large {n}. (Equivalently, the formal power series {\sum_n S_n T^n} is a rational function of {T}.)

Note that in the converse direction, row operations show that if one has the recurrence (8), then (7) vanishes.

Proof: We may assume that the {m \times m} determinants

\displaystyle  \det ( S_{n+i+j} )_{0 \leq i,j \leq m-1} \ \ \ \ \ (9)

are non-vanishing for infinitely many {n} (this is a vacuous condition if {m=0}), since otherwise we can replace {m} by {m-1} in the hypotheses and conclusion.

Let {n} be large enough that (7) vanishes, and suppose that the determinant (9) vanishes for this value of {n}. We claim that the determinant

\displaystyle  \det ( S_{n+1+i+j} )_{0 \leq i,j \leq m-1} \ \ \ \ \ (10)

also vanishes; induction then shows that (9) vanishes for all sufficiently large {n}, a contradiction.

To see why (10) vanishes, we argue as follows. As (9) vanishes, there is a non-trivial linear dependence among the {m} rows of the matrix in (9). If this dependence does not involve the first row, then it also creates a non-trivial dependence among the first {m-1} rows of the matrix (10), and we are done. Thus we may assume that the first row in (9) is a linear combination of the next {m-1} rows. As a consequence, the first row in (7) is a linear combination of the next {m-1} rows, plus a vector of the form {(0,\dots,0,\beta)} for some {\beta \in k}. If {\beta} is non-zero, then the row operations and cofactor expansion show that the determinant (7) is plus or minus {\beta} times the determinant (10), giving the claim. If {\beta} is instead zero, then the first {m} rows of the matrix in (7) have a non-trivial linear dependence, which on deleting the first column shows that the {m} rows of the matrix in (10) also have a non-trivial linear dependence, giving the claim.

We thus conclude that (9) does not vanish for all sufficiently large {n}. In particular, the matrix in (7) always has rank {m}. An easy induction then shows that the row span of the matrix in (7) is a hyperplane in {k^{m+1}} (spanned by either the first {m} rows or the last {m} rows), which is independent of {n}. Writing this hyperplane as {\{ (x_0,\dots,x_m): a_0 x_0 + \dots + a_m x_m = 0 \}}, we obtain the claim. \Box

Now let {S_n} be as in Proposition 3 and Proposition 4, let {m} be a large natural number to be chosen later, and consider the determinant (7). This is clearly a rational integer. On the one hand, from Proposition 3 we have the upper bound

\displaystyle  |\det ( S_{n+i+j} )_{0 \leq i,j \leq m}|_\infty \leq C_m q^{A n (m+1)} \ \ \ \ \ (11)

for all {n} and some {C_m, A>0}, with {A} independent of {m}. On the other hand, from Proposition 4 we can write each row in the {m+1\times m+1} matrix in (7) (after applying the embedding {\iota_p}) as the linear combination of at most {k} vectors of the form {(1, \lambda, \dots, \lambda^m)} for various {\lambda \in {\bf C}_p}, plus an error vector whose coefficients all have norm at most {q^{-(A+1)n}} (say), where {k} is independent of {m}. Taking determinants, we conclude that

\displaystyle  |\det ( S_{n+i+j} )_{0 \leq i,j \leq m}|_p \leq C'_m q^{-(A+1) n (m-k)} \ \ \ \ \ (12)

for sufficiently large {n,m} and some {C'_m > 0}. Inserting the two bounds (11), (12) into the uncertainty principle (6), we conclude the vanishing

\displaystyle  \det ( S_{n+i+j} )_{0 \leq i,j \leq m} = 0

for all sufficiently large {n,m}. Applying Lemma 10, we conclude that there exists a natural number {m \geq 0} and rational coefficients {a_0,\dots,a_m \in {\bf Q}}, not all zero, such that

\displaystyle  a_0 S_n + a_1 S_{n+1} + \dots + a_m S_{n+m} = 0 \ \ \ \ \ (13)

for all sufficiently large {n}. By clearing denominators, we may assume that the {a_i} are all rational integers. By deleting zero terms, we may assume that {a_0} and {a_m} are non-zero.

We can use this recurrence to improve the conclusions of Proposition 4. Observe from that proposition, after collecting like terms and absorbing any characteristic value {\lambda} with {|\lambda|_p \leq q^{-A}} into the error term, that for any {A > 0} we can find a finite number of distinct characteristic values {\lambda_1,\dots,\lambda_k \in {\bf C}_p} with {|\lambda_i|_p > q^{-A}} for {i=1,\dots,k}, as well as non-zero integers {c_1,\dots,c_k \in {\bf Z}}, such that

\displaystyle  |\iota_p(S_n) - \sum_{i=1}^k c_i \lambda_i^n|_p \leq q^{-An}

for all {n}. Applying (13) to eliminate the {S_n}, we conclude that

\displaystyle  |\sum_{i=1}^k c_k P(\lambda_i) \lambda_i^n|_p \leq q^{-An}

for all sufficiently large {n}, where {P} is the characteristic polynomial

\displaystyle  P(z) := a_0 + a_1 z + \dots + a_m z^m.

If one of the {\lambda_i} is not a root of {P}, then by applying difference operators

\displaystyle  \partial_\lambda S_n := S_{n+1} - \lambda S_n

to eliminate all the other characteristic values, we eventually conclude that

\displaystyle  |\lambda_i^n|_p \leq C_i q^{-An}

for all sufficiently large {n} and some {C_i} independent of {n}, contradicting the hypothesis {|\lambda_i|_p > q^{-A}}. Thus all the {\lambda_i} are zeroes of {P} and in particular lie in {\overline{{\bf Q}}}. If we then let {\zeta_1,\dots,\zeta_l \in \overline{{\bf Q}}} be an enumeration of the distinct zeroes of {P} (which are all non-zero by the non-vanishing of {a_0}), and choose {A} such that {|\zeta_i| > q^{-A}} for all {i=1,\dots,l}, we conclude that for each {A}, there exist integers {c_i = c_{i,A}} for {i=1,\dots,l} such that

\displaystyle  |S_n - \sum_{i=1}^l c_i \zeta_i^n|_p \leq q^{-An} \ \ \ \ \ (14)

for all {n}. The coefficients {c_i} ostensibly depend on {A}, but a repetition of the above arguments show that they are in fact independent of {A}, since given two {A,A'} with {|\zeta_i| > q^{-A} \geq q^{-A'}}, we see from the triangle inequality that

\displaystyle  |\sum_{i=1}^l (c_{i,A} - c_{i,A'}) \zeta_i^n|_p \leq q^{-An}

for all {n}, and then by applying difference operators to isolate a single {\zeta_i}, we see that {c_{i,A} = c_{i,A'}} for all {i}. (Note that this argument also gives the uniqueness of the characteristic values that was asserted in the introduction.) As the {c_i} are independent of {A}, we may send {A \rightarrow \infty} in (14), and conclude that

\displaystyle  S_n = \sum_{i=1}^l c_i \zeta_i^n \ \ \ \ \ (15)

for all {n}.

We are now nearly done, except that the {\zeta_i \in \overline{{\bf Q}}} are algebraic numbers rather than algebraic integers. However, as the {S_n} are rational integers, we have {|S_n|_\ell \leq 1} for all {n} and {\ell}, and applying difference operators to (15) to isolate {\zeta_i} we conclude that {|\zeta_i|_\ell \leq 1} for all {i} and all {\ell}. As the characteristic values are closed under the absolute Galois group of {{\bf Q}}, we conclude that all Galois conjugates of {\zeta_i} also have {\ell}-adic norm at most one, so the minimal polynomial of {\zeta_i} in {{\bf Q}} has coefficients that are rational and have {\ell}-adic norm at most one for every {\ell}, and are thus rational integers, so that {\zeta_i} is an algebraic integer as required. Theorem 1 follows.

Remark 11 The argument above is has been slightly rearranged from the standard argument in the literature, in which one establishes rationality of the zeta function {\exp(\sum_{n=1}^\infty \frac{S_n}{n} T^n)} directly, rather than first establishing rationality of the generating function {\sum_{n=1}^\infty S_n T^n} (which is essentially the logarithmic derivative of the zeta function). The reason I did so was to highlight the fact that transcendental operations such as exponentiation do not play a role in this portion of the argument, in contrast to Propositions 5 and 6, which crucially exploit the properties of the exponential function.

— 3. The {p}-adic Weierstrass preparation theorem —

Now we prove Proposition 6. We begin with a theorem somewhat analogous to Rouche’s theorem in complex analysis, which approximately locates a zero of an entire function that is dominated by a monomial {a_m T^m}.

Lemma 12 (Rouche-type theorem) Let

\displaystyle  f(T) = 1 + a_1 T + a_2 T^2 + \dots

be an entire function on {{\bf C}_p}, thus {a_1,a_2,\dots \in {\bf C}_p} and {|a_n|_p^{1/n} \rightarrow 0} as {n \rightarrow \infty}. Suppose that {|a_m|_p^{1/m} \geq 1/R} for some {m \geq 1} and {R>0}. Then there exists a root {z \in {\bf C}_p} of {f} (thus {f(z)=0}) with {|z|_p \leq R}.

Proof: We first consider the polynomials

\displaystyle  f_n(T) = 1 + a_1 T + \dots + a_n T^n

for some {n \geq m}. As {{\bf C}_p} is algebraically closed, there must be a factorisation

\displaystyle  f_n(T) = (1-\beta_{n,1} T) \dots (1-\beta_{n,n} T)

for some {\beta_{n,1},\dots,\beta_{n,n} \in {\bf C}_p}, thus {a_m} is plus or minus the {m^{th}} symmetric polynomials of the {\beta_{n,i}}. Since {|a_m|_p \geq 1/R^m}, we conclude from the non-archimedean nature of the norm that {|\beta_{i,n}|_p \geq 1/R} for at least one {i}. Similarly, given any {R'>0}, if there are exactly {k} {i} for which {|\beta_{i,n}| \geq 1/R'}, then by computing the {k^{th}} symmetric polynomial we conclude that {|a_k|_p \geq (1/R')^k}. Since {|a_n|_p^{1/n}} goes to zero, we conclude that for any {R'}, the number of {i} for which {|\beta_{i,n}|_p \geq 1/R'} is bounded uniformly in {n}; the same argument shows that the {|\beta_{i,n}|_p} are uniformly bounded away from zero.

Now we run an argument somewhat similar to the proof of Lemma 7. Let {n, n'} be large natural numbers, and let {i=i_n} be such that {|\beta_{i,n}|_p \geq 1/R}. We have

\displaystyle  P_n( \frac{1}{\beta_{i,n}} ) = 0

and hence (since {|a_n|_p^{1/n} \rightarrow 0})

\displaystyle  P_{n'}( \frac{1}{\beta_{i,n}} ) \rightarrow 0

as {n,n' \rightarrow \infty}; thus

\displaystyle  \prod_{j=1}^{n'} | 1 - \frac{\beta_{j,n'}}{\beta_{i,n}}|_p \rightarrow 0.

On the other hand, since {|\beta_{i,n}|_p \geq 1/R}, and since {|\beta_{j,n'}| < R} for all but a bounded number of {j}, we see from the non-archimedean nature of the norm that {| 1 - \frac{\beta_{j,n'}}{\beta_{i,n}}|_p = 1} for all but a bounded number of {j}. Since the {\beta_{j,n'}} are also uniformly bounded away from zero, we conclude that

\displaystyle  \inf_{1 \leq j \leq n'} |\beta_{i,n} - \beta_{j,n'}|_p \rightarrow 0

as {n,n' \rightarrow \infty}. From this, we can form a Cauchy sequence {\beta_{i_k,n_k}} such that {n_k \rightarrow \infty} and {|\beta_{i_k,n_k}|_p \geq 1/R}; taking limits, we obtain {\beta \in {\bf C}_p} with {|\beta|_p \geq 1/R} such that {f( \frac{1}{\beta} ) = 0}, giving the claim. \Box

One can refine the methods in this proof to read off the {p}-adic magnitudes of all the zeroes of {f} in terms of the Newton polytope of {f} (yielding a {p}-adic analogue of Jensen’s formula from complex analysis), but we will not need to do so here.

By iteratively removing the zeroes generated by the above lemma, we have

Proposition 13 ({p}-adic Weierstrass preparation theorem, alternate form) Let

\displaystyle  f(T) = 1 + a_1 T + a_2 T^2 + \dots

be an entire function on {{\bf C}_p}, and let {R>0}. Then there exists a factorisation

\displaystyle  f(T) = (\prod_{i=1}^m (1-\beta_i T)) g(T)

where {\beta_1,\dots,\beta_m \in {\bf C}_p} and

\displaystyle  g(T) = 1 + b_1 T + b_2 T^2 + \dots

is an entire function such that {|b_n|_p \leq R^{-n}} for all {n}.

Proof: We can make {R} a rational power of {p}, and then by rescaling we may normalise {R=1}.

Since {|a_n|_p^{1/n} \rightarrow 0}, the function {R^n |a_n|_p} goes to zero as {n \rightarrow \infty}, and so there exists a natural number {m \geq 0} such that

\displaystyle  |a_n|_p \leq |a_m|_p \ \ \ \ \ (16)

for all {n \geq 0}, with the convention that {a_0=1}, and with strict inequality if {n<m}. We now induct on {m}. If {m=0} then we are already done (setting {g=f}). Now suppose that {m \geq 1}, and that the claim has already been proven for {m-1}. From the strict form of (16) with {n=0} we have {|a_m|_p > 1}, so by Lemma 12 we can find {\beta \in {\bf C}_p} with {|\beta| > 1} such that {f(1/\beta)=0}. We can then factor

\displaystyle  f(T) = (1-\beta T) f'(T)

where

\displaystyle  f'(T) = 1 + a'_1 T + a'_2 T^2 + \dots

and

\displaystyle  a'_n = \beta^n + a_1 \beta^{n-1} + \dots + a_n.

Since {f(1/\beta)=0}, we also have

\displaystyle  a'_n = - \beta^{-1} a_{n+1} - \beta^{-2} a_{n+2} - \dots.

Using (16) and the non-archimedean property, one easily verifies that

\displaystyle  |a'_n|_p \leq \frac{|a_m|_p}{|\beta|_p} = |a'_{m-1}|_p

for all {n \geq 0}, with strict inequality if {n<m-1}. Applying the induction hypothesis to {f'}, we obtain the claim. \Box

Now we prove Proposition 6. Let {S'_n} be as in that proposition, and let {A>0} be arbitrary. By Proposition 13 we have

\displaystyle  \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n ) = (\prod_{i=1}^{k} (1-\beta_i T)) g(T)

for some {\beta_1,\dots,\beta_k \in {\bf C}_p} and some entire function

\displaystyle  g(T) = 1 + b_1 T + b_2 T^2 + \dots

with {|b_n|_p \leq q^{-An}} for all {n}. Now we apply formal logarithms

\displaystyle  \log(1-X) := - X - \frac{X^2}{2} - \frac{X^3}{3} - \dots

to both sides. Clearly {\log(\exp(f(T))) = f(T)} for any formal power series {f(T)} that actually converges in {{\bf C}}; comparing coefficients, we conclude that the formal identity {\log(\exp(f(T))=f(T)} holds in any characteristic zero field. For similar reasons we have {\log(f(T) g(T)) = \log(f(T))+\log(g(T))} for any formal power series {f(T), g(T)} with {f(0)=g(0)=1} with coefficients in a characteristic zero field. We conclude that

\displaystyle  \sum_{n=1}^\infty \frac{S'_n}{n} T^n = - \sum_{n=1}^\infty \sum_{i=1}^k \frac{\beta_i^n}{n} T^n + \log g(T).

But by working out the power series, we see that

\displaystyle  \log g(T) = c_1 T + c_2 T^2 + \dots

where the coefficients {c_n} obey the bounds

\displaystyle  |c_n|_p \leq C^n q^{-An}

for some constant {C} independent of {A}. The claim then follows after increasing {A} as necessary.

— 4. Factorising the zeta function —

Now we establish Proposition 5, which is the most “number-theoretical” component of Dwork’s argument.

First observe that by covering the quasiprojective variety {V} into affine pieces, and using an induction on the dimension of {V} to take care of any double-counted terms, we may reduce to the case when {V} is an affine variety (1), thus

\displaystyle  S_n = | \{ x \in {\bf F}_{q^n}^d: P_1(x) = \dots = P_m(x) = 0 \}|.

We can view {S_n} as a sum {S_n= \sum_{x \in V[{\bf F}_{q^n}]} 1} over the affine variety {V}. The next step is Fourier expansion in order to “complete” the sum {S_n} into exponential sums over an ambient affine space. Write {q=p^r}. For any {n \geq 0} define the trace map {\hbox{Tr}_n: {\bf F}_{p^n} \rightarrow {\bf F}_p} by the formula

\displaystyle  \hbox{Tr}_n( x ) := x + x^p + x^{p^2} + \dots + x^{p^{n-1}}.

This is a linear map over {{\bf F}_p}.

Let {\varepsilon} be a primitive {p^{th}} root of unity in {{\bf C}_p}. Then from Fourier analysis we see that for any {x \in {\bf F}_{q^n}^d = {\bf F}_{p^{nr}}^d}, the sum

\displaystyle  \sum_{y_1,\dots,y_k \in {\bf F}_{q^n}} \varepsilon^{\hbox{Tr}_{nr}( y_1 P_1(x) + \dots + y_m P_k(x) )}

is equal to {q^{mn}} if {P_1(x)=\dots=P_m(x)=0} and equal to zero otherwise. Hence

\displaystyle  q^{mn} S_n = q^{mn} \iota_p(S_n) = \sum_{(x,y_1,\dots,y_m) \in {\bf F}_{q^n}^{d+m}} \varepsilon^{\hbox{Tr}_{nr}( y_1 P_1(x) + \dots + y_m P_m(x) )}.

In view of this (replacing {d+m} by {d}, and rescaling the zeta function), it suffices to show that for any polynomial {P: {\bf A}^d \rightarrow {\bf A}} defined over {{\bf F}_q}, one can decompose the sequence

\displaystyle  n \mapsto \sum_{x \in {\bf F}_{q}^d} \varepsilon^{\hbox{Tr}_{nr}( P(x) )}

as a finite linear combination over {{\bf Z}} of sequences {S'_n \in {\bf C}_p} with {\exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n )} entire.

It is convenient to remove the coordinate hyperplanes. Note that {{\bf F}_{q^n}^d} splits as the space {({\bf F}_{q^n}^\times)^d} plus some lower-dimensional spaces, where {{\bf F}_{q^n}^\times := {\bf F}_{q^n} \backslash \{0\}}. By an induction on dimension, it thus suffices to show that the sequence

\displaystyle  n \mapsto \sum_{x \in ({\bf F}_{q^n}^\times)^d} \varepsilon^{\hbox{Tr}_n( P(x) )}

decomposes as a finite linear combination over {{\bf Z}} of sequences {S'_n \in {\bf C}_p} with {\exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n )} entire.

To prove this, we will establish the following trace formula:

Theorem 14 (Trace formula) There exists a formal power series

\displaystyle  G(X_1,\dots,X_d) = \sum_{w_1,\dots,w_d \geq 0} c_{w_1,\dots,w_d} X_1^{w_1} \dots X_d^{w_d}

in {d} variables with coefficients {c_{w_1,\dots,w_d}} in {{\bf C}_p}, or more compactly

\displaystyle  G(X) = \sum_{w \in {\bf N}^d} c_w X^w,

such that

\displaystyle  |c_w|_p \leq p^{-M|w|} \ \ \ \ \ (17)

for all {w \in {\bf N}^d} and some {M>0} (where {|w| := w_1+\dots+w_d}), such that one has the trace formula

\displaystyle  \sum_{(x \in {\bf F}_{q^n}^\times)^d} \varepsilon^{\hbox{Tr}_{nr}( P(x) )} = (q^n-1)^d \hbox{tr}_R( \Psi^n ) \ \ \ \ \ (18)

for all {n \geq 1}, where {\Psi = \Psi_{q,G}: R \rightarrow R} is the {{\bf C}_p}-linear map on the (infinite-dimensional) vector space {R} of all formal power series {F(X) = \sum_{u \in {\bf N}^d} a_u X^u} in {d} variables defined by

\displaystyle  \Psi(F(X)) = T_q( G(X) F(X) ) \ \ \ \ \ (19)

where {T_q: R \rightarrow R} is the linear map

\displaystyle  T_q( \sum_{u \in {\bf N}^d} a_u X^u ) := \sum_{u \in {\bf N}^d} a_{qu} X^u

and the trace {\hbox{tr}_R} on {R} is computed using the monomial basis {X^u} of {R}, thus if

\displaystyle  \Psi^m( X^u ) = \sum_{v \in {\bf N}^d} c^{(m)}_{uv} X^v

then

\displaystyle  \hbox{tr}_R(\Psi^n) := \sum_{u \in {\bf N}^d} c^{(m)}_{uu};

one can easily verify that this sum is convergent. (We will not address the subtle issue as to whether trace is a basis-independent concept in infinite dimensions.)

Let us assume this trace formula for the moment and conclude the proof of Theorem 5. Expanding out the {(q^n-1)^d} factor in (18) and arguing as before, it will suffice to show that the zeta function

\displaystyle  \exp( - \sum_{n=1}^\infty \frac{\hbox{tr}_R( \Psi^n ) T^n}{n} ) \ \ \ \ \ (20)

is entire; thus if {b_m} is the {T^m} coefficient of this zeta function, our task is to show that

\displaystyle  |b_m|^{1/m} \rightarrow 0 \ \ \ \ \ (21)

as {m \rightarrow \infty}.

Note from the formal identity

\displaystyle  1 - X = \exp( - \sum_{n=1}^\infty \frac{X^n}{n} )

(which is true for small complex {X}, and is thus also true for formal power series in characteristic zero) and the Jordan normal form that

\displaystyle  \det( 1 - AT ) = \exp( - \sum_{n=1}^\infty \frac{\hbox{tr}(A^n) T^n}{n} )

on the level of formal power series in {T} for any finite-dimensional matrix {A = (a_{ij})_{1 \leq i,j \leq k}} in characteristic zero. In particular, for any natural number {m}, the {T^m} coefficient of {\exp( - \sum_{n=1}^\infty \frac{\hbox{tr}(A^n) T^n}{n} )} is given by the formula

\displaystyle  (-1)^m \sum_* \hbox{sgn}(\sigma) a_{i_1,\sigma(i_1)} \dots a_{i_m,\sigma(i_m)}

where the sum {\sum_*} ranges over distinct elements {i_1,\dots,i_m} of {\{1,\dots,k\}}, and over permutations {\sigma} of {\{i_1,\dots,i_m\}}. This is a universal polynomial identity in characteristic zero, and so we conclude that the {T^m} coefficient {b_m} of the zeta function (20) is given by the formula

\displaystyle  b_m = (-1)^m \sum_* \hbox{sgn}(\sigma) c^{(1)}_{i_1,\sigma(i_1)} \dots c^{(1)}_{i_m,\sigma(i_m)}

where {i_1,\dots,i_m} now range over distinct natural numbers, and {\sigma} ranges over permutations of {\{i_1,\dots,i_m\}}; again, one can check that this sum is convergent. By the non-archimedean nature of the metric, it thus suffices to show that

\displaystyle  \sup_* (|c^{(1)}_{i_1,\sigma(i_1)}|_p \dots |c^{(1)}_{i_m,\sigma(i_m)}|_p)^{1/m} \rightarrow 0

as {m \rightarrow \infty}.

Now from (17) and construction of {\Psi}, we have

\displaystyle  |c^{(1)}_{ij}|_p \leq p^{-M (q|i| - |j|)}

for any {i,j}, and so (as {\sigma} is a permutation)

\displaystyle  (|c^{(1)}_{i_1,\sigma(i_1)}|_p \dots |c^{(1)}_{i_m,\sigma(i_m)}|_p)^{1/m} \leq p^{-M (q-1) \frac{1}{m} (|i_1| + \dots + |i_m|)}.

But because there are only a finite number of elements {i} of {{\bf N}^d} of a given length {|i|}, we see that {|i_1| + \dots + |i_m|} grows superlinearly in {m} (in fact it must grow by {\gg m^{1+\frac{1}{d}}}), and the claim follows.

It remains to establish the trace formula (18). We first write the trace {\hbox{tr}_R( \Psi^n )} in a more tractable form. For any natural number {k}, let {\mu_k := \{ z \in {\bf C}_p: z^k = 1 \}} denote the group of {k^{th}} roots of unity.

Lemma 15 If

\displaystyle  G(X) = \sum_{w \in {\bf N}^d} c_w X^w

is a power series with {|c_w|_p \leq p^{-M|w|}} for some {M>0} and all {w}, then for any {n \geq 1} we have

\displaystyle  (q^n-1)^d \hbox{tr}_R( \Psi^n ) = \sum_{x \in \mu_{q^n-1}^d} G(x) G(x^q) \dots G(x^{q^{n-1}})

where we use the notation

\displaystyle  (x_1,\dots,x_d)^q := (x_1^q,\dots,x_d^q).

Note that the power series for {G} converges at all roots of unity.

Proof: Observe that

\displaystyle  H(X) T_q( F(X) ) = T_q( H(X^q) F(X) )

for any {F,H \in R}. Iterating this using (19), we conclude the identity

\displaystyle  \Psi_{q,G}^n( F(x) ) = T_{q^n}( G(X^{q^{n-1}}) \dots G(X^q) G(X) F(X) )

\displaystyle  = \Psi_{q^n, G(X) \dots G(X^{q^{n-1}})}( F(x) )

and so to prove the lemma it suffices to do so in the {n=1} case, that is to say

\displaystyle  (q-1)^d \hbox{tr}_R( \Psi ) = \sum_{x \in \mu_{q-1}^d} G(x).

The right-hand side expands as

\displaystyle  \sum_{w \in {\bf N}^d} c_w \sum_{x \in \mu_{q-1}^d} x^w.

From Fourier analysis we see that {\sum_{x \in \mu_{q-1}^d} x^w} equals {(q-1)^d} when {w} is a multiple of {q-1} and zero otherwise, so the sum simplifies to

\displaystyle  (q-1)^d \sum_{w \in {\bf N}^d} c_{qw-w}.

The claim follows. \Box

In view of this lemma, it will suffices to obtain an identity of the form

\displaystyle  \sum_{x \in ({\bf F}_{q^n}^\times)^d} \varepsilon^{\hbox{Tr}_{nr}( P(x) )} = \sum_{x \in \mu_{q^n-1}^d} G(x) G(x^q) \dots G(x^{q^{r-1}}) \ \ \ \ \ (22)

for some power series

\displaystyle  G(X) = \sum_{w \in {\bf N}^d} c_w X^w

with {|c_w|_p \leq p^{-M|w|}} for some {M>0} and all {w}.

This will be deduced from the following basic fact in {p}-adic analysis, namely the existence of a canonical multiplicative embedding of the algebraic closure {\overline{{\bf F}_p}} of {{\bf F}_p} inside {{\bf C}_p}.

Lemma 16 (Teichmuller lifting) Let {\overline{{\bf F}_p} = \bigcup_{n=1}^\infty {\bf F}_{p^n}} be the algebraic closure of {{\bf F}_p}. Then there exists a map {\tau: \overline{{\bf F}_p} \rightarrow {\bf C}_p^\times}, known as the Teichmuller lift, with the following properties:

  • (Homomorphism) One has

    \displaystyle  \tau(xy) = \tau(x) \tau(y) \ \ \ \ \ (23)

    for all {x,y \in \overline{{\bf F}_p}}.

  • (Bijection) For each {n \geq 1}, {\tau} is a bijection (and hence group isomorphism, by (23)) between {{\bf F}_{p^n}^\times} and {\mu_{p^n-1}}. In particular, {|\tau(x)|_p=1} for all {x \in \overline{{\bf F}_p}^\times}.
  • (Description of trace) There exists a power series

    \displaystyle  \Theta(T) = \sum_{n=0}^\infty a_n T^n

    with {|a_n| \leq p^{-n/(p-1)}} for all {n}, such that

    \displaystyle  \varepsilon^{\hbox{Tr}_n(x)} = \Theta(\tau(x)) \Theta(\tau(x)^p) \dots \Theta(\tau(x)^{p^{n-1}}) \ \ \ \ \ (24)

    for all {n \geq 1} and {x \in {\bf F}_{p^n}}.

Let us see how the lemma implies an identity of the form (22). Writing

\displaystyle  P(x) = \sum_w a_w x^w

for some finite set of multi-indices {w} and coefficients {a_w \in {\bf F}_q}, we have from the linearity of trace that

\displaystyle  \varepsilon^{\hbox{Tr}_{nr}(P(x))} = \prod_w \varepsilon^{\hbox{Tr}_{nr}(a_w x^w)}

and hence by (24) we have

\displaystyle  \varepsilon^{\hbox{Tr}_{nr}(P(x))} = G(\tau(x)) G(\tau(x)^q) \dots G(\tau(x)^{q^{r-1}})

where

\displaystyle  G(z) := \prod_{s=0}^{r-1} \prod_w \Theta(\tau(a_w)^{p^s} z^{p^s w})

and {\tau(x_1,\dots,x_d) := (\tau(x_1),\dots,\tau(x_d))}. Note that the required decay of the coefficients of {G} follows from that of {\Theta}, since the {\tau(a_w)} have unit {p}-norm. The claim now follows from the bijective nature of the Teichmuller lift.

The only remaining task is to establish Lemma 16; here I will follow the exposition of Koblitz. We begin by constructing the Teichmuller lift {\tau: {\bf F}_{p^n}^\times \rightarrow \mu_{p^n-1}} for a given {n \geq 1}. Let {\alpha} be a primitive element of {{\bf F}_{p^n}^\times}, thus

\displaystyle  {\bf F}_{p^n}^\times = \{ 1, \alpha, \alpha^2, \dots, \alpha^{p^n-1} \}.

(The existence of such a primitive element can be seen by counting how many elements of {{\bf F}_{p^n}^\times} have order strictly less than {p^n-1}.) The minimal polynomial of {\alpha} over {{\bf F}_p} thus has degree {n}, that is to say it is of the form

\displaystyle  P(x) = x^n + a_{n-1} x_{n-1} + \dots + a_0

for some {a_0,\dots,a_{n-1} \in {\bf F}_p}. We arbitrarily lift this to the {p}-adic integers {{\bf Z}_p} as

\displaystyle  \tilde P(x) = x^n + \tilde a_{n-1} x_{n-1} + \dots + \tilde a_0

where {\tilde a_0,\dots,\tilde a_{n-1} \in {\bf Z}_p} reduce to {a_0,\dots,a_{n-1}} modulo {p}. Since the minimal polynomial {P} is irreducible in {{\bf F}_p}, the lift {\tilde P} is irreducible in {{\bf Z}_p} and hence also in {{\bf Q}_p} (here we use Lemma 12 to reach a contradiction if a monic factor of {\tilde P} has a coefficient of {p}-norm greater than {1}). Thus, if we let {\alpha \in \overline{{\bf Q}_p}} be a root of {\tilde P}, then {|\alpha|_p \leq 1} and {{\bf Q}_p(\alpha)} is a degree {n} extension of {{\bf Q}_p}. In this field, we define the valuation ring {A := \{ x \in {\bf Q}_p(\alpha): |x|_p \leq 1\}} and its maximal ideal {M := \{ x \in {\bf Q}_p(\alpha): |x|_p < 1 \}}. Then {A/M} is a field generated over {{\bf F}_p} by {\alpha \hbox{ mod } M}, which is a root of {P}, thus {A/M} is a degree {n} extension of {{\bf F}_p} and may be identified with {{\bf F}_{p^n}}.

We claim that the field extension {{\bf Q}_p(\alpha)} is unramified in the sense that all of the non-zero elements of {{\bf Q}_p(\alpha)} have norms that are integer powers of {p}, and in particular that {M = pA}. Suppose this were not the case, then there exists an element {\pi} of {{\bf Q}_p(\alpha)} with {1/p < |\pi| < 1}. If one lets {e_1,\dots,e_n} be a linear basis of {{\bf F}_{p^n}} over {{\bf F}_p}, and let {\tilde e_1,\dots,\tilde e_n} be representatives of this basis in {A}, one can then show that {\tilde e_1,\dots,\tilde e_n, \pi \tilde e_1,\dots,\pi \tilde e_n} are linearly independent over {{\bf Q}_p}, contradicting the fact that {{\bf Q}_p(\alpha)} is a degree {n} extension.

Let {\alpha} be an element of {{\bf F}_{p^n}^\times}, then {\alpha^{p^n-1}=1}. As discussed earlier, we can view {\alpha} as an element of {A/M = A/pA}. By applying (a slight variant of) Hensel’s lemma, we can find a lift {\tau(\alpha) \in A} of {\alpha} such that {\tau(\alpha)^{p^n-1} = 1}. This gives an injective map from {{\bf F}_{p^n}^\times} to {\mu_{p^n-1}}, which on comparing cardinalities must be a bijection. Since the quotient map from {A} to {A/pA} is a homomorphism, we see that {\tau} is a homomorphism. One can check that the maps {\tau: {\bf F}_{p^n}^\times \rightarrow {\bf C}_p^\times} for different {n} are compatible, and glue together to form a single map {\tau: \overline{{\bf F}_p}^\times \rightarrow {\bf C}_p^\times} obeying the homomorphism and bijection properties.

Now we have to construct {\Theta}. We first give a heuristic discussion. From the construction of {\tau}, we morally have {\tau(x) = x \hbox{ mod } p} for all {x \in \overline{{\bf F}_p}^\times}, where we are deliberately vague as to what “{\hbox{ mod } p}” means. Since the map {x \mapsto \varepsilon^x} should morally be periodic modulo {p}, we thus expect

\displaystyle  \varepsilon^{\hbox{Tr}_n(x)} = \varepsilon^{x + x^p + \dots + x^{p^{n-1}}}

\displaystyle  = \varepsilon^{\tau(x) + \tau(x^p) + \dots + \tau(x^{p^{n-1}})}

\displaystyle  = \varepsilon^{\tau(x)} \varepsilon^{\tau(x^p)} \dots \varepsilon^{\tau(x^{p^{n-1}})}

and so one is led to the initial guess

\displaystyle  \Theta(T) ?= \varepsilon^T

for {\Theta}. To make this heuristic discussion rigorous, we have to formally define what {\varepsilon^T} means as a power series in {T}. We write {\varepsilon = 1+\lambda}, thus {\lambda \neq 0} and

\displaystyle  (1+\lambda)^p = 1;

so that

\displaystyle  p + \binom{p}{2} \lambda + \dots + \binom{p}{p-1} \lambda^{p-2} + \lambda^{p-1} = 0;

thus the {p-1} Galois conjugates of {\lambda} multiply to {\pm p}, and so {|\lambda|_p = p^{-\frac{1}{p-1}}}. We can then define {\varepsilon^T = (1+\lambda)^T} by formal binomial expansion as

\displaystyle  (1+\lambda)^T := \sum_{i=0}^\infty \frac{T (T-1) \dots (T-i+1)}{i!} \lambda^i.

This is well-defined (over {{\bf C}_p}) as a formal power series in {T}. However, the convergence properties are bad, because of the denominator {i!}. Indeed, a standard computation shows that

\displaystyle  | \frac{1}{i!} | = p^{\frac{i-S_i}{p-1}}

where {S_i} is the sum of the digits of the base {p} expansion of {i}, and so

\displaystyle  | \frac{\lambda^i}{i!} | = p^{\frac{-S_i}{p-1}}.

The sequence {S_i} does not go to infinity as {i \rightarrow \infty}, and so the power series {(1+\lambda)^T} does not converge for {|T|_p \geq 1}. This is a problem, since we want to apply {\Theta} to norm one quantities such as {\tau(x^{p^j})}, and furthermore we are claiming a slightly larger radius of convergence in Lemma 16, namely (almost) {p^{1/(p-1)}}.

It turns out that there is a way to tweak the series {T \mapsto (1+\lambda)^T} to significantly improve the {p}-adic convergence behaviour. Namely, we (formally) define the corrected function

\displaystyle  \Theta(T) := F(T,\lambda)

where {F(T,Y)} is the formal power series in two variables {T,Y} defined by

\displaystyle  F(T,Y) := (1+Y)^T \prod_{j=1}^\infty (1+Y^{p^j})^{(T^{p^j} - T^{p^{j-1}})/p^j}; \ \ \ \ \ (25)

one can verify that this is well-defined as a formal power series, and for fixed {t \in {\bf C}_p}, {F(t,Y)} is a formal power series in {Y}. By telescoping series, we have

\displaystyle  F(t,Y) F(t^p,Y) \dots F(t^{p^{n-1}},Y) = (1+Y)^{t+t^p+\dots+t^{p^{n-1}}}

as a formal power series, whenever {t \in \mu_{p^n-1}}. In particular,

\displaystyle  F(\tau(x),Y) F(\tau(x^p),Y) \dots F(\tau(x^{p^{n-1}}),Y) = (1+Y)^{\tau(x)+\tau(x)^p+\dots+\tau(x)^{p^{n-1}}}

for {x \in {\bf F}_{p^n}^\times}. If we can show that {F(T,\lambda)} makes sense as a formal power series {\sum_n a_n T^n} with {|a_n|_p \leq p^{-n/(p-1)}} for all {n}, we thus have

\displaystyle  \Theta(\tau(x)) \Theta(\tau(x^p)) \dots \Theta(\tau(x^{p^{n-1}})) = (1+\lambda)^{\tau(x)+\tau(x)^p+\dots+\tau(x)^{p^{n-1}}};

since {x, x^p, \dots, x^{p^{n-1}}} are the Galois conjugates of {x \in {\bf F}_{p^n}} over {{\bf F}_p}, one can verify that {\tau(x), \tau(x)^p,\dots,\tau(x)^{p^{n-1}}} are the Galois conjugates of {\tau(x) \in \overline{{\bf Q}_p}} over {{\bf Q}_p}, and so {\tau(x)+\tau(x)^p+\dots+\tau(x)^{p^{n-1}}} lies in {{\bf Q}_p}; since {\tau(x)} has norm {1}, this quantity in fact lies in {{\bf Z}_p}. Quotienting out by the maximal ideal {\{ z \in {\bf Q}_p(\tau(x)): |z|_p < 1 \}}, we conclude that

\displaystyle  \tau(x)+\tau(x)^p+\dots+\tau(x)^{p^{n-1}} = x + x^p + \dots + x^{p^{n-1}} = \hbox{Tr}_n(x) \hbox{ mod } p

and (24) follows.

So we are at last reduced to showing that {F(T,\lambda) = \sum_n a_n T^n} with {|a_n|_p \leq p^{-n/(p-1)}} for all {n}. From (25) we have the identity

\displaystyle  F(T^p, Y^p) = F(T,Y)^p \frac{(1+Y^p)^T}{(1+Y)^{pT}}.

On the other hand, we have {1+Y^p = (1 + Y)^p \hbox{ mod } pY{\bf Z}_p[Y]}, and hence

\displaystyle  \frac{1+Y^p}{1+Y} = 1 + p Y G(Y)

for some formal power series {G(Y)} with coefficients in {{\bf Z}_p}, and hence

\displaystyle  F(T^p, Y^p) = F(T,Y)^p (1 + p Y G(Y))^T.

We can expand {(1 + pY G(Y))^T} as {1 + pH(Y,T)}, where {H(Y,T)} is a formal power series with coefficients in {{\bf Z}_p} and no constant term, hence

\displaystyle  F(T^p, Y^p) = F(T,Y)^p (1 + p H(Y,T)).

As {F(0,0)=1}, the {T^a Y^b} coefficient of this identity lets us express the {T^a Y^b} coefficient of {F(T,Y)} as a polynomial combination (over {{\bf Z}_p}) of lower degree coefficients of {F} (as well as coefficients of {H}), which by induction shows that all coefficients of {F} lie in {{\bf Z}_p}. Replacing {Y} by {\lambda}, the desired claim follows.

Remark 17 The function {\Theta} can also be defined as

\displaystyle  \Theta(T) = E_p( \pi T )

where {E_p} is the Artin-Hasse exponential

\displaystyle  E_p(x) := \exp( \sum_{i=0}^\infty \frac{x^{p^i}}{p^i} )

and {\pi \in {\bf C}_p} is a root of the power series {\sum_{i=0}^\infty \frac{x^{p^i}}{p^i}}.

Remark 18 A small modification of Dwork’s argument also establishes rationality (the zeta function associated to) exponential sums such as

\displaystyle  \sum_{x \in V[{\bf F}_q^n]} \chi(\hbox{Tr}_{nr}(P(x)))

for some polynomial {P: V \rightarrow {\bf A}} defined over {{\bf F}_q}, and some multiplicative character {\chi: {\bf F}_p \rightarrow {\bf C}}.