Let {F} be a finite field, with algebraic closure {\overline{F}}, and let {V} be an (affine) algebraic variety defined over {\overline{F}}, by which I mean a set of the form

\displaystyle  V = \{ x \in \overline{F}^d: P_1(x) = \ldots = P_m(x) = 0 \}

for some ambient dimension {d \geq 0}, and some finite number of polynomials {P_1,\ldots,P_m: \overline{F}^d \rightarrow \overline{F}}. In order to reduce the number of subscripts later on, let us say that {V} has complexity at most {M} if {d}, {m}, and the degrees of the {P_1,\ldots,P_m} are all less than or equal to {M}. Note that we do not require at this stage that {V} be irreducible (i.e. not the union of two strictly smaller varieties), or defined over {F}, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)

One can consider two crude measures of how “big” the variety {V} is. The first measure, which is algebraic geometric in nature, is the dimension {\hbox{dim}(V)} of the variety {V}, which is an integer between {0} and {d} (or, depending on convention, {-\infty}, {-1}, or undefined, if {V} is empty) that can be defined in a large number of ways (e.g. it is the largest {r} for which the generic linear projection from {V} to {\overline{F}^r} is dominant, or the smallest {r} for which the intersection with a generic codimension {r} subspace is non-empty). The second measure, which is number-theoretic in nature, is the number {|V(F)| = |V \cap F^d|} of {F}-points of {V}, i.e. points {x = (x_1,\ldots,x_d)} in {V} all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations {P_i(x_1,\ldots,x_d) = 0} for {i=1,\ldots,m} with variables {x_1,\ldots,x_d} in {F}.

These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).

Lemma 1 (Schwarz-Zippel type bound) Let {V} be a variety of complexity at most {M}. Then we have {|V(F)| \ll_M |F|^{\hbox{dim}(V)}}.

Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity {M}, instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)

We argue by induction on the ambient dimension {d} of the variety {V}. The {d=0} case is trivial, so suppose {d \geq 1} and that the claim has already been proven for {d-1}. By breaking up {V} into irreducible components we may assume that {V} is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each {x_1,\ldots,x_{d-1} \in \overline{F}}, the fibre {\{ x_d \in \overline{F}: (x_1,\ldots,x_{d-1},x_d) \in V \}} is either one-dimensional (and thus all of {\overline{F}}) or zero-dimensional. In the latter case, one has {O_M(1)} points in the fibre from the fundamental theorem of algebra (indeed one has a bound of {D} in this case), and {(x_1,\ldots,x_{d-1})} lives in the projection of {V} to {\overline{F}^{d-1}}, which is a variety of dimension at most {\hbox{dim}(V)} and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes {|F|} {F}-points, but {(x_1,\ldots,x_{d-1})} lies in a variety in {\overline{F}^{d-1}} of dimension at most {\hbox{dim}(V)-1} (since otherwise {V} would contain a subvariety of dimension at least {\hbox{dim}(V)+1}, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis. \Box

One can improve the bound on the implied constant to be linear in the degree of {V} (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.

Without further hypotheses on {V}, the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety

\displaystyle  V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: \prod_{j=1}^D (x_d - a_j) = 0\},

where {a_1,\ldots,a_D \in F} are distict, is the union of {D} distinct hyperplanes of dimension {d-1}, with {|V(F)| = D |F|^{d-1}} and complexity {\max(D,d)}; similar examples can easily be concocted for other choices of {\hbox{dim}(V)}. In the other direction, there is also no non-trivial lower bound for {|V(F)|} without further hypotheses on {V}. For a trivial example, if {a} is an element of {\overline{F}} that does not lie in {F}, then the hyperplane

\displaystyle  V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d - a = 0 \}

clearly has no {F}-points whatsoever, despite being a {d-1}-dimensional variety in {\overline{F}^d} of complexity {d}. For a slightly less non-trivial example, if {a} is an element of {F} that is not a quadratic residue, then the variety

\displaystyle  V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d^2 - a = 0 \},

which is the union of two hyperplanes, still has no {F}-points, even though this time the variety is defined over {F} instead of {\overline{F}} (by which we mean that the defining polynomial(s) have all of their coefficients in {F}). There is however the important Lang-Weil bound that allows for a much better estimate as long as {V} is both defined over {F} and irreducible:

Theorem 2 (Lang-Weil bound) Let {V} be a variety of complexity at most {M}. Assume that {V} is defined over {F}, and that {V} is irreducible as a variety over {\overline{F}} (i.e. {V} is geometrically irreducible or absolutely irreducible). Then

\displaystyle  |V(F)| = (1 + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}.

Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over {F} and geometric irreducibility are both necessary.

The Lang-Weil bound is already non-trivial in the model case {d=2, \hbox{dim}(V)=1} of plane curves:

Theorem 3 (Hasse-Weil bound) Let {P: \overline{F}^2 \rightarrow \overline{F}} be an irreducible polynomial of degree {D} with coefficients in {F}. Then

\displaystyle  |\{ (x,y) \in F^2: P(x,y) = 0 \}| = |F| + O_D( |F|^{1/2} ).

Thus, for instance, if {a,b \in F}, then the elliptic curve {\{ (x,y) \in F^2: y^2 = x^3 + ax + b \}} has {|F| + O(|F|^{1/2})} {F}-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.

The hypotheses of definability over {F} and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:

Corollary 4 (Lang-Weil bound, alternate form) Let {V} be a variety of complexity at most {M}. Then one has

\displaystyle  |V(F)| = (c(V) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}

where {c(V)} is the number of top-dimensional components of {V} (i.e. geometrically irreducible components of {V} of dimension {\hbox{dim}(V)}) that are definable over {F}, or equivalently are invariant with respect to the Frobenius endomorphism {x \mapsto x^{|F|}} that defines {F}.

Proof: By breaking up a general variety {V} into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when {V} is itself geometrically irreducible. If {V} is definable over {F}, the claim follows from Theorem 2. If {V} is not definable over {F}, then it is not fixed by the Frobenius endomorphism {Frob} (since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over {F} by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so {V \cap Frob(V)} has strictly smaller dimension than {V}. But {V \cap Frob(V)} captures all the {F}-points of {V}, so in this case the claim follows from Lemma 1. \Box

Note that if {V} is reducible but is itself defined over {F}, then the Frobenius endomorphism preserves {V} itself, but may permute the components of {V} around. In this case, {c(V)} is the number of fixed points of this permutation action of Frobenius on the components. In particular, {c(V)} is always a natural number between {0} and {O_M(1)}; thus we see that regardless of the geometry of {V}, the normalised count {|V(F)|/|F|^{\hbox{dim}(V)}} is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and {|F|} goes to infinity).

Example 1 Consider the variety

\displaystyle  V := \{ (x,y) \in \overline{F}^2: x^2 - ay^2 = 0 \}

for some non-zero parameter {a \in F}. Geometrically (by which we basically mean “when viewed over the algebraically closed field {\overline{F}}“), this is the union of two lines, with slopes corresponding to the two square roots of {a}. If {a} is a quadratic residue, then both of these lines are defined over {F}, and are fixed by Frobenius, and {c(V) = 2} in this case. If {a} is not a quadratic residue, then the lines are not defined over {F}, and the Frobenius automorphism permutes the two lines while preserving {V} as a whole, giving {c(V)=0} in this case.

Corollary 4 effectively computes (at least to leading order) the number-theoretic size {|V(F)|} of a variety in terms of geometric information about {V}, namely its dimension {\hbox{dim}(V)} and the number {c(V)} of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety {V}, but a family of varieties indexed by points in some base space {W}. More precisely, suppose we now have two affine varieties {V,W} of bounded complexity, together with a regular map {\phi: V \rightarrow W} of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space {W} is irreducible. If the map {\phi} is a dominant map (i.e. the image {\phi(V)} is Zariski dense in {W}), then standard algebraic geometry results tell us that the fibres {\phi^{-1}(\{w\})} are an unramified family of {\hbox{dim}(V)-\hbox{dim}(W)}-dimensional varieties outside of an exceptional subset {W'} of {W} of dimension strictly smaller than {\hbox{dim}(W)} (and with {\phi^{-1}(W')} having dimension strictly smaller than {\hbox{dim}(V)}); see e.g. Section I.6.3 of Shafarevich.

Now suppose that {V}, {W}, and {\phi} are defined over {F}. Then, by Lang-Weil, {W(F)} has {(1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(W)}} {F}-points, and by Schwarz-Zippel, for all but {O( |F|^{\hbox{dim}(W)-1})} of these {F}-points {w} (the ones that lie in the subvariety {W'}), the fibre {\phi^{-1}(\{w\})} is an algebraic variety defined over {F} of dimension {\hbox{dim}(V)-\hbox{dim}(W)}. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has {(c(\phi^{-1}(\{w\})) + O(|F|^{-1/2}) |F|^{\hbox{dim}(V)-\hbox{dim}(W)}} {F}-points. One can then ask how the quantity {c(\phi^{-1}(\{w\})} is distributed. A simple but illustrative example occurs when {V=W=F} and {\phi: F \rightarrow F} is the polynomial {\phi(x) := x^2}. Then {c(\phi^{-1}(\{w\})} equals {2} when {w} is a non-zero quadratic residue and {0} when {w} is a non-zero quadratic non-residue (and {1} when {w} is zero, but this is a negligible fraction of all {w}). In particular, in the asymptotic limit {|F| \rightarrow \infty}, {c(\phi^{-1}(\{w\})} is equal to {2} half of the time and {0} half of the time.

Now we describe the asymptotic distribution of the {c(\phi^{-1}(\{w\}))}. We need some additional notation. Let {w_0} be an {F}-point in {W \backslash W'}, and let {\pi_0( \phi^{-1}(\{w_0\}) )} be the connected components of the fibre {\phi^{-1}(\{w_0\})}. As {\phi^{-1}(\{w_0\})} is defined over {F}, this set of components is permuted by the Frobenius endomorphism {Frob}. But there is also an action by monodromy of the fundamental group {\pi_1(W \backslash W')} (this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite {\sigma}-algebra on {\pi_1(W \backslash W')} that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).

Theorem 5 (Lang-Weil with parameters) Let {V, W} be varieties of complexity at most {M} with {W} irreducible, and let {\phi: V \rightarrow W} be a dominant map of complexity at most {M}. Let {w_0} be an {F}-point of {W \backslash W'}. Then, for any natural number {a}, one has {c(\phi^{-1}(\{w\})) = a} for {(\mathop{\bf P}( X = a ) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(W)}} values of {w \in W(F)}, where {X} is the random variable that counts the number of components of a generic fibre {\phi^{-1}(w_0)} that are invariant under {g \circ Frob}, where {g} is an element chosen uniformly at random from the étale fundamental group {\pi_1(W \backslash W')}. In particular, in the asymptotic limit {|F| \rightarrow \infty}, and with {w} chosen uniformly at random from {W(F)}, {c(\phi^{-1}(\{w\}))} (or, equivalently, {|\phi^{-1}(\{w\})(F)| / |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}) and {X} have the same asymptotic distribution.

This theorem generalises Corollary 4 (which is the case when {W} is just a point, so that {\phi^{-1}(\{w\})} is just {V} and {g} is trivial). Informally, the effect of a non-trivial parameter space {W} on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.

Example 2 Let {V=W=F} and {\phi(x) = x^m} for some fixed {m \geq 1}; to avoid some technical issues let us suppose that {m} is coprime to {|F|}. Then {W'} can be taken to be {\{0\}}, and for a base point {w_0 \in W \backslash W'} we can take {w_0=1}. The fibre {\phi^{-1}(\{1\})} – the {m^{th}} roots of unity – can be identified with the cyclic group {{\bf Z}/m{\bf Z}} by using a primitive root of unity. The étale fundamental group {\pi(W \backslash W') = \pi(\overline{F} \backslash 0)} is (I think) isomorphic to the profinite closure {\hat {\bf Z}} of the integers {{\bf Z}} (excluding the part of that closure coming from the characteristic of {F}). Not coincidentally, the integers {{\bf Z}} are the fundamental group of the complex analogue {{\bf C} \backslash \{0\}} of {W \backslash W'}. (Brian Conrad points out to me though that for more complicated varieties, such as covers of {\overline{F} \backslash \{0\}} by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres {{\bf Z}/m{\bf Z}} can given by translation. Meanwhile, the Frobenius map {Frob} on {{\bf Z}/m{\bf Z}} is given by multiplication by {|F|}. A random element {g \circ Frob} then becomes a random affine map {x \mapsto |F|x+b} on {{\bf Z}/m{\bf Z}}, where {b} chosen uniformly at random from {{\bf Z}/m{\bf Z}}. The number of fixed points of this map is equal to the greatest common divisor {(|F|-1,m)} of {|F|-1} and {m} when {b} is divisible by {(|F|-1,m)}, and equal to {0} otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of {F} will be an {m^{th}} power with probability {1/(|F|-1,m)}, and when this occurs, the number of {m^{th}} roots in {F} will be {(|F|-1,m)}.

Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve {E = \{ y^2 = x^3 + ax + b \}}, where {a,b} are chosen uniformly at random, and let {m \geq 1}. Let {E[m]} be the {m}-torsion points of {E} (i.e. those elements {g \in E} with {mg = 0} using the elliptic curve addition law); as a group, this is isomorphic to {{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}} (assuming that {F} has sufficiently large characteristic, for simplicity), and consider the number of {F} points of {E[m]}, which is a random variable taking values in the natural numbers between {0} and {m^2}. In this case, the base variety {W} is the modular curve {X(1)}, and the covering variety {V} is the modular curve {X_1(m)}. The generic fibre here can be identified with {{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}, the monodromy action projects down to the action of {SL_2({\bf Z}/m{\bf Z})}, and the action of Frobenius on this fibre can be shown to be given by a {2 \times 2} matrix with determinant {|F|} (with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of {F}-points of {E[m]} is asymptotic to the distribution of the number of fixed points {X} of a random linear map of determinant {|F|} on {{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}.

Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.

Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.

Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.

— 1. The Stepanov-Bombieri proof of the Hasse-Weil bound —

We now give (most of) the Stepanov-Bombieri proof of Theorem 3, following the exposition of Bombieri (see also these lecture notes of Kowalski, focusing on the model case of curves of the form {\{ (x,y): y^d = P(x)\}}). In this section all implied constants are allowed to depend on the degree {D} of the polynomial {P(x,y)}.

Let {C(F)} be the {F}-points of the curve {C := \{ (x,y) \in \overline{F}^2: P(x,y) = 0\}}; the hypothesis that {P} is irreducible means that the curve {C} is also irreducible. Our task is to establish the upper bound

\displaystyle  |C(F)| \leq |F| + O( |F|^{1/2} ) \ \ \ \ \ (1)

and the lower bound

\displaystyle  |C(F)| \geq |F| - O( |F|^{1/2} ). \ \ \ \ \ (2)

For technical reasons, we only prove these bounds directly when {|F|} is a perfect square; the general case then follows by using the explicit formula for {C[F_{p^r}]} as a function of {r} and the tensor power trick, as explained in this previous post.

Now we prove the upper bound. One uses Stepanov’s polynomial method (discussed in these previous blog posts, for a somewhat different set of applications). The basic idea here is to bound the size of a finite set {C(F)} by constructing a non-trivial polynomial of controlled degree that vanishes to high order at each point of {C(F)}. Stepanov’s original argument projected the curve {C} onto an affine line {\overline{F}} so that one could use ordinary one-dimensional polynomials, but Bombieri’s approach works directly on the curve {C}. For this it becomes convenient to work in the language of divisors of rational functions, rather than zeroes of polynomials (and also it becomes slightly more convenient to work projectively rather than affinely, though I will gloss over this minor detail). Given a rational function {f} on the curve {C} (with coefficients in {\overline{F}}) that is not identically zero or identically infinite on {C}, one can define the divisor {(f)} to be the formal sum {\sum_Q m_Q Q} of the zeroes and poles {Q} of {f} on {C}, weighted by the multiplicity {m_Q} of the zero or pole (with poles being viewed as having negative multiplicity; one has to be a little careful defining multiplicity at singular points of {C}, but this can be done by using a sufficiently algebraic formalism). The irreducibility of {C} (which, of course, must be used at some juncture to prove the Hasse-Weil bound) ensures that this formal sum only has finitely many non-zero terms. One can show that the total signed multiplicities of zeroes and poles of {(f)} always add up to zero: {\sum_Q m_Q = 0}. (This generalises the fundamental theorem of algebra, which is the case when {C} is a projective line.)

Fix a point {P_\infty} of {C} (which one can think of as being at infinity), and for each degree {M}, let {L(M P_\infty)} be the space of those rational functions {f} which are either zero, or have {(f) \geq -M P_\infty}, i.e. {f} has a pole of order {M} at {P_\infty} but no other poles; this generalises the space of polynomials of degree at most {M}, which corresponds to the case when {C} is a projective line and {P_\infty} is the point at infinity. It is easy to see that {L(M P_\infty)} is a vector space (over {\overline{F}}), which is non-decreasing in {M}. In the case when {C} is a line, this space clearly has dimension {M+1}. The Riemann-Roch theorem is the generalisation of this to the case of a general curve. In its most precise form, it asserts the identity

\displaystyle  \hbox{dim} L(M P_\infty) + \hbox{dim} L(K - M P_\infty) = M - g + 1

where {g} is the genus of {C}, and {K} is the canonical divisor of {C} (i.e. the divisor of the canonical line bundle of {C}). For our application, we will need two specific corollaries of the Riemann-Roch theorem, coming from the non-negativity and monotonicity properties of {L}. The first is the Riemann inequality

\displaystyle  M-g+1 \leq \hbox{dim} L(M P_\infty) \leq M+1

(so in particular {\hbox{dim} L(M P_\infty) = M+O(1)}), and the second is the inequality

\displaystyle  \hbox{dim} L(M P_\infty) \leq \hbox{dim} L((M+1) P_\infty) \leq \hbox{dim} L(M P_\infty) + 1

for all {M}. In particular, if we set {{\mathcal N}} to be the set of all degrees {M} such that {\hbox{dim} L(M P_\infty) =\hbox{dim} L((M-1) P_\infty) + 1}, then {{\mathcal N}} consists of all but at most {g} elements of the natural numbers, and we can find a sequence {\{ e_M: M \in {\mathcal N}\}} of rational functions on {C} with {e_M \in \hbox{dim}(M P_\infty) \backslash \hbox{dim}((M-1) P_\infty)} (i.e. {e_M} has a pole of order exactly {M} at {P_\infty} and no other poles), with each {L(M P_\infty)} being spanned by the {f_{M'}} with {M' \leq M}. The {e_M} can be viewed as a generalisation of the standard monomial basis {1,x,x^2,\ldots} of the polynomials of one variable. A key point here is that the degrees of this basis are all distinct; this will be crucial later on for ensuring that the polynomial we will be using for Stepanov’s method does not vanish identically.

We will not prove the Riemann-Roch theorem here, and simply assume it (and its consequences) as a black box. Next, we consider the Frobenius map {Frob: \overline{F}^2 \rightarrow \overline{F}^2} defined by {Frob(x,y) := (x^{|F|}, y^{|F|})}. As {C} is defined over {F}, this map preserves {C}, and the fixed points of this map are precisely the {F}-points of {C}.

The plan is now to find a non-trivial element {f \in L(M_* P_\infty)} for some controlled {M_*} that vanishes to order at least {m} at each fixed point of {Frob} on {C} for some {m}. As the total number of zeroes and poles of {f} must agree, this forces

\displaystyle  m |C(F)| \leq M_*

and by optimising the parameters {m,M_*} this should lead to (1).

The trick is to pick rational functions {f} of a specific form, namely

\displaystyle  f(x,y) = \sum_{M \in {\mathcal N}: M \leq M_0} f_M(x,y)^m e_M( Frob( x, y) ) \ \ \ \ \ (3)

where {M_0} is a parameter to be optimised later, and for each {M}, {f_M} lies in {L( M_1 P_\infty )} for another parameter {M_1} to be optimised in later. We also pick {m} to divide {|F|}, thus {m} is a power of the characteristic {p} of {F} that is less than or equal to {|F|}. There are several reasons to working with polynomials of this form. The first is that because of the Frobenius endomorphism identity {e_M( Frob( x, y) ) = e_M(x,y)^{|F|}}, and {m} divides {|F|} by hypothesis, each term in {f(x,y)} is an {m^{th}} power, and hence (since {a^m+b^m = (a+b)^m} in characteristic {p}) {f} is itself the {m^{th}} power of some rational function. As such, whenever {f} vanishes at a point, it automatically vanishes to order at least {m}.

Secondly, in order for {f} to vanish at every fixed point of the Frobenius map on {C} (and thus vanish to order {m} at each such point, by the above discussion), it is of course necessary and sufficient that the rational function

\displaystyle  \sum_{M \in {\mathcal N}: M \leq M_0} f_M(x,y)^m e_M( x, y ) = 0. \ \ \ \ \ (4)

This is a constraint on the {f_M} which is linear over {F_p} (because the map {x \mapsto x^m} is linear over {F_p}, though not over {F}), and so we can try to enforce this constraint by linear algebra. Note that if each {f_M} lies in {L(M_1 P_\infty)}, then the rational function on the left-hand side of of (4) lies in {L( (mM_1 + M_0) P_\infty)}, so the vanishing (4) imposes {r \hbox{dim}( L((mM_1 + M_0) P_\infty) ) = r(mM_1 + M_0 + O(1))} homogeneous linear constraints (possibly dependent) over {F_p} on the {f_M}, where {|F| =: p^r}. On the other hand, the dimension (over {F_p}) of the space of all possible {f_M} is

\displaystyle r |\{ M \in {\mathcal N}: M \leq M_0\}| \hbox{dim}( M_1 P_\infty ) = r (M_0 + O(1)) (M_1 + O(1)).

Thus, by linear algebra, we can find a collection of {f_M \in L(M_1 P_\infty)}, not all zero, obeying (4) as long as we have

\displaystyle  (M_0 - C)_+ (M_1 - C)_+ > m M_1 + M_0 + C \ \ \ \ \ (5)

We are not done yet, because even if the {f_M} are are not all zero, it is conceivable that the combined function (3) could still vanish. But observe that if {M} is the largest index for which {f_M} is non-zero, then the rational function {f_M(x,y)^m e_M( Frob( x, y) )} has a pole of order at least {M|F|} at {P_\infty}, and all the other terms have a pole of order at most {(M-1)|F| + m M_1} (here it is crucial that the {e_M} have poles of different orders at {P_\infty}, thanks to Riemann-Roch). So as long as we have

\displaystyle  m M_1 < |F|, \ \ \ \ \ (6)

we have that {f} does not vanish identically. As it vanishes to degree at least {m} at every point of {C(F)}, and lies in {L( (mM_1 + |F| M_0) P_\infty )}, and so we obtain the upper bound

\displaystyle  |C(F)| \leq \frac{mM_1 + |F| M_0}{m}.

Now we optimise in the parameters {m, M_0, M_1} subject to the constraints (5), (6). As {|F|} is assumed to be a perfect square and is sufficiently large, then a little work shows that one can satisfy the constraints with {m := |F|^{1/2}}, {M_1 := |F|^{1/2}-1}, and {M_0 := |F|^{1/2}+C'} for some sufficiently large {C'}, leading to the desired bound (1).

Remark 1 When {|F|} is not a perfect square, but is not a prime, one can still choose other values of {m} and obtain a weaker version of (1) that is still non-trivial. But it is curious that Bombieri’s version of Stepanov’s argument breaks down completely when {F} has prime order. As mentioned previously, one can still recover this case a posteriori by the tensor product trick, but this requires the explicit formula for {|C[F_{p^r}]|} which is not exactly trivial. On the other hand, other versions of Stepanov’s argument (such as the one given in this previous blog post) do work in the prime order case, so this obstruction may be purely artificial in nature.

Now we need to pass from the upper bound (1) to the lower bound (2). In model cases, such as curves of the form {C = \{ (x,y): y^d= P(x) \}} with {d-1} not divisible by the characteristic of {F}, one can proceed by observing that the multiplicative group {F^\times} foliates into {d} cosets {g_1 H,\ldots, g_d H} of the subgroup {H := \{ x^d: x \in F^\times\}} of {d^{th}} powers. Because of this, we see that the union {C' := C_1 \cup \ldots \cup C_d} of the curves {C_i := \{ (x,y): g_i y^d = P(x) \}} contains exactly {d} {F}-points on all but {O(1)} vertical lines {\{ x = a \}}, {a \in F}, and hence the union has cardinality {d|F|+O(1)} {F}-points. (In other words, the projection {(x,y) \mapsto x} from {C'} to {\overline{F}} is generically {d}-to-one. Also, as the {C_i} are all dilations of {C}, they will be irreducible if {C} is, and thus by (1) they each have at most {|F| + O(|F|^{1/2})} {F}-points; from subtracting all but one of the curves from {C_1 \cup \ldots C_d} we then see that each of the {C_i} has at least {|F| - O(|F|^{1/2})} {F}-points also. As we can take one of the {g_i} to be the identity, the claim (2) follows in this case. The general case follows in a similar fashion, using Galois theory to lifting a general curve {C} to another curve {C'} whose projection to some coordinate line {\overline{F}} has fixed multiplicity (basically by ensuring that the function field of the former is a Galois extension of the function field of the latter); see Bombieri’s paper for details.

— 2. The proof of the Lang-Weil bound —

Now we prove the Lang-Weil bound. We allow all implied constants to depend on the complexity bound {M}, and will ignore the issues of how to control the complexity of the various algebraic objects that arise in our analysis (as before, one can obtain such bounds more or less “for free” from an ultraproduct argument, if desired).

The proof proceeds by induction on the dimension {\hbox{dim}(V)} of the variety {V}. The first non-trivial case occurs at dimension one. If {V} is a plane curve (so {\hbox{dim}(V)=1}, and the ambient dimension is {2}), then the claim follows directly from the Hasse-Weil bound. If {V} is a higher-dimensional curve, we can use resultants to eliminate all but two of the variables (or, alternatively, one can apply the primitive element theorem to the function field of the curve), and convert the curve to a birationally equivalent (over {F}) plane curve, to reduce to the plane curve case (noting that the singular points of the birational transformation only influence the number of {F}-points by {O(1)} at most).

Now suppose inductively that {\hbox{dim}(V) \geq 2}, and that the claim has already been proven for irreducible varieties of dimension {\hbox{dim}(V)-1}. The induction step is then based on the following two parallel facts about hyperplane slicing, the first in the algebraic geometric category, and the second in the combinatorial category:

Lemma 6 (Bertini’s theorem) Let {V} be an irreducible variety in {\overline{F}^d} of dimension at least two. Then, for a generic affine hyperplane {H} (over {\overline{F}}), the slice {V \cap H} is an irreducible variety of dimension {\hbox{dim}(V)-1}.

Lemma 7 (Random sampling) Let {E} be a subset of {F^d} for some {d=O(1)}. Then, for a affine hyperplane {H} (defined over {F}) chosen uniformly at random, the random variable {|E \cap H|} has mean {|E|/|F|} and variance {O(|E|/|F|)}. In particular, by Chebyshev’s inequality, the median value of {|E \cap H|} is {|E|/|F| + O( (|E|/|F|)^{1/2} )}.

Let us see how the two lemmas conclude the induction. There are {|F| \frac{|F|^d-1}{|F|-1} = (1 + O(|F|^{-1})) |F|^d} different affine hyperplanes {H} over {F}. The set of hyperplanes {H} over {\overline{F}} for which {V \cap H} fails to be an irreducible variety of dimension {\hbox{dim}(V)} is, by Bertini’s theorem, a proper algebraic variety in the Grassmannian space, which one can show to have bounded complexity; and so by Lemma 1, only {O(1/|F|)} of all such hyperplanes over {F} fall into this variety. For the remaining {1-O(1/|F|)} such hyperplanes, we may apply the induction hypothesis and conclude that {|V(F) \cap H| = (1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(V)-1}}. In particular, if {H} is an affine hyperplane chosen uniformly at random, then the median value of {|V(F) \cap H|} is {(1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(V)-1}}. But this is consistent with Lemma 7 only when {|V(F)| = (1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(V)}}, closing the induction.

Remark 2 One can also proceed without the variance bound in Lemma 7 by using Lemma 1 as a crude estimate for {|V(F) \cap H|} for the exceptional {H}, after first making an easy reduction to the case where {V(F)} is not already contained inside a single hyperplane. However, I think the variance bound is illuminating, as it illustrates the concentration of measure that occurs when {|E|} becomes much larger than {|F|}, which only occurs in dimensions two and higher. In contrast, if one attempts to slice a curve by a hyperplane, one typically gets zero or multiple points, so there is no concentration of measure and irreducibility usually fails. So the slicing method can reduce the dimension of {V} down to one, but not to zero, so the need to invoke Hasse-Weil cannot be avoided by this method.

We first show Lemma 7, which is a routine first and second moment calculation. Any given point {x} in {F^n} lies in precisely {1/|F|} of the hyperplanes {H}, so summing in {x} we obtain {\mathop{\bf E} |E \cap H| = |E|/|F|}. Next, any two distinct points {x,y} in {F^n} lie in {\frac{1}{|F|} \frac{|F|^{d-1}-1}{|F|^d-1} = \frac{1}{|F|^2} + O( |F|^{-d-1} )} of the hyperplanes {H}, and so

\displaystyle  \mathop{\bf E} |E \cap H|^2 = |E|/|F| + |E| (|E|-1) (\frac{1}{|F|^2} + O( |F|^{-d-1} ))

\displaystyle  = (\mathop{\bf E} |E \cap H|)^2 + O( |E|/|F| )

giving the variance bound.

Now we show Bertini’s theorem. A dimension count shows that the space of pairs {(p,H)} where {H} is a hyperplane and {p} is a singular point of {V} in {H} has dimension at most {\hbox{dim}(V)-1+d-1}, and so for a generic hyperplane, the space of singular points of {V} in {H} is contained in a variety of dimension at most {\hbox{dim}(V)-2}. Similarly, the space of pairs {(p,H)} where {p} is a smooth point of {V} and {H} is a hyperplane containing the tangent space to {V} at {p} is at most {\hbox{dim}(V)+d-\hbox{dim}(V)-1}, and so the space of smooth points of {V} in {H} whose tangent space is not transverse to a generic {H} also has dimension at most {\hbox{dim}(V)-2}. Thus, for generic {H}, the slice {V \cap H} is a {\hbox{dim}(V)-1}-dimensional variety, which is generically comes from smooth points of {V} whose tangent space is transverse to {H}. The remaining task is to establish irreducibility for generic {H}. As irreducibility is an algebraic condition, {V \cap H} is either generically irreducible or generically reducible. Suppose for contradiction that {V \cap H} is generically reducible. Consider the set {S} of triples {(p,q,H)} where {p,q} are distinct smooth points of {V}, and {H} is a hyperplane through {p,q} that is transverse to the tangent spaces of both {p} and {q}, with {H \cap V} reducible. This is a quasiprojective variety of dimension {d + (\hbox{dim}(V)-1) + (\hbox{dim}(V)-1)}. It can be decomposed into two top-dimensional components: one where {p,q} lie in the same component of {V \cap H}, and one where they lie in distinct components. (The hypothesis {\hbox{dim}(V) \geq 2} is crucial to ensure that the first component is non-empty.) As such, {S} is disconnected in the Zariski topology. On the other hand, for fixed {p,q}, the set of all {H} contributing to {S} is connected, while the set of all distinct smooth pairs {p,q} is also connected, and so {S} (and hence is closure) is also connected, a contradiction.

Remark 3 See also page 174 of Griffiths-Harris for a slight variant of this argument (thanks to Jordan Ellenberg for pointing this out). In that reference, it is also noted that the claim can also be derived from the Lefschetz hyperplane theorem in the case that {V} is smooth.

— 3. Lang-Weil with parameters —

Now we prove Theorem 5. Again we allow all implied constants to depend on the complexity parameter {M}, and gloss over the task of making sure that all algebraic objects have complexity {O_M(1)}.

We will proceed by the moment method. Observe that {c(w)} is a bounded natural number random variable, so to show that {c(w)} has the asymptotic distribution of {X} up to errors of {1+O(|F|^{-1/2})}, it suffices to show that

\displaystyle  \mathop{\bf E}_{w \in W(F) \backslash W'(F)} c(w)^k = \mathop{\bf E} X^k + O(|F|^{-1/2})

for any fixed natural number {k=O(1)}, or equivalently (by the Lang-Weil bound for {W(F)} and for the fibres {\phi^{-1}(\{w\})})

\displaystyle  \sum_{w \in W(F) \backslash W'(F)} |\phi^{-1}(\{w\})(F)|^k = (\mathop{\bf E} X^k + O(|F|^{-1/2})) \ \ \ \ \ (7)

\displaystyle  \times |F|^{\hbox{dim}(W) + k (\hbox{dim}(V)-\hbox{dim}(W))}.

Fix {k}. We form the {k}-fold fibre product {V_k} of {V} over {W \backslash W'}, consisting of all {k}-tuples {(v_1,\ldots,v_k) \in V^k} such that {\phi(v_1)=\ldots=\phi(v_k)} lies in {W \backslash W'}. This fibre product is a quasiprojective variety of dimension {\hbox{dim}(W) + k (\hbox{dim}(V)-\hbox{dim}(W))}, is defined over {F}, and the left-hand side of (7) is precisely equal to the number of {F}-points of {V_k}. Thus, by the Lang-Weil bound for {V_k}, it suffices to show that

\displaystyle  c( V_k ) = \mathop{\bf E} X^k,

i.e. that {V_k} has precisely {\mathop{\bf E} X^k} top-dimensional components which are Frobenius-invariant.

We will in fact prove a more general statement: if {\phi: U \rightarrow W} is any dominant map which has smooth unramified fibres on {W \backslash W'}, then the number {c(U)} of top-dimensional components of {U} that are Frobenius-invariant is equal to the expected number of top-dimensional components of {g \circ Frob} on a generic fibre {\phi^{-1}(\{w_0\})}, where {g} is selected uniformly at random from {\pi_1(W \backslash W')}. Specialising this to the fibre product {V_k} (noting that the generic fibre of {V_k} is just the {k}-fold Cartesian power of the generic fibre of {k}, and similarly for the action of Frobenius and the fundamental group), we obtain the claim.

Now we prove this more general statement. By decomposing into irreducible components, we may certainly assume that {U} is irreducible. If {U} is not Frobenius-invariant, then {Frob(U)} is generically disjoint from {U} and so there certainly can be no components of {\phi^{-1}(\{w_0\})} that are fixed by {g \circ Frob}. Thus we may assume that {U} is Frobenius-invariant (i.e. defined over {F}). Then the situation becomes very close to that of Burnside’s lemma. If there are {K} components {S_1,\ldots,S_K} of {\phi^{-1}(\{w_0\})}, the connectedness of {U} implies that the fundamental group {\pi_1(W \backslash W')} acts transitively on these components, and so for any single component {S_i} and uniformly chosen {g \in \pi_1(W \backslash W')}, {g S_i} will be uniformly distributed amongst the {S_1,\ldots,S_K}, as will {g(Frob(S_i))}. In particular, each {S_i} has a {1/K} chance of being a fixed point of {g \circ Frob}, so the expected number of fixed points of {g \circ Frob} is {1} as requred.

Exercise 1 Obtain the following extension of Corollary 5: if one has {k = O(1)} different dominant maps {\phi_i: V_i \rightarrow W} of varieties {V_i,W} of complexity {O(1)}, with unramified, non-singular fibres on {W \backslash W'}, then the joint distribution of {c(\phi_1^{-1}(\{w\})), \ldots, c(\phi_k^{-1}(\{w\}))} converges in distribution to the joint distribution of {X_1,\ldots,X_k}, where each {X_i} is the number of fixed points of {g \circ Frob} on the components of a generic fibre, where {g} is independent of {i} and drawn uniformly from {\pi_1(W \backslash W')}.

Remark 4 While the moment computation is simple and cute, I would be interested in seeing either a heuristic or rigorous explanation of Theorem 5 (or Exercise 1) that did not proceed through moments. The theorem seems to be asserting a statement roughly of the following form (ignoring for now the fact that paths do not actually make much sense in positive characteristic): if one takes a random path {\gamma} in {W} connecting two random {F}-points, then the loop {Frob(\gamma) - \gamma} behaves as if it is distributed uniformly in the fundamental group of {\pi_1(W)} (or {\pi_1(W \backslash W')}, for any lower-dimensional set {W'}). But I do not know how to make this heuristic precise.