You are currently browsing the tag archive for the ‘polynomial method’ tag.
[This blog post was written jointly by Terry Tao and Will Sawin.]
In the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different from the usual notion of tensor rank, and which (following BCCGNSU) we will call “slice rank”. This notion of rank could then be used to encode the Croot-Lev-Pach-Ellenberg-Gijswijt argument that uses the polynomial method to control capsets.
Afterwards, several papers have applied the slice rank method to further problems – to control tri-colored sum-free sets in abelian groups (BCCGNSU, KSS) and from there to the triangle removal lemma in vector spaces over finite fields (FL), to control sunflowers (NS), and to bound progression-free sets in -groups (P).
In this post we investigate the notion of slice rank more systematically. In particular, we show how to give lower bounds for the slice rank. In many cases, we can show that the upper bounds on slice rank given in the aforementioned papers are sharp to within a subexponential factor. This still leaves open the possibility of getting a better bound for the original combinatorial problem using the slice rank of some other tensor, but for very long arithmetic progressions (at least eight terms), we show that the slice rank method cannot improve over the trivial bound using any tensor.
It will be convenient to work in a “basis independent” formalism, namely working in the category of abstract finite-dimensional vector spaces over a fixed field . (In the applications to the capset problem one takes
to be the finite field of three elements, but most of the discussion here applies to arbitrary fields.) Given
such vector spaces
, we can form the tensor product
, generated by the tensor products
with
for
, subject to the constraint that the tensor product operation
is multilinear. For each
, we have the smaller tensor products
, as well as the
tensor product
defined in the obvious fashion. Elements of of the form
for some
and
will be called rank one functions, and the slice rank (or rank for short)
of an element
of
is defined to be the least nonnegative integer
such that
is a linear combination of
rank one functions. If
are finite-dimensional, then the rank is always well defined as a non-negative integer (in fact it cannot exceed
. It is also clearly subadditive:
For ,
is
when
is zero, and
otherwise. For
,
is the usual rank of the
-tensor
(which can for instance be identified with a linear map from
to the dual space
). The usual notion of tensor rank for higher order tensors uses complete tensor products
,
as the rank one objects, rather than
, giving a rank that is greater than or equal to the slice rank studied here.
From basic linear algebra we have the following equivalences:
Lemma 1 Let
be finite-dimensional vector spaces over a field
, let
be an element of
, and let
be a non-negative integer. Then the following are equivalent:
- (i) One has
.
- (ii) One has a representation of the form
where
are finite sets of total cardinality
at most
, and for each
and
,
and
.
- (iii) One has
where for each
,
is a subspace of
of total dimension
at most
, and we view
as a subspace of
in the obvious fashion.
- (iv) (Dual formulation) There exist subspaces
of the dual space
for
, of total dimension at least
, such that
is orthogonal to
, in the sense that one has the vanishing
for all
, where
is the obvious pairing.
Proof: The equivalence of (i) and (ii) is clear from definition. To get from (ii) to (iii) one simply takes to be the span of the
, and conversely to get from (iii) to (ii) one takes the
to be a basis of the
and computes
by using a basis for the tensor product
consisting entirely of functions of the form
for various
. To pass from (iii) to (iv) one takes
to be the annihilator
of
, and conversely to pass from (iv) to (iii).
One corollary of the formulation (iv), is that the set of tensors of slice rank at most is Zariski closed (if the field
is algebraically closed), and so the slice rank itself is a lower semi-continuous function. This is in contrast to the usual tensor rank, which is not necessarily semicontinuous.
Corollary 2 Let
be finite-dimensional vector spaces over an algebraically closed field
. Let
be a nonnegative integer. The set of elements of
of slice rank at most
is closed in the Zariski topology.
Proof: In view of Lemma 1(i and iv), this set is the union over tuples of integers with
of the projection from
of the set of tuples
with
orthogonal to
, where
is the Grassmanian parameterizing
-dimensional subspaces of
.
One can check directly that the set of tuples with
orthogonal to
is Zariski closed in
using a set of equations of the form
locally on
. Hence because the Grassmanian is a complete variety, the projection of this set to
is also Zariski closed. So the finite union over tuples
of these projections is also Zariski closed.
We also have good behaviour with respect to linear transformations:
Lemma 3 Let
be finite-dimensional vector spaces over a field
, let
be an element of
, and for each
, let
be a linear transformation, with
the tensor product of these maps. Then
Furthermore, if the
are all injective, then one has equality in (2).
Thus, for instance, the rank of a tensor is intrinsic in the sense that it is unaffected by any enlargements of the spaces
.
Proof: The bound (2) is clear from the formulation (ii) of rank in Lemma 1. For equality, apply (2) to the injective , as well as to some arbitrarily chosen left inverses
of the
.
Computing the rank of a tensor is difficult in general; however, the problem becomes a combinatorial one if one has a suitably sparse representation of that tensor in some basis, where we will measure sparsity by the property of being an antichain.
Proposition 4 Let
be finite-dimensional vector spaces over a field
. For each
, let
be a linearly independent set in
indexed by some finite set
. Let
be a subset of
.
where for each
,
is a coefficient in
. Then one has
where the minimum ranges over all coverings of
by sets
, and
for
are the projection maps.
Now suppose that the coefficients
are all non-zero, that each of the
are equipped with a total ordering
, and
is the set of maximal elements of
, thus there do not exist distinct
,
such that
for all
. Then one has
In particular, if
is an antichain (i.e. every element is maximal), then equality holds in (4).
Proof: By Lemma 3 (or by enlarging the bases ), we may assume without loss of generality that each of the
is spanned by the
. By relabeling, we can also assume that each
is of the form
with the usual ordering, and by Lemma 3 we may take each to be
, with
the standard basis.
Let denote the rank of
. To show (4), it suffices to show the inequality
for any covering of by
. By removing repeated elements we may assume that the
are disjoint. For each
, the tensor
can (after collecting terms) be written as
for some . Summing and using (1), we conclude the inequality (6).
Now assume that the are all non-zero and that
is the set of maximal elements of
. To conclude the proposition, it suffices to show that the reverse inequality
holds for some covering
. By Lemma 1(iv), there exist subspaces
of
whose dimension
sums to
Let . Using Gaussian elimination, one can find a basis
of
whose representation in the standard dual basis
of
is in row-echelon form. That is to say, there exist natural numbers
such that for all ,
is a linear combination of the dual vectors
, with the
coefficient equal to one.
We now claim that is disjoint from
. Suppose for contradiction that this were not the case, thus there exists
for each
such that
As is the set of maximal elements of
, this implies that
for any tuple other than
. On the other hand, we know that
is a linear combination of
, with the
coefficient one. We conclude that the tensor product
is equal to
plus a linear combination of other tensor products with
not in
. Taking inner products with (3), we conclude that
, contradicting the fact that
is orthogonal to
. Thus we have
disjoint from
.
For each , let
denote the set of tuples
in
with
not of the form
. From the previous discussion we see that the
cover
, and we clearly have
, and hence from (8) we have (7) as claimed.
As an instance of this proposition, we recover the computation of diagonal rank from the previous blog post:
Example 5 Let
be finite-dimensional vector spaces over a field
for some
. Let
be a natural number, and for
, let
be a linearly independent set in
. Let
be non-zero coefficients in
. Then
has rank
. Indeed, one applies the proposition with
all equal to
, with
the diagonal in
; this is an antichain if we give one of the
the standard ordering, and another of the
the opposite ordering (and ordering the remaining
arbitrarily). In this case, the
are all bijective, and so it is clear that the minimum in (4) is simply
.
The combinatorial minimisation problem in the above proposition can be solved asymptotically when working with tensor powers, using the notion of the Shannon entropy of a discrete random variable
.
Proposition 6 Let
be finite-dimensional vector spaces over a field
. For each
, let
be a linearly independent set in
indexed by some finite set
. Let
be a non-empty subset of
.
Let
be a tensor of the form (3) for some coefficients
. For each natural number
, let
be the tensor power of
copies of
, viewed as an element of
. Then
and
range over the random variables taking values in
.
Now suppose that the coefficients
are all non-zero and that each of the
are equipped with a total ordering
. Let
be the set of maximal elements of
in the product ordering, and let
where
range over random variables taking values in
. Then
as
. In particular, if the maximizer in (10) is supported on the maximal elements of
(which always holds if
is an antichain in the product ordering), then equality holds in (9).
Proof:
as , where
is the projection map. Then the same thing will apply to
and
. Then applying Proposition 4, using the lexicographical ordering on
and noting that, if
are the maximal elements of
, then
are the maximal elements of
, we obtain both (9) and (11).
We first prove the lower bound. By compactness (and the continuity properties of entropy), we can find a random variable taking values in
such that
Let be a small positive quantity that goes to zero sufficiently slowly with
. Let
denote the set of all tuples
in
that are within
of being distributed according to the law of
, in the sense that for all
, one has
By the asymptotic equipartition property, the cardinality of can be computed to be
if goes to zero slowly enough. Similarly one has
Now let be an arbitrary covering of
. By the pigeonhole principle, there exists
such that
which by (13) implies that
noting that the factor can be absorbed into the
error). This gives the lower bound in (12).
Now we prove the upper bound. We can cover by
sets of the form
for various choices of random variables
taking values in
. For each such random variable
, we can find
such that
; we then place all of
in
. It is then clear that the
cover
and that
for all , giving the required upper bound.
It is of interest to compute the quantity in (10). We have the following criterion for when a maximiser occurs:
Proposition 7 Let
be finite sets, and
be non-empty. Let
be the quantity in (10). Let
be a random variable taking values in
, and let
denote the essential range of
, that is to say the set of tuples
such that
is non-zero. Then the following are equivalent:
- (i)
attains the maximum in (10).
- (ii) There exist weights
and a finite quantity
, such that
whenever
, and such that
for all
, with equality if
. (In particular,
must vanish if there exists a
with
.)
Furthermore, when (i) and (ii) holds, one has
Proof: We first show that (i) implies (ii). The function is concave on
. As a consequence, if we define
to be the set of tuples
such that there exists a random variable
taking values in
with
, then
is convex. On the other hand, by (10),
is disjoint from the orthant
. Thus, by the hyperplane separation theorem, we conclude that there exists a half-space
where are reals that are not all zero, and
is another real, which contains
on its boundary and
in its interior, such that
avoids the interior of the half-space. Since
is also on the boundary of
, we see that the
are non-negative, and that
whenever
.
By construction, the quantity
is maximised when . At this point we could use the method of Lagrange multipliers to obtain the required constraints, but because we have some boundary conditions on the
(namely, that the probability that they attain a given element of
has to be non-negative) we will work things out by hand. Let
be an element of
, and
an element of
. For
small enough, we can form a random variable
taking values in
, whose probability distribution is the same as that for
except that the probability of attaining
is increased by
, and the probability of attaining
is decreased by
. If there is any
for which
and
, then one can check that
for sufficiently small , contradicting the maximality of
; thus we have
whenever
. Taylor expansion then gives
for small , where
and similarly for . We conclude that
for all
and
, thus there exists a quantity
such that
for all
, and
for all
. By construction
must be nonnegative. Sampling
using the distribution of
, one has
almost surely; taking expectations we conclude that
The inner sum is , which equals
when
is non-zero, giving (17).
Now we show conversely that (ii) implies (i). As noted previously, the function is concave on
, with derivative
. This gives the inequality
for any (note the right-hand side may be infinite when
and
). Let
be any random variable taking values in
, then on applying the above inequality with
and
, multiplying by
, and summing over
and
gives
By construction, one has
and
so to prove that (which would give (i)), it suffices to show that
or equivalently that the quantity
is maximised when . Since
it suffices to show this claim for the quantity
One can view this quantity as
By (ii), this quantity is bounded by , with equality if
is equal to
(and is in particular ranging in
), giving the claim.
The second half of the proof of Proposition 7 only uses the marginal distributions and the equation(16), not the actual distribution of
, so it can also be used to prove an upper bound on
when the exact maximizing distribution is not known, given suitable probability distributions in each variable. The logarithm of the probability distribution here plays the role that the weight functions do in BCCGNSU.
Remark 8 Suppose one is in the situation of (i) and (ii) above; assume the nondegeneracy condition that
is positive (or equivalently that
is positive). We can assign a “degree”
to each element
by the formula
then every tuple
in
has total degree at most
, and those tuples in
have degree exactly
. In particular, every tuple in
has degree at most
, and hence by (17), each such tuple has a
-component of degree less than or equal to
for some
with
. On the other hand, we can compute from (19) and the fact that
for
that
. Thus, by asymptotic equipartition, and assuming
, the number of “monomials” in
of total degree at most
is at most
; one can in fact use (19) and (18) to show that this is in fact an equality. This gives a direct way to cover
by sets
with
, which is in the spirit of the Croot-Lev-Pach-Ellenberg-Gijswijt arguments from the previous post.
We can now show that the rank computation for the capset problem is sharp:
Proposition 9 Let
denote the space of functions from
to
. Then the function
from
to
, viewed as an element of
, has rank
as
, where
is given by the formula
Proof: In , we have
Thus, if we let be the space of functions from
to
(with domain variable denoted
respectively), and define the basis functions
of indexed by
(with the usual ordering), respectively, and set
to be the set
then is a linear combination of the
with
, and all coefficients non-zero. Then we have
. We will show that the quantity
of (10) agrees with the quantity
of (20), and that the optimizing distribution is supported on
, so that by Proposition 6 the rank of
is
.
To compute the quantity at (10), we use the criterion in Proposition 7. We take to be the random variable taking values in
that attains each of the values
with a probability of
, and each of
with a probability of
; then each of the
attains the values of
with probabilities
respectively, so in particular
is equal to the quantity
in (20). If we now set
and
we can verify the condition (16) with equality for all , which from (17) gives
as desired.
This statement already follows from the result of Kleinberg-Sawin-Speyer, which gives a “tri-colored sum-free set” in of size
, as the slice rank of this tensor is an upper bound for the size of a tri-colored sum-free set. If one were to go over the proofs more carefully to evaluate the subexponential factors, this argument would give a stronger lower bound than KSS, as it does not deal with the substantial loss that comes from Behrend’s construction. However, because it actually constructs a set, the KSS result rules out more possible approaches to give an exponential improvement of the upper bound for capsets. The lower bound on slice rank shows that the bound cannot be improved using only the slice rank of this particular tensor, whereas KSS shows that the bound cannot be improved using any method that does not take advantage of the “single-colored” nature of the problem.
We can also show that the slice rank upper bound in a result of Naslund-Sawin is similarly sharp:
Proposition 10 Let
denote the space of functions from
to
. Then the function
from
, viewed as an element of
, has slice rank
Proof: Let and
be a basis for the space
of functions on
, itself indexed by
. Choose similar bases for
and
, with
and
.
Set . Then
is a linear combination of the
with
, and all coefficients non-zero. Order
the usual way so that
is an antichain. We will show that the quantity
of (10) is
, so that applying the last statement of Proposition 6, we conclude that the rank of
is
,
Let be the random variable taking values in
that attains each of the values
with a probability of
. Then each of the
attains the value
with probability
and
with probability
, so
Setting and
, we can verify the condition (16) with equality for all
, which from (17) gives
as desired.
We used a slightly different method in each of the last two results. In the first one, we use the most natural bases for all three vector spaces, and distinguish from its set of maximal elements
. In the second one we modify one basis element slightly, with
instead of the more obvious choice
, which allows us to work with
instead of
. Because
is an antichain, we do not need to distinguish
and
. Both methods in fact work with either problem, and they are both about equally difficult, but we include both as either might turn out to be substantially more convenient in future work.
Proposition 11 Let
be a natural number and let
be a finite abelian group. Let
be any field. Let
denote the space of functions from
to
.
Let
be any
-valued function on
that is nonzero only when the
elements of
form a
-term arithmetic progression, and is nonzero on every
-term constant progression.
Then the slice rank of
is
.
Proof: We apply Proposition 4, using the standard bases of . Let
be the support of
. Suppose that we have
orderings on
such that the constant progressions are maximal elements of
and thus all constant progressions lie in
. Then for any partition
of
,
can contain at most
constant progressions, and as all
constant progressions must lie in one of the
, we must have
. By Proposition 4, this implies that the slice rank of
is at least
. Since
is a
tensor, the slice rank is at most
, hence exactly
.
So it is sufficient to find orderings on
such that the constant progressions are maximal element of
. We make several simplifying reductions: We may as well assume that
consists of all the
-term arithmetic progressions, because if the constant progressions are maximal among the set of all progressions then they are maximal among its subset
. So we are looking for an ordering in which the constant progressions are maximal among all
-term arithmetic progressions. We may as well assume that
is cyclic, because if for each cyclic group we have an ordering where constant progressions are maximal, on an arbitrary finite abelian group the lexicographic product of these orderings is an ordering for which the constant progressions are maximal. We may assume
, as if we have an
-tuple of orderings where constant progressions are maximal, we may add arbitrary orderings and the constant progressions will remain maximal.
So it is sufficient to find orderings on the cyclic group
such that the constant progressions are maximal elements of the set of
-term progressions in
in the
-fold product ordering. To do that, let the first, second, third, and fifth orderings be the usual order on
and let the fourth, sixth, seventh, and eighth orderings be the reverse of the usual order on
.
Then let be a constant progression and for contradiction assume that
is a progression greater than
in this ordering. We may assume that
, because otherwise we may reverse the order of the progression, which has the effect of reversing all eight orderings, and then apply the transformation
, which again reverses the eight orderings, bringing us back to the original problem but with
.
Take a representative of the residue class in the interval
. We will abuse notation and call this
. Observe that
, and
are all contained in the interval
modulo
. Take a representative of the residue class
in the interval
. Then
is in the interval
for some
. The distance between any distinct pair of intervals of this type is greater than
, but the distance between
and
is at most
, so
is in the interval
. By the same reasoning,
is in the interval
. Therefore
. But then the distance between
and
is at most
, so by the same reasoning
is in the interval
. Because
is between
and
, it also lies in the interval
. Because
is in the interval
, and by assumption it is congruent mod
to a number in the set
greater than or equal to
, it must be exactly
. Then, remembering that
and
lie in
, we have
and
, so
, hence
, thus
, which contradicts the assumption that
.
In fact, given a -term progressions mod
and a constant, we can form a
-term binary sequence with a
for each step of the progression that is greater than the constant and a
for each step that is less. Because a rotation map, viewed as a dynamical system, has zero topological entropy, the number of
-term binary sequences that appear grows subexponentially in
. Hence there must be, for large enough
, at least one sequence that does not appear. In this proof we exploit a sequence that does not appear for
.
A capset in the vector space over the finite field
of three elements is a subset
of
that does not contain any lines
, where
and
. A basic problem in additive combinatorics (discussed in one of the very first posts on this blog) is to obtain good upper and lower bounds for the maximal size of a capset in
.
Trivially, one has . Using Fourier methods (and the density increment argument of Roth), the bound of
was obtained by Meshulam, and improved only as late as 2012 to
for some absolute constant
by Bateman and Katz. But in a very recent breakthrough, Ellenberg (and independently Gijswijt) obtained the exponentially superior bound
, using a version of the polynomial method recently introduced by Croot, Lev, and Pach. (In the converse direction, a construction of Edel gives capsets as large as
.) Given the success of the polynomial method in superficially similar problems such as the finite field Kakeya problem (discussed in this previous post), it was natural to wonder that this method could be applicable to the cap set problem (see for instance this MathOverflow comment of mine on this from 2010), but it took a surprisingly long time before Croot, Lev, and Pach were able to identify the precise variant of the polynomial method that would actually work here.
The proof of the capset bound is very short (Ellenberg’s and Gijswijt’s preprints are both 3 pages long, and Croot-Lev-Pach is 6 pages), but I thought I would present a slight reformulation of the argument which treats the three points on a line in symmetrically (as opposed to treating the third point differently from the first two, as is done in the Ellenberg and Gijswijt papers; Croot-Lev-Pach also treat the middle point of a three-term arithmetic progression differently from the two endpoints, although this is a very natural thing to do in their context of
). The basic starting point is this: if
is a capset, then one has the identity
for all , where
is the Kronecker delta function, which we view as taking values in
. Indeed, (1) reflects the fact that the equation
has solutions precisely when
are either all equal, or form a line, and the latter is ruled out precisely when
is a capset.
To exploit (1), we will show that the left-hand side of (1) is “low rank” in some sense, while the right-hand side is “high rank”. Recall that a function taking values in a field
is of rank one if it is non-zero and of the form
for some
, and that the rank of a general function
is the least number of rank one functions needed to express
as a linear combination. More generally, if
, we define the rank of a function
to be the least number of “rank one” functions of the form
for some and some functions
,
, that are needed to generate
as a linear combination. For instance, when
, the rank one functions take the form
,
,
, and linear combinations of
such rank one functions will give a function of rank at most
.
It is a standard fact in linear algebra that the rank of a diagonal matrix is equal to the number of non-zero entries. This phenomenon extends to higher dimensions:
Lemma 1 (Rank of diagonal hypermatrices) Let
, let
be a finite set, let
be a field, and for each
, let
be a coefficient. Then the rank of the function
Proof: We induct on . As mentioned above, the case
follows from standard linear algebra, so suppose now that
and the claim has already been proven for
.
It is clear that the function (2) has rank at most equal to the number of non-zero (since the summands on the right-hand side are rank one functions), so it suffices to establish the lower bound. By deleting from
those elements
with
(which cannot increase the rank), we may assume without loss of generality that all the
are non-zero. Now suppose for contradiction that (2) has rank at most
, then we obtain a representation
for some sets of cardinalities adding up to at most
, and some functions
and
.
Consider the space of functions that are orthogonal to all the
,
in the sense that
for all . This space is a vector space whose dimension
is at least
. A basis of this space generates a
coordinate matrix of full rank, which implies that there is at least one non-singular
minor. This implies that there exists a function
in this space which is nowhere vanishing on some subset
of
of cardinality at least
.
If we multiply (3) by and sum in
, we conclude that
where
The right-hand side has rank at most , since the summands are rank one functions. On the other hand, from induction hypothesis the left-hand side has rank at least
, giving the required contradiction.
On the other hand, we have the following (symmetrised version of a) beautifully simple observation of Croot, Lev, and Pach:
Lemma 2 On
, the rank of the function
is at most
, where
Proof: Using the identity for
, we have
The right-hand side is clearly a polynomial of degree in
, which is then a linear combination of monomials
with with
In particular, from the pigeonhole principle, at least one of is at most
.
Consider the contribution of the monomials for which . We can regroup this contribution as
where ranges over those
with
,
is the monomial
and is some explicitly computable function whose exact form will not be of relevance to our argument. The number of such
is equal to
, so this contribution has rank at most
. The remaining contributions arising from the cases
and
similarly have rank at most
(grouping the monomials so that each monomial is only counted once), so the claim follows.
Upon restricting from to
, the rank of
is still at most
. The two lemmas then combine to give the Ellenberg-Gijswijt bound
All that remains is to compute the asymptotic behaviour of . This can be done using the general tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if
,
,
for some
summing to
, Stirling’s formula gives
where is the entropy function
We then have
where is the maximum entropy
subject to the constraints
A routine Lagrange multiplier computation shows that the maximum occurs when
and is approximately
, giving rise to the claimed bound of
.
Remark 3 As noted in the Ellenberg and Gijswijt papers, the above argument extends readily to other fields than
to control the maximal size of subset of
that has no non-trivial solutions to the equation
, where
are non-zero constants that sum to zero. Of course one replaces the function
in Lemma 2 by
in this case.
Remark 4 This symmetrised formulation suggests that one possible way to improve slightly on the numerical quantity
by finding a more efficient way to decompose
into rank one functions, however I was not able to do so (though such improvements are reminiscent of the Strassen type algorithms for fast matrix multiplication).
Remark 5 It is tempting to see if this method can get non-trivial upper bounds for sets
with no length
progressions, in (say)
. One can run the above arguments, replacing the function
with
this leads to the bound
where
Unfortunately,
is asymptotic to
and so this bound is in fact slightly worse than the trivial bound
! However, there is a slim chance that there is a more efficient way to decompose
into rank one functions that would give a non-trivial bound on
. I experimented with a few possible such decompositions but unfortunately without success.
Remark 6 Return now to the capset problem. Since Lemma 1 is valid for any field
, one could perhaps hope to get better bounds by viewing the Kronecker delta function
as taking values in another field than
, such as the complex numbers
. However, as soon as one works in a field of characteristic other than
, one can adjoin a cube root
of unity, and one now has the Fourier decomposition
Moving to the Fourier basis, we conclude from Lemma 1 that the function
on
now has rank exactly
, and so one cannot improve upon the trivial bound of
by this method using fields of characteristic other than three as the range field. So it seems one has to stick with
(or the algebraic completion thereof).
Thanks to Jordan Ellenberg and Ben Green for helpful discussions.
Let be a finite field of order
, and let
be an absolutely irreducible smooth projective curve defined over
(and hence over the algebraic closure
of that field). For instance,
could be the projective elliptic curve
in the projective plane , where
are coefficients whose discriminant
is non-vanishing, which is the projective version of the affine elliptic curve
To each such curve one can associate a genus
, which we will define later; for instance, elliptic curves have genus
. We can also count the cardinality
of the set
of
-points of
. The Hasse-Weil bound relates the two:
The usual proofs of this bound proceed by first establishing a trace formula of the form
for some complex numbers independent of
; this is in fact a special case of the Lefschetz-Grothendieck trace formula, and can be interpreted as an assertion that the zeta function associated to the curve
is rational. The task is then to establish a bound
for all
; this (or more precisely, the slightly stronger assertion
) is the Riemann hypothesis for such curves. This can be done either by passing to the Jacobian variety of
and using a certain duality available on the cohomology of such varieties, known as Rosati involution; alternatively, one can pass to the product surface
and apply the Riemann-Roch theorem for that surface.
In 1969, Stepanov introduced an elementary method (a version of what is now known as the polynomial method) to count (or at least to upper bound) the quantity . The method was initially restricted to hyperelliptic curves, but was soon extended to general curves. In particular, Bombieri used this method to give a short proof of the following weaker version of the Hasse-Weil bound:
Theorem 2 (Weak Hasse-Weil bound) If
is a perfect square, and
, then
.
In fact, the bound on can be sharpened a little bit further, as we will soon see.
Theorem 2 is only an upper bound on , but there is a Galois-theoretic trick to convert (a slight generalisation of) this upper bound to a matching lower bound, and if one then uses the trace formula (1) (and the “tensor power trick” of sending
to infinity to control the weights
) one can then recover the full Hasse-Weil bound. We discuss these steps below the fold.
I’ve discussed Bombieri’s proof of Theorem 2 in this previous post (in the special case of hyperelliptic curves), but now wish to present the full proof, with some minor simplifications from Bombieri’s original presentation; it is mostly elementary, with the deepest fact from algebraic geometry needed being Riemann’s inequality (a weak form of the Riemann-Roch theorem).
The first step is to reinterpret as the number of points of intersection between two curves
in the surface
. Indeed, if we define the Frobenius endomorphism
on any projective space by
then this map preserves the curve , and the fixed points of this map are precisely the
points of
:
Thus one can interpret as the number of points of intersection between the diagonal curve
and the Frobenius graph
which are copies of inside
. But we can use the additional hypothesis that
is a perfect square to write this more symmetrically, by taking advantage of the fact that the Frobenius map has a square root
with also preserving
. One can then also interpret
as the number of points of intersection between the curve
Let be the field of rational functions on
(with coefficients in
), and define
,
, and
analogously )(although
is likely to be disconnected, so
will just be a ring rather than a field. We then (morally) have the commuting square
if we ignore the issue that a rational function on, say, , might blow up on all of
and thus not have a well-defined restriction to
. We use
and
to denote the restriction maps. Furthermore, we have obvious isomorphisms
,
coming from composing with the graphing maps
and
.
The idea now is to find a rational function on the surface
of controlled degree which vanishes when restricted to
, but is non-vanishing (and not blowing up) when restricted to
. On
, we thus get a non-zero rational function
of controlled degree which vanishes on
– which then lets us bound the cardinality of
in terms of the degree of
. (In Bombieri’s original argument, one required vanishing to high order on the
side, but in our presentation, we have factored out a
term which removes this high order vanishing condition.)
To find this , we will use linear algebra. Namely, we will locate a finite-dimensional subspace
of
(consisting of certain “controlled degree” rational functions) which projects injectively to
, but whose projection to
has strictly smaller dimension than
itself. The rank-nullity theorem then forces the existence of a non-zero element
of
whose projection to
vanishes, but whose projection to
is non-zero.
Now we build . Pick a
point
of
, which we will think of as being a point at infinity. (For the purposes of proving Theorem 2, we may clearly assume that
is non-empty.) Thus
is fixed by
. To simplify the exposition, we will also assume that
is fixed by the square root
of
; in the opposite case when
has order two when acting on
, the argument is essentially the same, but all references to
in the second factor of
need to be replaced by
(we leave the details to the interested reader).
For any natural number , define
to be the set of rational functions
which are allowed to have a pole of order up to
at
, but have no other poles on
; note that as we are assuming
to be smooth, it is unambiguous what a pole is (and what order it will have). (In the fancier language of divisors and Cech cohomology, we have
.) The space
is clearly a vector space over
; one can view intuitively as the space of “polynomials” on
of “degree” at most
. When
,
consists just of the constant functions. Indeed, if
, then the image
of
avoids
and so lies in the affine line
; but as
is projective, the image
needs to be compact (hence closed) in
, and must therefore be a point, giving the claim.
For higher , we have the easy relations
The former inequality just comes from the trivial inclusion . For the latter, observe that if two functions
lie in
, so that they each have a pole of order at most
at
, then some linear combination of these functions must have a pole of order at most
at
; thus
has codimension at most one in
, giving the claim.
From (3) and induction we see that each of the are finite dimensional, with the trivial upper bound
Riemann’s inequality complements this with the lower bound
thus one has for all but at most
exceptions (in fact, exactly
exceptions as it turns out). This is a consequence of the Riemann-Roch theorem; it can be proven from abstract nonsense (the snake lemma) if one defines the genus
in a non-standard fashion (as the dimension of the first Cech cohomology
of the structure sheaf
of
), but to obtain this inequality with a standard definition of
(e.g. as the dimension of the zeroth Cech cohomolgy
of the line bundle of differentials) requires the more non-trivial tool of Serre duality.
At any rate, now that we have these vector spaces , we will define
to be a tensor product space
for some natural numbers which we will optimise in later. That is to say,
is spanned by functions of the form
with
and
. This is clearly a linear subspace of
of dimension
, and hence by Rieman’s inequality we have
Observe that maps a tensor product
to a function
. If
and
, then we see that the function
has a pole of order at most
at
. We conclude that
and in particular by (4)
We will choose to be a bit bigger than
, to make the
image of
smaller than that of
. From (6), (10) we see that if we have the inequality
(together with (7)) then cannot be injective.
On the other hand, we have the following basic fact:
Proof: From (3), we can find a linear basis of
such that each of the
has a distinct order
of pole at
(somewhere between
and
inclusive). Similarly, we may find a linear basis
of
such that each of the
has a distinct order
of pole at
(somewhere between
and
inclusive). The functions
then span
, and the order of pole at
is
. But since
, these orders are all distinct, and so these functions must be linearly independent. The claim follows.
This gives us the following bound:
Proposition 4 Let
be natural numbers such that (7), (11), (12) hold. Then
.
Proof: As is not injective, we can find
with
vanishing. By the above lemma, the function
is then non-zero, but it must also vanish on
, which has cardinality
. On the other hand, by (8),
has a pole of order at most
at
and no other poles. Since the number of poles and zeroes of a rational function on a projective curve must add up to zero, the claim follows.
If , we may make the explicit choice
and a brief calculation then gives Theorem 2. In some cases one can optimise things a bit further. For instance, in the genus zero case (e.g. if
is just the projective line
) one may take
and conclude the absolutely sharp bound
in this case; in the case of the projective line
, the function
is in fact the very concrete function
.
Remark 1 When
is not a perfect square, one can try to run the above argument using the factorisation
instead of
. This gives a weaker version of the above bound, of the shape
. In the hyperelliptic case at least, one can erase this loss by working with a variant of the argument in which one requires
to vanish to high order at
, rather than just to first order; see this survey article of mine for details.
Let be a finite field, with algebraic closure
, and let
be an (affine) algebraic variety defined over
, by which I mean a set of the form
for some ambient dimension , and some finite number of polynomials
. In order to reduce the number of subscripts later on, let us say that
has complexity at most
if
,
, and the degrees of the
are all less than or equal to
. Note that we do not require at this stage that
be irreducible (i.e. not the union of two strictly smaller varieties), or defined over
, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)
One can consider two crude measures of how “big” the variety is. The first measure, which is algebraic geometric in nature, is the dimension
of the variety
, which is an integer between
and
(or, depending on convention,
,
, or undefined, if
is empty) that can be defined in a large number of ways (e.g. it is the largest
for which the generic linear projection from
to
is dominant, or the smallest
for which the intersection with a generic codimension
subspace is non-empty). The second measure, which is number-theoretic in nature, is the number
of
-points of
, i.e. points
in
all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations
for
with variables
in
.
These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).
Lemma 1 (Schwarz-Zippel type bound) Let
be a variety of complexity at most
. Then we have
.
Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity , instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)
We argue by induction on the ambient dimension of the variety
. The
case is trivial, so suppose
and that the claim has already been proven for
. By breaking up
into irreducible components we may assume that
is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each
, the fibre
is either one-dimensional (and thus all of
) or zero-dimensional. In the latter case, one has
points in the fibre from the fundamental theorem of algebra (indeed one has a bound of
in this case), and
lives in the projection of
to
, which is a variety of dimension at most
and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes
-points, but
lies in a variety in
of dimension at most
(since otherwise
would contain a subvariety of dimension at least
, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis.
One can improve the bound on the implied constant to be linear in the degree of (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.
Without further hypotheses on , the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety
where are distict, is the union of
distinct hyperplanes of dimension
, with
and complexity
; similar examples can easily be concocted for other choices of
. In the other direction, there is also no non-trivial lower bound for
without further hypotheses on
. For a trivial example, if
is an element of
that does not lie in
, then the hyperplane
clearly has no -points whatsoever, despite being a
-dimensional variety in
of complexity
. For a slightly less non-trivial example, if
is an element of
that is not a quadratic residue, then the variety
which is the union of two hyperplanes, still has no -points, even though this time the variety is defined over
instead of
(by which we mean that the defining polynomial(s) have all of their coefficients in
). There is however the important Lang-Weil bound that allows for a much better estimate as long as
is both defined over
and irreducible:
Theorem 2 (Lang-Weil bound) Let
be a variety of complexity at most
. Assume that
is defined over
, and that
is irreducible as a variety over
(i.e.
is geometrically irreducible or absolutely irreducible). Then
Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over and geometric irreducibility are both necessary.
The Lang-Weil bound is already non-trivial in the model case of plane curves:
Theorem 3 (Hasse-Weil bound) Let
be an irreducible polynomial of degree
with coefficients in
. Then
Thus, for instance, if , then the elliptic curve
has
-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.
The hypotheses of definability over and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:
Corollary 4 (Lang-Weil bound, alternate form) Let
be a variety of complexity at most
. Then one has
where
is the number of top-dimensional components of
(i.e. geometrically irreducible components of
of dimension
) that are definable over
, or equivalently are invariant with respect to the Frobenius endomorphism
that defines
.
Proof: By breaking up a general variety into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when
is itself geometrically irreducible. If
is definable over
, the claim follows from Theorem 2. If
is not definable over
, then it is not fixed by the Frobenius endomorphism
(since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over
by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so
has strictly smaller dimension than
. But
captures all the
-points of
, so in this case the claim follows from Lemma 1.
Note that if is reducible but is itself defined over
, then the Frobenius endomorphism preserves
itself, but may permute the components of
around. In this case,
is the number of fixed points of this permutation action of Frobenius on the components. In particular,
is always a natural number between
and
; thus we see that regardless of the geometry of
, the normalised count
is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and
goes to infinity).
Example 1 Consider the variety
for some non-zero parameter
. Geometrically (by which we basically mean “when viewed over the algebraically closed field
“), this is the union of two lines, with slopes corresponding to the two square roots of
. If
is a quadratic residue, then both of these lines are defined over
, and are fixed by Frobenius, and
in this case. If
is not a quadratic residue, then the lines are not defined over
, and the Frobenius automorphism permutes the two lines while preserving
as a whole, giving
in this case.
Corollary 4 effectively computes (at least to leading order) the number-theoretic size of a variety in terms of geometric information about
, namely its dimension
and the number
of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety
, but a family of varieties indexed by points in some base space
. More precisely, suppose we now have two affine varieties
of bounded complexity, together with a regular map
of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space
is irreducible. If the map
is a dominant map (i.e. the image
is Zariski dense in
), then standard algebraic geometry results tell us that the fibres
are an unramified family of
-dimensional varieties outside of an exceptional subset
of
of dimension strictly smaller than
(and with
having dimension strictly smaller than
); see e.g. Section I.6.3 of Shafarevich.
Now suppose that ,
, and
are defined over
. Then, by Lang-Weil,
has
-points, and by Schwarz-Zippel, for all but
of these
-points
(the ones that lie in the subvariety
), the fibre
is an algebraic variety defined over
of dimension
. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has
-points. One can then ask how the quantity
is distributed. A simple but illustrative example occurs when
and
is the polynomial
. Then
equals
when
is a non-zero quadratic residue and
when
is a non-zero quadratic non-residue (and
when
is zero, but this is a negligible fraction of all
). In particular, in the asymptotic limit
,
is equal to
half of the time and
half of the time.
Now we describe the asymptotic distribution of the . We need some additional notation. Let
be an
-point in
, and let
be the connected components of the fibre
. As
is defined over
, this set of components is permuted by the Frobenius endomorphism
. But there is also an action by monodromy of the fundamental group
(this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite
-algebra on
that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).
Theorem 5 (Lang-Weil with parameters) Let
be varieties of complexity at most
with
irreducible, and let
be a dominant map of complexity at most
. Let
be an
-point of
. Then, for any natural number
, one has
for
values of
, where
is the random variable that counts the number of components of a generic fibre
that are invariant under
, where
is an element chosen uniformly at random from the étale fundamental group
. In particular, in the asymptotic limit
, and with
chosen uniformly at random from
,
(or, equivalently,
) and
have the same asymptotic distribution.
This theorem generalises Corollary 4 (which is the case when is just a point, so that
is just
and
is trivial). Informally, the effect of a non-trivial parameter space
on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.
Example 2 Let
and
for some fixed
; to avoid some technical issues let us suppose that
is coprime to
. Then
can be taken to be
, and for a base point
we can take
. The fibre
– the
roots of unity – can be identified with the cyclic group
by using a primitive root of unity. The étale fundamental group
is (I think) isomorphic to the profinite closure
of the integers
(excluding the part of that closure coming from the characteristic of
). Not coincidentally, the integers
are the fundamental group of the complex analogue
of
. (Brian Conrad points out to me though that for more complicated varieties, such as covers of
by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres
can given by translation. Meanwhile, the Frobenius map
on
is given by multiplication by
. A random element
then becomes a random affine map
on
, where
chosen uniformly at random from
. The number of fixed points of this map is equal to the greatest common divisor
of
and
when
is divisible by
, and equal to
otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of
will be an
power with probability
, and when this occurs, the number of
roots in
will be
.
Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve
, where
are chosen uniformly at random, and let
. Let
be the
-torsion points of
(i.e. those elements
with
using the elliptic curve addition law); as a group, this is isomorphic to
(assuming that
has sufficiently large characteristic, for simplicity), and consider the number of
points of
, which is a random variable taking values in the natural numbers between
and
. In this case, the base variety
is the modular curve
, and the covering variety
is the modular curve
. The generic fibre here can be identified with
, the monodromy action projects down to the action of
, and the action of Frobenius on this fibre can be shown to be given by a
matrix with determinant
(with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of
-points of
is asymptotic to the distribution of the number of fixed points
of a random linear map of determinant
on
.
Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.
Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.
Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.
The ham sandwich theorem asserts that, given bounded open sets
in
, there exists a hyperplane
that bisects each of these sets
, in the sense that each of the two half-spaces
on either side of the hyperplane captures exactly half of the volume of
. The shortest proof of this result proceeds by invoking the Borsuk-Ulam theorem.
A useful generalisation of the ham sandwich theorem is the polynomial ham sandwich theorem, which asserts that given bounded open sets
in
, there exists a hypersurface
of degree
(thus
is a polynomial of degree
such that the two semi-algebraic sets
and
capture half the volume of each of the
. (More precisely, the degree will be at most
, where
is the first positive integer for which
exceeds
.) This theorem can be deduced from the Borsuk-Ulam theorem in the same manner that the ordinary ham sandwich theorem is (and can also be deduced directly from the ordinary ham sandwich theorem via the Veronese embedding).
The polynomial ham sandwich theorem is a theorem about continuous bodies (bounded open sets), but a simple limiting argument leads one to the following discrete analogue: given finite sets
in
, there exists a hypersurface
of degree
, such that each of the two semi-algebraic sets
and
contain at most half of the points on
(note that some of the points of
can certainly lie on the boundary
). This can be iterated to give a useful cell decomposition:
Proposition 1 (Cell decomposition) Let
be a finite set of points in
, and let
be a positive integer. Then there exists a polynomial
of degree at most
, and a decomposition
into the hypersurface
and a collection
of cells bounded by
, such that
, and such that each cell
contains at most
points.
A proof is sketched in this previous blog post. The cells in the argument are not necessarily connected (being instead formed by intersecting together a number of semi-algebraic sets such as and
), but it is a classical result (established independently by Oleinik-Petrovskii, Milnor, and Thom) that any degree
hypersurface
divides
into
connected components, so one can easily assume that the cells are connected if desired. (Incidentally, one does not need the full machinery of the results in the above cited papers – which control not just the number of components, but all the Betti numbers of the complement of
– to get the bound on connected components; one can instead observe that every bounded connected component has a critical point where
, and one can control the number of these points by Bezout’s theorem, after perturbing
slightly to enforce genericity, and then count the unbounded components by an induction on dimension.)
Remark 1 By setting
as large as
, we obtain as a limiting case of the cell decomposition the fact that any finite set
of points in
can be captured by a hypersurface of degree
. This fact is in fact true over arbitrary fields (not just over
), and can be proven by a simple linear algebra argument (see e.g. this previous blog post). However, the cell decomposition is more flexible than this algebraic fact due to the ability to arbitrarily select the degree parameter
.
The cell decomposition can be viewed as a structural theorem for arbitrary large configurations of points in space, much as the Szemerédi regularity lemma can be viewed as a structural theorem for arbitrary large dense graphs. Indeed, just as many problems in the theory of large dense graphs can be profitably attacked by first applying the regularity lemma and then inspecting the outcome, it now seems that many problems in combinatorial incidence geometry can be attacked by applying the cell decomposition (or a similar such decomposition), with a parameter to be optimised later, to a relevant set of points, and seeing how the cells interact with each other and with the other objects in the configuration (lines, planes, circles, etc.). This strategy was spectacularly illustrated recently with Guth and Katz‘s use of the cell decomposition to resolve the Erdös distinct distance problem (up to logarithmic factors), as discussed in this blog post.
In this post, I wanted to record a simpler (but still illustrative) version of this method (that I learned from Nets Katz), namely to provide yet another proof of the Szemerédi-Trotter theorem in incidence geometry:
Theorem 2 (Szemerédi-Trotter theorem) Given a finite set of points
and a finite set of lines
in
, the set of incidences
has cardinality
This theorem has many short existing proofs, including one via crossing number inequalities (as discussed in this previous post) or via a slightly different type of cell decomposition (as discussed here). The proof given below is not that different, in particular, from the latter proof, but I believe it still serves as a good introduction to the polynomial method in combinatorial incidence geometry.
Combinatorial incidence geometry is the study of the possible combinatorial configurations between geometric objects such as lines and circles. One of the basic open problems in the subject has been the Erdős distance problem, posed in 1946:
Problem 1 (Erdős distance problem) Let
be a large natural number. What is the least number
of distances that are determined by
points
in the plane?
Erdős called this least number . For instance, one can check that
and
, although the precise computation of
rapidly becomes more difficult after this. By considering
points in arithmetic progression, we see that
. By considering the slightly more sophisticated example of a
lattice grid (assuming that
is a square number for simplicity), and using some analytic number theory, one can obtain the slightly better asymptotic bound
.
On the other hand, lower bounds are more difficult to obtain. As observed by Erdős, an easy argument, ultimately based on the incidence geometry fact that any two circles intersect in at most two points, gives the lower bound . The exponent
has been slowly increasing over the years by a series of increasingly intricate arguments combining incidence geometry facts with other known results in combinatorial incidence geometry (most notably the Szemerédi-Trotter theorem) and also some tools from additive combinatorics; however, these methods seemed to fall quite short of getting to the optimal exponent of
. (Indeed, previously to last week, the best lower bound known was approximately
, due to Katz and Tardos.)
Very recently, though, Guth and Katz have obtained a near-optimal result:
The proof neatly combines together several powerful and modern tools in a new way: a recent geometric reformulation of the problem due to Elekes and Sharir; the polynomial method as used recently by Dvir, Guth, and Guth-Katz on related incidence geometry problems (and discussed previously on this blog); and the somewhat older method of cell decomposition (also discussed on this blog). A key new insight is that the polynomial method (and more specifically, the polynomial Ham Sandwich theorem, also discussed previously on this blog) can be used to efficiently create cells.
In this post, I thought I would sketch some of the key ideas used in the proof, though I will not give the full argument here (the paper itself is largely self-contained, well motivated, and of only moderate length). In particular I will not go through all the various cases of configuration types that one has to deal with in the full argument, but only some illustrative special cases.
To simplify the exposition, I will repeatedly rely on “pigeonholing cheats”. A typical such cheat: if I have objects (e.g.
points or
lines), each of which could be of one of two types, I will assume that either all
of the objects are of the first type, or all
of the objects are of the second type. (In truth, I can only assume that at least
of the objects are of the first type, or at least
of the objects are of the second type; but in practice, having
instead of
only ends up costing an unimportant multiplicative constant in the type of estimates used here.) A related such cheat: if one has
objects
(again, think of
points or
circles), and to each object
one can associate some natural number
(e.g. some sort of “multiplicity” for
) that is of “polynomial size” (of size
), then I will assume in fact that all the
are in a fixed dyadic range
for some
. (In practice, the dyadic pigeonhole principle can only achieve this after throwing away all but about
of the original
objects; it is this type of logarithmic loss that eventually leads to the logarithmic factor in the main theorem.) Using the notation
to denote the assertion that
for an absolute constant
, we thus have
for all
, thus
is morally constant.
I will also use asymptotic notation rather loosely, to avoid cluttering the exposition with a certain amount of routine but tedious bookkeeping of constants. In particular, I will use the informal notation or
to denote the statement that
is “much less than”
or
is “much larger than”
, by some large constant factor.
See also Janos Pach’s recent reaction to the Guth-Katz paper on Kalai’s blog.
Below the fold is a version of my talk “Recent progress on the Kakeya conjecture” that I gave at the Fefferman conference.
Jordan Ellenberg, Richard Oberlin, and I have just uploaded to the arXiv the paper “The Kakeya set and maximal conjectures for algebraic varieties over finite fields“, submitted to Mathematika. This paper builds upon some work of Dvir and later authors on the Kakeya problem in finite fields, which I have discussed in this earlier blog post. Dvir established the following:
Kakeya set conjecture for finite fields. Let F be a finite field, and let E be a subset of
that contains a line in every direction. Then E has cardinality at least
for some
.
The initial argument of Dvir gave . This was improved to
for some explicit
by Saraf and Sudan, and recently to
by Dvir, Kopparty, Saraf, and Sudan, which is within a factor 2 of the optimal result.
In our work we investigate a somewhat different set of improvements to Dvir’s result. The first concerns the Kakeya maximal function of a function
, defined for all directions
in the projective hyperplane at infinity by the formula
where the supremum ranges over all lines in
oriented in the direction
. Our first result is the endpoint
estimate for this operator, namely
Kakeya maximal function conjecture in finite fields. We have
for some constant
.
This result implies Dvir’s result, since if f is the indicator function of the set E in Dvir’s result, then for every
. However, it also gives information on more general sets E which do not necessarily contain a line in every direction, but instead contain a certain fraction of a line in a subset of directions. The exponents here are best possible in the sense that all other
mapping properties of the operator can be deduced (with bounds that are optimal up to constants) by interpolating the above estimate with more trivial estimates. This result is the finite field analogue of a long-standing (and still open) conjecture for the Kakeya maximal function in Euclidean spaces; we rely on the polynomial method of Dvir, which thus far has not extended to the Euclidean setting (but note the very interesting variant of this method by Guth that has established the endpoint multilinear Kakeya maximal function estimate in this setting, see this blog post for further discussion).
It turns out that a direct application of the polynomial method is not sufficient to recover the full strength of the maximal function estimate; but by combining the polynomial method with the Nikishin-Maurey-Pisier-Stein “method of random rotations” (as interpreted nowadays by Stein and later by Bourgain, and originally inspired by the factorisation theorems of Nikishin, Maurey, and Pisier), one can already recover a “restricted weak type” version of the above estimate. If one then enhances the polynomial method with the “method of multiplicities” (as introduced by Saraf and Sudan) we can then recover the full “strong type” estimate; a few more details below the fold.
It turns out that one can generalise the above results to more general affine or projective algebraic varieties over finite fields. In particular, we showed
Kakeya maximal function conjecture in algebraic varieties. Suppose that
is an (n-1)-dimensional algebraic variety. Let
be an integer. Then we have
for some constant
, where the supremum is over all irreducible algebraic curves
of degree at most d that pass through x but do not lie in W, and W(F) denotes the F-points of W.
The ordinary Kakeya maximal function conjecture corresponds to the case when N=n, W is the hyperplane at infinity, and the degree d is equal to 1. One corollary of this estimate is a Dvir-type result: a subset of which contains, for each x in W, an irreducible algebraic curve of degree d passing through x but not lying in W, has cardinality
if
. (In particular this implies a lower bound for Nikodym sets worked out by Li.) The dependence of the implied constant on W is only via the degree of W.
The techniques used in the flat case can easily handle curves of higher degree (provided that we allow the implied constants to depend on d), but the method of random rotations does not seem to work directly on the algebraic variety W as there are usually no symmetries of this variety to exploit. Fortunately, we can get around this by using a “random projection trick” to “flatten” W into a hyperplane (after first expressing W as the zero locus of some polynomials, and then composing with the graphing map for such polynomials), reducing the non-flat case to the flat case.
Below the fold, I wish to sketch two of the key ingredients in our arguments, the random rotations method and the random projections trick. (We of course also use some algebraic geometry, but mostly low-tech stuff, on the level of Bezout’s theorem, though we do need one non-trivial result of Kleiman (from SGA6), that asserts that bounded degree varieties can be cut out by a bounded number of polynomials of bounded degree.)
[Update, March 14: See also Jordan’s own blog post on our paper.]
One of my favourite family of conjectures (and one that has preoccupied a significant fraction of my own research) is the family of Kakeya conjectures in geometric measure theory and harmonic analysis. There are many (not quite equivalent) conjectures in this family. The cleanest one to state is the set conjecture:
Kakeya set conjecture: Let
, and let
contain a unit line segment in every direction (such sets are known as Kakeya sets or Besicovitch sets). Then E has Hausdorff dimension and Minkowski dimension equal to n.
One reason why I find these conjectures fascinating is the sheer variety of mathematical fields that arise both in the partial results towards this conjecture, and in the applications of those results to other problems. See for instance this survey of Wolff, my Notices article and this article of Łaba on the connections between this problem and other problems in Fourier analysis, PDE, and additive combinatorics; there have even been some connections to number theory and to cryptography. At the other end of the pipeline, the mathematical tools that have gone into the proofs of various partial results have included:
- Maximal functions, covering lemmas,
methods (Cordoba, Strömberg, Cordoba-Fefferman);
- Fourier analysis (Nagel-Stein-Wainger);
- Multilinear integration (Drury, Christ)
- Paraproducts (Katz);
- Combinatorial incidence geometry (Bourgain, Wolff);
- Multi-scale analysis (Barrionuevo, Katz-Łaba-Tao, Łaba-Tao, Alfonseca-Soria-Vargas);
- Probabilistic constructions (Bateman-Katz, Bateman);
- Additive combinatorics and graph theory (Bourgain, Katz-Łaba-Tao, Katz-Tao, Katz-Tao);
- Sum-product theorems (Bourgain-Katz-Tao);
- Bilinear estimates (Tao-Vargas-Vega);
- Perron trees (Perron, Schoenberg, Keich);
- Group theory (Katz);
- Low-degree algebraic geometry (Schlag, Tao, Mockenhaupt-Tao);
- High-degree algebraic geometry (Dvir, Saraf-Sudan);
- Heat flow monotonicity formulae (Bennett-Carbery-Tao)
[This list is not exhaustive.]
Very recently, I was pleasantly surprised to see yet another mathematical tool used to obtain new progress on the Kakeya conjecture, namely (a generalisation of) the famous Ham Sandwich theorem from algebraic topology. This was recently used by Guth to establish a certain endpoint multilinear Kakeya estimate left open by the work of Bennett, Carbery, and myself. With regards to the Kakeya set conjecture, Guth’s arguments assert, roughly speaking, that the only Kakeya sets that can fail to have full dimension are those which obey a certain “planiness” property, which informally means that the line segments that pass through a typical point in the set must be essentially coplanar. (This property first surfaced in my paper with Katz and Łaba.) Guth’s arguments can be viewed as a partial analogue of Dvir’s arguments in the finite field setting (which I discussed in this blog post) to the Euclidean setting; in particular, both arguments rely crucially on the ability to create a polynomial of controlled degree that vanishes at or near a large number of points. Unfortunately, while these arguments fully settle the Kakeya conjecture in the finite field setting, it appears that some new ideas are still needed to finish off the problem in the Euclidean setting. Nevertheless this is an interesting new development in the long history of this conjecture, in particular demonstrating that the polynomial method can be successfully applied to continuous Euclidean problems (i.e. it is not confined to the finite field setting).
In this post I would like to sketch some of the key ideas in Guth’s paper, in particular the role of the Ham Sandwich theorem (or more precisely, a polynomial generalisation of this theorem first observed by Gromov).
One of my favourite unsolved problems in mathematics is the Kakeya conjecture in geometric measure theory. This conjecture is descended from the
Kakeya needle problem. (1917) What is the least area in the plane required to continuously rotate a needle of unit length and zero thickness around completely (i.e. by
)?
For instance, one can rotate a unit needle inside a unit disk, which has area . By using a deltoid one requires only
area.
In 1928, Besicovitch showed that that in fact one could rotate a unit needle using an arbitrarily small amount of positive area. This unintuitive fact was a corollary of two observations. The first, which is easy, is that one can translate a needle using arbitrarily small area, by sliding the needle along the direction it points in for a long distance (which costs zero area), turning it slightly (costing a small amount of area), sliding back, and then undoing the turn. The second fact, which is less obvious, can be phrased as follows. Define a Kakeya set in to be any set which contains a unit line segment in each direction. (See this Java applet of mine, or the wikipedia page, for some pictures of such sets.)
Theorem. (Besicovitch, 1919) There exists Kakeya sets
of arbitrarily small area (or more precisely, Lebesgue measure).
In fact, one can construct such sets with zero Lebesgue measure. On the other hand, it was shown by Davies that even though these sets had zero area, they were still necessarily two-dimensional (in the sense of either Hausdorff dimension or Minkowski dimension). This led to an analogous conjecture in higher dimensions:
Kakeya conjecture. A Besicovitch set in
(i.e. a subset of
that contains a unit line segment in every direction) has Minkowski and Hausdorff dimension equal to n.
This conjecture remains open in dimensions three and higher (and gets more difficult as the dimension increases), although many partial results are known. For instance, when n=3, it is known that Besicovitch sets have Hausdorff dimension at least 5/2 and (upper) Minkowski dimension at least . See my Notices article for a general survey of this problem (and its connections with Fourier analysis, additive combinatorics, and PDE), my paper with Katz for a more technical survey, and Wolff’s survey for a systematic treatment of the field (up to about 1998 or so).
In 1999, Wolff proposed a simpler finite field analogue of the Kakeya conjecture as a model problem that avoided all the technical issues involving Minkowski and Hausdorff dimension. If is a vector space over a finite field F, define a Kakeya set to be a subset of
which contains a line in every direction.
Finite field Kakeya conjecture. Let
be a Kakeya set. Then E has cardinality at least
, where
depends only on n.
This conjecture has had a significant influence in the subject, in particular inspiring work on the sum-product phenomenon in finite fields, which has since proven to have many applications in number theory and computer science. Modulo minor technicalities, the progress on the finite field Kakeya conjecture was, until very recently, essentially the same as that of the original “Euclidean” Kakeya conjecture.
Last week, the finite field Kakeya conjecture was proven using a beautifully simple argument by Zeev Dvir, using the polynomial method in algebraic extremal combinatorics. The proof is so short that I can present it in full here.
Recent Comments