You are currently browsing the monthly archive for May 2016.
A capset in the vector space over the finite field
of three elements is a subset
of
that does not contain any lines
, where
and
. A basic problem in additive combinatorics (discussed in one of the very first posts on this blog) is to obtain good upper and lower bounds for the maximal size of a capset in
.
Trivially, one has . Using Fourier methods (and the density increment argument of Roth), the bound of
was obtained by Meshulam, and improved only as late as 2012 to
for some absolute constant
by Bateman and Katz. But in a very recent breakthrough, Ellenberg (and independently Gijswijt) obtained the exponentially superior bound
, using a version of the polynomial method recently introduced by Croot, Lev, and Pach. (In the converse direction, a construction of Edel gives capsets as large as
.) Given the success of the polynomial method in superficially similar problems such as the finite field Kakeya problem (discussed in this previous post), it was natural to wonder that this method could be applicable to the cap set problem (see for instance this MathOverflow comment of mine on this from 2010), but it took a surprisingly long time before Croot, Lev, and Pach were able to identify the precise variant of the polynomial method that would actually work here.
The proof of the capset bound is very short (Ellenberg’s and Gijswijt’s preprints are both 3 pages long, and Croot-Lev-Pach is 6 pages), but I thought I would present a slight reformulation of the argument which treats the three points on a line in symmetrically (as opposed to treating the third point differently from the first two, as is done in the Ellenberg and Gijswijt papers; Croot-Lev-Pach also treat the middle point of a three-term arithmetic progression differently from the two endpoints, although this is a very natural thing to do in their context of
). The basic starting point is this: if
is a capset, then one has the identity
for all , where
is the Kronecker delta function, which we view as taking values in
. Indeed, (1) reflects the fact that the equation
has solutions precisely when
are either all equal, or form a line, and the latter is ruled out precisely when
is a capset.
To exploit (1), we will show that the left-hand side of (1) is “low rank” in some sense, while the right-hand side is “high rank”. Recall that a function taking values in a field
is of rank one if it is non-zero and of the form
for some
, and that the rank of a general function
is the least number of rank one functions needed to express
as a linear combination. More generally, if
, we define the rank of a function
to be the least number of “rank one” functions of the form
for some and some functions
,
, that are needed to generate
as a linear combination. For instance, when
, the rank one functions take the form
,
,
, and linear combinations of
such rank one functions will give a function of rank at most
.
It is a standard fact in linear algebra that the rank of a diagonal matrix is equal to the number of non-zero entries. This phenomenon extends to higher dimensions:
Lemma 1 (Rank of diagonal hypermatrices) Let
, let
be a finite set, let
be a field, and for each
, let
be a coefficient. Then the rank of the function
Proof: We induct on . As mentioned above, the case
follows from standard linear algebra, so suppose now that
and the claim has already been proven for
.
It is clear that the function (2) has rank at most equal to the number of non-zero (since the summands on the right-hand side are rank one functions), so it suffices to establish the lower bound. By deleting from
those elements
with
(which cannot increase the rank), we may assume without loss of generality that all the
are non-zero. Now suppose for contradiction that (2) has rank at most
, then we obtain a representation
for some sets of cardinalities adding up to at most
, and some functions
and
.
Consider the space of functions that are orthogonal to all the
,
in the sense that
for all . This space is a vector space whose dimension
is at least
. A basis of this space generates a
coordinate matrix of full rank, which implies that there is at least one non-singular
minor. This implies that there exists a function
in this space which is nowhere vanishing on some subset
of
of cardinality at least
.
If we multiply (3) by and sum in
, we conclude that
where
The right-hand side has rank at most , since the summands are rank one functions. On the other hand, from induction hypothesis the left-hand side has rank at least
, giving the required contradiction.
On the other hand, we have the following (symmetrised version of a) beautifully simple observation of Croot, Lev, and Pach:
Lemma 2 On
, the rank of the function
is at most
, where
Proof: Using the identity for
, we have
The right-hand side is clearly a polynomial of degree in
, which is then a linear combination of monomials
with with
In particular, from the pigeonhole principle, at least one of is at most
.
Consider the contribution of the monomials for which . We can regroup this contribution as
where ranges over those
with
,
is the monomial
and is some explicitly computable function whose exact form will not be of relevance to our argument. The number of such
is equal to
, so this contribution has rank at most
. The remaining contributions arising from the cases
and
similarly have rank at most
(grouping the monomials so that each monomial is only counted once), so the claim follows.
Upon restricting from to
, the rank of
is still at most
. The two lemmas then combine to give the Ellenberg-Gijswijt bound
All that remains is to compute the asymptotic behaviour of . This can be done using the general tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if
,
,
for some
summing to
, Stirling’s formula gives
where is the entropy function
We then have
where is the maximum entropy
subject to the constraints
A routine Lagrange multiplier computation shows that the maximum occurs when
and is approximately
, giving rise to the claimed bound of
.
Remark 3 As noted in the Ellenberg and Gijswijt papers, the above argument extends readily to other fields than
to control the maximal size of subset of
that has no non-trivial solutions to the equation
, where
are non-zero constants that sum to zero. Of course one replaces the function
in Lemma 2 by
in this case.
Remark 4 This symmetrised formulation suggests that one possible way to improve slightly on the numerical quantity
by finding a more efficient way to decompose
into rank one functions, however I was not able to do so (though such improvements are reminiscent of the Strassen type algorithms for fast matrix multiplication).
Remark 5 It is tempting to see if this method can get non-trivial upper bounds for sets
with no length
progressions, in (say)
. One can run the above arguments, replacing the function
with
this leads to the bound
where
Unfortunately,
is asymptotic to
and so this bound is in fact slightly worse than the trivial bound
! However, there is a slim chance that there is a more efficient way to decompose
into rank one functions that would give a non-trivial bound on
. I experimented with a few possible such decompositions but unfortunately without success.
Remark 6 Return now to the capset problem. Since Lemma 1 is valid for any field
, one could perhaps hope to get better bounds by viewing the Kronecker delta function
as taking values in another field than
, such as the complex numbers
. However, as soon as one works in a field of characteristic other than
, one can adjoin a cube root
of unity, and one now has the Fourier decomposition
Moving to the Fourier basis, we conclude from Lemma 1 that the function
on
now has rank exactly
, and so one cannot improve upon the trivial bound of
by this method using fields of characteristic other than three as the range field. So it seems one has to stick with
(or the algebraic completion thereof).
Thanks to Jordan Ellenberg and Ben Green for helpful discussions.
I’ve just uploaded to the arXiv my paper “Equivalence of the logarithmically averaged Chowla and Sarnak conjectures“, submitted to the Festschrift “Number Theory – Diophantine problems, uniform distribution and applications” in honour of Robert F. Tichy. This paper is a spinoff of my previous paper establishing a logarithmically averaged version of the Chowla (and Elliott) conjectures in the two-point case. In that paper, the estimate
as was demonstrated, where
was any positive integer and
denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form
whenever was a slowly growing function of
. This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.
It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that
as for any fixed distinct integers
. As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form
for a slowly growing function of
and some fixed
(in fact we can take
for
), where
is the (normalised) local Gowers uniformity norm. (In the case
,
, this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion
where the supremum is over all possible choices of nilsequences of controlled step and complexity (see the paper for definitions of these terms).
The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that
whenever is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the class of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.
On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.
It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.
When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon, despite all the advances in modern technology. However, there are some nice things one can do in an electronic medium, such as this blog. Here, I would like to experiment with the ability to animate images, which I think can convey some mathematical concepts in ways that cannot be easily replicated by traditional static text and images. Given that many readers may find these animations annoying, I am placing the rest of the post below the fold.
Over the last few years, a large group of mathematicians have been developing an online database to systematically collect the known facts, numerical data, and algorithms concerning some of the most central types of objects in modern number theory, namely the L-functions associated to various number fields, curves, and modular forms, as well as further data about these modular forms. This of course includes the most famous examples of L-functions and modular forms respectively, namely the Riemann zeta function and the discriminant modular form
, but there are countless other examples of both. The connections between these classes of objects lie at the heart of the Langlands programme.
As of today, the “L-functions and modular forms database” is now out of beta, and open to the public; at present the database is mostly geared towards specialists in computational number theory, but will hopefully develop into a more broadly useful resource as time develops. An article by John Cremona summarising the purpose of the database can be found here.
(Thanks to Andrew Sutherland and Kiran Kedlaya for the information.)
Recent Comments