A *capset* in the vector space over the finite field of three elements is a subset of that does not contain any lines , where and . A basic problem in additive combinatorics (discussed in one of the very first posts on this blog) is to obtain good upper and lower bounds for the maximal size of a capset in .

Trivially, one has . Using Fourier methods (and the density increment argument of Roth), the bound of was obtained by Meshulam, and improved only as late as 2012 to for some absolute constant by Bateman and Katz. But in a very recent breakthrough, Ellenberg (and independently Gijswijt) obtained the exponentially superior bound , using a version of the polynomial method recently introduced by Croot, Lev, and Pach. (In the converse direction, a construction of Edel gives capsets as large as .) Given the success of the polynomial method in superficially similar problems such as the finite field Kakeya problem (discussed in this previous post), it was natural to wonder that this method could be applicable to the cap set problem (see for instance this MathOverflow comment of mine on this from 2010), but it took a surprisingly long time before Croot, Lev, and Pach were able to identify the precise variant of the polynomial method that would actually work here.

The proof of the capset bound is very short (Ellenberg’s and Gijswijt’s preprints are both 3 pages long, and Croot-Lev-Pach is 6 pages), but I thought I would present a slight reformulation of the argument which treats the three points on a line in symmetrically (as opposed to treating the third point differently from the first two, as is done in the Ellenberg and Gijswijt papers; Croot-Lev-Pach also treat the middle point of a three-term arithmetic progression differently from the two endpoints, although this is a very natural thing to do in their context of ). The basic starting point is this: if is a capset, then one has the identity

for all , where is the Kronecker delta function, which we view as taking values in . Indeed, (1) reflects the fact that the equation has solutions precisely when are either all equal, or form a line, and the latter is ruled out precisely when is a capset.

To exploit (1), we will show that the left-hand side of (1) is “low rank” in some sense, while the right-hand side is “high rank”. Recall that a function taking values in a field is of *rank one* if it is non-zero and of the form for some , and that the rank of a general function is the least number of rank one functions needed to express as a linear combination. More generally, if , we define the *rank* of a function to be the least number of “rank one” functions of the form

for some and some functions , , that are needed to generate as a linear combination. For instance, when , the rank one functions take the form , , , and linear combinations of such rank one functions will give a function of rank at most .

It is a standard fact in linear algebra that the rank of a diagonal matrix is equal to the number of non-zero entries. This phenomenon extends to higher dimensions:

Lemma 1 (Rank of diagonal hypermatrices)Let , let be a finite set, let be a field, and for each , let be a coefficient. Then the rank of the function

*Proof:* We induct on . As mentioned above, the case follows from standard linear algebra, so suppose now that and the claim has already been proven for .

It is clear that the function (2) has rank at most equal to the number of non-zero (since the summands on the right-hand side are rank one functions), so it suffices to establish the lower bound. By deleting from those elements with (which cannot increase the rank), we may assume without loss of generality that all the are non-zero. Now suppose for contradiction that (2) has rank at most , then we obtain a representation

for some sets of cardinalities adding up to at most , and some functions and .

Consider the space of functions that are orthogonal to all the , in the sense that

for all . This space is a vector space whose dimension is at least . A basis of this space generates a coordinate matrix of full rank, which implies that there is at least one non-singular minor. This implies that there exists a function in this space which is nowhere vanishing on some subset of of cardinality at least .

If we multiply (3) by and sum in , we conclude that

where

The right-hand side has rank at most , since the summands are rank one functions. On the other hand, from induction hypothesis the left-hand side has rank at least , giving the required contradiction.

On the other hand, we have the following (symmetrised version of a) beautifully simple observation of Croot, Lev, and Pach:

*Proof:* Using the identity for , we have

The right-hand side is clearly a polynomial of degree in , which is then a linear combination of monomials

with with

In particular, from the pigeonhole principle, at least one of is at most .

Consider the contribution of the monomials for which . We can regroup this contribution as

where ranges over those with , is the monomial

and is some explicitly computable function whose exact form will not be of relevance to our argument. The number of such is equal to , so this contribution has rank at most . The remaining contributions arising from the cases and similarly have rank at most (grouping the monomials so that each monomial is only counted once), so the claim follows.

Upon restricting from to , the rank of is still at most . The two lemmas then combine to give the Ellenberg-Gijswijt bound

All that remains is to compute the asymptotic behaviour of . This can be done using the general tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if , , for some summing to , Stirling’s formula gives

where is the entropy function

We then have

where is the maximum entropy subject to the constraints

A routine Lagrange multiplier computation shows that the maximum occurs when

and is approximately , giving rise to the claimed bound of .

Remark 3As noted in the Ellenberg and Gijswijt papers, the above argument extends readily to other fields than to control the maximal size of subset of that has no non-trivial solutions to the equation , where are non-zero constants that sum to zero. Of course one replaces the function in Lemma 2 by in this case.

Remark 4This symmetrised formulation suggests that one possible way to improve slightly on the numerical quantity by finding a more efficient way to decompose into rank one functions, however I was not able to do so (though such improvements are reminiscent of the Strassen type algorithms for fast matrix multiplication).

Remark 5It is tempting to see if this method can get non-trivial upper bounds for sets with no length progressions, in (say) . One can run the above arguments, replacing the functionwith

this leads to the bound where

Unfortunately, is asymptotic to and so this bound is in fact slightly worse than the trivial bound ! However, there is a slim chance that there is a more efficient way to decompose into rank one functions that would give a non-trivial bound on . I experimented with a few possible such decompositions but unfortunately without success.

Remark 6Return now to the capset problem. Since Lemma 1 is valid for any field , one could perhaps hope to get better bounds by viewing the Kronecker delta function as taking values in another field than , such as the complex numbers . However, as soon as one works in a field of characteristic other than , one can adjoin a cube root of unity, and one now has the Fourier decompositionMoving to the Fourier basis, we conclude from Lemma 1 that the function on now has rank exactly , and so one cannot improve upon the trivial bound of by this method using fields of characteristic other than three as the range field. So it seems one has to stick with (or the algebraic completion thereof).

Thanks to Jordan Ellenberg and Ben Green for helpful discussions.

## 76 comments

Comments feed for this article

18 May, 2016 at 4:14 am

JSEBeautiful! The symmetric formulation “feels right” and is surely the best way to express it.

20 May, 2016 at 12:57 am

AnonymousIt seems to be a proof from “the book”!

18 May, 2016 at 4:23 am

Nets KatzTerry, is it obvious that decompositions of the quadratic form in Remark 5 into rank 1 pieces is the only way to go? Any argument that gives a lower bound for the dimension of a set of vectors which pairwise vanish would work. Must such an argument have a simple algebraic formulation? I would like to comment that applying the technology to characteristic zero meets with similar obstructions. A simple model for characteristic zero is to replace F_p^d addition with addition that has carrying but for which the carrying in each digit is independent. Then you have to look at a product of 2^{d-1} forms, one for each carrying profile.

18 May, 2016 at 7:02 am

JSEI was thinking along similar lines, Nets. It’s an interesting question — what formal properties of polynomials are needed to make some argument of this kind work? Are there families of functions on (Z/NZ) which are “polynomial-like” enough? I truly don’t know. My first instinct would be to look at the Witt vector construction of (Z/p^n Z), which is somehow of the most canonical way of expressing “carrying” and passing from char p to char 0.

18 May, 2016 at 10:02 am

Terence TaoIt could well be that working with more general classes of functions than polynomials of a given degree would be useful – for instance polynomials taking values in some other commutative ring, or even a non-commutative ring (though the linear algebra becomes replaced by something nastier of course if one strays too far from the field setting). One could possibly play with completely non-linear spaces also. Maybe it is worth working with very low dimensional examples (as in my other recent comment here) to experiment if there are other decompositions than the polynomial ones which are more efficient. I played around for a while with multi-degrees instead of degrees, for instance, but eventually concluded that one did not gain much asymptotically by doing so.

18 May, 2016 at 6:01 pm

WillIt depends on whether we want to replace the additive structure of polynomials or just multiplicative. Since like most mathematicians I will only stop using linear algebra under extreme duress, I thought about how to generalize the polynomial method here while still using the notion of tensor rank to measure the complexity of functions.

So we can ask the general question: given a tensor f in V^3, viewing f tensor f tensor … f as a tensor in (V^k)^3, how may we upper bound its tensor rank as a function of k?

We may also try to lower bound its tensor rank, to let us know that we are not missing an easy way to improve the bound, or for use on the other side of the inequality. So maybe the right question is what is the asymptotic of the tensor rank – is it always an exponential function in k?

Anyways the upper bound argument generalizes immediately from polynomial functions to any functions that can be written as a sum of v_a tensor v_b tensor v_c, where v_a,v_b,v_c lie in some list of vector that come with weights w_a,w_b,w_c, such that we have an upper bound w_a + w_b + w_c < W for every term appearing in the sum.

Then we get an upper bound in terms of the maximum entropy of a probability distribution on the indices a where the expected value of w_a is at most W/3, by precisely the same argument.

To recover the polynomial special case, take v_a,v_b,v_c to be monomials in one variable, w_a,w_b,w_c the degree of the monomial, and W the degree of the polynomial function.

In the case where v_a range through a basis v_1,..,v_n (e.g. the monomials in one variable over F_p of degree less than p), then as soon as W/3 is less than the average value of w under the uniform distribution we will force the distribution to have an entropy slightly smaller than log n and hence we will get a nontrivial bound for the tensor ranks.

So for a 3-variable polynomial over F_p, the degree needs to be less than 3(p-1)/2 to get a nontrivial bound, and for a d-variable polynomial, less than d(p-1)/2.

In what should be interesting to some, this condition is precisely the Hilbert-Mumford criterion for GIT-instability under the action of (SL_n)^3 on P^{n^3-1}. So we can summarize the key idea in higher-level language as: There is a power savings bound for the rank of powers of GIT-unstable tensors.

Are there any known bounds for ranks of tensor powers that cannot be derived in this way from tuples of vectors and weight functions? Is there any method that would allow us to lower bound tensor ranks so as to show that this upper bound is sometimes sharp?

19 May, 2016 at 11:59 pm

Terence TaoIf has some basis , then we can associate a “Newton diagram” to a function by declaring to be an element of if the coefficient of is non-zero. Define the “capacity” of this diagram to be the maximum value of , where ranges over random variables taking values in and denotes Shannon entropy. Then I think Stirling’s formula ends up saying that the rank of (viewed as an element of ) is at most , basically because every “monomial” in becomes associated with a discrete random variable in (taking values). I think this bound covers the previous bounds on rank mentioned here, though I haven’t checked fully.

20 May, 2016 at 6:13 am

WillI think the bound you sketched and the bound I sketched are the same because the problems you need to solve to compute each bound are dual in a convex optimization sense. Given a weight function, you can upper bound the entropy of the marginal distributions of any probability distribution by observing min(E(w_X,w_Y,w_Z))<E(w_X+w_Y+w_Z)<sup(w_X+w_Y+w_Z)<W. If you have any distribution, you can define a weight function as the derivative of the entropy of each marginal distribution by the probability of each marginal value times the Lagrange multiplier of the entropy. Then the upper bound we get from the weight function is exactly the minimum entropy of a marginal distribution.

This makes it easy to check you aren't missing an obvious improvement – write down a weight function and a distribution and check that their entropies agree.

20 May, 2016 at 11:26 am

Terence TaoI think one may have to generalise to the non-symmetric setting where the weight functions associated to each of the three components are allowed to be different from each other, but I think I agree with your point that the two formulations are dual.

18 May, 2016 at 5:09 am

Fred LunnonTypos (presumptive?):

for “F_3^n” read “F_{3^n}” (throughout);

for “F^3” read “F_3” (once).

[Second erratum corrected. First is not an erratum, I have tried to clarify in post – T.]18 May, 2016 at 5:46 am

WillOne can not improve on your bound for 4-term arithmetic progressions using only polynomials. Indeed if is any multivariate polynomial over that is outside a a linear subspace of codimension but not everywhere, then restricted to some subspace of dimension is a polynomial that vanishes outside a point but not at that point, which by a fragment of Chevalley-Warning (http://mathoverflow.net/a/238527/18060) has degree at least . So a nontrivial polynomial which is nonvanishing for -term arithmetic progressions in has degree at least , and the upper bound for the degree in a single vector is , which is precisely the critical value one needs to beat to improve the bound.

18 May, 2016 at 7:09 am

JSEOne note about this higher notion of “rank” is that rank <= r is not a closed condition, the way it is for matrices! I always find this confusing and it's probably good to keep it in mind.

18 May, 2016 at 5:31 pm

WillAre you sure that is true for this notion of tensor rank and not just the more commonly used notion of tensor rank? I might have a proof that this notion of rank is in fact semicontinuous.

18 May, 2016 at 6:03 pm

JSEOh is Terry’s notion not the same as usual tensor rank? I thought it was, but didn’t think about it carefully.

18 May, 2016 at 6:47 pm

WillThe usual definition allows only functions of the form f(x)g(y)h(z), which is much more restrictive. For instance it is clear that every n x n x n tensor has rank at most n in Terry’s sense but this is not true in the usual sense. Maybe we should call Terry’s version something like lax tensor rank to remove ambiguity.

Let me sketch my argument that the space of tensors of bounded lax tensor rank is closed:

A tensor in V tensor V tensor V is of rank < k if it is in the space spanned by W1 x V x V + V x W2 x V + V x V x W3 where dim W1 + dim W2 + dim W3 < k. Now for fixed W1, W2, W3 the space spanned by W1 x V x V + V x W2 x V + V x V x W3 is a linear subspace of V x V x V and varies smoothly as W1 , W2, W3 vary in the Grassmanian of subspaces of fixed dimension, so this forms a closed subset of V tensor V tensor V x G(dim W1, V) x G(dim W2, V) x G(dim W3, V) that projects to a closed subset of V tensor V tensor V. Then a finite sum over all triples of dimensions satisfying the inequality gives the desired result.

19 May, 2016 at 7:42 am

JSEOh good point! I had failed to grasp this distinction.

18 May, 2016 at 7:29 am

WillAlso two quick observations:

1. It seems that the bound given for sets in F_p^n with no 3-term arithmetic progressions is ((c+o(1))p)^n for some constant c<1 that I doubt can be put in closed form.

2. For each set S of size 2n+1 in F_p one gets power-saving bounds for subsets of F_p^n that do not contain the image of S under any nonconstant map whose coordinates are polynomials of degree n. I didn't see a trivial proof of power savings here for general S (though for special S we may reduce using special polynomials to a simpler bound) but I also didn't see a relationship to any well-known open problem in combinatorics.

18 May, 2016 at 11:16 am

Eric NaslundIn fact we can show that c=e^{-1/6}. That is we have the upper bound (e^{-1/6}+o(1))^n p^n, where the o(1) term is positive and tending to zero as p–>infty.

18 May, 2016 at 9:35 am

Terence TaoA small but perhaps amusing computation: for the question of finding the largest number of cards in the game Set without a “set” (or equivalently, the largest capset in ), the upper bound given here is

This can be compared with the asymptotic prediction of for the upper bound (or the asymptotic prediction of for the lower bound. Theorem 1.2 of Meshulam’s paper using the Fourier method gives an upper bound of . The maximal size of a capset in is actually known to be 20 by the work of Pellegrino. So even in this low dimensional case, the bound given by the polynomial method improves upon existing bounds, but is a bit short of the truth. One could imagine in this case at least one could improve upon the bound of while staying within the framework of the polynomial method (it might also be worth working out numbers for other low dimensional cases).

19 May, 2016 at 5:22 am

Fedor PetrovWell, 45 may be replaced by 30, as may be replaced to always, see a proof here (sorry for doubling this link, but I think it may be convenient)

For specific this looks better improvement than it actually is:)

27 June, 2016 at 12:12 am

Anurag BishnoiThe bounds given by Bierbrauer and Edel (based on Meshulam’s Fourier-Analytic proof) seem to be closer to the truth in the low-dimensional cases than the polynomial method bound. For dimensions 4, 5 and 6, they get upper bounds, 21, 48 and 114, respectively. See Section 3 in http://onlinelibrary.wiley.com/doi/10.1002/jcd.10000/abstract. The values of the polynomial bound (constant 2 instead of 3, as obtained by Fedor) for these dimensions are 30, 102 and 336.

18 May, 2016 at 11:30 am

SamI think remark 5 has a typo. As written, is asymptotic to . Should it say

?

[Corrected, thanks – T.]18 May, 2016 at 11:59 am

Eric NaslundUsing this general approach, Will and I applied this directly to the Erdos-Szemeredi sunflower problem, proving that if F is a sunflower-free collection of subsets of {1,2,3,…,n}, then |F|< (3/2^{2/3})^n: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxuYXNsdW5kZXJpY3xneDozZDQ2NzRjNzAxNmJjNDlh

18 May, 2016 at 1:22 pm

The Ellenberg-Gijswijt bound on cap sets | Anurag's Math Blog[…] [2] Polymath 10 Emergency Post 5: The Erdos-Szemeredi Sunflower Conjecture is Now Proven. [3] A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound [4] Large caps in projective Galois […]

18 May, 2016 at 4:14 pm

AnonymousIt seems that the above constant is (simply) given by

18 May, 2016 at 4:28 pm

Fedor PetrovI think, we may improve multiple 3 to 2 (not a big deal, yes.) But maybe modification of method is useful elsewhere.

For denote by a number of points

in with sum of coordiantes at most

. Assume that and

for some

(min is attained, of course,

for about

and is exponentially smaller than by a suitable verison of Law of Large Numbers.)

We need

Lemma. Let be a field, be a finite set and let

be a -dimensional linear subspace of the space of functions

on . Then the squares of functions from span a space of dimension at least .

Proof (inspired by Ilya Bogdanov). Let .

By Gauss' method, we may find a basis of such that

, where is the number of the first nonzero

entry in . Their squares work.

Denote by the space of -valued functions on such that

for any polynomial

over , which has degree at most 2 in each variable and total degree at most .

Dimension of this space

is at least .

If , for any polynomial of degree at most we have

(this is clear for monomials: sum factorizes, and there exists zero factor).

Assume that for , in . Consider the space of polynomials

of degree at most 2 in each variable, equal to 0

0 outside , of total degree at most .

Dimension of the space of such polynomials is at least .

And these polynomials are all different as functions on

(since they are different as functions on

). For all

we have

By lemma the functions , , form a space of dimension at least ,

and functions form a space of dimension at least . Sum of dimensions

exceed , thus orthogonal complement of the first subspace can not lie in the second.

18 May, 2016 at 5:03 pm

Fedor PetrovFormatting looks really ugly, sorry. Here is pdf

19 May, 2016 at 11:20 pm

Terence TaoNice! I actually get instead of (as I believe it is the former that is equal to . This seems to give the bounds , , , , if I did the arithmetic correctly; this is sharp in one and two dimensions but misses a little in three (I believe ) and higher. In terms of the tensor rank formulation, the point seems to be this: if one wishes to write as a sum

then Lemma 1 of the post tells us that we need . But if we have the additional information that and for , then we can improve this condition to . I guess there is some sort of “symmetric rank” here where we symmetrise under the operation of interchanging x and y. (It also seems tempting to symmetrise in all of x,y,z, but there seems to be some issue in characteristic 3 in that one can’t divide by 3!. Given that the bound is now sharp in one and two dimensions, I guess it is not too realistic to hope for a really big further gain.)

20 May, 2016 at 1:17 am

Fedor PetrovYes, of course . I hope that now everything in the linked pdf is correct, and it is expressed in more symmetric form, without two steps. The space satisfies a condition for all in . In general it implies that , but we are additionally given that supports of all functions in are quite large. Does it allow to get essentially better bound on the dimension? Without conditions on the support it is tight, but the example I have in mind (linear span of vectors ) has small supports.

It gives at least some improvement, say, estimate 30 in the game set problem may be easily improved (though I doubt that down to 20.)

19 May, 2016 at 8:25 am

Polymath 10 Emergency Post 5: The Erdos-Szemeredi Sunflower Conjecture is Now Proven. | Combinatorics and more[…] nice modification) of the Croot-Lev-Pach-Ellenberg-Gijswijt capset and further discussion see this post and comment thread on Terry Tao’s […]

19 May, 2016 at 9:28 am

Eric NaslundHere is another application of this method which can be seen as progress on the Erdos-Rado sunflower problem. The functions used are combinations of indicator functions of the elements in the underlying set rather than polynomials: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxuYXNsdW5kZXJpY3xneDozZWM4ZTI0ZTcyNGMyY2M2

19 May, 2016 at 11:05 am

Jop BrietDear Terry, thanks for the great post! Tiny typo: Eq (3) seems to be missing an equality symbol.

[Corrected, thanks – T.]20 May, 2016 at 3:54 pm

Terence TaoFor the record: the known values of (i.e.for ) are at https://oeis.org/A090245

21 May, 2016 at 1:36 am

Fedor PetrovWe may mix Croot-Lev-Pach and Olson’s ideas to rephrase this proof using group rings.

Let be odd prime. Denote by a cyclic group of order . Let be a finite Abelian -group with generators , generates . Assume that we managed to find a subspace of a group algebra of codimension satisfying for all . Then any subset of cardinality at least contains three different elements such that .

Why? Assume the contrary. Consider the subspaces of with all coefficients outside (resp. outside ) equal to 0. They have dimensions at least . Consider the product

where two first summands are chosen from and the second from . If only for , the coefficient of 1 in this product is nothing but . But the space of functions of the form , is at least , as we may see by choosing basis in by Gauss elimination, and it can not be orthogonal to the space formed by functions , which is isomorphic to . (This argument is the same as before.)

Now how to choose such a space with low codimension. Group algebra is generated by the products , where . Let be a subspace generated by monomials for which . Then any product for has some in a power strictly greater than , but .

What is the codimension of this subspace? If are random variables uniformly distributed in the set

, then is the probability that . It decays expoentially in . Alas, this does not seem to be better than partitioning onto classes modulo and reasoning in them separately.

21 May, 2016 at 5:12 am

Fedor PetrovWell, at least this gives better exponent for, say, $C_9^n$ compared to $3^n$ copies of $C_3^n$, so I hope that it touches some effect.

31 May, 2016 at 8:54 am

Fedor PetrovIt gives similar estimates for progression-free sets , where such that any large enough finite group (at least of odd order, though this is hardily important) contains elements without 3-progressions?

31 May, 2016 at 8:57 am

Fedor PetrovI tried to say that it works for non-Abelian -groups, but something went wrong, probably inequality signs have eaten my post. Never mind. Here is a draft

22 May, 2016 at 11:55 am

Tom GurShould delta(x_1 – a) * … * delta(x_k – a) be delta_a(x_1) *…* delta_a(x_k) ?

(Near the end of the proof of Lemma 1.)

[Corrected, thanks – T.]25 May, 2016 at 8:50 am

Mind Boggling: Following the work of Croot, Lev, and Pach, Jordan Ellenberg settled the cap set problem! | Combinatorics and more[…] See also this post by Tao (presenting a symmetric version of the proof) and this post by […]

25 May, 2016 at 10:01 am

Thomas BloomInterestingly, this proof also works for the similar problem over polynomial rings. For example, bounding the size of the largest subset of F_2[t]_{\deg f< n} without non-trivial solutions to a+tb=(1+t)c, this method gives an exponential upper bound (I think something like 1.9^n for this particular problem).

This is great, because before this not even the $N/\log N$ bound was known, but rather $N(\log\log N)^2/\log N$ – I used to think that this problem was closer to the normal integer setting than the cap set problem, but not any more!

And, of course, there are many analogues between the integers and polynomials over finite fields…

25 May, 2016 at 7:08 pm

Terence TaoBen Green has recently obtained an exponential bound for the analogue of Sarkozy’s theorem in function fields using these methods: http://arxiv.org/abs/1605.07263

26 May, 2016 at 12:33 pm

Mario CarneiroYou comment on https://oeis.org/A090245 that “Asymptotically, a(n) = O(3^n/n)” which contradicts this post.

26 May, 2016 at 5:13 pm

Terence TaoIn combinatorics, is used to denote an upper bound only (one would write instead if one wanted to assert the exact order of magnitude).

26 May, 2016 at 7:25 pm

Mario CarneiroOops, newbie mistake. Indeed I read it as a(n) ~ 3^n/n.

27 May, 2016 at 12:11 am

Polymath 10 post 6: The Erdos-Rado sunflower conjecture, and the Turan (4,3) problem: homological approaches. | Combinatorics and more[…] We will not regard attacks on the sunflower conjecture based on the recent solution of the cap set problem and the Erdos Szemeredi sunflower conjecture (which is a weaker version of the Erdos-Rado conjecture) as part of the present polymath10 project, but, of course, I will be happy to hear also reports on progress or comments on these directions. Some updates on this story: Eric Naslund and Will Sawin gave a direct proof based on the polynomial method for the Erdos-Szemeredi sunflower conjecture, and an even stronger result is given by Naslund here. (Eric also has stronger quantitative bounds for Erdos-Szemeredi based on bounds for cap sets.) More cap set updates are added to this post, and can be found in the comment threads here and here. […]

27 May, 2016 at 12:38 pm

AnonymousTypo: I believe $k$ should be $k-1$ in the equation following “If we multiply (3) by…”

[Corrected, thanks – T.]29 May, 2016 at 3:54 am

Klaus85This is a very interesting post, thanks. I wonder, do you have a personal favorite-list of open problems? (it seemed this one was on the list). Or do professional mathematicians not have such lists but concentrate in developing the field further independent whether there is some identified open question? I’m honestly interested in this. Thanks!

29 May, 2016 at 8:22 am

Terence TaoThe blog posts listed at https://terrytao.wordpress.com/category/question/ are devoted to discussion of a number of open questions that I am interested in. I did at one point try to maintain a list of such open questions more systematically, but have not done so for more than a decade.

1 June, 2016 at 8:14 am

Terence TaoA shout out, by the way, to Ben Green, who to my knowledge was the first to dare conjecture [EDIT:

explicitly, andin print] that cap sets had to be exponentially sparse (see Conjecture 4.3 of http://arxiv.org/abs/math/0409420 ). I actually disagreed with him at the time, and even said so in my 2007 blog post, but am happy to have been proven wrong on this.While on the topic of past inaccuracies, it seems I may have inadvertently propagated a slightly incorrect terminology in that post: sets in free of collinear triples are known as

affine capsor simplycapsin the design theory literature, rather thancap sets. The latter terminology seems to have become rather entrenched, though, at least in additive combinatorics circles…1 June, 2016 at 9:33 am

samuelfhopkinsIn the commentary after Conjecture 4.3, Green says he would expect any progress on that conjecture to lead to improvements on bounding the size of a maximum subset of $\mathbb{Z}_n$ without 3-term APs. Is the jury out at the moment about the possibility of these latest developments being applied to the $\mathbb{Z}_n$ setting?

1 June, 2016 at 11:56 am

Gil KalaiIt is interesting to look at the original 1987 paper of Peter Frankl, Ron Graham, and Vojta Rodl.

FGR gave a upper bound using Rusza-Szemeredi’s theorem (that applies also to show that ). In that paper FGR asked if cap sets are exponentially sparse. They also relate the question to the Erdos-Rado’s sunflower conjecture.

I myself worked with Meshulam on this problem around 2000, and we had a Fourier statement about inequality between certain norms that in its strongest form would give exponentially good upper bound, (and in some weaker forms will give weaker improvement to Roy’s original 95 bound.) (These norm-inequalities are still completely open, I think.) Roy certainly conjectured that cap sets are exponentially sparse while I changed sides several times over the years. In 2009 I discussed another approach also initiated with discussions with Roy with certain weakening and strengthening of the 3AP condition, potential relations to variants of the Frankl-Wilson and Frankl-Rodl theorems, and some slightly related polynomial-method lemmas.

On another matter it is certainly an obvious question at this point to try to give upper bounds for the rank of subsets of without a

combinatorialline. The most optimistic constructions are given in terms of minimum of level sets corresponding to 3AP free sets of levels. Maybe there could be a way to dualize somehow and get upper bounds in terms of maximum of some related question. (It will be sufficiently wonderful to relate upper bounds for DHJ to upper bounds for .)1 June, 2016 at 12:46 pm

Terence TaoOne can certainly try to apply the same method to the DHJ problem – one replaces the function by the indicator function of all the triples in that form a combinatorial line. Unfortunately the rank of this tensor appears to be too large for the method to work (and in any event, a suitable modification of the Behrend construction shows that an exponential gain is not possible in this case anyway).

At present it seems like the most plausible way one could use these methods to improve upon the problem is through progress on DHJ (or slight variants thereof), perhaps in for some suitable choice of and with (the Behrend example suggests that has to grow with in order for there to be any chance here). I myself haven’t seriously tried this, but presumably some of the many other mathematicians now experimenting with this method would be considering it; it’s worth a shot, certainly.

p.s. Regarding by previous comment, I should clarify that Ben was certainly not the first to pose the question of whether capsets are exponentially sparse or not – the question was also raised as “an interesting open problem” in the 2004 paper of Edel. But I don’t know of an earlier reference in the literature in which the author came out strongly in favour of the exponentially sparse scenario, rather than being undecided or posing the opposing scenario as a question (as was done for instance in the Frankl-Graham-Rodl paper).

2 June, 2016 at 5:52 pm

Paul DelhantyEdel, in the 2004 paper that you linked previously, writes, “That leaves us with two research problems. Firstly to improve the bounds on $\mu(3)$ by finding better *capsets*.”

So you get a pass on the 2nd point.

2 June, 2016 at 9:36 pm

Paul DelhantyAh, my bad – sorry! On reading through Edel’s paper more carefully, *capset* is defined in Definition 9 and it is a somewhat different and much more specific concept from a cap. It looks like a capset is a set of indexing tuples that represent a set of product caps that get unioned together to create the final cap.

1 June, 2016 at 3:04 pm

anthonyquasVery nice! I have 3 minor comments and a question.

1) I think it would be helpful to say that is *nowhere zero* on (rather than non-zero, which is ambiguous);

2) In the first line of Lemma 2, for (not for general ;

3) In my browser, I was unable to find the equation label (3) (but I inferred where it was).

And the question: Am I right that the term “capset” comes from spherical caps that were used to build large sets without arithmetic progressions? The terminology is a bit mysterious to non-experts.

[Corrected, thanks. The notion of an “affine cap” comes from design theory, but I do not know the provenance exactly – T.]2 June, 2016 at 1:39 am

kodluIn Remark 3, “to control the maximal size of aubset” should end with “a subset”, I presume.

[Corrected, thanks -T.]8 June, 2016 at 8:02 pm

aramI’m confused about (1). The LHS =1 if x,y,z form a line but the RHS = 0 then. And the empty set is a valid capset, so the RHS could be always 0. And the RHS could be sometimes 2 but the LHS is always 0 or 1.

9 June, 2016 at 11:01 am

Terence TaoThe case when form a line never occurs when and is a capset, by definition of capset. In the case of the empty set, (1) is vacuously true for . The RHS cannot actually attain 2 because at most one of the summands can be non-zero at a time.

9 June, 2016 at 7:38 pm

aramOh, I missed that x,y,z have to belong to A! That makes sense.

23 June, 2016 at 9:25 am

AbbasNice argument and presentation!

On the last line of Remark 3, should be ? Also in Remark 5, 3rd line from below.

[Corrected, thanks – T.]8 July, 2016 at 1:12 pm

Bounds for sum free sets in prime power cyclic groups — three ways | Secret Blogging Seminar[…] now-standard argument (I like Terry Tao’s exposition) shows that is bounded by three times the number of monomials of degree . One needs to check that […]

29 July, 2016 at 8:05 am

Comments on the Ellenberg-Gijswijt bound for caps in higher characteristic | Blogderbeweise - Proofs from the blog[…] breakthrough due to Croot, Lev and Pach, recycled by Ellenberg-Gijswijt and streamlined by Tao on this blog post, revolutionised the landscape of Roth’s type problems in vector spaces over finite fields. […]

24 August, 2016 at 7:52 am

Notes on the “slice rank” of tensors | What's new[…] the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different […]

4 September, 2016 at 6:17 am

Paata IvanishviliOnce at the end of a seminar about capset problem fedja said: “you don’t need Lagrange multipliers. For clearly . Minimizing the LHS gives "

5 September, 2016 at 11:59 pm

WillYes, a similar method is very useful for quickly doing slice rank calculations. Eric Naslund and I used it on the last page of this: https://arxiv.org/pdf/1606.09575v1.pdf

However, it is not obvious that the upper bounds produced by this method are optimal. To analyze that question I don’t see how to avoid the use of Lagrange multipliers. See https://terrytao.wordpress.com/2016/08/24/notes-on-the-slice-rank-of-tensors/

6 September, 2016 at 12:10 am

Fedor PetrovThe multiplicative constant obtained on this way (variants of which are called Cramer theorem or Chernoff bound in probability and which is essentially a saddle-point method) is of course optimal. Further asymptotics (polynomial multiples) may be also obtained by saddle-point method, this is more technical.

6 September, 2016 at 5:27 am

WillBut in the probability theory problem we are trying to bound the tail probability of a fixed random variable. I agree that the method is optimal there. Here we have the freedom to choose the random variable, subject to some constraints. If you write down the constraints, then any way you slice it, you have a convex optimization problem for which Lagrange multipliers are a suitable tool.

Of course optimizing the random variable for progression-free sets in F_q^n with q=3 is not very hard. But for higher values of q it is nontrivial, though solved by these papers: https://arxiv.org/abs/1608.00243 https://arxiv.org/abs/1608.05740

26 October, 2016 at 5:04 am

Thomas Dybdahl AhleWait, why is that clearly an upper bound?

26 October, 2016 at 5:57 am

Thomas Dybdahl AhleAh, . You just flipped the inequality on .

9 September, 2016 at 9:48 am

Introduction to the polynomial method (and other similar things) | Short, Fat Matrices[…] result then follows from analyzing the asymptotic behavior of (see this post for details). The proof of the lemma is perhaps more interesting, considering this is where polynomials […]

15 May, 2017 at 1:40 am

Notes of the seminar ‘On the slice rank of functions’ | Points And Lines[…] try to elude the idea of ‘slice rank’ of a tensor, as was defined by Terry Tao in his reformulation of the Ellenberg-Gijswijt proof. I will try to indicate where the idea might have come from and how […]

23 June, 2017 at 2:55 pm

TheoryFest Day 3: Plenaries | A bunch of data[…] came out of the recent flurry of work by Croot, Lev, Pach, Ellenberg, Gijswijt and Tao (and cited Tao’s blog post as well as the papers, which I thought was […]

1 July, 2017 at 9:29 pm

수학동아 연재 (2016년) - Prof. Sang-il Oum[…] 참고 블로그글: Ellenberg 블로그, Tim Gowers 블로그, Terrence Tao 블로그 […]

16 April, 2018 at 7:21 am

GeorgeIn regards to Remark 5 and Will’s nice response to it above, it seems that Hongze Li has made progress for large q. https://arxiv.org/pdf/1610.00247.pdf

16 April, 2018 at 5:45 pm

George ShakanAfter reading the paper and getting in touch with the author, unfortunately there is an error.

19 March, 2019 at 4:03 am

수학동아 연재 (2016년) - Sang-il Oum[…] 참고 블로그글: Ellenberg 블로그, Tim Gowers 블로그, Terrence Tao 블로그 […]