This is a continuation of several threads from Tim Gowers’ massively collaborative mathematics project, “A combinatorial approach to density Hales-Jewett“, which I discussed in my previous post.

For any positive integer n, let {}[3]^n be the set of strings of length n consisting of 1s, 2s, and 3s, and define a combinatorial line to be a triple of such strings arising by taking a string in \{1,2,3,x\}^n with at least one wildcard x, and substituting x=1, x=2, x=3 in that string (e.g. xx1x3 would give the combinatorial line \{11113, 22123, 33133\}).  Call a set A \subset [3]^n of strings line-free if it contains no combinatorial lines, and let c_n be the size of the largest set A which is line-free.  We then have

Density Hales-Jewett theorem (k=3): c_n = o(3^n), i.e. \lim_{n \to \infty} c_n/3^n = 0.

This theorem implies several other important results, most notably Roth’s theorem on length three progressions (in an arbitrary abelian group!) and also the corners theorem of Ajtai and Szemeredi (again in an arbitrary abelian group).

Here are some of the questions we are initially exploring in this thread (and which I will comment further on below the fold):

  1. What are the best upper and lower bounds for c_n for small n?  Currently we know c_0=1, c_1=2, c_2=6, c_3=18, c_4=52, with some partial results for higher n.    We also have some relationships between different c_n.
  2. What do extremal or near-extremal sets (i.e. line-free sets with cardinality close to c_n) look like?
  3. What are some good constructions of line-free sets that lead to asymptotic lower bounds on c_n?  The best asymptotic lower bound we have currently is c_n \geq 3^{n - O( \sqrt{\log n} )}.
  4. Can these methods extend to other related problems, such as the Moser cube problem or the capset problem?
  5. How feasible is it to extend the existing combinatorial proofs of Roth’s theorem (or the corners theorem), in particular the arguments of Szemeredi, to the Density Hales-Jewett theorem?

But I imagine that further threads might develop in the course of the discussion.

As with the rest of the project, this is supposed to be an open collaboration: please feel free to pose a question or a comment, even if (or especially if) it is just barely non-trivial.  (For more on the “rules of the game”, see this post.)

— I. c_n for small n —

Because the Cartesian product of two line-free sets is line-free, we have a lower bound

c_{n+m} \geq c_n c_m (1)

for all n,m \geq 1, although this seems to be an inferior bound in practice.

There is a trivial bound

c_{n+1} \leq 3 c_n (2)

which comes from the observation that any line-free subset of {}[3]^{n+1} can be split into three slices, each of which is essentially a line-free subset of {}[3]^n.  In particular, since c_1 = 2, we see that

c_n \leq 2 \times 3^{n-1} (3)

for n \geq 1.

If we identify {}[3] = \{1,2,3\} with {\Bbb Z}/3{\Bbb Z} and consider the set

D_n := \{ (x_1,\ldots,x_n) \in {\Bbb Z}/3{\Bbb Z}: x_1+\ldots+x_n \neq 0 \hbox{ mod } 3 \}, (4)

thus |D_n| = 2 \times 3^{n-1}.  Observe that a combinatorial line with a,b,c,k 1s, 2s, 3s, and wildcards respectively lies in D_n if and only if k is a multiple of 3, and a is not equal to b mod 3.  In particular, D_n is line-free for n = 1,2,3, thus matching the upper bound (3) and concluding that

c_1 = 2; c_2 = 6; c_3 = 18. (5)

For n=4, the only combinatorial lines remaining in D_4 are 1xxx, 2xxx and their permutations.  Thus (as observed in Neylon.83) the set D_4 \backslash \{1111, 2222\} is line-free, leading to the lower bound c_4 \geq 52.

This bound was complemented with the lower bound c_4 \leq 52 in Jakobsen.90.  Thus c_4 = 52.

For c_5, the best lower bound currently is c_5 \geq 145 (Neylon.83) and c_5 \leq 3c_4 = 156.

No attempt at getting reasonable bounds for c_6, c_7, etc. appears in the previous threads.

Question I.A: Can we improve the upper and lower bounds on c_5, c_6, c_7?

Question I.B: Is there some reasonably efficient way to automate this process?

Question I.C: What do the extremal line-free sets (sets of size close to c_n) look like?

At some point we should submit this sequence to the OEIS.

— II. Lower bounds on c_n for large n —

For any integers a,b,c adding up to n, let \Gamma_{a,b,c} \subset [3]^n be the set of all strings with a 1s, b 2s, and c 3s.  Observe that if B is any set of triples (a,b,c) that avoids equilateral triangles (a+r,b,c), (a,b+r,c), (a,b,c+r) for r > 0, then the set \Gamma_B := \bigcup_{(a,b,c) \in B} \Gamma_{a,b,c} is line-free.  In particular, if we choose B to be the set of all triples (a,b,c) such that a+2b lies in a set free of length three arithmetic progressions (e.g. a Behrend set), then \Gamma_B is line-free.  This seems to give a lower bound

c_n \geq 3^{n - O(\sqrt{\log n})}. (6)

Question II.A: Using the Elkin bound on Behrend sets, what is the precise lower bound one gets? (Solymosi.46.5 notes that we should restrict the Behrend set to scale \sqrt{n} rather than n).

Question II.B: What does one get for higher k than 3?  Presumably one would use the O’Bryant bound (O’Bryant.126).

Question II.C: Do the examples from Section I extend to give competitive lower bounds for large n?

Question II.D: Are there better ways to avoid triangles or corners in two dimensions than simply lifting up the Behrend example from one dimension?

— III. Miscellanous facts about c_n

Using the colouring Hales-Jewett theorem, one can show that for every m, one has c_n < 3^{n-m} c_m for sufficiently large m, improving (2) very slightly; see (Tao.85).

Question III.A: Anything else we can say about the c_n?  For instance, is there a relation to Hales-Jewett numbers HJ(3,r) (defined as the first n such that every r-colouring of {}[3]^n contains a monochromatic combinatorial line?)

Note: as pointed out in Gasarch.45.5, Hindman and Tressler have recently established that HJ(3,2)=4, HJ(3,3) > 6, and HJ(4,2) > 6.

— IV. Related problems —

Let c'_n denote the largest subset of {}[3]^n which does not contain any geometric line (which is the same as a combinatorial line, but has a second wildcard y which goes from 3 to 1 whilst x goes from 1 to 3, e.g. xx2yy gives the geometric line 11233, 22222, 33211).  The Moser cube problem is to understand the behaviour of c'_n.  The first few values are (see OEIS A003142):

c'_0 = 1; c'_1 = 2; c'_2 = 6; c'_3 = 16; c'_4 = 43. (7)

It is clear that c'_n \leq c_n.

Elsholtz.43 proposed modifying the construction of (6) to give a similar lower bound on c'_n, but a difficulty was found in Solymosi.48.  Currently, the best known asymptotic bounds for c'_n are

3^n / \sqrt{n} \ll c'_n \ll o(3^n) (8)

Question IV.A.  Can we improve the lower bound in (8)?

Question IV.B. Can we get good bounds on c'_5, c'_6?

Define c''_n to be the largest cardinality of a subset of {}[3]^n which contains no algebraic lines, defined as a triple \{ x, x+r, x+2r\} where x, r \in ({\Bbb Z}/3{\Bbb Z})^n \equiv [3]^n; such sets are known as capsets.  Clearly c''_n \leq c'_n \leq c_n.  As noted my previous blog post, the best asymptotic bounds are

(2.2174\ldots)^n \leq c''_n \ll 3^n/n (9)

Question IV.C. Can we improve on these bounds in any way?

Question IV.D. What are the first few values of c''_n?  Brute force calculation reveals c''_0 = 1, c''_1 = 2, c''_2 = 4.  (Presumably this sequence is also in the OEIS.)

[Update, Feb 7: I will now maintain a running table of the current “world records” for c_n, c'_n, c''_n for small n, formed by combining all the bounds already mentioned in this post and comments.]

[Update, Feb 8: A more complete (and up-to-date) spreadsheet can now be found here.]

n c_n c'_n c''_n
0 1 1 1
1 2 2 2
2 6 6 4
3 18 16 9
4 52 43 20
5 [150,154] [96,129] 45
6 [450,462] [258,387] 112
7 [1302,1386] [688,1161] [236,292]
8 [3780,4158] [1849,3483] [472,773]
9 [11340,12474] [4128,10449] [1008,2075]
10 [32864,37422] [11094,31347] [2240,5632]
11 [95964,112266] [29584,94041] [5040,15425]
12 [287892,336798] [79507,282123] [12544,42569]
13 [837850,1010394] [175504,846369] [26432,118237]
14 [2458950,3031182] [477042,2539107] [52864,330222]
15 [7376850,9093946] [1272112,7617321] [112896,926687]
\to \infty {}[3^{n-O(\sqrt{\log n})}, o(3^n)] {}[\Omega(\frac{3^n}{\sqrt{n}}), o(3^n)] {}[(2.21\ldots)^n, O(\frac{3^n}{n})]
  • green – obtained via the inequality (1)
  • orange – obtained via the inequality (2)
  • lavender – uses the inequality c''_n \leq (3c''_{n-1}+1)/(1 + c''_{n-1}/3^{n-1})

— V. Szemeredi’s proof of Roth’s theorem —

Szemeredi provided a short proof of Roth’s theorem by purely combinatorial means, and also famously proved the full case of Szemeredi’s theorem by much more intricate combinatorial argument.  Both arguments are based on the density increment method.  In the setting of {}[3]^n, the density increment method revolves around the critical density \alpha = \lim_{n \to \infty} c_n / 3^n, which exists by (2); the density Hales-Jewett theorem is precisely the statement that \alpha vanishes, so suppose for contradiction that \alpha is positive.

Then one can find a line-free set A \subset [3]^n of density \alpha + o_{n \to \infty}(1) such that A has density at most \alpha + o_{m \to \infty}(1) for every m-dimensional subspace of {}[3]^n (and one can complement this lower bound with an upper bound for “most” subspaces, see Tao.100).  One can also regularise the density on other sets than subspaces, see Tao.121).  So, basically, the moment one gets a significant density increment on a non-trivial subspace, one is done (see Tao.120, O’Donnell.122, Tao.127).

Szemeredi’s proof of Roth’s theorem proceeds by first showing that a dense set A contains a large cube Q, and thus (if it is free of algebraic lines) must completely avoid the set 2.Q - A.  This set contains (most of) lots of parallel subspaces and so squeezes A to have higher density on some other subspace, closing the argument.  (See Tao.86, Tao.118, Tao.131)

Question V.A: Is there some way to adapt this argument to density Hales-Jewett, or at least to the corners problem?  The sparsity of lines is a serious difficulty, see Tao.118, Solymosi.135.

One can at least generate lots of cubes inside A without difficulty (Tao.102, Solymosi.135), lots of combinatorial lines with two elements inside A (Tao.130, Bukh.132, O’Donnell.133, Solymosi.135), and many unorganised subspaces in the complement of A (Tao.118, Solymosi.135), but this does not seem to be enough as yet.

Question V.B: What about using the techniques from Szemeredi’s big paper on Szemeredi’s theorem?  (The immediate issue is that there is a key point where one needs to regularise a graph, and the relevant graph in HJ or even for corners is far too sparse; I’ll try to clarify this at some point.)

— A note on comments —

As in the first thread, please number your comments (starting with 200, then 201, etc.) and provide a title, to assist with following the comments.  (Presumably, in future projects of this type, we will use a platform that allows for comment threading; see this post for further discussion.)