This is a continuation of several threads from Tim Gowers’ massively collaborative mathematics project, “A combinatorial approach to density Hales-Jewett“, which I discussed in my previous post.

For any positive integer n, let ${}^n$ be the set of strings of length n consisting of 1s, 2s, and 3s, and define a combinatorial line to be a triple of such strings arising by taking a string in $\{1,2,3,x\}^n$ with at least one wildcard x, and substituting x=1, x=2, x=3 in that string (e.g. xx1x3 would give the combinatorial line $\{11113, 22123, 33133\}$).  Call a set $A \subset ^n$ of strings line-free if it contains no combinatorial lines, and let $c_n$ be the size of the largest set A which is line-free.  We then have

Density Hales-Jewett theorem (k=3): $c_n = o(3^n)$, i.e. $\lim_{n \to \infty} c_n/3^n = 0$.

This theorem implies several other important results, most notably Roth’s theorem on length three progressions (in an arbitrary abelian group!) and also the corners theorem of Ajtai and Szemeredi (again in an arbitrary abelian group).

Here are some of the questions we are initially exploring in this thread (and which I will comment further on below the fold):

1. What are the best upper and lower bounds for $c_n$ for small n?  Currently we know $c_0=1, c_1=2, c_2=6, c_3=18, c_4=52$, with some partial results for higher n.    We also have some relationships between different $c_n$.
2. What do extremal or near-extremal sets (i.e. line-free sets with cardinality close to $c_n$) look like?
3. What are some good constructions of line-free sets that lead to asymptotic lower bounds on $c_n$?  The best asymptotic lower bound we have currently is $c_n \geq 3^{n - O( \sqrt{\log n} )}$.
4. Can these methods extend to other related problems, such as the Moser cube problem or the capset problem?
5. How feasible is it to extend the existing combinatorial proofs of Roth’s theorem (or the corners theorem), in particular the arguments of Szemeredi, to the Density Hales-Jewett theorem?

But I imagine that further threads might develop in the course of the discussion.

As with the rest of the project, this is supposed to be an open collaboration: please feel free to pose a question or a comment, even if (or especially if) it is just barely non-trivial.  (For more on the “rules of the game”, see this post.)

— I. $c_n$ for small n —

Because the Cartesian product of two line-free sets is line-free, we have a lower bound $c_{n+m} \geq c_n c_m$ (1)

for all $n,m \geq 1$, although this seems to be an inferior bound in practice.

There is a trivial bound $c_{n+1} \leq 3 c_n$ (2)

which comes from the observation that any line-free subset of ${}^{n+1}$ can be split into three slices, each of which is essentially a line-free subset of ${}^n$.  In particular, since $c_1 = 2$, we see that $c_n \leq 2 \times 3^{n-1}$ (3)

for $n \geq 1$.

If we identify ${} = \{1,2,3\}$ with ${\Bbb Z}/3{\Bbb Z}$ and consider the set $D_n := \{ (x_1,\ldots,x_n) \in {\Bbb Z}/3{\Bbb Z}: x_1+\ldots+x_n \neq 0 \hbox{ mod } 3 \}$, (4)

thus $|D_n| = 2 \times 3^{n-1}$.  Observe that a combinatorial line with a,b,c,k 1s, 2s, 3s, and wildcards respectively lies in $D_n$ if and only if k is a multiple of 3, and a is not equal to b mod 3.  In particular, $D_n$ is line-free for $n = 1,2,3$, thus matching the upper bound (3) and concluding that $c_1 = 2; c_2 = 6; c_3 = 18.$ (5)

For n=4, the only combinatorial lines remaining in $D_4$ are $1xxx, 2xxx$ and their permutations.  Thus (as observed in Neylon.83) the set $D_4 \backslash \{1111, 2222\}$ is line-free, leading to the lower bound $c_4 \geq 52$.

This bound was complemented with the lower bound $c_4 \leq 52$ in Jakobsen.90.  Thus $c_4 = 52$.

For $c_5$, the best lower bound currently is $c_5 \geq 145$ (Neylon.83) and $c_5 \leq 3c_4 = 156$.

No attempt at getting reasonable bounds for $c_6, c_7$, etc. appears in the previous threads.

Question I.A: Can we improve the upper and lower bounds on $c_5, c_6, c_7$?

Question I.B: Is there some reasonably efficient way to automate this process?

Question I.C: What do the extremal line-free sets (sets of size close to $c_n$) look like?

At some point we should submit this sequence to the OEIS.

— II. Lower bounds on $c_n$ for large n —

For any integers a,b,c adding up to n, let $\Gamma_{a,b,c} \subset ^n$ be the set of all strings with a 1s, b 2s, and c 3s.  Observe that if $B$ is any set of triples (a,b,c) that avoids equilateral triangles $(a+r,b,c), (a,b+r,c), (a,b,c+r)$ for $r > 0$, then the set $\Gamma_B := \bigcup_{(a,b,c) \in B} \Gamma_{a,b,c}$ is line-free.  In particular, if we choose B to be the set of all triples $(a,b,c)$ such that a+2b lies in a set free of length three arithmetic progressions (e.g. a Behrend set), then $\Gamma_B$ is line-free.  This seems to give a lower bound $c_n \geq 3^{n - O(\sqrt{\log n})}$. (6)

Question II.A: Using the Elkin bound on Behrend sets, what is the precise lower bound one gets? (Solymosi.46.5 notes that we should restrict the Behrend set to scale $\sqrt{n}$ rather than n).

Question II.B: What does one get for higher k than 3?  Presumably one would use the O’Bryant bound (O’Bryant.126).

Question II.C: Do the examples from Section I extend to give competitive lower bounds for large n?

Question II.D: Are there better ways to avoid triangles or corners in two dimensions than simply lifting up the Behrend example from one dimension?

— III. Miscellanous facts about $c_n$

Using the colouring Hales-Jewett theorem, one can show that for every m, one has $c_n < 3^{n-m} c_m$ for sufficiently large m, improving (2) very slightly; see (Tao.85).

Question III.A: Anything else we can say about the $c_n$?  For instance, is there a relation to Hales-Jewett numbers HJ(3,r) (defined as the first n such that every r-colouring of ${}^n$ contains a monochromatic combinatorial line?)

Note: as pointed out in Gasarch.45.5, Hindman and Tressler have recently established that $HJ(3,2)=4$, $HJ(3,3) > 6$, and $HJ(4,2) > 6$.

— IV. Related problems —

Let $c'_n$ denote the largest subset of ${}^n$ which does not contain any geometric line (which is the same as a combinatorial line, but has a second wildcard y which goes from 3 to 1 whilst x goes from 1 to 3, e.g. xx2yy gives the geometric line $11233, 22222, 33211$).  The Moser cube problem is to understand the behaviour of $c'_n$.  The first few values are (see OEIS A003142): $c'_0 = 1; c'_1 = 2; c'_2 = 6; c'_3 = 16; c'_4 = 43$. (7)

It is clear that $c'_n \leq c_n$.

Elsholtz.43 proposed modifying the construction of (6) to give a similar lower bound on $c'_n$, but a difficulty was found in Solymosi.48.  Currently, the best known asymptotic bounds for $c'_n$ are $3^n / \sqrt{n} \ll c'_n \ll o(3^n)$ (8)

Question IV.A.  Can we improve the lower bound in (8)?

Question IV.B. Can we get good bounds on $c'_5, c'_6$?

Define $c''_n$ to be the largest cardinality of a subset of ${}^n$ which contains no algebraic lines, defined as a triple $\{ x, x+r, x+2r\}$ where $x, r \in ({\Bbb Z}/3{\Bbb Z})^n \equiv ^n$; such sets are known as capsets.  Clearly $c''_n \leq c'_n \leq c_n$.  As noted my previous blog post, the best asymptotic bounds are $(2.2174\ldots)^n \leq c''_n \ll 3^n/n$ (9)

Question IV.C. Can we improve on these bounds in any way?

Question IV.D. What are the first few values of $c''_n$?  Brute force calculation reveals $c''_0 = 1, c''_1 = 2, c''_2 = 4$.  (Presumably this sequence is also in the OEIS.)

[Update, Feb 7: I will now maintain a running table of the current “world records” for $c_n, c'_n, c''_n$ for small n, formed by combining all the bounds already mentioned in this post and comments.]

[Update, Feb 8: A more complete (and up-to-date) spreadsheet can now be found here.]

 n $c_n$ $c'_n$ $c''_n$ 0 1 1 1 1 2 2 2 2 6 6 4 3 18 16 9 4 52 43 20 5 [150,154] [96,129] 45 6 [450,462] [258,387] 112 7 [1302,1386] [688,1161] [236,292] 8 [3780,4158] [1849,3483] [472,773] 9 [11340,12474] [4128,10449] [1008,2075] 10 [32864,37422] [11094,31347] [2240,5632] 11 [95964,112266] [29584,94041] [5040,15425] 12 [287892,336798] [79507,282123] [12544,42569] 13 [837850,1010394] [175504,846369] [26432,118237] 14 [2458950,3031182] [477042,2539107] [52864,330222] 15 [7376850,9093946] [1272112,7617321] [112896,926687] $\to \infty$ ${}[3^{n-O(\sqrt{\log n})}, o(3^n)]$ ${}[\Omega(\frac{3^n}{\sqrt{n}}), o(3^n)]$ ${}[(2.21\ldots)^n, O(\frac{3^n}{n})]$
• green – obtained via the inequality (1)
• orange – obtained via the inequality (2)
• lavender – uses the inequality $c''_n \leq (3c''_{n-1}+1)/(1 + c''_{n-1}/3^{n-1})$

— V. Szemeredi’s proof of Roth’s theorem —

Szemeredi provided a short proof of Roth’s theorem by purely combinatorial means, and also famously proved the full case of Szemeredi’s theorem by much more intricate combinatorial argument.  Both arguments are based on the density increment method.  In the setting of ${}^n$, the density increment method revolves around the critical density $\alpha = \lim_{n \to \infty} c_n / 3^n$, which exists by (2); the density Hales-Jewett theorem is precisely the statement that $\alpha$ vanishes, so suppose for contradiction that $\alpha$ is positive.

Then one can find a line-free set $A \subset ^n$ of density $\alpha + o_{n \to \infty}(1)$ such that A has density at most $\alpha + o_{m \to \infty}(1)$ for every m-dimensional subspace of ${}^n$ (and one can complement this lower bound with an upper bound for “most” subspaces, see Tao.100).  One can also regularise the density on other sets than subspaces, see Tao.121).  So, basically, the moment one gets a significant density increment on a non-trivial subspace, one is done (see Tao.120, O’Donnell.122, Tao.127).

Szemeredi’s proof of Roth’s theorem proceeds by first showing that a dense set A contains a large cube Q, and thus (if it is free of algebraic lines) must completely avoid the set $2.Q - A$.  This set contains (most of) lots of parallel subspaces and so squeezes A to have higher density on some other subspace, closing the argument.  (See Tao.86, Tao.118, Tao.131)

Question V.A: Is there some way to adapt this argument to density Hales-Jewett, or at least to the corners problem?  The sparsity of lines is a serious difficulty, see Tao.118, Solymosi.135.

One can at least generate lots of cubes inside A without difficulty (Tao.102, Solymosi.135), lots of combinatorial lines with two elements inside A (Tao.130, Bukh.132, O’Donnell.133, Solymosi.135), and many unorganised subspaces in the complement of A (Tao.118, Solymosi.135), but this does not seem to be enough as yet.

Question V.B: What about using the techniques from Szemeredi’s big paper on Szemeredi’s theorem?  (The immediate issue is that there is a key point where one needs to regularise a graph, and the relevant graph in HJ or even for corners is far too sparse; I’ll try to clarify this at some point.)

— A note on comments —

As in the first thread, please number your comments (starting with 200, then 201, etc.) and provide a title, to assist with following the comments.  (Presumably, in future projects of this type, we will use a platform that allows for comment threading; see this post for further discussion.)