The sunflower lemma via Shannon entropy

20 July, 2020 in expository, math.CO, math.IT | Tags: Shannon entropy, sunflowers | by Terence Tao

A family ${A_1,\dots,A_r}$ of sets for some ${r \geq 1}$ is a sunflower if there is a core set ${A_0}$ contained in each of the ${A_i}$ such that the petal sets ${A_i \backslash A_0, i=1,\dots,r}$ are disjoint. If ${k,r \geq 1}$ , let ${\mathrm{Sun}(k,r)}$ denote the smallest natural number with the property that any family of ${\mathrm{Sun}(k,r)}$ distinct sets of cardinality at most ${k}$ contains ${r}$ distinct elements ${A_1,\dots,A_r}$ that form a sunflower. The celebrated Erdös-Rado theorem asserts that ${\mathrm{Sun}(k,r)}$ is finite; in fact Erdös and Rado gave the bounds

$\displaystyle (r-1)^k \leq \mathrm{Sun}(k,r) \leq (r-1)^k k! + 1. \ \ \ \ \ (1)$

The sunflower conjecture asserts in fact that the upper bound can be improved to ${\mathrm{Sun}(k,r) \leq O(1)^k r^k}$ . This remains open at present despite much effort (including a Polymath project); after a long series of improvements to the upper bound, the best general bound known currently is

$\displaystyle \mathrm{Sun}(k,r) \leq O( r \log(kr) )^k \ \ \ \ \ (2)$

for all ${k,r \geq 2}$ , established in 2019 by Rao (building upon a recent breakthrough a month previously of Alweiss, Lovett, Wu, and Zhang). Here we remove the easy cases ${k=1}$ or ${r=1}$ in order to make the logarithmic factor ${\log(kr)}$ a little cleaner.

Rao’s argument used the Shannon noiseless coding theorem. It turns out that the argument can be arranged in the very slightly different language of Shannon entropy, and I would like to present it here. The argument proceeds by locating the core and petals of the sunflower separately (this strategy is also followed in Alweiss-Lovett-Wu-Zhang). In both cases the following definition will be key. In this post all random variables, such as random sets, will be understood to be discrete random variables taking values in a finite range. We always use boldface symbols to denote random variables, and non-boldface for deterministic quantities.

Definition 1 (Spread set) Let ${R > 1}$ . A random set ${{\bf A}}$ is said to be ${R}$ -spread if one has
$\displaystyle {\mathbb P}( S \subset {\bf A}) \leq R^{-|S|}$
for all sets ${S}$ . A family ${(A_i)_{i \in I}}$ of sets is said to be ${R}$ -spread if ${I}$ is non-empty and the random variable ${A_{\bf i}}$ is ${R}$ -spread, where ${{\bf i}}$ is drawn uniformly from ${I}$ .

The core can then be selected greedily in such a way that the remainder of a family becomes spread:

Lemma 2 (Locating the core) Let ${(A_i)_{i \in I}}$ be a family of subsets of a finite set ${X}$ , each of cardinality at most ${k}$ , and let ${R > 1}$ . Then there exists a “core” set ${S_0}$ of cardinality at most ${k}$ such that the set
$\displaystyle J := \{ i \in I: S_0 \subset A_i \} \ \ \ \ \ (3)$
has cardinality at least ${R^{-|S_0|} |I|}$ , and such that the family ${(A_j \backslash S_0)_{j \in J}}$ is ${R}$ -spread. Furthermore, if ${|I| > R^k}$ and the ${A_i}$ are distinct, then ${|S_0| < k}$ .

Proof: We may assume ${I}$ is non-empty, as the claim is trivial otherwise. For any ${S \subset X}$ , define the quantity

$\displaystyle Q(S) := R^{|S|} |\{ i \in I: S \subset A_i\}|,$

and let ${S_0}$ be a subset of ${X}$ that maximizes ${Q(S_0)}$ . Since ${Q(\emptyset) = |I| > 0}$ and ${Q(S)=0}$ when ${|S| >k}$ , we see that ${0 \leq |S_0| \leq K}$ . If the ${A_i}$ are distinct and ${|I| > R^k}$ , then we also have ${Q(S) \leq R^k < |I| = Q(\emptyset)}$ when ${|S|=k}$ , thus in this case we have ${|S_0| < k}$ .

Let ${J}$ be the set (3). Since ${Q(S_0) \geq Q(\emptyset)>0}$ , ${J}$ is non-empty. It remains to check that the family ${(A_j \backslash S_0)_{j \in J}}$ is ${R}$ -spread. But for any ${S \subset X}$ and ${{\bf j}}$ drawn uniformly at random from ${J}$ one has

$\displaystyle {\mathbb P}( S \subset A_{\bf j} \backslash S_0 ) = \frac{|\{ i \in I: S_0 \cup S \subset A_i\}|}{|\{ i \in I: S_0 \subset A_i\}|} = R^{|S_0|-|S_0 \cup S|} \frac{Q(S)}{Q(S_0)}.$

Observe that ${Q(S) \leq Q(S_0)}$ , and the probability is only non-empty when ${S, S_0}$ are disjoint, so that ${|S_0|-|S_0 \cup S| = - |S|}$ . The claim follows. $\Box$

In view of the above lemma, the bound (2) will then follow from

Proposition 3 (Locating the petals) Let ${r, k \geq 2}$ be natural numbers, and suppose that ${R \geq C r \log(kr)}$ for a sufficiently large constant ${C}$ . Let ${(A_i)_{i \in I}}$ be a finite family of subsets of a finite set ${X}$ , each of cardinality at most ${k}$ which is ${R}$ -spread. Then there exist ${i_1,\dots,i_r \in I}$ such that ${A_{i_1},\dots,A_{i_r}}$ is disjoint.

Indeed, to prove (2), we assume that ${(A_i)_{i \in I}}$ is a family of sets of cardinality greater than ${R^k}$ for some ${R \geq Cr \log(kr)}$ ; by discarding redundant elements and sets we may assume that ${I}$ is finite and that all the ${A_i}$ are contained in a common finite set ${X}$ . Apply Lemma 2 to find a set ${S_0 \subset X}$ of cardinality ${|S_0| < k}$ such that the family ${(A_j \backslash S_0)_{j \in J}}$ is ${R}$ -spread. By Proposition 3 we can find ${j_1,\dots,j_r \in J}$ such that ${A_{j_1} \backslash S_0,\dots,A_{j_r} \backslash S_0}$ are disjoint; since these sets have cardinality ${k - |S_0| > 0}$ , this implies that the ${j_1,\dots,j_r}$ are distinct. Hence ${A_{j_1},\dots,A_{j_r}}$ form a sunflower as required.

Remark 4 Proposition 3 is easy to prove if we strengthen the condition on ${R}$ to ${R > k(r-1)}$ . In this case, we have ${\mathop{\bf P}_{i \in I}( x \in A_i) < 1/k(r-1)}$ for every ${x \in X}$ , hence by the union bound we see that for any ${i_1,\dots,i_j \in I}$ with ${j \leq r-1}$ there exists ${i_{j+1} \in I}$ such that ${A_{i_{j+1}}}$ is disjoint from the set ${A_{i_1} \cup \dots \cup A_{i_j}}$ , which has cardinality at most ${k(r-1)}$ . Iterating this, we obtain the conclusion of Proposition 3 in this case. This recovers a bound of the form ${\mathrm{Sun}(k,r) \leq (k(r-1))^k+1}$ , and by pursuing this idea a little further one can recover the original upper bound (1) of Erdös and Rado.

It remains to prove Proposition 3. In fact we can locate the petals one at a time, placing each petal inside a random set.

Proposition 5 (Locating a single petal) Let the notation and hypotheses be as in Proposition 3. Let ${{\bf V}}$ be a random subset of ${X}$ , such that each ${x \in X}$ lies in ${{\bf V}}$ with an independent probability of ${1/r}$ . Then with probability greater than ${1-1/r}$ , ${{\bf V}}$ contains one of the ${A_i}$ .

To see that Proposition 5 implies Proposition 3, we randomly partition ${X}$ into ${{\bf V}_1 \cup \dots \cup {\bf V}_r}$ by placing each ${x \in X}$ into one of the ${{\bf V}_j}$ , ${j=1,\dots,r}$ chosen uniformly and independently at random. By Proposition 5 and the union bound, we see that with positive probability, it is simultaneously true for all ${j=1,\dots,r}$ that each ${{\bf V}_j}$ contains one of the ${A_i}$ . Selecting one such ${A_i}$ for each ${{\bf V}_j}$ , we obtain the required disjoint petals.

We will prove Proposition 5 by gradually increasing the density of the random set and arranging the sets ${A_i}$ to get quickly absorbed by this random set. The key iteration step is

Proposition 6 (Refinement inequality) Let ${R > 1}$ and ${0 < \delta < 1}$ . Let ${{\bf A}}$ be a random subset of a finite set ${X}$ which is ${R}$ -spread, and let ${{\bf V}}$ be a random subset of ${X}$ independent of ${{\bf A}}$ , such that each ${x \in X}$ lies in ${{\bf V}}$ with an independent probability of ${\delta}$ . Then there exists another ${R}$ -spread random subset ${{\bf A}'}$ of ${X}$ whose support is contained in the support of ${{\bf A}}$ , such that ${{\bf A}' \backslash {\bf V} \subset {\bf A}}$ and
$\displaystyle {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq \frac{5}{\log(R\delta)} {\mathbb E} |{\bf A}|.$

Note that a direct application of the first moment method gives only the bound

$\displaystyle {\mathbb E} |{\bf A} \backslash {\bf V}| \leq (1-\delta) {\mathbb E} |{\bf A}|,$

but the point is that by switching from ${{\bf A}}$ to an equivalent ${{\bf A}'}$ we can replace the ${1-\delta}$ factor by a quantity significantly smaller than ${1}$ .

One can iterate the above proposition, repeatedly replacing ${{\bf A}, X}$ with ${{\bf A}' \backslash {\bf V}, X \backslash {\bf V}}$ (noting that this preserves the ${R}$ -spread nature of ${{\bf A}}$ ) to conclude

Corollary 7 (Iterated refinement inequality) Let ${R > 1}$ , ${0 < \delta < 1}$ , and ${m \geq 1}$ . Let ${{\bf A}}$ be a random subset of a finite set ${X}$ which is ${R}$ -spread, and let ${{\bf V}}$ be a random subset of ${X}$ independent of ${{\bf A}}$ , such that each ${x \in X}$ lies in ${{\bf V}}$ with an independent probability of ${1-(1-\delta)^m}$ . Then there exists another random subset ${{\bf A}'}$ of ${X}$ with support contained in the support of ${{\bf A}}$ , such that
$\displaystyle {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq (\frac{5}{\log(R\delta)})^m {\mathbb E} |{\bf A}|.$

Now we can prove Proposition 5. Let ${m}$ be chosen shortly. Applying Corollary 7 with ${{\bf A}}$ drawn uniformly at random from the ${(A_i)_{i \in I}}$ , and setting ${1-(1-\delta)^m = 1/r}$ , or equivalently ${\delta = 1 - (1 - 1/r)^{1/m}}$ , we have

$\displaystyle {\mathbb E} |{\bf A}' \backslash {\bf V}| \leq (\frac{5}{\log(R\delta)})^m k.$

In particular, if we set ${m = \lceil \log kr \rceil}$ , so that ${\delta \sim \frac{1}{r \log kr}}$ , then by choice of ${R}$ we have ${\frac{5}{\log(R\delta)} < \frac{1}{2}}$ , hence

$\displaystyle {\mathbb E} |{\bf A}' \backslash {\bf V}| < \frac{1}{r}.$

In particular with probability at least ${1 - \frac{1}{r}}$ , there must exist ${A_i}$ such that ${|A_i \backslash {\bf V}| = 0}$ , giving the proposition.

It remains to establish Proposition 6. This is the difficult step, and requires a clever way to find the variant ${{\bf A}'}$ of ${{\bf A}}$ that has better containment properties in ${{\bf V}}$ than ${{\bf A}}$ does. The main trick is to make a conditional copy ${({\bf A}', {\bf V}')}$ of ${({\bf A}, {\bf V})}$ that is conditionally independent of ${({\bf A}, {\bf V})}$ subject to the constraint ${{\bf A} \cup {\bf V} = {\bf A}' \cup {\bf V}'}$ . The point here is that this constrant implies the inclusions

$\displaystyle {\bf A}' \backslash {\bf V} \subset {\bf A} \cap {\bf A}' \subset {\bf A} \ \ \ \ \ (4)$

and

$\displaystyle {\bf A}' \backslash {\bf A} \subset {\bf V}. \ \ \ \ \ (5)$

Because of the ${R}$ -spread hypothesis, it is hard for ${{\bf A}}$ to contain any fixed large set. If we could apply this observation in the contrapositive to ${{\bf A} \cap {\bf A}'}$ we could hope to get a good upper bound on the size of ${{\bf A} \cap {\bf A}'}$ and hence on ${{\bf A} \backslash {\bf V}}$ thanks to (4). One can also hope to improve such an upper bound by also employing (5), since it is also hard for the random set ${{\bf V}}$ to contain a fixed large set. There are however difficulties with implementing this approach due to the fact that the random sets ${{\bf A} \cap {\bf A}', {\bf A}' \backslash {\bf A}}$ are coupled with ${{\bf A}, {\bf V}}$ in a moderately complicated fashion. In Rao’s argument a somewhat complicated encoding scheme was created to give information-theoretic control on these random variables; below the fold we accomplish a similar effect by using Shannon entropy inequalities in place of explicit encoding. A certain amount of information-theoretic sleight of hand is required to decouple certain random variables to the extent that the Shannon inequalities can be effectively applied. The argument bears some resemblance to the “entropy compression method” discussed in this previous blog post; there may be a way to more explicitly express the argument below in terms of that method. (There is also some kinship with the method of dependent random choice, which is used for instance to establish the Balog-Szemerédi-Gowers lemma, and was also translated into information theoretic language in these unpublished notes of Van Vu and myself.)

— 1. Shannon entropy —

In this section we lay out all the tools from the theory of Shannon entropy that we will need.

Define an empirical sequence for a random variable ${{\bf X}}$ taking values in a discrete set ${S}$ to be a sequence ${X_1,X_2,\dots}$ in ${S}$ such that the empirical samples of this sequence converge in distribution to ${{\bf X}}$ in the sense that

$\displaystyle {\mathbb P}( X_{\bf n} = X ) = {\mathbb P}( {\bf X} = X ) + o(1) \ \ \ \ \ (6)$

as ${N \rightarrow \infty}$ , where ${{\bf n} = {\bf n}_N}$ is a uniform random variable drawn from ${\{1,\dots,N\}}$ , and ${o(1)}$ denotes a quantity that goes to zero as ${N \rightarrow \infty}$ (uniformly over other parameters independent of ${N}$ ). Every random variable ${{\bf X}}$ has at least one empirical sequence; indeed, from the strong law of large numbers one can find such a sequence almost surely by taking an infinite number of independent trials of ${{\bf X}}$ .

If ${{\bf X}}$ is a random variable taking values in some set ${S}$ , its Shannon entropy ${{\mathbb H}({\bf X})}$ is defined by the formula

$\displaystyle {\mathbb H}({\bf X}) := \sum_{X \in S} {\mathbb P}( {\bf X} = X ) \log \frac{1}{{\mathbb P}({\bf X} = X)}$

with the convention ${0 \log \frac{1}{0} = 0}$ , and using the base two logarithm throughout. We can condition the Shannon entropy to events ${E}$ of positive probability,

$\displaystyle {\mathbb H}({\bf X}|E) := \sum_{X \in S} {\mathbb P}( {\bf X} = X |E) \log \frac{1}{{\mathbb P}({\bf X} = X|E)},$

and condition the Shannon entropy to other random variables ${{\bf Y}}$ taking values in a set ${T}$ by

$\displaystyle {\mathbb H}({\bf X}|{\bf Y}) := \sum_{Y \in T} {\mathbb P}( {\bf Y} = Y) {\mathbb H}( {\bf X} | {\bf Y} = Y)$

with the convention that the summand is zero if ${{\mathbb P}({\bf Y} = Y) = 0}$ .

We record the following standard and easily verified facts:

Lemma 8 (Basic Shannon inequalities) Let ${{\bf X}, {\bf Y}, {\bf Z}}$ be random variables.

(i) (Monotonicity) If ${{\bf X}}$ is a deterministic function ${{\bf X} = f({\bf Y})}$ of ${{\bf Y}}$ , then ${0 \leq {\mathbb H}({\bf X}) \leq {\mathbb H}({\bf Y})}$ . More generally, if ${{\bf X}}$ is a deterministic function ${{\bf X} = f({\bf Y}, {\bf Z})}$ of ${{\bf Y}}$ and ${{\bf Z}}$ , then ${0 \leq {\mathbb H}({\bf X}|{\bf Z}) \leq {\mathbb H}({\bf Y}|{\bf Z})}$ . If ${{\bf Y}}$ is a deterministic function of ${{\bf Z}}$ , then ${{\mathbb H}({\bf X}|{\bf Z}) \leq {\mathbb H}({\bf X}|{\bf Y})}$ .
(ii) (Subadditivity) One has ${{\mathbb H}({\bf X}, {\bf Y}) \leq {\mathbb H}({\bf X})+{\mathbb H}({\bf Y})}$ , with equality iff ${{\bf X}}$ , ${{\bf Y}}$ are independent. More generally, one has ${{\mathbb H}({\bf X}, {\bf Y}|{\bf Z}) \leq {\mathbb H}({\bf X}|{\bf Z})+{\mathbb H}({\bf Y}|{\bf Z})}$ , with equality iff ${{\bf X}}$ , ${{\bf Y}}$ are conditionally independent with respect to ${{\bf Z}}$ .
(iii) (Chain rule) One has ${{\mathbb H}({\bf X}, {\bf Y}) = {\mathbb H}({\bf X}) + {\mathbb H}({\bf Y}|{\bf X}) = {\mathbb H}({\bf Y}) + {\mathbb H}({\bf X}|{\bf Y})}$ . More generally ${{\mathbb H}({\bf X}, {\bf Y}|{\bf Z}) = {\mathbb H}({\bf X}|{\bf Z}) + {\mathbb H}({\bf Y}|{\bf X},{\bf Z}) = {\mathbb H}({\bf Y}|{\bf Z}) + {\mathbb H}({\bf X}|{\bf Y},{\bf Z})}$ . In particular ${0 \leq {\mathbb H}({\bf X}|{\bf Y}) \leq {\mathbb H}({\bf X})}$ , and ${{\mathbb H}({\bf X}|{\bf Y}) = {\mathbb H}({\bf X})}$ iff ${{\bf X},{\bf Y}}$ are independent; similarly, ${0 \leq {\mathbb H}({\bf X}|{\bf Y},{\bf Z}) \leq {\mathbb H}({\bf X}|{\bf Z})}$ , and ${{\mathbb H}({\bf X}|{\bf Y},{\bf Z}) = {\mathbb H}({\bf X}|{\bf Z})}$ iff ${{\bf X},{\bf Y}}$ are conditionally independent with respect to ${{\bf Z}}$ .
(iv) (Jensen’s inequality) If ${{\bf X}}$ takes values in a finite set ${S}$ then ${{\mathbb H}({\bf X}) \leq \log |S|}$ , with equality iff ${{\bf X}}$ is uniformly distributed in ${S}$ . More generally, if ${{\bf X}}$ takes values in a set ${S_{\bf Y}}$ that depends on ${{\bf Y}}$ , then ${{\mathbb H}({\bf X}|{\bf Y}) \leq {\mathbb E} \log |S_{\bf Y}|}$ , with equality iff ${{\bf X}}$ is uniformly distributed in ${S_{\bf Y}}$ after conditioning on ${{\bf Y}}$ .
(v) (Gibbs inequality) If ${{\bf X}, {\bf Y}}$ take values in the same finite set ${S}$ , then
$\displaystyle {\mathbb H}({\bf X}) \leq \sum_{X \in S} {\mathbb P}({\bf X} = X) \log \frac{1}{{\mathbb P}({\bf Y}=X)}$
(we permit the right-hand side to be infinite, which makes the inequality vacuously true).

See this previous blog post for some intuitive analogies to understand Shannon entropy.

Now we establish some inequalities of relevance to random sets.

We first observe that any small random set largely determines any of its subsets. Define a random subset of a random set ${{\bf B}}$ to be a random set ${{\bf A}}$ such that ${{\bf B} \subset {\bf A}}$ holds almost surely.

Lemma 9 (Subsets of small sets have small conditional entropy) Let ${{\bf B}}$ be a random finite set.

(i) One has ${{\mathbb H}({\bf A}|{\bf B}) \leq {\mathbb E} |{\bf B}|}$ for any random subset ${{\bf A}}$ of ${{\bf B}}$ .
(ii) One has ${{\mathbb H}( |{\bf B}| ) \leq 1+{\mathbb E} |{\bf B}|}$ . If ${{\bf B}}$ is almost surely non-empty, we can improve this to ${{\mathbb H}( |{\bf B}| ) \leq {\mathbb E} |{\bf B}|}$ .

Proof: The set ${{\bf A}}$ takes values in the power set ${2^{{\bf B}}}$ of ${{\bf B}}$ , so the claim (i) follows from Lemma 8(iv). (Note how it is convenient here that we are using the base ${2}$ for the logarithm.)

For (ii), apply Lemma 8(v) with ${{\bf X} = |{\bf B}|}$ and ${{\bf Y}}$ the geometric random variable ${{\mathbb P}({\bf Y} = n) = 2^{-n-1}}$ for natural numbers ${n}$ (or ${{\mathbb P}({\bf Y} = n) = 2^{-n}}$ for positive ${n}$ , if ${{\bf B}}$ is non-empty). $\Box$

Now we encode the property of a random variable ${{\bf A}}$ being ${R}$ -spread in the language of Shannon entropy.

Lemma 10 (Information-theoretic interpretation of spread) Let ${{\bf A}}$ be a random finite set that is ${R}$ -spread for some ${R \geq 1}$ .

(i) If ${{\bf A}}$ is uniformly distributed amongst some finite collection ${{\mathcal A}}$ of sets, then
$\displaystyle {\mathbb H}({\bf A} | {\bf S} ) \leq {\mathbb H}({\bf A}) - (\log R) {\mathbb E} |{\bf S}|$
for all random subsets ${{\bf S}}$ of ${{\bf A}}$ .
(ii) In the general case, if ${A_1, A_2, A_3,\dots}$ are an empirical sequence of ${{\bf A}}$ , then
$\displaystyle {\mathbb H}({\bf n} | {\bf S} ) \leq {\mathbb H}({\bf n}) - (\log R) {\mathbb E} |{\bf S}| - o(1)$
as ${N \rightarrow \infty}$ , where ${{\bf n}}$ is drawn uniformly from ${\{1,\dots,N\}}$ and ${{\bf S}}$ is a random subset of ${A_{\bf n}}$ .

Informally: large random subsets of an ${R}$ -spread set ${{\bf A}}$ necessarily have a lot of mutual information with ${{\bf A}}$ . Conversely, one can bound the size of a random subset of an ${R}$ -spread set by bounding its mutual information with ${{\bf A}}$ .

Proof: In case (i), it suffices by Lemma 8(iv) to establish the bound

$\displaystyle {\mathbb H}({\bf A} | {\bf S} = S ) \leq \log |{\mathcal A}| - (\log R) {\mathbb E} |S|$

for each finite set ${S}$ in the range of ${{\bf S}}$ . But as ${{\bf A}}$ is ${R}$ -spread and uniformly distributed in ${{\mathcal A}}$ , there are at most ${R^{-S} |{\mathcal A}|}$ elements of ${{\mathcal A}}$ that contain ${S}$ , and the claim follows from a second appeal to Lemma 8(iv). Case (ii) follows from case (i) thanks to (6). $\Box$

Given a finite non-empty set ${X}$ and ${0 \leq k \leq |X|}$ , let ${\binom{X}{k}}$ denote the collection of ${k}$ -element subsets of ${X}$ . A uniformly chosen element ${{\bf V}}$ of ${\binom{X}{k}}$ is thus a random ${k}$ -element subset of ${X}$ ; we refer to the quantity ${\delta = k/|X|}$ as the density of this random subset, and ${{\bf V}}$ as a uniformly chosen random subset of ${X}$ of density ${\delta}$ . (Of course, this is only defined when ${0 \leq \delta \leq X}$ is an integer multiple of ${1/|X|}$ .) Uniformly chosen random sets have the following information-theoretic relationships to small random sets:

Lemma 11 Let ${X}$ be a finite non-empty set, let ${{\bf W}}$ be a uniformly chosen random subset of ${X}$ of some density ${0 \leq \delta \leq 1}$ (which is a multiple of ${1/|X|}$ ).

(i) (Spread) If ${{\bf S}}$ is a random subset of ${{\bf W}}$ , then
$\displaystyle {\mathbb H}({\bf W} | {\bf S} ) \leq {\mathbb H}({\bf W}) - (\log \frac{1}{\delta}) {\mathbb E} |{\bf S}|.$

(ii) (Absorption) If ${{\bf S}}$ is a random subset of ${X}$ , then
$\displaystyle {\mathbb H}({\bf W} \cup {\bf S} ) \leq {\mathbb H}({\bf W}) + 1 + (1+\log \frac{1}{\delta}) {\mathbb E} |{\bf S}|.$

Proof: To prove (i), it suffices by Lemma 10(i) to show that ${{\bf W}}$ is ${\frac{1}{\delta}}$ -spread, which amounts to showing that

$\displaystyle \binom{|X \backslash S|}{\delta|X| - |S|} \leq \delta^{|S|} \binom{|X|}{\delta |X|}$

for all subsets ${S \subset X}$ with ${0 \leq |S| \leq \delta |X|}$ . But this comes from multiplying together the inequalities ${\frac{\delta |X|-i}{|X|-i} \leq \delta}$ for ${0 \leq i < |S|}$ .

For (ii), by replacing ${{\bf S}}$ with ${{\bf S} \backslash {\bf W}}$ we may assume that ${{\bf S}, {\bf W}}$ are disjoint. From Lemma 8(iii) and Lemma 9(ii) it suffices to show that

$\displaystyle {\mathbb H}({\bf W} \cup {\bf S} | |{\bf S}| ) \leq {\mathbb H}({\bf W}) + \log \frac{1}{\delta} {\mathbb E} |{\bf S}|$

which in turn is implied by

$\displaystyle {\mathbb H}({\bf W} \cup {\bf S} | |{\bf S}| = k ) \leq {\mathbb H}({\bf W}) + \log \frac{1}{\delta} k$

for each ${0 \leq k \leq |X| - \delta |X|}$ . By Lemma 8(iv) it suffices to show that

$\displaystyle \binom{|X|}{\delta |X|+k} \leq \delta^{-k} \binom{|X|}{\delta |X|}$

but this follows from multiplying together the inequalities ${\frac{X+i}{\delta |X|+i}\leq \frac{1}{\delta}}$ for ${1 \leq i \leq k}$ . $\Box$

The following “relative product” construction will be important for us. Given a random variable ${{\bf X}}$ and a deterministic function ${f({\bf X})}$ of that variable, one can construct a conditionally independent copy ${{\bf X}'}$ of ${{\bf X}}$ subject to the condition ${f({\bf X})=f({\bf X}')}$ , with the joint distribution

$\displaystyle {\mathbb P}( ({\bf X}, {\bf X}') = (X,X') ) = 1_{f(X)=f(X')} {\mathbb P}( f({\bf X}) = f(X) ) {\mathbb P}( {\bf X} = X | f({\bf X}) = f(X) ) {\mathbb P}( {\bf X} = X' | f({\bf X}) = f(X) ).$

(Note that this is usually not the same as starting with a completely independent copy ${{\bf X}'}$ of ${{\bf X}}$ and then conditioning to the event ${f({\bf X}) = f({\bf X}')}$ ; cf. Simpson’s paradox.) By construction, ${{\bf X}'}$ has the same distribution as ${{\bf X}}$ , and is conditionally independent of ${{\bf X}}$ relative to ${{\bf X}'}$ . In particular, from Lemma 8 we have

$\displaystyle {\mathbb H}( {\bf X} | {\bf X}' ) = {\mathbb H}( {\bf X} | f({\bf X}) ) = {\mathbb H}({\bf X}) - {\mathbb H}(f({\bf X})) \ \ \ \ \ (7)$

which we can also write as a “entropy Cauchy-Schwarz identity”

$\displaystyle {\mathbb H}({\bf X}, {\bf X}' ) = 2 {\mathbb H}({\bf X}) - {\mathbb H}(f({\bf X})).$

This can be compared with the combinatorial inequality

$\displaystyle |\{ (x,x') \in X^2: f(x)=f(x') \}| \geq \frac{|X|^2}{|f(X)|}$

or equivalently

$\displaystyle \log |\{ (x,x') \in X^2: f(x)=f(x') \}| \geq 2 \log |X| - \log |f(X)|$

whenever ${f: X \rightarrow Y}$ is a function on a non-empty finite set ${X}$ , which is easily proven as a consequence of Cauchy-Schwarz. One nice advantage of the entropy formalism over the combinatorial one is that the analogue of this instance of the Cauchy-Schwarz inequality automatically becomes an equality (this is related to the asymptotic equipartition property in the microstates interpretation of entropy).

— 2. Proof of refinement inequality —

Now we have enough tools to prove Proposition 6. Let ${R, \delta, {\bf A}, {\bf V}}$ be as in that proposition. If ${\emptyset}$ is in the support of ${{\bf A}}$ we can set ${{\bf A}' = \emptyset}$ so we can instead assume that ${{\bf A}}$ is never empty. In particular

$\displaystyle {\mathbb E} |{\bf A}| \geq 1. \ \ \ \ \ (8)$

In order to use Lemma 10 we fix an empirical sequence ${A_1,A_2,\dots}$ for ${{\bf A}}$ . We relabel ${X}$ as ${\{1,\dots,|X|\}}$ , and let ${N \geq |X|}$ be a parameter going off to infinity (so in particular ${X}$ is identified with a subset of ${\{1,\dots,N\}}$ . We let ${{\bf n}}$ be drawn uniformly at random from ${\{1,\dots,N\}}$ , and let ${{\bf W}}$ be a uniform random subset of ${\{1,\dots,N\}}$ of density ${\lfloor \delta N\rfloor/N = \delta + o(1)}$ independent of ${{\bf n}}$ . Observe from Stirling’s formula that ${{\bf W} \cap X}$ converges in distribution to ${{\bf V}}$ . Thus it will suffice to find another uniform random variable ${{\bf n}'}$ from ${\{1,\dots,N\}}$ such that

$\displaystyle {\mathbb E} |A_{{\bf n}'} \backslash {\bf W}| \leq \frac{5}{\log(R\delta)} {\mathbb E} |A_{{\bf n}}| + o(1) \ \ \ \ \ (9)$

as ${N \rightarrow \infty}$ , since we can pass to a subsequence in which ${(A_{{\bf n}'}, {\bf W} \cap X)}$ converges in distribution to ${({\bf A}, {\bf V})}$ . From (8) we have

$\displaystyle {\mathbb E} |A_{\bf n}| \geq 1 - o(1). \ \ \ \ \ (10)$

From ${{\bf n}, {\bf W}}$ we can form the random set ${A_{\bf n} \cup {\bf W}}$ ; we then form a conditionally independent copy ${({\bf n}', {\bf W}')}$ of ${({\bf n}, {\bf W})}$ subject to the constraint

$\displaystyle A_{{\bf n}} \cup {\bf W} = A_{{\bf n}'} \cup {\bf W}'. \ \ \ \ \ (11)$

We use ${{\bf n}'}$ as the uniform variable to establish (9). The point is that the relation (11) implies that

$\displaystyle A_{{\bf n}'} \backslash {\bf W} \subset A_{{\bf n}} \cap A_{{\bf n}'}$

so it will suffice to show that

$\displaystyle {\mathbb E} |A_{{\bf n}} \cap A_{{\bf n}'}| \leq \frac{5}{\log(R\delta)} {\mathbb E} |A_{{\bf n}}| + o(1) \ \ \ \ \ (12)$

By Lemma 11(ii), (10) we have

$\displaystyle {\mathbb H}(A_{\bf n} \cup {\bf W}) \leq {\mathbb H}({\bf W}) + (2 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| + o(1),$

and hence by (7)

$\displaystyle {\mathbb H}( {\bf n}, {\bf W} | {\bf n}', {\bf W}' ) \geq {\mathbb H}({\bf n}, {\bf W}) - {\mathbb H}({\bf W}) - (2 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| - o(1)$

and hence by Lemma 8(ii) and independence of ${{\bf n}, {\bf W}}$

$\displaystyle {\mathbb H}({\bf n}) \leq {\mathbb H}( {\bf n}, {\bf W} | {\bf n}', {\bf W} ) + (2 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| + o(1).$

Now we try to relate the first term on the left-hand side with ${A_{{\bf n}} \cap A_{{\bf n}'}}$ . Note from (11) that we have the identity

$\displaystyle {\bf W} = (A_{{\bf n}'} \cup {\bf W}') \backslash (A_{\bf n} \backslash {\bf W})$

and hence by Lemma 8(i)

$\displaystyle {\mathbb H}( {\bf n}, {\bf W} | {\bf n}', {\bf W}' ) \leq {\mathbb H}( {\bf n}, A_{\bf n} \backslash {\bf W} | {\bf n}', {\bf W}' ).$

We estimate the relative entropy of ${{\bf n}, A_{\bf n} \backslash {\bf W}}$ here by selecting first ${A_{\bf n} \cap A_{{\bf n}'}}$ , then ${{\bf n}}$ , then ${A_{\bf n} \backslash {\bf W}}$ . More precisely, using the chain rule and monotonicity (Lemma 8(i), (iii)) we have

$\displaystyle {\mathbb H}( {\bf n}, A_{\bf n} \backslash {\bf W} | {\bf n}', {\bf W}' ) \leq {\mathbb H}( A_{\bf n} \cap A_{{\bf n}'} | {\bf n}', {\bf W}' )$

$\displaystyle + {\mathbb H}( {\bf n} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n}', {\bf W}' ) + {\mathbb H}( A_{\bf n} \backslash {\bf W} | {\bf n}, A_{\bf n} \cap A_{{\bf n}'}, {\bf n}', {\bf W}' ).$

From Lemma 9(i) we have

$\displaystyle {\mathbb H}( A_{\bf n} \cap A_{{\bf n}'} | {\bf n}', {\bf W} ) \leq {\mathbb H}( A_{\bf n} \cap A_{{\bf n}'} | A_{{\bf n}'} ) \leq {\mathbb E} |A_{\bf n}|$

and

$\displaystyle {\mathbb H}( A_{\bf n} \backslash {\bf W} | {\bf n}, A_{\bf n} \cap A_{{\bf n}'}, {\bf n}', {\bf W}' ) \leq {\mathbb H}( A_{\bf n} \backslash {\bf W} | A_{{\bf n}} ) \leq {\mathbb E} |A_{\bf n}|.$

Putting all this together, we conclude

$\displaystyle {\mathbb H}({\bf n}) \leq {\mathbb H}( {\bf n} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n}', {\bf W}' ) + (4 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| + o(1).$

If we apply Lemma 10(ii) right away we will get the estimate

$\displaystyle (\log R) {\bf E} |A_{\bf n} \cap A_{{\bf n}'}| \leq (4 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| + o(1)$

which is a bound resembling (12), but the dependence on the parameters ${\delta, R}$ are too weak. To do better we return to the relative product construction to decouple some of the random variables here. From the tuple ${({\bf n}, {\bf W}, {\bf n}', {\bf W}')}$ we can form the random variable ${(A_{\bf n} \cap A_{{\bf n}'}, A_{{\bf n}} \cup {\bf W})}$ , then form a conditionally independent copy ${({\bf m}, {\bf U}, {\bf m}', {\bf U}')}$ of ${({\bf n}, {\bf W}, {\bf n}', {\bf W}')}$ subject to the constraints

$\displaystyle A_{\bf n} \cap A_{{\bf n}'} = A_{\bf m} \cap A_{{\bf m}'}; \quad A_{\bf n} \cup {\bf W} = A_{\bf m} \cup {\bf U}. \ \ \ \ \ (13)$

From (11) and Lemma 8(i) we then have

$\displaystyle {\mathbb H}( {\bf n} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n}', {\bf W}' ) \leq {\mathbb H}( {\bf n} | A_{\bf n} \cap A_{{\bf n}'}, A_{\bf n} \cup {\bf W} )$

$\displaystyle = {\mathbb H}( {\bf m} | A_{\bf n} \cap A_{{\bf n}'}, A_{\bf n} \cup {\bf W} ).$

The point is that ${{\bf m}}$ is now conditionally independent of ${{\bf n}, {\bf W}}$ relative to ${A_{\bf n} \cap A_{{\bf n}'}, A_{\bf n} \cup {\bf W}}$ , so we can also rewrite the above conditional entropy as

$\displaystyle {\mathbb H}( {\bf m} | A_{\bf n} \cap A_{{\bf n}'}, A_{\bf n} \cup {\bf W}, {\bf n}, {\bf W} )$

$\displaystyle ={\mathbb H}( {\bf m} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n}, {\bf W} )$

We now use the chain rule to disentangle the role of ${{\bf W}}$ , writing the previous as

$\displaystyle {\mathbb H}( {\bf m}, {\bf W} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n} ) - {\mathbb H}( {\bf W} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n} )$

$\displaystyle = {\mathbb H}( {\bf m}, {\bf W} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n} ) - {\mathbb H}( {\bf W} | {\bf n} )$

$\displaystyle + {\mathbb H}( A_{\bf n} \cap A_{{\bf n'}} | {\bf n} ) - {\mathbb H}( A_{\bf n} \cap A_{{\bf n'}} | {\bf n}, {\bf W} ).$

From independence we have

$\displaystyle {\mathbb H}( {\bf W} | {\bf n} ) = {\mathbb H}( {\bf W} )$

and from Lemma 9(i) we have

$\displaystyle {\mathbb H}( A_{\bf n} \cap A_{{\bf n'}} | {\bf n} ) \leq {\bf E} |A_{\bf n}|.$

We discard the negative term ${{\mathbb H}( A_{\bf n} \cap A_{{\bf n'}} | {\bf n}, {\bf W} )}$ . Putting all this together, we obtain

$\displaystyle {\mathbb H}({\bf n}) \leq {\mathbb H}( {\bf m}, {\bf W} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n} ) \ \ \ \ \ (14)$

$\displaystyle - {\mathbb H}( {\bf W} ) + (5 + \log \frac{1}{\delta}) {\bf E} |A_{\bf n}| + o(1).$

Now, from the constraints (11), (13) we have

$\displaystyle A_{\bf n} \cap A_{{\bf n}'} \subset A_{{\bf m}}$

and

$\displaystyle A_{{\bf n}'} \backslash A_{\bf n} \subset {\bf W}.$

Thus by Lemma 8(i), (ii), followed by Lemma 10(ii) and Lemma 11(i), we have

$\displaystyle {\mathbb H}( {\bf m}, {\bf W} | A_{\bf n} \cap A_{{\bf n}'}, {\bf n} ) \leq {\mathbb H}( {\bf m} | A_{\bf n} \cap A_{{\bf n}'} ) + {\mathbb H}( {\bf W} | A_{{\bf n}'} \backslash A_{\bf n} )$

$\displaystyle \leq {\mathbb H}({\bf m}) - (\log R) {\mathbb E} |A_{\bf n} \cap A_{{\bf n}'}| + {\mathbb H}({\bf W}) - \log \frac{1}{\delta} {\mathbb E} |A_{{\bf n}'} \backslash A_{\bf n}| + o(1)$

which when inserted back into (14) using ${{\mathbb H}({\bf m}) = {\mathbb H}({\bf n})}$ and ${|A_{{\bf n}'} \backslash A_{\bf n}| = |A_{{\bf n}'}| - |A_{{\bf n}'} \cap A_{\bf n}|}$ simplifies to

$\displaystyle (\log (R\delta)) {\mathbb E} |A_{\bf n} \cap A_{{\bf n}'}| \leq 5 {\bf E} |A_{\bf n}| + o(1)$

and the claim follows.

31 comments

Comments feed for this article

20 July, 2020 at 10:11 pm

David Roberts

The formatting around the two uses of Erdős’ name in the first paragraph is broken.

[Corrected, thanks – T.]

21 July, 2020 at 5:04 pm

David Roberts

Unfortunately, though the diacritics are fixed, the link doesn’t stop where it’s meant to!

[Corrected, thanks – T.]

26 July, 2020 at 10:29 am

Max

There is one more spare ‘}’ too much in that link text (“… theorem}”).

21 July, 2020 at 4:40 am

Darko Veljan

Let G=(V,E) be a connected graph, n=#V. Let I be a subset of V with k el (“infected” vertices); let N(v) be the nbhd of v; suppose #N(v)<r+1 for all v.. Let J be the set of all vertices at distance at most s from infected vertices. What are bounds of probability that a random vertex will be infected
Answer (perhaps could be a better one). Between k/n and k[1+r+r(r-1)+r(r-1)^2+…+r(r-1)^(s-1)]/n. Exponential in s= distance from infected vertices.

21 July, 2020 at 5:40 am

Anonymous

Dear Pro Tao,
Has Sunflower conjecture been solved completely? If not still yet, I expect good news from you. I am really excited to read your paper like watching a football match.
Thank sir,

24 July, 2020 at 1:53 pm

Dmitriy Zakharov

Thank you for the great post! Here are some typos I spotted:

In the proof of Lemma 11 V should be instead of W and S instead of X in some binomial coefficients.

In (11) take union instead of intersection.

“and hence by Lemma 8(i): …” In the formula below this line W’ should be instead of W.

After “we can form the random variable” $A_n \cup W$ instead of $n \cup W$

Missing “=” after “so we can also rewrite the above conditional entropy as”

[Corrected, thanks – T.]

26 July, 2020 at 12:08 pm

Anonymous

A natural generalization of the concept of “Sunflower” to a family of multisets is to similarly define $Sun(k, r, m)$ under the additional constraint that the multiplicity of each element in any multiset is at most $m$ . Clearly $Sun(k, r, 1) = Sun(k, r)$ .
Is it possible that $Sun(k, r, m)$ can be expressed as a function of $Sun(k, r)$ and $m$ ?

27 July, 2020 at 4:22 am

Max

On the bottom of the Polymath wiki page, the section and table with f(k,r) appears to have k and r reversed w.r.t. here and the upper half of the page. (It says f(k,r) = k^r +… while we have here and earlier on the page Sun(k,r) ~ r^k.) I don’t know whom to contact for this, but the main author of the page is https://asone.ai/polymath/index.php?title=User:Teorth with user page signed(?) “Terence Tao”, so maybe …?

[Corrected, thanks – T.]

27 July, 2020 at 8:01 am

Max

(Note that in the paper by Kostochka et al., where the formula appears as “k^r”, r is the cardinality of the sets and k is the number of petals — the converse of what k & r stand for here and on the Polymath wiki page!)

27 July, 2020 at 3:41 pm

rafik zeraoulia

@Terry Tao, I ask if there is any relationship between Szemerédi’s theorem and Sunflower conjecture?

pleas check my question here:

https://mathoverflow.net/q/366245/51189

Thanks

1 August, 2020 at 7:18 am

Why is writing down things in information theoretic parlance better?

2 August, 2020 at 4:53 pm

Terence Tao

The bounds are a little sharper and I found the argument to be somewhat less ad hoc when presented in this language as it is largely based on piecing together various basic estimates using the Shannon entropy inequalities (although the trick of passing to a conditional copy twice to get a good bound still feels somewhat magical to me).

1 August, 2020 at 7:14 pm

Lutz Warnke

Minor typo: in the proof of Proposition 5, below Corollary 7 I believe it should read E |A’ \ V| < 1/r (to obtain the claimed probability via Markov).

[Corrected, thanks – T.]

2 August, 2020 at 5:01 am

Is there any relationship between Szemerédi's theorem and Sunflower conjecture? ~ MathOverflow ~ mathubs.com

[…] rate. In the same time it is an open problem to determine the exact rate growth or exact bounds for The sunflower (the sunflower lemma via Shannon entropy). My question here […]

3 August, 2020 at 11:28 pm

Yuval Filmus

The proof of Lemma 10(i) uses Jensen’s inequality in the wrong direction.

[Here we are using the equality case of Jensen’s inequality (crucially relying on the uniform distribution of ${\bf A}$ ) – T.]

5 August, 2020 at 11:30 am

Amazing: Ryan Alweiss, Shachar Lovett, Kewen Wu, Jiapeng Zhang made dramatic progress on the Sunflower Conjecture | Combinatorics and more

[…] Updates: (from comments below) Anup Rao presents a simplification of the original Alweiss, Lovett, Wu, and Zhang argument. https://arxiv.org/abs/1909.04774; An excellent Quanta Magazine article by Kevin Hartnett https://www.quantamagazine.org/mathematicians-begin-to-tame-wild-sunflower-problem-20191021/ . Terry Tao posted a version of Rao’s proof https://terrytao.wordpress.com/2020/07/20/the-sunflower-lemma-via-shannon-entropy/. […]

11 August, 2020 at 9:01 am

Could someone please explain what it means to say ‘ ${{\bf m}}$ is now conditionally independent of ${{\bf n}, {\bf W}}$ relative to ${A_{\bf n} \cap A_{{\bf n}'}, A_{\bf n} \cup {\bf W}}'$ ? I read this to mean that when we fix ${A_{\bf n} \cap A_{{\bf n}'}}$ and $A_{\bf n} \cup {\bf W}$ , then the random variable m conditioned on these is independent from the random variables $n$ and $W$ conditioned on these. But I don’t understand why we no longer condition on the events in (13). Is it just notation? So when you write the entropy of $m$ conditioned on ${A_{\bf n} \cap A_{{\bf n}'}}$ , then you actually mean the constraint $A_{\bf n} \cap A_{{\bf n}'} = {A_{\bf m} \cap A_{{\bf m}'}}$ ? Thanks!

14 August, 2020 at 7:35 am

Terence Tao

The constraints in (13) hold almost surely, they do not need to be conditioned on further. ( $({\bf m}, {\bf U}, {\bf m}', {\bf U}')$ is a conditionally independent copy of $({\bf n}, {\bf W}, {\bf n}', {\bf W}')$ subject to (13), which is not the same as an unconditionally independent copy that is then conditioned to (13); see the discussion at the end of Section 1.)

13 August, 2020 at 4:32 pm

The Ionescu-Wainger multiplier theorem and the adeles | What's new

[…] losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such […]

25 September, 2020 at 6:56 am

Lutz Warnke

In case someone finds it useful, let me remark that a very minor variant of the existing probabilistic arguments yields an upper bound of form Sun(k,r) \le (Cr \log k)^k for k,r \ge 2.
This was observed during a 2020 REU with undergrads Tolson Bell and Suchakree Chueluecha,
and the minor twist is to randomly partition X into 2r sets V_1,…,V_{2r} (instead of r sets).

Indeed, for R \ge C r \log k and fixed j, the proof of Proposition 5 then shows that V_j contains at least one of the A_i with probability at least 1/2 whenever the universal constant C is sufficiently large (this can also be deduced from Lemma 4 in Rao’s paper).
The expected number of sets V_j containing at least one of the A_i is thus at least 2r * 1/2 = r, ensuring that the required r disjoint sets A_{i_1}, \ldots, A_{i_r} must exist.
This establishes Proposition 3 for R \ge C r \log k, which in turn gives the claimed upper bound Sun(k,r) \le (Cr \log k)^k, see also https://arxiv.org/abs/2009.09327

28 April, 2021 at 9:23 pm

Lunjia Hu

Thank you for the great post! Here is a minor technical issue I spotted:

At the beginning of Section 2: proof of refinement inequality, conditioning on the event that $\mathbf A$ is non-empty may cause the resulting distribution to be no longer $R$ -spread. This can be fixed by setting $\mathbf A'=\emptyset$ deterministically whenever $\emptyset$ is in the support of $\mathbf A$ . Of course, this could make the distribution of $\mathbf A'$ different from that of $\mathbf A$ and thus violate a desired property of Proposition 6, but for the purpose of proving Proposition 5, we only need the weaker property that the support of $\mathbf A'$ is contained in the support of $\mathbf A$ .

[Correction implemented, thanks – T.]

19 May, 2021 at 3:04 pm

Entropy Estimation via Two Chains: Streamlining the Proof of the Sunflower Lemma – Theory Dish

[…] of this blog post is to present a streamlined proof of Theorem 1. The proof is largely based on a blog post by Terence Tao where he presented an elegant proof of Rao’s result using Shannon entropy. […]

16 June, 2021 at 12:30 am

How to write LaTeX on a static blog smoothly? ~ TeX - LaTeX ~ AsktoWorld.com

[…] look of the blogpost I’m targeting is similar to this blog or this one. But anything goes if it’s easy to setup and […]

30 July, 2021 at 6:17 am

DeepC

A minor correction : in the last line of the proof of Lemma 2, $|S_0| - |S_0 \cup S| \geq -|S|$ is in the wrong direction : that would imply $R^{|S_0| - |S_0 \cup S|} \geq R^{-|S|}$ .

Rather, one should say that if $S\cap S_0 \neq \emptyset$ , then the probability is indeed $0$ . Therefore, $|S\cup S_0|$ is in fact $|S| + |S_0|$ .

[Corrected, thanks – T.]

16 November, 2021 at 9:25 am

Is there any relationship between Szemerédi's theorem and Sunflower conjecture? ~ MathOverflow ~ InsideDarkWeb.com

[…] rate. In the same time it is an open problem to determine the exact rate growth or exact bounds for The sunflower (the sunflower lemma via Shannon entropy). My question here […]

26 April, 2022 at 10:50 am

M Stoeckl

I am confused by the part where Proposition 6 is used to prove Corollary 7. If anyone could help explain this, it would be much appreciated.

What precisely does replacing $X$ with $X \setminus \mathbf{V}$ mean? $X \setminus \mathbf{V}$ is a random variable (which may also depend on $\mathbf{A}'$ ), but the statement of Prop 6 requires that $X$ be a finite set.

1 May, 2022 at 9:55 am

Terence Tao

This is the conditioning trick. For each possible value $V$ of ${\bf V}$ , condition all the random variables to the event ${\bf V} = V$ , apply Proposition 6 to the deterministic finite set $X \backslash V$ and the conditioned random variables, and then use the law of total probability to undo the conditioning.

2 May, 2022 at 7:04 pm

M Stoeckl

Thank you, that was very helpful.

Also, you may be interested to hear that it is possible to improve the constant in Proposition 6 from $5$ to $(1 + H_b(\delta))$ , where $H_b$ is the binary entropy function. The approach I used is to modify Lunjia Hu’s proof of Proposition 6, to directly study the case where $\mathbf{W}$ is a random subset with each $x\in X$ contained in $\mathbf{W}$ with independent probability $\delta$ , instead of having $\mathbf{W}$ be a random set of size $\delta |X|$. Much of the improvement comes from having a tighter version of the absorption property.

I’ve written up this argument as part of https://mstoeckl.com/notes/research/sunflower_notes.html .

18 September, 2022 at 12:24 am

Anonymous

Sorry, I don’t follow the first inequality in the second last inequality. More specificly, I don’t see where the $A_{n’}\setminus A_{n}$ come from in $H(W|A_{n’}\setminus A_n)$. Could someone explain it to me?

19 September, 2022 at 8:26 pm

Terence Tao

Oops, this is actually a rather serious issue; I had thought that ${\bf n}$ and $A_{\bf n} \cap A_{\bf n'}$ determine $A_{\bf n'} \backslash A_{\bf n}$ , when instead they determine $A_{\bf n} \backslash A_{\bf n'}$ . It does not seem easy to implement a local fix for ths issue; however there are now better proofs of this lemma that do not proceed using this second independent copy that give better bounds, starting with this argument of Hu and this refinement of Stoeckl. I’ll leave the original post unchanged for the historical record.

19 September, 2022 at 11:23 pm

Anonymous

Thanks, I’ll take a look at their proofs.

	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on A Banach algebra proof of the…
	Anonymous on A Banach algebra proof of the…
	Aleksandar on 245C, Notes 4: Sobolev sp…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Terence Tao on 245C, Notes 4: Sobolev sp…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on What is a gauge?
	Terence Tao on Erratum for “An inverse…
	Terence Tao on 275A, Notes 3: The weak and st…

The sunflower lemma via Shannon entropy

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

31 comments

Leave a comment Cancel reply

For commenters

The sunflower lemma via Shannon entropy

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

31 comments

Leave a comment Cancel reply

For commenters