As part of the polymath1 project, I would like to set up a reading seminar on this blog for the following three papers and notes:
- H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem for k=3“, Graph Theory and Combinatorics (Cambridge, 1988). Discrete Math. 75 (1989), no. 1-3, 227–241.
- R. McCutcheon, “The conclusion of the proof of the density Hales-Jewett theorem for k=3“, unpublished.
- H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem“, J. Anal. Math. 57 (1991), 64–119.
As I understand it, paper #1 begins the proof of DHJ(3) (the k=3 version of density Hales-Jewett), but the proof is not quite complete, and the notes in #2 completes the proof using ideas from both paper #1 and paper #3. Paper #3, of course, does DHJ(k) for all k. For the purposes of the polymath1 project, though, I think it would be best if we focus exclusively on k=3.
While this seminar is of course related in content to the main discussion threads in the polymath1 project, I envision this to be a more sedate affair, in which we go slowly through various sections of various papers, asking questions of each other along the way, and presenting various bits and pieces of the proof. The papers require a certain technical background in ergodic theory in order to understand, but my hope is that if enough other people (in particular, combinatorialists) ask questions here (and “naive” or “silly” questions are strongly encouraged) then we should be able to make a fair amount of the arguments here accessible. I also hope that some ergodic theorists who have been intending to read these papers already, but didn’t get around to it, will join with reading the papers with me.
This is the first time I am trying something like this, and so we shall be using the carefully thought out protocol known as “making things up as we go along”. My initial plan is to start understanding the “big picture” (in particular, to outline the general strategy of proof), while also slowly going through the key stages of that proof in something resembling a linear order. But I imagine that the focus may change as the seminar progresses.
I’ll start the ball rolling with some initial impressions of paper #1 in the comments below. As with other threads in this project, I would like all comments to come with a number and title, starting with 600 and then incrementing (the numbers 1-599 being reserved by other threads in this project).
38 comments
Comments feed for this article
11 February, 2009 at 6:36 pm
Terence Tao
600. Initial impressions of paper #1
Paper #1 outlines a proof of DHJ(3): given any
, that any subset A of
of density at least
contains a combinatorial line if n is sufficiently large depending on
. (The paper refers to combinatorial lines as “HJ sequences”, and
as
, but I am translating to be more compatible with the notation already used in other threads.) It proceeds by first reducing the problem to one concerning a family of measurable sets on a probability space, indexed by words, and for which there is a family of (non-commuting!) measure-preserving transformations relating these sets to each other. This reduction is performed in Proposition 3.1.
A key point in this reduction is that a certain “stationarity” property is obtained on this family of sets. To obtain this stationarity, one needs a version of the Carlson-Simpson theorem, which seems to be an infinitary strengthening of the colouring Hales-Jewett theorem. (It looks vaguely reminiscent of Hindman’s theorem; presumably there is a relation.)
With the above reductions, the game is now to show that a certain triple intersection of three sets has positive measure (Theorem B). This is not actually done in paper #1, but is covered in #2 or #3. In #1, the simpler (but still non-trivial) result is shown that all pairwise intersections of the three sets can simultaneously have positive measure (Theorem 4.1). (It would be interesting to see what the combinatorial analogue of this weaker statement is. Getting just one of the pairs to have positive measure sounds similar to the “DHJ(2.5)” I formulated in my comment 130, which turns out to be equivalent to DHJ(2) and thus an easy consequence of Sperner’s theorem.)
The proof of Theorem 4.1 is established from some soft functional analysis results about IP-systems of unitary operators in Section 5.
In Section 6, there is a discussion of what is needed to establish the full strength of Theorem B, and I guess this is completed in #2.
My initial plan is to go to Section 3 and start understanding the equivalence between the combinatorial problem and the ergodic theory one. There seems to be several stages to this equivalence; a relatively easy correspondence between a combinatorial problem and an ergodic problem without stationarity, and then the reduction to the stationary case which requires the Carlson-Simpson type theorems.
11 February, 2009 at 8:01 pm
Terence Tao
601. Proposition 3.1
Proposition 3.1 of paper #1 equates the combinatorial statement DHJ(3) with various more ergodic-flavoured versions of the same statement. There are three main statements in this equivalence, (a), (b), and (c). (There are also minor variants (a*), (b*), (c*) of (a), (b), and (c), but I think we won’t need to focus on them too much.)
DHJ(3)(a) is the combinatorial version of DHJ(3), i.e. for any
and for n sufficiently large depending on
, every subset of
of density at least
contains a combinatorial line.
DHJ(3)(b) is a preliminary ergodic version of DHJ(3). Define a
-system to be a collection of subsets
in some probability space
, indexed by strings in
.
DHJ(3)(b): If
and n is sufficiently large depending on
, and
is a
-system with all sets
having measure
greater than or equal to
, then there exists a combinatorial line
such that
.)
This can be stated in a probabilistic language: if A is a random subset of
with the property that each element w of
belongs to A with probability at least
, then with positive probability, A contains a combinatorial line. (Indeed, just take A to be the set of those
for which
holds.)
To deduce DHJ(3)(b) from DHJ(3)(a), observe from linearity of expectation that the expected density of A is at least
, and so A has density at least
(say) with positive probability.
To deduce DHJ(3)(a) from DHJ(3)(b), one can use random averaging arguments (as discussed for instance in comment 460). [Paper #1 uses a somewhat different derivation of this implication which I will skip here.] To recap that argument briefly: we pick a medium integer m between 1 and n, and pick a random m-dimensional subspace inside
. Pulling back the dense subset A of
to
, we get a random subset A’ of
with each point lying in A’ with probability at least
(say), if n is large enough depending on m. The claim then follows.
DHJ(3)(c) is DHJ(3)(b) with an additional stationarity hypothesis thrown in, and will be discussed subsequently.
12 February, 2009 at 2:10 am
gowers
602.
Dear Terry,
I think it’s a very interesting idea to have an online reading seminar. Of course, it requires us to do the reading, which is quite a big investment to make, and presumably why you envisage a sedate affair. I think what would be most useful to me is basically what you encourage: rather than reading the paper very carefully and then explaining bits I understand here, I would prefer to try to use this post and its associated comments to read the paper sloppily and present my very partial understanding, or even complete misunderstanding, in the hope that others will be able to say, “Yes, you’re getting there — what you need to grasp next is this,” or “No, that’s not quite it — if that were the case then so would this be but it clearly isn’t,” and things like that. So when I get a moment, I’ll try to skim-read Randall’s paper and give some kind of initial reaction to it. I haven’t yet looked at it at all, but my eventual aim would be to obtain some kind of non-ergodic understanding of the proof. (Here I mean non-ergodic in the shallow sense of trying to explain things without getting completely into the ergodic language, rather than in the deep sense of genuinely doing without ergodic theory.)
12 February, 2009 at 4:32 am
zugzwang trebuchet
603.
Hi gowers,
Why not work out the ergodic details here?
12 February, 2009 at 8:19 am
David Speyer
604.
I’m going to try to restate proposition 3.1 to see if I understand it. If I get it right, this will probably be redundant with commment 601 above, but if I get it wrong, it will be one of those naive quesitons Terry asked for.
Prop. 3.1 says that six things are equivalent, called (a), (a*), (b), (b*), (c) and (c*). In each case, the unstarred version is a uniform variant on the starred version. So the starred version will say “If we have a sequence
where lim sup (some measure of the size of
)
then…” while the unstarred version will say “For every
, there is an integer
such that, if
and (some measure of the size of
)
then…”.
We’re going to concentrate on the unstarred versions.
(a) is what we’re trying to prove. For every
, there is an
such that any subset of
of cardinatliy more that
contains a combinatorial line.
(b) says that we can’t break (a) by a random construction. Suppose we had some random process, like drawing ping pong balls out of a bag. The set of balls is called
, and the probaility that an individual ball
is drawn will be
. For each ball
, our process produces a particular subset of
, which I will denote
. Confusingly (to me), H and K use the opposite indexing; for each
, they write
for the set of all
such that
, but they don’t have a notation for
.
For fixed
, if
is large enough, the following holds: is long as the probability of any particular element
of
being in
is at least
, then there will be some particular combiantorial line which occurs in
with nonzero probablity.
Some comments:
Notice that (b)’s framework is general enough to destroy any probabilistic construction I might attempt. In particular, there is no hypothesis that the events “
is in
” and “
is in
” are independent in any way.
The only kind of probabilistic construction I could imagine which would violate the hypotheses of (a) is a process which, although they produce sets of density
, have much lower probability of including certain particular elements of
. However, the problem is so symmetric that I see no reason that this would help. Presumably, the proof of (b)–>(a) is to make this idea rigorous.
Finally, I described things above with a finite collection of ping pong balls, each with a positive probability. H and K phrase things in terms of a general measure space. I assume this generality will be helpful, although it seems pointless so far.
Whew, that’s long! I’ll save condition (c) for later, asI need to get going now.
12 February, 2009 at 11:51 am
Terence Tao
605.
Dear Tim: Well, I don’t really have an organised plan as to where this will go, but I would like to get as many contributions (from as many backgrounds) as possible, to take full advantage of Metcalfe’s law. So if some people read slowly and carefully, others comment loosely and freely, and yet others translate from ergodic theory to combinatorics or back, I think that will be an instructive mix. (One advantage of an internet-based seminar, as opposed to a physical one, is that one can have multiple concurrent discussions on different aspects of a paper without too much confusion.)
Incidentally, the McCutcheon notes rely heavily on paper #1, so I think it is best to start with that paper first, though it may still be valuable to skim through #2 at this preliminary stage nevertheless.
Dear David: Welcome! I’m glad you took the initiative to jump in (there is no roster or anything for this seminar :-). Please don’t worry about redundancy – in fact, I think the more perspectives we have on an individual component of the argument, the better.
Regarding the implication of (b) from (a), I can give you a simpler version of it, in which we are not looking for combinatorial lines in
, but rather algebraic lines
in
; the analogue of DHJ(3) here is called “Roth’s theorem in
“. Given a (deterministic) set A in
of density at least
, we can define a random set A(x) in
by setting
where x is a random element of
. Note that each element w of
will lie in A(x) with probability at least
, and if A(x) contains an algebraic line with positive probability, then A certainly contains an algebraic line also. So in this case the deduction of (a) from (b) is pretty simple.
This doesn’t work in the DHJ setting because combinatorial lines are not translation invariant, but one can do a somewhat different trick, embedding
randomly in
in a combinatorial line-preserving manner, rather than by translating
randomly back to
.
12 February, 2009 at 1:13 pm
Terence Tao
606. Stationarity
In Proposition 3.1(b), the
-system
(which is called
in the notation of #1, and is equivalent to the notion of a random subset A(x) of
as discussed in Speyer.604), has no structure other than having every set
be large. But the deepest part of Proposition 3.1 is a reduction to a special type of
-system, namely a stationary
-system. This is a system where the sets
take the form
where
Clearly, if (1) holds, and
has measure at least
, then all the
have measure at least
. So DHJ(3)(b) certainly implies
DHJ(3)(c): If
and n is sufficiently large depending on
, and
is a stationary
-system with
having measure at least
, then there exists a combinatorial line
such that
has positive measure.
The non-trivial fact is that DHJ(3)(c) also implies DHJ(3)(b). I guess I’ll try to discuss that in future comments.
It is tempting to try to describe stationarity in terms of a measure-preserving action on X by the free group
on three generators
, but this isn’t quite right, because the
are not constant in i. But if one adds a discrete time variable i, thus replacing X with
, then one can recover a measure-preserving action
of
, with each generator
,
of
mapping
to
for
and
(here we have to define
for non-positive i in some arbitrary fashion, e.g. setting these transformations to be the identity). With this action, and embedding
inside
by identifying a string
with the group element
, one can rewrite (1), (2) as
and so the task is to show that there exists a combinatorial line
such that
At this stage I do not know whether this group action perspective is useful, although it does make the problem look a little bit like simpler ergodic recurrence theorems. For instance, Roth’s theorem is equivalent to the assertion that if T acts in a measure-preserving way on a probability space X, then for every set E of positive measure, there exists an arithmetic progression
of integers such that
12 February, 2009 at 1:40 pm
Kristal Cantwell
I would be interested in this seminar. I have downloaded the three papers and will be looking at them in the next few days along with the comments here.
12 February, 2009 at 1:56 pm
Terence Tao
607. Stationarity cont.
Welcome, Kristal! Feel free to jump in and contribute with any questions, comments, or suggestions, even if (or especially if) they overlap with those already in this post.
One small comment, continuing 606: it appears that the stationarity gives quite strong constraints on the
. Most obviously, they are all forced to have the same measure, but there are other constraints too. For instance, if n=2, then
has to have the same measure as
, since the latter set can be obtained from the former set by applying the measure-preserving
. To put it another way, the random set
is just as likely to contain the pair (0,0), (1,0) as it is to contain the pair (0,2), (1,2).
More generally, stationarity seems to be saying that if one looks at a slice
of
formed by freezing the last m positions of A(x), that the distribution of that slice (or more precisely, the identification of that slice with a subset of
) is in fact independent of the choice of the frozen coordinates
. (This would explain the terminology “stationary”.) I do not know yet whether this property already captures the full strength of stationarity, or whether stationarity in fact says something more.
12 February, 2009 at 2:06 pm
Terence Tao
608. Symmetry breaking
I wanted to repeat a remark of Austin.54 here: the original formulation DHJ(3)(a) of the problem, together with its probabilistic variant DHJ(3)(b), are invariant under permutation of the indices in [n], but the notion of stationarity is not invariant (being dependent on the ordering of the indices), and so DHJ(3)(c) has “broken” the permutation symmetry. There is nothing particularly harmful about this – the standard proofs of colouring Hales-Jewett also break symmetry, for instance – but it seems like a noteworthy point nonetheless.
12 February, 2009 at 6:43 pm
David Speyer
609: My first dumb question: In (b) and (c), would it add any generality to allow the measure space
to depend on
? When I think of sampling methods I might use, it seems natural to me to use different sample spaces
for different
. But I suspect that I could use
as my sample space, and thus reduce to H and K’s set up.
12 February, 2009 at 8:46 pm
Randall
610: Fixed some typos I found on an airplane today and put a better version up. The most potentially confusing of these were in Lemma 1 (continued), the first display on page 4 and in the definition of \delta on page 6. There may be others so please speak up if you see anything that looks incongruous.
The notion of stationarity is more familiar for (linear) processes. If (X_i) are random variables taking values say in {a,b,…,z}, then we call (X_i) a process and we say that the process is stationary if the probability that X_1X_2X_3=”cat” is the same as the probability that X_17X_18X_19=”cat” (for all cats and all shifts).
In the current setup, you can view the sets B_w as random variables X_w taking values in {0,1}. Stationarity says: take any finite set of words. Call that set I. The probability of seeing a certain pattern of 0s and 1s in the random variables X_w, w in I, is exactly the same as the probability of seeing the same pattern of 0s and 1s in the random variables X_wv, w in I, where v is any word. Here “v” plays the role 16 did in the cat example, the pattern of 0s and 1s plays the role of “cat”, and I plays the role of {1,2,3}.
I can certainly sympathize with the sentiment that it seems quite unclear what 3.1 can possibly do to help. It gives you some measure preserving transformations, but nothing resembling honest ergodic theory, really, because although there are measure preserving transformations around, they don’t even constitute an action (as Terry pointed out). I guess the thing to do at this early stage is just have faith….
12 February, 2009 at 8:51 pm
Terence Tao
611: the measure space X
David, I think you’re right, the actual choice of space X is more or less irrelevant. In a lot of these problems one can take the Cantor space
(or the unit interval) as a universal space (note that any countably generated probability space can be lifted up to either of these two spaces).
One of the philosophies of probability theory, by the way, is to try to hide the underlying space X as much as possible; thus one talks about events E and their probabilities, random variables and their expectations and correlations, etc. without explicit mention of the probability space and probability measure, etc. It’s like studying a manifold or variety through its ring of functions rather than by looking at its constituent points. It has the advantage that one can concatenate any number of random processes together (e.g. flip infinitely many coins, and then pull balls out of urns infinitely often, etc.) without having to think about what happens to the underlying probability space.
In ergodic theory, though, one often takes almost the opposite view: one starts with an abstract measure space X (with some groups acting on it) and then creates some factors Y of that space which often have a very explicit algebraic structure (e.g. they are the homogeneous space for some Lie group). One then studies the geometry and dynamics of that space Y from a very concrete point of view, thus moving from the category of abstract measure spaces and measure preserving transformations to something much more algebraic.
13 February, 2009 at 1:53 am
DHJ — possible proof strategies « Gowers’s Weblog
[…] is his comment 460, which again is responded to by other comments. Terry has also begun an online reading seminar on the Furstenberg-Katznelson proof, using not just the original paper of Furstenberg and […]
13 February, 2009 at 11:56 am
Kristal Cantwell
612. Hindman’s theorem is a special case of one of their versions of the Carlson-Simpson theorem according to the paper.
13 February, 2009 at 8:56 pm
Terence Tao
613. Stationarity II
DHJ(3)(c) finds combinatorial lines in stationary
-systems
. It turns out to be convenient to eliminate the role of n, and instead work with stationary
-systems
, where
is the space of all words of finite length with alphabet {0,1,2}. (Paper #1 uses the notation
for
.) It is clear that DHJ(3)(c) implies
DHJ(3)(c*): If
is a stationary
-system, with E having positive measure, then there is a combinatorial line
such that
.
In a similar spirit, DHJ(3)(b) is easily seen to be equivalent to
DHJ(3)(b*): If
is a
-system, with the
having measure uniformly bounded away from zero, then there is a combinatorial line
such that
.
To finish the proof of Proposition 3.1, we need to deduce DHJ(3)(b*) from DHJ(3)(c*). Broadly speaking, the idea is to start with a
-system and repeatedly use Carlson-Simpson-type Ramsey theorems to pass to subsystems in which become increasingly stationary.
A
-system can be viewed as a collection of random sets
, one for each n. As noted already in 610, stationarity implies that
has the same distribution as
for all n and all words
, where of course we identify
with
in the obvious manner.
13 February, 2009 at 10:30 pm
Terence Tao
614. A possible simplification
It appears to me that one can in fact get stationarity for free (and avoid invoking the Carlson-Simpson theorem here). The basic observation is that when one deduces DHJ(3)(b) from DHJ(3)(a) by using a random m-dimensional subspace inside
, the random set obtained in this fashion already is essentially stationary by construction. I’ll write up the details on the wiki and report back here when I’m done.
(This does not completely eliminate the need for Carlson-Simpson, though, as such results are also used later in the paper.)
14 February, 2009 at 9:12 am
Ryan O'Donnell
615. Let me try to recap Terry’s points, passing from DHJ(3)(a) to DHJ(3)(c), in my own language.
Let
be a given set of density
.
Let
be some moderate integer; perhaps
when
is a “constant”.
Let
be a random subset of cardinality
. Imagine choosing a “random restriction”
to the coordinates
. I.e.,
is chosen uniformly at random.
Then
induces a probability distribution on subsets of
. Call the induced subset
.
Let
denote a string in
. I will use the notation “
” for the string in
gotten by appropriately piecing
and
together.
We claim that for *each* string
, the probability (over choice of
) that
is at least
. On one hand, note that
is *not* a uniformly distributed string in
. For example, if
is the all-0’s string, then the distribution on
is slightly biased towards those strings with more 0’s.
Still if
is “small enough” then the distribution will be very close to uniform (except around the “extreme slices” of
, but
has almost all its density outside these slices anyway). And this should prove the claim with a minimum of calculations.
At this point we’re at DHJ(3)(b); on to DHJ(3)(c). “Stationarity” here would seem to be that for any further restriction
to coordinates
, the resulting distribution on the set
(a subset of
) is the same.
Again, this is not true, but it should be “almost” true. And it seems like we’re only going for “almost stationarity”, not “stationarity” anyway.
In fact, it seems to me that whereas “(almost) stationarity” importantly involves an *ordering* on the coordinates
, it seems this argument doesn’t really. I.e.,
need not be a “suffix” of
or anything.
14 February, 2009 at 10:10 am
Terence Tao
616.
Thanks Ryan, this fleshes out what I was trying to say.
I’ve started a running “combinatorial translation” of paper #1 on the wiki at
http://michaelnielsen.org/polymath1/index.php?title=Furstenberg-Katznelson_argument
It’s likely to evolve in unstable ways over time (in particular, I am likely to overhaul the notation as I understand more of the big picture) but hopefully will eventually settle down.
As you note, the random sampling construction gives additional stationarity properties beyond those used in Paper #1. In particular, the random sets
are also stationary wrt permutations of the index set [m]. It’s not clear whether this additional symmetry could be useful though (it might even be harmful, because later parts of the proof may need to break this symmetry and so trying to hold on to it may be counterproductive).
14 February, 2009 at 10:35 am
Terence Tao
617. Recap
I’m now going to try to recapitulate the previous discussion in light of the new simplification, thus moving away from the presentation in Paper #1. (I guess I am changing the rules a little bit here from a “reading seminar” to a “revisionist seminar”, but I said that the rules would be made up as we went along. :-) ) I will try to keep the wiki page above current with the latest revision of thinking.
Let
be a dense line-free subset of
, where n is very large. For every small integer m, we can then form a random line-free subset
by randomly embedding
inside
(by picking m random positions for the wildcards, and then picking the frozen positions randomly), and then pulling
back by this map. One can think of
as a random variable taking values in the power set
, so its probability distribution is a probability measure
on
.
By construction,
is stationary with respect to the permutation group
of [m], i.e.
is invariant wrt the action of
. Also, for any word
, the slicing map
defined by
is almost stationary in the sense that for fixed m, r, and w, the random variables
and
have asymptotically the same distribution in the limit
.
Taking a subsequence limit as
(and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets
(or equivalently, a probability measure
on
) which is stationary with respect to the permutation groups
acting on
, and also with respect to the slicing maps
. Also, if the
all have density at least
, then so do the
: in particular,
will be non-empty with probability at least
, which implies that for any
, that
with probability at least
also.
This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of paper #1: the slicing maps
are not invertible. We would like to upgrade them to be invertible (thus upgrading the semigroupoid of transformations
to a groupoid), thus giving us DHJ(3)(c*) in its full strength. This can be done, but requires extending the measure spaces
a bit (or, in probabilistic language, adding a whole bunch of “dice” or “random number generators”). I plan to talk about this in a later comment.
14 February, 2009 at 11:56 am
Terence Tao
618. Inversion of maps
There is a simple lemma that says that any finite probability-preserving surjective map can be inverted if one throws in a random number generator that draws uniformly from [0,1]. More precisely:
Lemma Let
be a surjective map from one finite probability space to another which preserves the probability measure (in the sense that U pushes the probability measure on X forward to the probability measure on Y). Then one can lift this map to an invertible measure-preserving map
, with product measure.
I prove this lemma in the wiki notes, but let’s just illustrate this with the simplest example: the non-injective map from the two-element space {a,b} (with probability 1/2 of each) to the one-element space {c}. One invertible lift of this map is the map
which maps (a,t) to (c,t/2) and (b,t) to (c,(t+1)/2). Many other choices are possible; this lift is not unique.
Applying this lift to all of the generators
, we can now find probability spaces
and invertible measure-preserving maps
that commute with the slicing maps
. Concatenating these maps then give us more invertible measure-preserving maps
for all words w of length r. We are now basically in the situation of DHJ(3)(c*), which the rest of Paper #1 aims to prove. (It’s slightly different because we’ve “graded” the underlying probability space X, replacing it with a sequence
of probability spaces, but I don’t think this will make much of a difference.)
[Remark: the lemma here is my interpretation of Lemma 3.2 in Paper #1; the statement there is false as stated, but can be repaired by throwing in the [0,1] as is done above.]
14 February, 2009 at 12:53 pm
Terence Tao
619.
A frivolous comment, for the combinatorially minded: the Lemma in 618 is in some sense a continuous version of (a special case of) Hall’s marriage theorem, and serves much the same purpose (viz. conjuring up a bunch of artificial but useful perfect matchings out of thin air).
14 February, 2009 at 2:01 pm
Ryan O'Donnell
620. Inversion of maps.
I’m getting slightly lost in notation and also slightly lost because I don’t quite know where this is going. However it seems to me that we might not eventually need a completely general Lemma like the one in Terry.618. Some natural randomized-inverse map might work in our context; e.g., tack on a random substring from
.
14 February, 2009 at 2:39 pm
Terence Tao
621. Inversion of maps
Sorry about the notation shifting – I also don’t fully know where we are going either, and this I think is reflected in the notation…
We’re not inverting a map from
to
, but rather a map from
to
(which, I guess, is induced by an embedding of
in
). Given a word w of length r, and a set
, we have the slice
, defined as
Stationarity tells us that if we take the random set
and look at a slice
of it (think of w as fixed), one gets back something with the same distribution as the random set
. But one cannot initially invert this procedure: starting with the random set
, one cannot deterministically lift it back to a set in
that has the same distribution as
(and which agrees with to
on the slice) – there is not enough entropy in the domain and too much in the range. But one can do this once one has a random number generator, which I am modeling by a uniform distribution on [0,1] – this gives us an infinite amount of entropy on both sides and removes the obstruction to invertibility, as per “Hilbert’s hotel”. (One has to take care that one only performs “reversible computations” in order to maintain invertibility, but this is a minor technical detail.)
14 February, 2009 at 8:05 pm
Ryan O'Donnell
622. Understood, I’m just thinking that since we got to
in the first place by making a large random restriction, perhaps we can “remember” this and “unrestrict”
coordinates later. Just guessing… I’m probably being too speculative right now. Best to see where the next parts of the arguments go.
14 February, 2009 at 11:02 pm
Terence Tao
623. Oh, that’s an interesting idea. The artificial nature of the transformations used in the Furstenberg-Katznelson argument bothered me a bit, and this would be a much more natural way to do it. I don’t quite see how to set it up so that everything is completely reversible (i.e. the relevant maps
are invertible), so that entropy is neither created nor destroyed, but I can imagine that it is possible.
I’ve read a little further into the paper (an interesting experiment, reading a paper collaboratively a small fragment at a time). With the above revisionist notation, we now have a sequence
of probability spaces and invertible measure-preserving maps
for all words w, obeying a semigroup law. The claim is now that for every set
of positive measure, that there exists a combinatorial line
in some cube
for some n such that the triple intersection of
,
, and
has positive measure (or equivalently, that the random set
contains the line
with positive probability).
In Paper #1, a somewhat weaker result is proven, namely there exists a line
, such that the three pairwise intersections of
,
, and
have positive measure (thus the random set
will contain any two of the points on this line with positive probability). The notes in #2 finish the job to get all three points on the line. This weaker statement is already not obvious to me combinatorially – it’s some sort of triple version of the DHJ(2.5) statement we discussed back around 130 or so. (I think it is asserting the existence of a positive r for any dense
with r wildcards such that for each ij=01,12,20, there exists a combinatorial line which intersects A in the i and j position.) I think I’ll ask it at the 500 thread to see if anyone can come up with a combinatorial proof.
15 February, 2009 at 4:19 pm
Terence Tao
624. Compactification of the string space
I’ve scanned through the rest of the paper, and it seems like one has no choice but to start understanding the statement of the Carlson-Simpson theorem (and its variants) before progressing much further, although one can at least take these statements as “black boxes”. This theorem is an infinite-dimensional version of the colouring Hales-Jewett theorem, but to state it properly, we need some notation.
I’m using
for the strings of length n, and
for the finite length strings.
We can give
a “3-adic” metric by defining the distance between two distinct strings x, y (possibly of different length) to be
, where l is the first digit at which x and y differ (or terminate).
The metric space
is totally bounded but incomplete. One can complete it by attaching the space
of infinite strings. The space
is totally bounded and complete, with
as a dense subspace, thus
is a compactification of
. (Roughly, one can think of
as being like the terminating decimals base 3 in
, and
as being like the entire interval
.)
[Paper #1 uses
in place of
.]
Let X be a compact metric space. Observe that a map
is uniformly continuous if and only if it extends to a continuous map
. One can think of a uniformly continuous F as an operation that takes a string as input and spits out a point in X as output, but is increasingly insensitive to fluctuations in the
and higher digits as
.
Of course, not every map on
is continuous. (A good example of a discontinuous map: the map
that counts the parity of the number of 1s in the string.) But the Carlson-Simpson theorem is an assertion that every map F from
to a compact set (or a finite set) becomes uniformly continuous when restricted to an “infinite-dimensional combinatorial subspace”. This allows one to extend F at some key points
of
. The proof is going to proceed by taking the shift maps
(viewed as unitary operators) and taking a suitable limit (in the weak operator topology) as
and working with the limiting operators (which turn out to be orthogonal projections). I don’t know yet why this is going to be a useful thing to do.
In a subsequent comment I will state the Carlson-Simpson theorem properly, both in the finite colouring version (which is the standard formulation) and in the compact version (which is the version preferred by paper #1).
16 February, 2009 at 1:00 pm
Ryan O'Donnell
625. In the analysis of boolean functions there is general intuition of continuous/measurable functions vs. discontinuous/nonmeasurable functions. Roughly, the former are those functions which are “asymptotically noise stable” (in the sense of Benjamini-Kalai-Schramm: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.2927 ) or have “analogue” functions on Gaussian space or the sphere. I wonder if this viewpoint will be helpful.
16 February, 2009 at 2:59 pm
Terence Tao
626. Colouring theorems I: finitary theorems
In this post and the next I want to describe a square of four Ramsey theorems: two finitary theorems, and two infinitary theorems. It is the latter which is used in Paper #1.
The first finitary theorem is the colouring Hales-Jewett theorem. The k=3 version of this theorem asserts that if [3]^n is partitioned into c colour classes, and n is sufficiently large depending on c, then one of the colour classes contains a combinatorial line.
Iterating this theorem, we conclude that [3]^n is partitioned into c colour classes, and n is sufficiently large depending on c, m, then one of the colour classes contains an m-dimensional combinatorial subspace.
The second finitary theorem is the Graham-Rothschild theorem. The k=3 version of this theorem asserts that if, instead of colouring the vertices of [3]^n as in Hales-Jewett, one colours the combinatorial lines in [3]^n, partitioning the space of all such lines into c colours, then (if n is sufficiently large depending on c, m) one can find an m-dimensional combinatorial subspace in which all lines in that space have the same colour.
Recall that a combinatorial line can be viewed as a string of 0s, 1s, 2s, and a wildcard *, with the wildcard appearing at least once (e.g. 01*2* is a combinatorial line). Identifying the wildcard with the symbol 3, we can thus view a combinatorial line as a point in
containing at least one 4. Thus one can rephrase the k=3 Graham-Rothschild theorem in another way:
Graham-Rothschild theorem. If the vertices of [4]^n are partitioned into c colour classes, then (if n is sufficiently large depending on m, c) there is an m-dimensional subspace (with none of the fixed coordinates equal to 3) such that all points in this subspace (with at least one coordinate equal to 3) have the same colour.
Written this way, Graham-Rothschild looks a lot like colouring Hales-Jewett, and it is in fact possible to deduce the latter from the former (the precise deduction escapes me right now, though).
The k=1 version of the Graham-Rothschild theorem implies Folkman’s theorem: if the positive integers are finitely coloured, then for any m, one can find m positive integers such that all finite sums of those integers (other than the trivial sum 0) have the same colour. This can be shown by considering the colouring on
that colours a vertex by the colour given to the number of 1s on that vertex.
In the next comment I describe the infinitary strengthenings of the colouring Hales-Jewett and the Graham-Rothschild theorems, which are the Carlson-Simpson theorem and Carlson’s theorem respectively.
16 February, 2009 at 3:21 pm
Terence Tao
627. Colouring theorems II. Infinitary theorems
The infinitary strengthening of the Hales-Jewett theorem is the Carlson-Simpson theorem. To define it, I need some notation. Define an infinite-dimensional combinatorial subspace of
to be an object indexed by an infinite string of 0s, 1s, 2s, and an infinite number of wildcards
, where each wildcard
appears in a non-empty consecutive block
, with each
appearing to the left of
. For instance, the following string describes an infinite-dimensional combinatorial subspace:
This gives an embedding of
into
by mapping any n-digit string into the string formed from the above infinite string by substituting these digits into the first n wildcards
, and then truncating everything after
. For instance, in the above example, 20 maps to
Carlson-Simpson theorem If
is finitely coloured, then one of the colour classes contains an infinite-dimensional combinatorial subspace.
This theorem implies the colouring Hales-Jewett theorem but is significantly stronger, in particular it does not seem to be deducible solely from the colouring Hales-Jewett theorem (much as the infinite pigeonhole principle cannot be deduced from its finite counterpart).
It implies a compact version:
Carlson-Simpson theorem, compact version If
is a map from
into a compact metric space X, then there exists a infinite-dimensional combinatorial subspace
such that
is uniformly continuous (i.e. it extends continuously to the compactification
).
It is not hard to see that the two versions are equivalent (one has to cover the compact metric space by an increasingly fine mesh of small balls, which can then be used to colour
). In paper #1 it is remarked that the compact case is in fact slightly easier to prove than the finite colouring case (I suppose that the induction is somehow cleaner.)
In a similar spirit, the Graham-Rothschild theorem has the following infinitary generalisation:
Carlson’s theorem Suppose that the combinatorial lines in
are finitely coloured. Then there is an infinite-dimensional combinatorial subspace on which all lines have the same colour.
Or equivalently:
Carlson’s theorem Suppose that
is finitely coloured. Then there is an infinite-dimensional combinatorial subspace with no fixed coordinate equal to 3, such that any element of this space with at least one 3 has the same colour.
Again, there is a compact version:
Carlson’s theorem Suppose that
is a map into a compact metric space X. Then there is an infinite-dimensional combinatorial subspace with no fixed coordinate equal to 3, such that the restriction of F to this space is uniformly continuous on those points with at least one 3.
As I understand it, Carlson’s theorem was discovered independently by Carlson and by Furstenberg-Katznelson.
The k=1 case of Carlson’s theorem is essentially Hindman’s theorem (the infinitary generalisation of Folkman’s theorem). Given that the cleanest proof of Hindman’s theorem involves idempotent ultrafilters, I would imagine that this is how Carlson’s theorem is also proved. Needless to say, such theorems are not easily finitisable.
17 February, 2009 at 9:59 pm
Terence Tao
628. Convergence to an orthogonal projection
Recall that our objective is to show that, given any system of measure-preserving transformations
obeying a groupoid law, and any set
of positive measure, that there exists a combinatorial line
in some
such that
has positive measure for ij=01,12,20.
The measure of
can also be written as
where we let measure-preserving transformations
act on functions
in the usual manner,
.
The key claim is
Claim There exists a sequence of combinatorial lines
such that for each ij=01,12,20, the unitary operators
converge in the weak operator topology to orthogonal projections
.
Indeed, if this claim held, then (1) for the word
would converge as
to
On the other hand, since
we must have
and hence
, and so (1) is positive for sufficiently large m.
The main tool in proving the claim is the Carlson-Simpson theorem.
This is a somewhat infinitary argument; at present I do not see what the combinatorial analogue of it will be.
20 February, 2009 at 11:30 am
Terence Tao
629. The invertible maps
and group structure
I am beginning to understand a bit better the strange collection of invertible measure-preserving transformations
that appear in the papers 1, 2, 3, which in turn generate the compound maps
.
Firstly, as Ryan proposed in 622, there is a nice probabilistic interpretation of these transformations that does not require the artificial device of introducing a new source of randomness, such as the uniform probability measure on [0,1]. Roughly speaking, one can view each
as the space of random m-dimensional combinatorial subspaces in
, where each of the m wildcards
appear exactly once, plus some additional data which is a bit tricky to describe cleanly (details are on the wiki).
The transformation
for j=1,2,3 is the map that takes the random m+1-dimensional subspace to an m-dimensional slice, formed by replacing the final wildcard
with j.
To invert this transformation (i.e. extending an m-dimensional subspace with an m+1-dimensional one), what one would like to do is start with the random m-dimensional subspace, search the fixed coordinates randomly for an instance of j, and replace this j with a wildcard
. Of course, this is a random map rather than a deterministic one; so what one has to do is to “roll the dice in advance” and store the choices of fixed coordinates one would use to extend the transformation as part of the data used to define an element of
. In fact one needs to roll the dice once for every element of the free group
on three generators, representing all the future possibilities of extending and contracting the space.
The construction actually gives more than the invertible maps
; it also gives a permutation action on each
, and one can multiply all the
to get an honest-to-goodness measure-preserving group action
of the free group
, rather than this funny groupoid action that Furstenberg and Katznelson use. However (as was pointed out all the way back in Austin.6), all this additional structure is actually a hindrance to the proof, because one has to let go of it in order to take advantage of the powerful Ramsey theorems available such as Carlson’s theorem.
I still feel that there should be some clean algebraic substitute for the concept of a group action that can clarify conceptually the dynamics of this system, but I do not yet know what it should be (something like a semi-groupoid action). The key thing seems to be that the structure one has generates lots of “IP-systems”, which I will discuss next.
20 February, 2009 at 11:48 am
Terence Tao
630. An IP convergence lemma
Let
be a sequence of group elements, then we can define
for any finite set
of natural numbers by the formula
when
. The
form an IP-system, which means that they have the restricted multiplicativity property
whenever
is to the left of
. (This is an example of the “semi-groupoid action” concept I am struggling to formalise in a clean algebraic fashion.)
If G has a topology on it, we say that the IP-system
converges to a limit
if every neighbourhood of U contains the
“for sufficiently large
“, which means that the least element
of
is sufficiently large.
Hindman’s theorem is equivalent to the claim that any IP-system in a compact metrisable space has a convergent sub-IP-system (I will not define what a sub-IP-system is here; again, I do not feel that I have that the correct algebraic formalism for all this yet.).
The following lemma is the key in Paper #1 (together with Carlson’s theorem, which has a similar flavour to Hindman’s theorem) to proving the claim in 628:
Lemma. Let
be an IP system of unitary operators on a separable Hilbert space H that converges (in the weak operator topology) to a limit P. Then P is an orthogonal projection.
[Note that the weak operator topology on the operators with norm at most 1 on H is compact metrisable.]
Proof. Since the
have operator norm at most 1, P has norm at most 1 as well. Since the only idempotents of norm at most 1 are the orthogonal projections, it suffices to show that
.
Let
. Then
converges weakly to
as
. But for fixed
, $U^*_\beta U_\alpha^* v$ converges weakly to
as
; thus
where the limits are in the weak sense. On the other hand, using the IP property, the limit on the left is
. Thus
and so P is idempotent.
21 February, 2009 at 8:25 pm
Terence Tao
631. An informal combinatorial translation of the ergodic proof of DHJ(2)
I’m beginning to understand (though still in a somewhat fuzzy, informal way) what the ergodic argument in Paper #1 would translate to in purely combinatorial terms. For simplicity I’ll start with DHJ(2) – the claim that any dense subset
of
contains a combinatorial line.
Let m be an integer that grows slowly with n (think of m as being like
, for instance), and let V be a random m-dimensional combinatorial subspace of
, where I will be deliberately vague as to what distribution V is drawn from here. One can use this random subspace V to create various real-valued random variables f(V) which depend only on what A is doing on V. For instance, one could consider the indicator random variable
, which equals 1 when the bottom corner of V (formed by sending all m wildcards to 0) lies in A; one could consider the random variable
that measures the density of A in V; and so forth.
Now, the random subspace V has a lot of fixed coordinates, and so with very high probability it’s going to have a lot of fixed 0s and fixed 1s (about n/2 of each). Let V’ be the random subspace parallel to V formed by taking one of the fixed 0s at random and flipping it to a 1. Then every random variable f(V) that is associated to V has a “shifted” version f(V’). Note that we expect V’ to have almost the same distribution as V, and so f(V) and f(V’) should also have about the same distribution; this is a stationarity property. However, this does not mean that f(V) and f(V’) are necessarily close to each other.
There are two fundamental types of random variable:
* A random variable f(V) is periodic if f(V’) and f(V) are close, thus
is small.
* A random variable f(V) is mixing if f(V’) and f(V) are uncorrelated, thus
is small.
Examples: any constant random variable is periodic. If the set A has “low influence” (which means that flipping a bit from 0 to 1 is unlikely to affect membership in A) then every random variable f(V) is in fact periodic. At the other extreme, if A is a random set of a fixed density
, then any random variable f(V) of mean zero will be mixing, and only the constants will be periodic.
Now, here is the basic fact (modulo some lies):
Fact. Every random variable f(V) can be decomposed (uniquely) into a periodic random variable Pf(V) and a mixing random variable (1-P)f(V). Furthermore, the periodic random variables and mixing random variables are always uncorrelated (i.e. orthogonal to each other).
This fact is a heavily disguised version of the lemma in 630, and can also be viewed as a sort of “IP mean ergodic theorem”. It is not quite true as stated; to be true, one has to redefine V’ to not be formed by flipping just one fixed coordinate of V, but by flipping an unspecified but bounded number of fixed coordinates. (The number of digits to flip will be determined by a Ramsey-type theorem, such as Folkman’s theorem, Graham-Rothschild, etc. The Ramsey theorem will also tell us what distribution V is to be drawn from, and in particular how many times each wildcard will appear.).
Assume this fact. Now we can prove DHJ(2). If we let
be the event that the bottom left corner of V lies in A, it suffices to show that
is large.
Observe that the claim would be easy if
was periodic, since then
. The claim would also be easy if
was mixing, since then the two random variables
would be roughly independent. In general, the Fact tells us that we are in a combination of the two situations. Splitting both
and
into periodic and mixing components, we get four terms, three of which are small, leaving us with
But observe that the mixing function
is orthogonal to the periodic function 1, and thus has zero expectation. Since
has expectation
, we conclude that
has expectation
, and so (by Cauchy Schwarz) the expression (1) is large, and we are done.
In my next comment I will try to indicate how DHJ(3) works in some special cases. Here there is not just one projection P, but now three separate projections P, reflecting interchange between 0 and 1, 1 and 2, or 2 and 0. Things are relatively easy when one of the projections is extreme (either everything is periodic, or only the constants are periodic), but the case in which all projections are intermediate (so some random variables are periodic and others are mixing) is quite tricky.
21 February, 2009 at 9:27 pm
Terence Tao
632. Some special cases of DHJ(3)
Now we move from k=2 to k=3. Again, we consider an m-dimensional random subspace V of
. Instead of a single companion subspace V’, we now have three spaces
, formed from V by flipping a fixed 0 to a 1, or a fixed 1 to a 2, or a fixed 2 to a 0. We can then define the notion of a random variable f(V) being 01-periodic or 01-mixing (resp. 12-periodic or 12-mixing, etc.) by comparing f(V) with
, etc. We then get decompositions
into 01-periodic and 01-mixing components, etc.
What makes life difficult here is that the three orthogonal projections
can be in general position with respect to each other (in particular, they need not commute). But things are simpler when one of the orthogonal projections is trivial.
At one extreme, consider the case when
(say) is the identity – in other words, that every random variable f(V) is 12-periodic. This is basically saying that the original set A is essentially insensitive to flipping a 1 to 2 or vice versa. Because of this, we can basically identify 1 and 2 together and reduce the k=3 problem to the k=2 problem.
A more non-trivial case is the other extreme, in which
is the expectation operator – i.e. every random variable f(V) of expectation zero is 12-periodic, or equivalently, f(V) and g(V_{1 \to 2}) are independent for every f, g.
Now let f(V), g(V), h(V) be random variables. We claim that if f is 01-mixing OR if g is 20-mixing, then
where
is defined by taking the same digit that was flipped from 0 to 1 in the definition of
, and flipping it to 2 instead.
To show this, it suffices (by a certain “IP van der Corput lemma”, which one should think of as a variant of Cauchy-Schwarz) to eliminate h and show that
where the
are formed by choosing two of the fixed 0s of V and flipping them to i and j (and it is the same two digits being flipped for all four of the
).
The triviality of
(or more precisely, a variant of this with m replaced by m+1) can then be used to show that the random variables
and
, because these random variables involve parallel m+1-dimensional subspaces which differ from each other by flipping a single 1 to a 2. Assuming this, we can simplify the expression in (2) to
But one of these factor vanishes since f is 01 mixing or g is 20 mixing, and we are done.
For general f, g, we can now use the Fact from the previous comment to obtain the “generalised von Neumann theorem”.
Now we can prove DHJ(3). It suffices to show that
is large. Using the generalised von Neumann theorem, this expression can be rewritten as
But, as we saw in 631,
has positive inner product with
. Furthermore
has positive inner product with
for any subset E’ of
of positive measure. One can check that
is monotone, and so
has positive inner product with
for any
of positive measure; thus
is positive almost surely on
(indeed,
is the conditional expectation of
to a certain factor), and similarly for
, and we are done.
25 February, 2009 at 11:37 am
Terence Tao
633. Thread moving
I have now got an informal combinatorial translation of Paper #2 on the wiki at
http://michaelnielsen.org/polymath1/index.php?title=Furstenberg-Katznelson_argument
and it is now being discussed on the 800 thread at
http://gowers.wordpress.com/2009/02/23/brief-review-of-polymath1/
so I am now moving discussion of this paper over to that thread.
3 June, 2013 at 8:32 pm
Polymath proposal: bounded gaps between primes | The polymath blog
[…] 2 of this project could be run as an online reading seminar, similar to the online reading seminar of the Furstenberg-Katznelson paper that was part of the Polymath1 project. It would likely focus on the second half of Zhang’s […]
14 January, 2015 at 8:59 pm
There Are Many Primes | Gödel's Lost Letter and P=NP
[…] . This led to a new multidimensional formulation with Yitzhak Katznelson, and the two later proved positive-density versions of the Hales-Jewett […]