As part of the polymath1 project, I would like to set up a reading seminar on this blog for the following three papers and notes:

- H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem for k=3“, Graph Theory and Combinatorics (Cambridge, 1988). Discrete Math. 75 (1989), no. 1-3, 227–241.
- R. McCutcheon, “The conclusion of the proof of the density Hales-Jewett theorem for k=3“, unpublished.
- H. Furstenberg, Y. Katznelson, “A density version of the Hales-Jewett theorem“, J. Anal. Math. 57 (1991), 64–119.

As I understand it, paper #1 begins the proof of DHJ(3) (the k=3 version of density Hales-Jewett), but the proof is not quite complete, and the notes in #2 completes the proof using ideas from both paper #1 and paper #3. Paper #3, of course, does DHJ(k) for all k. For the purposes of the polymath1 project, though, I think it would be best if we focus exclusively on k=3.

While this seminar is of course related in content to the main discussion threads in the polymath1 project, I envision this to be a more sedate affair, in which we go slowly through various sections of various papers, asking questions of each other along the way, and presenting various bits and pieces of the proof. The papers require a certain technical background in ergodic theory in order to understand, but my hope is that if enough other people (in particular, combinatorialists) ask questions here (and “naive” or “silly” questions are *strongly* encouraged) then we should be able to make a fair amount of the arguments here accessible. I also hope that some ergodic theorists who have been intending to read these papers already, but didn’t get around to it, will join with reading the papers with me.

This is the first time I am trying something like this, and so we shall be using the carefully thought out protocol known as “making things up as we go along”. My initial plan is to start understanding the “big picture” (in particular, to outline the general strategy of proof), while also slowly going through the key stages of that proof in something resembling a linear order. But I imagine that the focus may change as the seminar progresses.

I’ll start the ball rolling with some initial impressions of paper #1 in the comments below. As with other threads in this project, I would like all comments to come with a number and title, starting with 600 and then incrementing (the numbers 1-599 being reserved by other threads in this project).

### Like this:

Like Loading...

*Related*

## 37 comments

Comments feed for this article

11 February, 2009 at 6:36 pm

Terence Tao600. Initial impressions of paper #1

Paper #1 outlines a proof of DHJ(3): given any , that any subset A of of density at least contains a combinatorial line if n is sufficiently large depending on . (The paper refers to combinatorial lines as “HJ sequences”, and as , but I am translating to be more compatible with the notation already used in other threads.) It proceeds by first reducing the problem to one concerning a family of measurable sets on a probability space, indexed by words, and for which there is a family of (non-commuting!) measure-preserving transformations relating these sets to each other. This reduction is performed in Proposition 3.1.

A key point in this reduction is that a certain “stationarity” property is obtained on this family of sets. To obtain this stationarity, one needs a version of the Carlson-Simpson theorem, which seems to be an infinitary strengthening of the colouring Hales-Jewett theorem. (It looks vaguely reminiscent of Hindman’s theorem; presumably there is a relation.)

With the above reductions, the game is now to show that a certain triple intersection of three sets has positive measure (Theorem B). This is not actually done in paper #1, but is covered in #2 or #3. In #1, the simpler (but still non-trivial) result is shown that all pairwise intersections of the three sets can simultaneously have positive measure (Theorem 4.1). (It would be interesting to see what the combinatorial analogue of this weaker statement is. Getting just one of the pairs to have positive measure sounds similar to the “DHJ(2.5)” I formulated in my comment 130, which turns out to be equivalent to DHJ(2) and thus an easy consequence of Sperner’s theorem.)

The proof of Theorem 4.1 is established from some soft functional analysis results about IP-systems of unitary operators in Section 5.

In Section 6, there is a discussion of what is needed to establish the full strength of Theorem B, and I guess this is completed in #2.

My initial plan is to go to Section 3 and start understanding the equivalence between the combinatorial problem and the ergodic theory one. There seems to be several stages to this equivalence; a relatively easy correspondence between a combinatorial problem and an ergodic problem without stationarity, and then the reduction to the stationary case which requires the Carlson-Simpson type theorems.

11 February, 2009 at 8:01 pm

Terence Tao601. Proposition 3.1

Proposition 3.1 of paper #1 equates the combinatorial statement DHJ(3) with various more ergodic-flavoured versions of the same statement. There are three main statements in this equivalence, (a), (b), and (c). (There are also minor variants (a*), (b*), (c*) of (a), (b), and (c), but I think we won’t need to focus on them too much.)

DHJ(3)(a) is the combinatorial version of DHJ(3), i.e. for any and for n sufficiently large depending on , every subset of of density at least contains a combinatorial line.

DHJ(3)(b) is a preliminary ergodic version of DHJ(3). Define a

-systemto be a collection of subsets in some probability space , indexed by strings in .DHJ(3)(b): If and n is sufficiently large depending on , and is a -system with all sets having measure greater than or equal to , then there exists a combinatorial line such that .)This can be stated in a probabilistic language: if A is a

randomsubset of with the property that each element w of belongs to A with probability at least , then with positive probability, A contains a combinatorial line. (Indeed, just take A to be the set of those for which holds.)To deduce DHJ(3)(b) from DHJ(3)(a), observe from linearity of expectation that the expected density of A is at least , and so A has density at least (say) with positive probability.

To deduce DHJ(3)(a) from DHJ(3)(b), one can use random averaging arguments (as discussed for instance in comment 460). [Paper #1 uses a somewhat different derivation of this implication which I will skip here.] To recap that argument briefly: we pick a medium integer m between 1 and n, and pick a random m-dimensional subspace inside . Pulling back the dense subset A of to , we get a random subset A’ of with each point lying in A’ with probability at least (say), if n is large enough depending on m. The claim then follows.

DHJ(3)(c) is DHJ(3)(b) with an additional stationarity hypothesis thrown in, and will be discussed subsequently.

12 February, 2009 at 2:10 am

gowers602.

Dear Terry,

I think it’s a very interesting idea to have an online reading seminar. Of course, it requires us to do the reading, which is quite a big investment to make, and presumably why you envisage a sedate affair. I think what would be most useful to me is basically what you encourage: rather than reading the paper very carefully and then explaining bits I understand here, I would prefer to try to use this post and its associated comments to read the paper sloppily and present my very partial understanding, or even complete misunderstanding, in the hope that others will be able to say, “Yes, you’re getting there — what you need to grasp next is this,” or “No, that’s not quite it — if that were the case then so would this be but it clearly isn’t,” and things like that. So when I get a moment, I’ll try to skim-read Randall’s paper and give some kind of initial reaction to it. I haven’t yet looked at it at all, but my eventual aim would be to obtain some kind of non-ergodic understanding of the proof. (Here I mean non-ergodic in the shallow sense of trying to explain things without getting completely into the ergodic language, rather than in the deep sense of genuinely doing without ergodic theory.)

12 February, 2009 at 4:32 am

zugzwang trebuchet603.

Hi gowers,

Why not work out the ergodic details here?

12 February, 2009 at 8:19 am

David Speyer604.

I’m going to try to restate proposition 3.1 to see if I understand it. If I get it right, this will probably be redundant with commment 601 above, but if I get it wrong, it will be one of those naive quesitons Terry asked for.

Prop. 3.1 says that six things are equivalent, called (a), (a*), (b), (b*), (c) and (c*). In each case, the unstarred version is a uniform variant on the starred version. So the starred version will say “If we have a sequence where lim sup (some measure of the size of ) then…” while the unstarred version will say “For every , there is an integer such that, if and (some measure of the size of ) then…”.

We’re going to concentrate on the unstarred versions.

(a) is what we’re trying to prove. For every , there is an such that any subset of of cardinatliy more that contains a combinatorial line.

(b) says that we can’t break (a) by a random construction. Suppose we had some random process, like drawing ping pong balls out of a bag. The set of balls is called , and the probaility that an individual ball is drawn will be . For each ball , our process produces a particular subset of , which I will denote . Confusingly (to me), H and K use the opposite indexing; for each , they write for the set of all such that , but they don’t have a notation for .

For fixed , if is large enough, the following holds: is long as the probability of any particular element of being in is at least , then there will be some particular combiantorial line which occurs in with nonzero probablity.

Some comments:

Notice that (b)’s framework is general enough to destroy any probabilistic construction I might attempt. In particular, there is no hypothesis that the events “ is in ” and “ is in ” are independent in any way.

The only kind of probabilistic construction I could imagine which would violate the hypotheses of (a) is a process which, although they produce sets of density , have much lower probability of including certain particular elements of . However, the problem is so symmetric that I see no reason that this would help. Presumably, the proof of (b)–>(a) is to make this idea rigorous.

Finally, I described things above with a finite collection of ping pong balls, each with a positive probability. H and K phrase things in terms of a general measure space. I assume this generality will be helpful, although it seems pointless so far.

Whew, that’s long! I’ll save condition (c) for later, asI need to get going now.

12 February, 2009 at 11:51 am

Terence Tao605.

Dear Tim: Well, I don’t really have an organised plan as to where this will go, but I would like to get as many contributions (from as many backgrounds) as possible, to take full advantage of Metcalfe’s law. So if some people read slowly and carefully, others comment loosely and freely, and yet others translate from ergodic theory to combinatorics or back, I think that will be an instructive mix. (One advantage of an internet-based seminar, as opposed to a physical one, is that one can have multiple concurrent discussions on different aspects of a paper without too much confusion.)

Incidentally, the McCutcheon notes rely heavily on paper #1, so I think it is best to start with that paper first, though it may still be valuable to skim through #2 at this preliminary stage nevertheless.

Dear David: Welcome! I’m glad you took the initiative to jump in (there is no roster or anything for this seminar :-). Please don’t worry about redundancy – in fact, I think the more perspectives we have on an individual component of the argument, the better.

Regarding the implication of (b) from (a), I can give you a simpler version of it, in which we are not looking for combinatorial lines in , but rather algebraic lines in ; the analogue of DHJ(3) here is called “Roth’s theorem in “. Given a (deterministic) set A in of density at least , we can define a random set A(x) in by setting where x is a random element of . Note that each element w of will lie in A(x) with probability at least , and if A(x) contains an algebraic line with positive probability, then A certainly contains an algebraic line also. So in this case the deduction of (a) from (b) is pretty simple.

This doesn’t work in the DHJ setting because combinatorial lines are not translation invariant, but one can do a somewhat different trick, embedding randomly in in a combinatorial line-preserving manner, rather than by translating randomly back to .

12 February, 2009 at 1:13 pm

Terence Tao606. Stationarity

In Proposition 3.1(b), the -system (which is called in the notation of #1, and is equivalent to the notion of a random subset A(x) of as discussed in Speyer.604), has no structure other than having every set be large. But the deepest part of Proposition 3.1 is a reduction to a special type of -system, namely a

stationary-system. This is a system where the sets take the form, (1)

where

, (2)

is a set, and the for and are measure-preserving invertible transformations on X. [These transformations are

notassumed to commute with each other, and this is apparently a significant technical difficulty in the rest of the argument.]Clearly, if (1) holds, and has measure at least , then all the have measure at least . So DHJ(3)(b) certainly implies

DHJ(3)(c):If and n is sufficiently large depending on , and is a stationary -system with having measure at least , then there exists a combinatorial line such that has positive measure.The non-trivial fact is that DHJ(3)(c) also implies DHJ(3)(b). I guess I’ll try to discuss that in future comments.

It is tempting to try to describe stationarity in terms of a measure-preserving action on X by the free group on three generators , but this isn’t quite right, because the are not constant in i. But if one adds a discrete time variable i, thus replacing X with , then one can recover a measure-preserving action of , with each generator , of mapping to for and (here we have to define for non-positive i in some arbitrary fashion, e.g. setting these transformations to be the identity). With this action, and embedding inside by identifying a string with the group element , one can rewrite (1), (2) as

(3)

and so the task is to show that there exists a combinatorial line such that

. (4)

At this stage I do not know whether this group action perspective is useful, although it does make the problem look a little bit like simpler ergodic recurrence theorems. For instance, Roth’s theorem is equivalent to the assertion that if T acts in a measure-preserving way on a probability space X, then for every set E of positive measure, there exists an arithmetic progression of integers such that

. (5)

12 February, 2009 at 1:40 pm

Kristal CantwellI would be interested in this seminar. I have downloaded the three papers and will be looking at them in the next few days along with the comments here.

12 February, 2009 at 1:56 pm

Terence Tao607. Stationarity cont.

Welcome, Kristal! Feel free to jump in and contribute with any questions, comments, or suggestions, even if (or especially if) they overlap with those already in this post.

One small comment, continuing 606: it appears that the stationarity gives quite strong constraints on the . Most obviously, they are all forced to have the same measure, but there are other constraints too. For instance, if n=2, then has to have the same measure as , since the latter set can be obtained from the former set by applying the measure-preserving . To put it another way, the random set is just as likely to contain the pair (0,0), (1,0) as it is to contain the pair (0,2), (1,2).

More generally, stationarity seems to be saying that if one looks at a slice of formed by freezing the last m positions of A(x), that the distribution of that slice (or more precisely, the identification of that slice with a subset of ) is in fact independent of the choice of the frozen coordinates

. (This would explain the terminology “stationary”.) I do not know yet whether this property already captures the full strength of stationarity, or whether stationarity in fact says something more.

12 February, 2009 at 2:06 pm

Terence Tao608. Symmetry breaking

I wanted to repeat a remark of Austin.54 here: the original formulation DHJ(3)(a) of the problem, together with its probabilistic variant DHJ(3)(b), are invariant under permutation of the indices in [n], but the notion of stationarity is not invariant (being dependent on the ordering of the indices), and so DHJ(3)(c) has “broken” the permutation symmetry. There is nothing particularly harmful about this – the standard proofs of colouring Hales-Jewett also break symmetry, for instance – but it seems like a noteworthy point nonetheless.

12 February, 2009 at 6:43 pm

David Speyer609: My first dumb question: In (b) and (c), would it add any generality to allow the measure space to depend on ? When I think of sampling methods I might use, it seems natural to me to use different sample spaces for different . But I suspect that I could use as my sample space, and thus reduce to H and K’s set up.

12 February, 2009 at 8:46 pm

Randall610: Fixed some typos I found on an airplane today and put a better version up. The most potentially confusing of these were in Lemma 1 (continued), the first display on page 4 and in the definition of \delta on page 6. There may be others so please speak up if you see anything that looks incongruous.

The notion of stationarity is more familiar for (linear) processes. If (X_i) are random variables taking values say in {a,b,…,z}, then we call (X_i) a process and we say that the process is stationary if the probability that X_1X_2X_3=”cat” is the same as the probability that X_17X_18X_19=”cat” (for all cats and all shifts).

In the current setup, you can view the sets B_w as random variables X_w taking values in {0,1}. Stationarity says: take any finite set of words. Call that set I. The probability of seeing a certain pattern of 0s and 1s in the random variables X_w, w in I, is exactly the same as the probability of seeing the same pattern of 0s and 1s in the random variables X_wv, w in I, where v is any word. Here “v” plays the role 16 did in the cat example, the pattern of 0s and 1s plays the role of “cat”, and I plays the role of {1,2,3}.

I can certainly sympathize with the sentiment that it seems quite unclear what 3.1 can possibly do to help. It gives you some measure preserving transformations, but nothing resembling honest ergodic theory, really, because although there are measure preserving transformations around, they don’t even constitute an action (as Terry pointed out). I guess the thing to do at this early stage is just have faith….

12 February, 2009 at 8:51 pm

Terence Tao611: the measure space X

David, I think you’re right, the actual choice of space X is more or less irrelevant. In a lot of these problems one can take the Cantor space (or the unit interval) as a universal space (note that any countably generated probability space can be lifted up to either of these two spaces).

One of the philosophies of probability theory, by the way, is to try to hide the underlying space X as much as possible; thus one talks about events E and their probabilities, random variables and their expectations and correlations, etc. without explicit mention of the probability space and probability measure, etc. It’s like studying a manifold or variety through its ring of functions rather than by looking at its constituent points. It has the advantage that one can concatenate any number of random processes together (e.g. flip infinitely many coins, and then pull balls out of urns infinitely often, etc.) without having to think about what happens to the underlying probability space.

In ergodic theory, though, one often takes almost the opposite view: one starts with an abstract measure space X (with some groups acting on it) and then creates some factors Y of that space which often have a very explicit algebraic structure (e.g. they are the homogeneous space for some Lie group). One then studies the geometry and dynamics of that space Y from a very concrete point of view, thus moving from the category of abstract measure spaces and measure preserving transformations to something much more algebraic.

13 February, 2009 at 1:53 am

DHJ — possible proof strategies « Gowers’s Weblog[…] is his comment 460, which again is responded to by other comments. Terry has also begun an online reading seminar on the Furstenberg-Katznelson proof, using not just the original paper of Furstenberg and […]

13 February, 2009 at 11:56 am

Kristal Cantwell612. Hindman’s theorem is a special case of one of their versions of the Carlson-Simpson theorem according to the paper.

13 February, 2009 at 8:56 pm

Terence Tao613. Stationarity II

DHJ(3)(c) finds combinatorial lines in stationary -systems . It turns out to be convenient to eliminate the role of n, and instead work with stationary -systems , where is the space of all words of finite length with alphabet {0,1,2}. (Paper #1 uses the notation for .) It is clear that DHJ(3)(c) implies

DHJ(3)(c*): If is a stationary -system, with E having positive measure, then there is a combinatorial line such that .In a similar spirit, DHJ(3)(b) is easily seen to be equivalent to

DHJ(3)(b*): If is a -system, with the having measure uniformly bounded away from zero, then there is a combinatorial line such that .To finish the proof of Proposition 3.1, we need to deduce DHJ(3)(b*) from DHJ(3)(c*). Broadly speaking, the idea is to start with a -system and repeatedly use Carlson-Simpson-type Ramsey theorems to pass to subsystems in which become increasingly stationary.

A -system can be viewed as a collection of random sets , one for each n. As noted already in 610, stationarity implies that has the same distribution as for all n and all words , where of course we identify with in the obvious manner.

13 February, 2009 at 10:30 pm

Terence Tao614. A possible simplification

It appears to me that one can in fact get stationarity for free (and avoid invoking the Carlson-Simpson theorem here). The basic observation is that when one deduces DHJ(3)(b) from DHJ(3)(a) by using a random m-dimensional subspace inside , the random set obtained in this fashion already is essentially stationary by construction. I’ll write up the details on the wiki and report back here when I’m done.

(This does not completely eliminate the need for Carlson-Simpson, though, as such results are also used later in the paper.)

14 February, 2009 at 9:12 am

Ryan O'Donnell615. Let me try to recap Terry’s points, passing from DHJ(3)(a) to DHJ(3)(c), in my own language.

Let be a given set of density .

Let be some moderate integer; perhaps when is a “constant”.

Let be a random subset of cardinality . Imagine choosing a “random restriction” to the coordinates . I.e., is chosen uniformly at random.

Then induces a probability distribution on subsets of . Call the induced subset .

Let denote a string in . I will use the notation “” for the string in gotten by appropriately piecing and together.

We claim that for *each* string , the probability (over choice of ) that is at least . On one hand, note that is *not* a uniformly distributed string in . For example, if is the all-0’s string, then the distribution on is slightly biased towards those strings with more 0’s.

Still if is “small enough” then the distribution will be very close to uniform (except around the “extreme slices” of , but has almost all its density outside these slices anyway). And this should prove the claim with a minimum of calculations.

At this point we’re at DHJ(3)(b); on to DHJ(3)(c). “Stationarity” here would seem to be that for any further restriction to coordinates , the resulting distribution on the set (a subset of ) is the same.

Again, this is not true, but it should be “almost” true. And it seems like we’re only going for “almost stationarity”, not “stationarity” anyway.

In fact, it seems to me that whereas “(almost) stationarity” importantly involves an *ordering* on the coordinates , it seems this argument doesn’t really. I.e., need not be a “suffix” of or anything.

14 February, 2009 at 10:10 am

Terence Tao616.

Thanks Ryan, this fleshes out what I was trying to say.

I’ve started a running “combinatorial translation” of paper #1 on the wiki at

http://michaelnielsen.org/polymath1/index.php?title=Furstenberg-Katznelson_argument

It’s likely to evolve in unstable ways over time (in particular, I am likely to overhaul the notation as I understand more of the big picture) but hopefully will eventually settle down.

As you note, the random sampling construction gives additional stationarity properties beyond those used in Paper #1. In particular, the random sets are also stationary wrt permutations of the index set [m]. It’s not clear whether this additional symmetry could be useful though (it might even be harmful, because later parts of the proof may need to break this symmetry and so trying to hold on to it may be counterproductive).

14 February, 2009 at 10:35 am

Terence Tao617. Recap

I’m now going to try to recapitulate the previous discussion in light of the new simplification, thus moving away from the presentation in Paper #1. (I guess I am changing the rules a little bit here from a “reading seminar” to a “revisionist seminar”, but I said that the rules would be made up as we went along. :-) ) I will try to keep the wiki page above current with the latest revision of thinking.

Let be a dense line-free subset of , where n is very large. For every small integer m, we can then form a random line-free subset by randomly embedding inside (by picking m random positions for the wildcards, and then picking the frozen positions randomly), and then pulling back by this map. One can think of as a random variable taking values in the power set , so its probability distribution is a probability measure on .

By construction, is stationary with respect to the permutation group of [m], i.e. is invariant wrt the action of . Also, for any word , the

slicing mapdefined byis

almost stationaryin the sense that for fixed m, r, and w, the random variables and have asymptotically the same distribution in the limit .Taking a subsequence limit as (and using the Helly selection principle, a.k.a. the vague sequential compactness of probability measures), we can then find random line-free sets (or equivalently, a probability measure on ) which is stationary with respect to the permutation groups acting on , and also with respect to the slicing maps . Also, if the all have density at least , then so do the : in particular, will be non-empty with probability at least , which implies that for any , that with probability at least also.

This is all very well and good, but there is one minor technical problem which will cause difficulty in later parts of paper #1: the slicing maps are not invertible. We would like to upgrade them to be invertible (thus upgrading the semigroupoid of transformations to a groupoid), thus giving us DHJ(3)(c*) in its full strength. This can be done, but requires extending the measure spaces a bit (or, in probabilistic language, adding a whole bunch of “dice” or “random number generators”). I plan to talk about this in a later comment.

14 February, 2009 at 11:56 am

Terence Tao618. Inversion of maps

There is a simple lemma that says that any finite probability-preserving surjective map can be inverted if one throws in a random number generator that draws uniformly from [0,1]. More precisely:

LemmaLet be a surjective map from one finite probability space to another which preserves the probability measure (in the sense that U pushes the probability measure on X forward to the probability measure on Y). Then one can lift this map to aninvertiblemeasure-preserving map , with product measure.I prove this lemma in the wiki notes, but let’s just illustrate this with the simplest example: the non-injective map from the two-element space {a,b} (with probability 1/2 of each) to the one-element space {c}. One invertible lift of this map is the map which maps (a,t) to (c,t/2) and (b,t) to (c,(t+1)/2). Many other choices are possible; this lift is not unique.

Applying this lift to all of the generators , we can now find probability spaces and invertible measure-preserving maps that commute with the slicing maps . Concatenating these maps then give us more invertible measure-preserving maps for all words w of length r. We are now basically in the situation of DHJ(3)(c*), which the rest of Paper #1 aims to prove. (It’s slightly different because we’ve “graded” the underlying probability space X, replacing it with a sequence of probability spaces, but I don’t think this will make much of a difference.)

[Remark: the lemma here is my interpretation of Lemma 3.2 in Paper #1; the statement there is false as stated, but can be repaired by throwing in the [0,1] as is done above.]

14 February, 2009 at 12:53 pm

Terence Tao619.

A frivolous comment, for the combinatorially minded: the Lemma in 618 is in some sense a continuous version of (a special case of) Hall’s marriage theorem, and serves much the same purpose (viz. conjuring up a bunch of artificial but useful perfect matchings out of thin air).

14 February, 2009 at 2:01 pm

Ryan O'Donnell620. Inversion of maps.

I’m getting slightly lost in notation and also slightly lost because I don’t quite know where this is going. However it seems to me that we might not eventually need a completely general Lemma like the one in Terry.618. Some natural randomized-inverse map might work in our context; e.g., tack on a random substring from .

14 February, 2009 at 2:39 pm

Terence Tao621. Inversion of maps

Sorry about the notation shifting – I also don’t fully know where we are going either, and this I think is reflected in the notation…

We’re not inverting a map from to , but rather a map from to (which, I guess, is induced by an embedding of in ). Given a word w of length r, and a set , we have the slice , defined as

.

Stationarity tells us that if we take the random set and look at a slice of it (think of w as fixed), one gets back something with the same distribution as the random set . But one cannot initially invert this procedure: starting with the random set , one cannot

deterministicallylift it back to a set in that has the same distribution as (and which agrees with to on the slice) – there is not enough entropy in the domain and too much in the range. But one can do this once one has a random number generator, which I am modeling by a uniform distribution on [0,1] – this gives us an infinite amount of entropy on both sides and removes the obstruction to invertibility, as per “Hilbert’s hotel”. (One has to take care that one only performs “reversible computations” in order to maintain invertibility, but this is a minor technical detail.)14 February, 2009 at 8:05 pm

Ryan O'Donnell622. Understood, I’m just thinking that since we got to in the first place by making a large random restriction, perhaps we can “remember” this and “unrestrict” coordinates later. Just guessing… I’m probably being too speculative right now. Best to see where the next parts of the arguments go.

14 February, 2009 at 11:02 pm

Terence Tao623. Oh, that’s an interesting idea. The artificial nature of the transformations used in the Furstenberg-Katznelson argument bothered me a bit, and this would be a much more natural way to do it. I don’t quite see how to set it up so that everything is completely reversible (i.e. the relevant maps are invertible), so that entropy is neither created nor destroyed, but I can imagine that it is possible.

I’ve read a little further into the paper (an interesting experiment, reading a paper collaboratively a small fragment at a time). With the above revisionist notation, we now have a sequence of probability spaces and invertible measure-preserving maps for all words w, obeying a semigroup law. The claim is now that for every set of positive measure, that there exists a combinatorial line in some cube for some n such that the triple intersection of , , and has positive measure (or equivalently, that the random set contains the line with positive probability).

In Paper #1, a somewhat weaker result is proven, namely there exists a line , such that the three

pairwiseintersections of , , and have positive measure (thus the random set will contain any two of the points on this line with positive probability). The notes in #2 finish the job to get all three points on the line. This weaker statement is already not obvious to me combinatorially – it’s some sort of triple version of the DHJ(2.5) statement we discussed back around 130 or so. (I think it is asserting the existence of a positive r for any dense with r wildcards such that for each ij=01,12,20, there exists a combinatorial line which intersects A in the i and j position.) I think I’ll ask it at the 500 thread to see if anyone can come up with a combinatorial proof.15 February, 2009 at 4:19 pm

Terence Tao624. Compactification of the string space

I’ve scanned through the rest of the paper, and it seems like one has no choice but to start understanding the statement of the Carlson-Simpson theorem (and its variants) before progressing much further, although one can at least take these statements as “black boxes”. This theorem is an infinite-dimensional version of the colouring Hales-Jewett theorem, but to state it properly, we need some notation.

I’m using for the strings of length n, and for the finite length strings.

We can give a “3-adic” metric by defining the distance between two distinct strings x, y (possibly of different length) to be , where l is the first digit at which x and y differ (or terminate).

The metric space is totally bounded but incomplete. One can complete it by attaching the space of infinite strings. The space is totally bounded and complete, with as a dense subspace, thus is a compactification of . (Roughly, one can think of as being like the terminating decimals base 3 in , and as being like the entire interval .)

[Paper #1 uses in place of .]

Let X be a compact metric space. Observe that a map is uniformly continuous if and only if it extends to a continuous map . One can think of a uniformly continuous F as an operation that takes a string as input and spits out a point in X as output, but is increasingly insensitive to fluctuations in the and higher digits as .

Of course, not every map on is continuous. (A good example of a discontinuous map: the map that counts the parity of the number of 1s in the string.) But the Carlson-Simpson theorem is an assertion that every map F from to a compact set (or a finite set) becomes uniformly continuous when restricted to an “infinite-dimensional combinatorial subspace”. This allows one to extend F at some key points of . The proof is going to proceed by taking the shift maps (viewed as unitary operators) and taking a suitable limit (in the weak operator topology) as and working with the limiting operators (which turn out to be orthogonal projections). I don’t know yet why this is going to be a useful thing to do.

In a subsequent comment I will state the Carlson-Simpson theorem properly, both in the finite colouring version (which is the standard formulation) and in the compact version (which is the version preferred by paper #1).

16 February, 2009 at 1:00 pm

Ryan O'Donnell625. In the analysis of boolean functions there is general intuition of continuous/measurable functions vs. discontinuous/nonmeasurable functions. Roughly, the former are those functions which are “asymptotically noise stable” (in the sense of Benjamini-Kalai-Schramm: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.2927 ) or have “analogue” functions on Gaussian space or the sphere. I wonder if this viewpoint will be helpful.

16 February, 2009 at 2:59 pm

Terence Tao626. Colouring theorems I: finitary theorems

In this post and the next I want to describe a square of four Ramsey theorems: two finitary theorems, and two infinitary theorems. It is the latter which is used in Paper #1.

The first finitary theorem is the

colouring Hales-Jewett theorem. The k=3 version of this theorem asserts that if [3]^n is partitioned into c colour classes, and n is sufficiently large depending on c, then one of the colour classes contains a combinatorial line.Iterating this theorem, we conclude that [3]^n is partitioned into c colour classes, and n is sufficiently large depending on c, m, then one of the colour classes contains an m-dimensional combinatorial subspace.

The second finitary theorem is the

Graham-Rothschild theorem. The k=3 version of this theorem asserts that if, instead of colouring the vertices of [3]^n as in Hales-Jewett, one colours thecombinatorial linesin [3]^n, partitioning the space of all such lines into c colours, then (if n is sufficiently large depending on c, m) one can find an m-dimensional combinatorial subspace in which all lines in that space have the same colour.Recall that a combinatorial line can be viewed as a string of 0s, 1s, 2s, and a wildcard *, with the wildcard appearing at least once (e.g. 01*2* is a combinatorial line). Identifying the wildcard with the symbol 3, we can thus view a combinatorial line as a point in containing at least one 4. Thus one can rephrase the k=3 Graham-Rothschild theorem in another way:

Graham-Rothschild theorem. If the vertices of [4]^n are partitioned into c colour classes, then (if n is sufficiently large depending on m, c) there is an m-dimensional subspace (with none of the fixed coordinates equal to 3) such that all points in this subspace (with at least one coordinate equal to 3) have the same colour.Written this way, Graham-Rothschild looks a lot like colouring Hales-Jewett, and it is in fact possible to deduce the latter from the former (the precise deduction escapes me right now, though).

The k=1 version of the Graham-Rothschild theorem implies Folkman’s theorem: if the positive integers are finitely coloured, then for any m, one can find m positive integers such that all finite sums of those integers (other than the trivial sum 0) have the same colour. This can be shown by considering the colouring on that colours a vertex by the colour given to the number of 1s on that vertex.

In the next comment I describe the infinitary strengthenings of the colouring Hales-Jewett and the Graham-Rothschild theorems, which are the Carlson-Simpson theorem and Carlson’s theorem respectively.

16 February, 2009 at 3:21 pm

Terence Tao627. Colouring theorems II. Infinitary theorems

The infinitary strengthening of the Hales-Jewett theorem is the Carlson-Simpson theorem. To define it, I need some notation. Define an

infinite-dimensional combinatorial subspaceof to be an object indexed by an infinite string of 0s, 1s, 2s, and an infinite number of wildcards , where each wildcard appears in a non-empty consecutive block , with each appearing to the left of . For instance, the following string describes an infinite-dimensional combinatorial subspace:This gives an embedding of into by mapping any n-digit string into the string formed from the above infinite string by substituting these digits into the first n wildcards , and then truncating everything after . For instance, in the above example, 20 maps to

.

Carlson-Simpson theoremIf is finitely coloured, then one of the colour classes contains an infinite-dimensional combinatorial subspace.This theorem implies the colouring Hales-Jewett theorem but is significantly stronger, in particular it does not seem to be deducible solely from the colouring Hales-Jewett theorem (much as the infinite pigeonhole principle cannot be deduced from its finite counterpart).

It implies a compact version:

Carlson-Simpson theorem, compact versionIf is a map from into a compact metric space X, then there exists a infinite-dimensional combinatorial subspace such that is uniformly continuous (i.e. it extends continuously to the compactification ).It is not hard to see that the two versions are equivalent (one has to cover the compact metric space by an increasingly fine mesh of small balls, which can then be used to colour ). In paper #1 it is remarked that the compact case is in fact slightly easier to prove than the finite colouring case (I suppose that the induction is somehow cleaner.)

In a similar spirit, the Graham-Rothschild theorem has the following infinitary generalisation:

Carlson’s theoremSuppose that the combinatorial lines in are finitely coloured. Then there is an infinite-dimensional combinatorial subspace on which all lines have the same colour.Or equivalently:

Carlson’s theoremSuppose that is finitely coloured. Then there is an infinite-dimensional combinatorial subspace with no fixed coordinate equal to 3, such that any element of this space with at least one 3 has the same colour.Again, there is a compact version:

Carlson’s theoremSuppose that is a map into a compact metric space X. Then there is an infinite-dimensional combinatorial subspace with no fixed coordinate equal to 3, such that the restriction of F to this space is uniformly continuous on those points with at least one 3.As I understand it, Carlson’s theorem was discovered independently by Carlson and by Furstenberg-Katznelson.

The k=1 case of Carlson’s theorem is essentially Hindman’s theorem (the infinitary generalisation of Folkman’s theorem). Given that the cleanest proof of Hindman’s theorem involves idempotent ultrafilters, I would imagine that this is how Carlson’s theorem is also proved. Needless to say, such theorems are not easily finitisable.

17 February, 2009 at 9:59 pm

Terence Tao628. Convergence to an orthogonal projection

Recall that our objective is to show that, given any system of measure-preserving transformations obeying a groupoid law, and any set of positive measure, that there exists a combinatorial line in some such that has positive measure for ij=01,12,20.

The measure of can also be written as

(1)

where we let measure-preserving transformations act on functions in the usual manner, .

The key claim is

ClaimThere exists a sequence of combinatorial lines such that for each ij=01,12,20, the unitary operators converge in the weak operator topology to orthogonal projections .Indeed, if this claim held, then (1) for the word would converge as to

On the other hand, since

we must have and hence , and so (1) is positive for sufficiently large m.

The main tool in proving the claim is the Carlson-Simpson theorem.

This is a somewhat infinitary argument; at present I do not see what the combinatorial analogue of it will be.

20 February, 2009 at 11:30 am

Terence Tao629. The invertible maps and group structure

I am beginning to understand a bit better the strange collection of invertible measure-preserving transformations that appear in the papers 1, 2, 3, which in turn generate the compound maps .

Firstly, as Ryan proposed in 622, there is a nice probabilistic interpretation of these transformations that does not require the artificial device of introducing a new source of randomness, such as the uniform probability measure on [0,1]. Roughly speaking, one can view each as the space of random m-dimensional combinatorial subspaces in , where each of the m wildcards appear exactly once, plus some additional data which is a bit tricky to describe cleanly (details are on the wiki).

The transformation for j=1,2,3 is the map that takes the random m+1-dimensional subspace to an m-dimensional slice, formed by replacing the final wildcard with j.

To invert this transformation (i.e. extending an m-dimensional subspace with an m+1-dimensional one), what one would like to do is start with the random m-dimensional subspace, search the fixed coordinates randomly for an instance of j, and replace this j with a wildcard . Of course, this is a random map rather than a deterministic one; so what one has to do is to “roll the dice in advance” and store the choices of fixed coordinates one would use to extend the transformation as part of the data used to define an element of . In fact one needs to roll the dice once for every element of the free group on three generators, representing all the future possibilities of extending and contracting the space.

The construction actually gives more than the invertible maps ; it also gives a permutation action on each , and one can multiply all the to get an honest-to-goodness measure-preserving group action of the free group , rather than this funny groupoid action that Furstenberg and Katznelson use. However (as was pointed out all the way back in Austin.6), all this additional structure is actually a hindrance to the proof, because one has to let go of it in order to take advantage of the powerful Ramsey theorems available such as Carlson’s theorem.

I still feel that there should be some clean algebraic substitute for the concept of a group action that can clarify conceptually the dynamics of this system, but I do not yet know what it should be (something like a semi-groupoid action). The key thing seems to be that the structure one has generates lots of “IP-systems”, which I will discuss next.

20 February, 2009 at 11:48 am

Terence Tao630. An IP convergence lemma

Let be a sequence of group elements, then we can define for any finite set of natural numbers by the formula

when . The form an

IP-system, which means that they have the restricted multiplicativity propertywhenever is to the left of . (This is an example of the “semi-groupoid action” concept I am struggling to formalise in a clean algebraic fashion.)

If G has a topology on it, we say that the IP-system converges to a limit if every neighbourhood of U contains the “for sufficiently large “, which means that the least element of is sufficiently large.

Hindman’s theorem is equivalent to the claim that any IP-system in a compact metrisable space has a convergent sub-IP-system (I will not define what a sub-IP-system is here; again, I do not feel that I have that the correct algebraic formalism for all this yet.).

The following lemma is the key in Paper #1 (together with Carlson’s theorem, which has a similar flavour to Hindman’s theorem) to proving the claim in 628:

Lemma.Let be an IP system of unitary operators on a separable Hilbert space H that converges (in the weak operator topology) to a limit P. Then P is an orthogonal projection.[Note that the weak operator topology on the operators with norm at most 1 on H is compact metrisable.]

Proof. Since the have operator norm at most 1, P has norm at most 1 as well. Since the only idempotents of norm at most 1 are the orthogonal projections, it suffices to show that .

Let . Then converges weakly to as . But for fixed , $U^*_\beta U_\alpha^* v$ converges weakly to as ; thus

,

where the limits are in the weak sense. On the other hand, using the IP property, the limit on the left is . Thus and so P is idempotent.

21 February, 2009 at 8:25 pm

Terence Tao631. An informal combinatorial translation of the ergodic proof of DHJ(2)

I’m beginning to understand (though still in a somewhat fuzzy, informal way) what the ergodic argument in Paper #1 would translate to in purely combinatorial terms. For simplicity I’ll start with DHJ(2) – the claim that any dense subset of contains a combinatorial line.

Let m be an integer that grows slowly with n (think of m as being like , for instance), and let V be a random m-dimensional combinatorial subspace of , where I will be deliberately vague as to what distribution V is drawn from here. One can use this random subspace V to create various real-valued random variables f(V) which depend only on what A is doing on V. For instance, one could consider the indicator random variable , which equals 1 when the bottom corner of V (formed by sending all m wildcards to 0) lies in A; one could consider the random variable that measures the density of A in V; and so forth.

Now, the random subspace V has a lot of fixed coordinates, and so with very high probability it’s going to have a lot of fixed 0s and fixed 1s (about n/2 of each). Let V’ be the random subspace parallel to V formed by taking one of the fixed 0s at random and flipping it to a 1. Then every random variable f(V) that is associated to V has a “shifted” version f(V’). Note that we expect V’ to have almost the same distribution as V, and so f(V) and f(V’) should also have about the same distribution; this is a

stationarityproperty. However, this does not mean that f(V) and f(V’) are necessarily close to each other.There are two fundamental types of random variable:

* A random variable f(V) is

periodicif f(V’) and f(V) are close, thus is small.* A random variable f(V) is

mixingif f(V’) and f(V) are uncorrelated, thus is small.Examples: any constant random variable is periodic. If the set A has “low influence” (which means that flipping a bit from 0 to 1 is unlikely to affect membership in A) then every random variable f(V) is in fact periodic. At the other extreme, if A is a random set of a fixed density , then any random variable f(V) of mean zero will be mixing, and only the constants will be periodic.

Now, here is the basic fact (modulo some lies):

Fact. Every random variable f(V) can be decomposed (uniquely) into a periodic random variable Pf(V) and a mixing random variable (1-P)f(V). Furthermore, the periodic random variables and mixing random variables are always uncorrelated (i.e. orthogonal to each other).This fact is a heavily disguised version of the lemma in 630, and can also be viewed as a sort of “IP mean ergodic theorem”. It is not quite true as stated; to be true, one has to redefine V’ to not be formed by flipping just one fixed coordinate of V, but by flipping an unspecified but bounded number of fixed coordinates. (The number of digits to flip will be determined by a Ramsey-type theorem, such as Folkman’s theorem, Graham-Rothschild, etc. The Ramsey theorem will also tell us what distribution V is to be drawn from, and in particular how many times each wildcard will appear.).

Assume this fact. Now we can prove DHJ(2). If we let be the event that the bottom left corner of V lies in A, it suffices to show that

is large.

Observe that the claim would be easy if was periodic, since then . The claim would also be easy if was mixing, since then the two random variables would be roughly independent. In general, the Fact tells us that we are in a combination of the two situations. Splitting both and into periodic and mixing components, we get four terms, three of which are small, leaving us with

. (1)

But observe that the mixing function is orthogonal to the periodic function 1, and thus has zero expectation. Since has expectation , we conclude that has expectation , and so (by Cauchy Schwarz) the expression (1) is large, and we are done.

In my next comment I will try to indicate how DHJ(3) works in some special cases. Here there is not just one projection P, but now three separate projections P, reflecting interchange between 0 and 1, 1 and 2, or 2 and 0. Things are relatively easy when one of the projections is extreme (either everything is periodic, or only the constants are periodic), but the case in which all projections are intermediate (so some random variables are periodic and others are mixing) is quite tricky.

21 February, 2009 at 9:27 pm

Terence Tao632. Some special cases of DHJ(3)

Now we move from k=2 to k=3. Again, we consider an m-dimensional random subspace V of . Instead of a single companion subspace V’, we now have three spaces , formed from V by flipping a fixed 0 to a 1, or a fixed 1 to a 2, or a fixed 2 to a 0. We can then define the notion of a random variable f(V) being 01-periodic or 01-mixing (resp. 12-periodic or 12-mixing, etc.) by comparing f(V) with , etc. We then get decompositions into 01-periodic and 01-mixing components, etc.

What makes life difficult here is that the three orthogonal projections can be in general position with respect to each other (in particular, they need not commute). But things are simpler when one of the orthogonal projections is trivial.

At one extreme, consider the case when (say) is the identity – in other words, that

everyrandom variable f(V) is 12-periodic. This is basically saying that the original set A is essentially insensitive to flipping a 1 to 2 or vice versa. Because of this, we can basically identify 1 and 2 together and reduce the k=3 problem to the k=2 problem.A more non-trivial case is the other extreme, in which is the expectation operator – i.e. every random variable f(V) of expectation zero is 12-periodic, or equivalently, f(V) and g(V_{1 \to 2}) are independent for every f, g.

Now let f(V), g(V), h(V) be random variables. We claim that if f is 01-mixing OR if g is 20-mixing, then

, (1)

where is defined by taking the

samedigit that was flipped from 0 to 1 in the definition of , and flipping it to 2 instead.To show this, it suffices (by a certain “IP van der Corput lemma”, which one should think of as a variant of Cauchy-Schwarz) to eliminate h and show that

(2)

where the are formed by choosing two of the fixed 0s of V and flipping them to i and j (and it is the same two digits being flipped for all four of the ).

The triviality of (or more precisely, a variant of this with m replaced by m+1) can then be used to show that the random variables and , because these random variables involve parallel m+1-dimensional subspaces which differ from each other by flipping a single 1 to a 2. Assuming this, we can simplify the expression in (2) to

.

But one of these factor vanishes since f is 01 mixing or g is 20 mixing, and we are done.

For general f, g, we can now use the Fact from the previous comment to obtain the “generalised von Neumann theorem”.

.

Now we can prove DHJ(3). It suffices to show that

is large. Using the generalised von Neumann theorem, this expression can be rewritten as

.

But, as we saw in 631, has positive inner product with . Furthermore has positive inner product with for any subset E’ of of positive measure. One can check that is monotone, and so has positive inner product with for any of positive measure; thus is positive almost surely on (indeed, is the conditional expectation of to a certain factor), and similarly for , and we are done.

25 February, 2009 at 11:37 am

Terence Tao633. Thread moving

I have now got an informal combinatorial translation of Paper #2 on the wiki at

http://michaelnielsen.org/polymath1/index.php?title=Furstenberg-Katznelson_argument

and it is now being discussed on the 800 thread at

http://gowers.wordpress.com/2009/02/23/brief-review-of-polymath1/

so I am now moving discussion of this paper over to that thread.

3 June, 2013 at 8:32 pm

Polymath proposal: bounded gaps between primes | The polymath blog[…] 2 of this project could be run as an online reading seminar, similar to the online reading seminar of the Furstenberg-Katznelson paper that was part of the Polymath1 project. It would likely focus on the second half of Zhang’s […]