One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations {f_3(x)} of an integer {x} as the sum of three primes {x = p_1+p_2+p_3}, as a Fourier-analytic integral over the unit circle {{\bf R}/{\bf Z}} involving exponential sums such as

\displaystyle  S(x,\alpha) := \sum_{p \leq x} e( \alpha p) \ \ \ \ \ (1)

where the sum here ranges over all primes up to {x}, and {e(x) := e^{2\pi i x}}. For instance, the expression {f(x)} mentioned earlier can be written as

\displaystyle  f_3(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha. \ \ \ \ \ (2)

The strategy is then to obtain sufficiently accurate bounds on exponential sums such as {S(x,\alpha)} in order to obtain non-trivial bounds on quantities such as {f_3(x)}. For instance, if one can show that {f_3(x)>0} for all odd integers {x} greater than some given threshold {x_0}, this implies that all odd integers greater than {x_0} are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.

Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff {p \leq x} with a smoother cutoff {\chi(p/x)} for a suitable chocie of cutoff function {\chi}, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function {\Lambda(n)}. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.

In many cases, it turns out that one can get fairly precise evaluations on sums such as {S(x,\alpha)} in the major arc case, when {\alpha} is close to a rational number {a/q} with small denominator {q}, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that

\displaystyle  S(x,0) \approx \frac{x}{\log x}

and the prime number theorem in residue classes modulo {q} suggests more generally that

\displaystyle  S(x,\frac{a}{q}) \approx \frac{\mu(q)}{\phi(q)} \frac{x}{\log x}

when {q} is small and {a} is close to {q}, basically thanks to the elementary calculation that the phase {e(an/q)} has an average value of {\mu(q)/\phi(q)} when {n} is uniformly distributed amongst the residue classes modulo {q} that are coprime to {q}. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.

In the minor arc case when {\alpha} is not close to a rational {a/q} with small denominator, one no longer expects to have such precise control on the value of {S(x,\alpha)}, due to the “pseudorandom” fluctuations of the quantity {e(\alpha p)}. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of {k} “pseudorandom” phases should fluctuate randomly and be of typical magnitude {\sim \sqrt{k}}, one expects upper bounds of the shape

\displaystyle  |S(x,\alpha)| \lessapprox \sqrt{\frac{x}{\log x}} \ \ \ \ \ (3)

for “typical” minor arc {\alpha}. Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that

\displaystyle  \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2\ d\alpha \sim \frac{x}{\log x} \ \ \ \ \ (4)

which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as {x^{4/5+o(1)}} are far more typical.

Because one only expects to have upper bounds on {|S(x,\alpha)|}, rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as {e(-x\alpha)} for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of {x}, so that averaging in {x} is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude {|S(x,\alpha)|^3} oscillates in sympathetic resonance with the phase {e(-x\alpha)}, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region {\Omega_{minor}}:

\displaystyle  |\int_{\Omega_{minor}} |S(x,\alpha)|^3 e(-x\alpha)\ d\alpha| \leq \int_{\Omega_{minor}} |S(x,\alpha)|^3\ d\alpha.

Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as {f(x)}, at least when {x} is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer {x} is the sum of three primes; my own result that all odd numbers greater than {1} can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem medium-sized ranges of {x}, such as {x \sim 10^{50}}. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)

However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number {x} as the sum {x = p_1 + p_2} of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations {2 = p_1 - p_2} of {2} as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large {x}, one has to find a non-trivial lower bound for the quantity

\displaystyle  f_2(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^2 e(-x\alpha)\ d\alpha \ \ \ \ \ (5)

for sufficiently large {x}, as this quantity {f_2(x)} is also the number of ways to represent {x} as the sum {x=p_1+p_2} of two primes {p_1,p_2}. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity

\displaystyle  \tilde f_2(x) = \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2 e(-2\alpha)\ d\alpha \ \ \ \ \ (6)

that goes to infinity as {x \rightarrow \infty}, as this quantity {\tilde f_2(x)} is also the number of ways to represent {2} as the difference {2 = p_1-p_2} of two primes less than or equal to {x}.

In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums {S(x,\alpha)}. Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity {f_2(x)} or {\tilde f_2(x)}, expresses it in terms of {S(x,\alpha)} using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to {x=p_1+p_2} or {2=p_1-p_2}, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on {f_2(x)} or {\tilde f_2(x)}.

Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on {f_2(x)} or {\tilde f_2(x)} (or similar quantities) purely from the upper and lower bounds on {S(x,\alpha)} or similar quantities (and of various {L^p} type norms on such quantities, such as the {L^2} bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in {S(x,\alpha)} are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:

  • (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
  • (ii) Upper and lower bounds on the magnitude of {S(x,\alpha)} are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of {O(1/\log x)} or better); but
  • (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.

I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.

In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.

Read the rest of this entry »

This is a sequel to my previous blog post “Cayley graphs and the geometry of groups“. In that post, the concept of a Cayley graph of a group {G} was used to place some geometry on that group {G}. In this post, we explore a variant of that theme, in which (fragments of) a Cayley graph on {G} is used to describe the basic algebraic structure of {G}, and in particular on elementary word identities in {G}. Readers who are familiar with either category theory or group homology/cohomology will recognise these concepts lurking not far beneath the surface; we wil remark briefly on these connections later in this post. However, no knowledge of categories or cohomology is needed for the main discussion, which is primarily focused on elementary group theory.

Throughout this post, we fix a single group {G = (G,\cdot)}, which is allowed to be non-abelian and/or infinite. All our graphs will be directed, with loops and multiple edges permitted.

In the previous post, we drew the entire Cayley graph of a group {G}. Here, we will be working much more locally, and will only draw the portions of the Cayley graph that are relevant to the discussion. In this graph, the vertices are elements {x} of the group {G}, and one draws a directed edge from {x} to {xg} labeled (or “coloured”) by the group element {g} for any {x, g \in G}; the graph consisting of all such vertices and edges will be denoted {Cay(G,G)}. Thus, a typical edge in {Cay(G,G)} looks like this:

Figure 1.

One usually does not work with the complete Cayley graph {Cay(G,G)}. It is customary to instead work with smaller Cayley graphs {Cay(G,S)}, in which the edge colours {g} are restricted to a smaller subset of {G}, such as a set of generators for {G}. As we will be working locally, we will in fact work with even smaller fragments of {Cay(G,G)} at a time; in particular, we only use a handful of colours (no more than nine, in fact, for any given diagram), and we will not require these colours to generate the entire group (we do not care if the Cayley graph is connected or not, as this is a global property rather than a local one).

Cayley graphs are left-invariant: for any {a \in G}, the left translation map {x \mapsto ax} is a graph isomorphism. To emphasise this left invariance, we will usually omit the vertex labels, and leave only the coloured directed edge, like so:

Figure 2.

This is analogous to how, in undergraduate mathematics and physics, vectors in Euclidean space are often depicted as arrows of a given magnitude and direction, with the initial and final points of this arrow being of secondary importance only. (Indeed, this depiction of vectors in a vector space can be viewed as an abelian special case of the more general depiction of group elements used in this post.)

Let us define a diagram to be a finite directed graph {H = (V,E)}, with edges coloured by elements of {G}, which has at least one graph homomorphism into the complete Cayley graph {Cay(G,G)} of {G}; thus there exists a map {\phi: V \rightarrow G} (not necessarily injective) with the property that {\phi(w) = \phi(v) g} whenever {(v,w)} is a directed edge in {H} coloured by a group element {g \in G}. Informally, a diagram is a finite subgraph of a Cayley graph with the vertex labels omitted, and with distinct vertices permitted to represent the same group element. Thus, for instance, the single directed edge displayed in Figure 2 is a very simple example of a diagram. An even simpler example of a diagram would be a depiction of the identity element:


Figure 3.

We will however omit the identity loops in our diagrams in order to reduce clutter.

We make the obvious remark that any directed edge in a diagram can be coloured by at most one group element {g}, since {y=xg, y=xh} implies {g=h}. This simple observation provides a way to prove group theoretic identities using diagrams: to show that two group elements {g, h} are equal, it suffices to show that they connect together (with the same orientation) the same pair of vertices in a diagram.

Remark 1 One can also interpret these diagrams as commutative diagrams in a category in which all the objects are copies of {G}, and the morphisms are right-translation maps. However, we will deviate somewhat from the category theoretic way of thinking here by focusing on the geometric arrangement and shape of these diagrams, rather than on their abstract combinatorial description. In particular, we view the arrows more as distorted analogues of vector arrows, than as the abstract arrows appearing in category theory.

Just as vector addition can be expressed via concatenation of arrows, group multiplication can be described by concatenation of directed edges. Indeed, for any {x,g,h \in G}, the vertices {x, xg, xgh} can be connected by the following triangular diagram:

Figure 4.

In a similar spirit, inversion is described by the following diagram:

Figure 5.

We make the pedantic remark though that we do not consider a {g^{-1}} edge to be the reversal of the {g} edge, but rather as a distinct edge that just happens to have the same initial and final endpoints as the reversal of the {g} edge. (This will be of minor importance later, when we start integrating “{1}-forms” on such edges.)

A fundamental operation for us will be that of gluing two diagrams together.

Lemma 1 ((Labeled) gluing) Let {D_1 = (V_1,E_1), D_2 = (V_2,E_2)} be two diagrams of a given group {G}. Suppose that the intersection {D_1 \cap D_2 := (V_1 \cap V_2, E_1 \cap E_2)} of the two diagrams connects all of {V_1 \cap V_2} (i.e. any two elements of {V_1 \cap V_2} are joined by a path in {D_1 \cap D_2}). Then the union {D_1 \cup D_2 := (V_1 \cup V_2, E_1 \cup E_2)} is also a diagram of {G}.

Proof: By hypothesis, we have graph homomorphisms {\phi_1: D_1 \rightarrow Cay(G,G)}, {\phi_2: D_2 \rightarrow Cay(G,G)}. If they agree on {D_1 \cap D_2} then one simply glues together the two homomorphisms to create a new graph homomorphism {\phi: D_1 \cup D_2 \rightarrow Cay(G,G)}. If they do not agree, one can apply a left translation to either {\phi_1} or {\phi_2} to make the two diagrams agree on at least one vertex of {D_1 \cap D_2}; then by the connected nature of {D_1 \cap D_2} we see that they now must agree on all vertices of {D_1 \cap D_2}, and then we can form the glued graph homomorphism as before. \Box

The above lemma required one to specify the label the vertices of {D_1,D_2} (in order to form the intersection {D_1 \cap D_2} and union {D_1 \cup D_2}). However, if one is presented with two diagrams {D_1, D_2} with unlabeled vertices, one can identify some partial set of vertices of {D_1} with a partial set of vertices of {D_2} of matching cardinality. Provided that the subdiagram common to {D_1} and {D_2} after this identification connects all of the common vertices together, we may use the above lemma to create a glued diagram {D}.

For instance, if a diagram {D} contains two of the three edges in the triangular diagram in Figure 4, one can “fill in” the triangle by gluing in the third edge:

Figure 6.

One can use glued diagrams to demonstrate various basic group-theoretic identities. For instance, by gluing together two copies of the triangular diagram in Figure 4 to create the glued diagram

Figure 7.

and then filling in two more triangles, we obtain a tetrahedral diagram that demonstrates the associative law {(gh)k = g(hk)}:

Figure 8.

Similarly, by gluing together two copies of Figure 4 with three copies of Figure 5 in an appropriate order, we can demonstrate the Abel identity {(gh)^{-1} = h^{-1} g^{-1}}:

Figure 9.

In addition to gluing, we will also use the trivial operation of erasing: if {D} is a diagram for a group {G}, then any subgraph of {D} (formed by removing vertices and/or edges) is also a diagram of {G}. This operation is not strictly necessary for our applications, but serves to reduce clutter in the pictures.

If two group elements {g, h} commute, then we obtain a parallelogram as a diagram, exactly as in the vector space case:

Figure 10.

In general, of course, two arbitrary group elements {g,h} will fail to commute, and so this parallelogram is no longer available. However, various substitutes for this diagram exist. For instance, if we introduce the conjugate {g^h := h^{-1} g h} of one group element {g} by another, then we have the following slightly distorted parallelogram:

Figure 11.

By appropriate gluing and filling, this can be used to demonstrate the homomorphism properties of a conjugation map {g \mapsto g^h}:

Figure 12.

Figure 13.

Another way to replace the parallelogram in Figure 10 is to introduce the commutator {[g,h] := g^{-1}h^{-1}gh} of two elements, in which case we can perturb the parallelogram into a pentagon:

Figure 14.

We will tend to depict commutator edges as being somewhat shorter than the edges generating that commutator, reflecting a “perturbative” or “nilpotent” philosophy. (Of course, to fully reflect a nilpotent perspective, one should orient commutator edges in a different dimension from their generating edges, but of course the diagrams drawn here do not have enough dimensions to display this perspective easily.) We will also be adopting a “Lie” perspective of interpreting groups as behaving like perturbations of vector spaces, in particular by trying to draw all edges of the same colour as being approximately (though not perfectly) parallel to each other (and with approximately the same length).

Gluing the above pentagon with the conjugation parallelogram and erasing some edges, we discover a “commutator-conjugate” triangle, describing the basic identity {g^h = g [g,h]}:

Figure 15.

Other gluings can also give the basic relations between commutators and conjugates. For instance, by gluing the pentagon in Figure 14 with its reflection, we see that {[g,h] = [h,g]^{-1}}. The following diagram, obtained by gluing together copies of Figures 11 and 15, demonstrates that {[h,g^{-1}] = [g,h]^{g^{-1}}},

Figure 16.

while this figure demonstrates that {[g,hk] = [g,k] [g,h]^k}:

Figure 17.

Now we turn to a more sophisticated identity, the Hall-Witt identity

\displaystyle [[g,h],k^g] [[k,g],h^k] [[h,k],g^h] = 1,

which is the fully noncommutative version of the more well-known Jacobi identity for Lie algebras.

The full diagram for the Hall-Witt identity resembles a slightly truncated parallelopiped. Drawing this truncated paralleopiped in full would result in a rather complicated looking diagram, so I will instead display three components of this diagram separately, and leave it to the reader to mentally glue these three components back to form the full parallelopiped. The first component of the diagram is formed by gluing together three pentagons from Figure 14, and looks like this:

Figure 18.

This should be thought of as the “back” of the truncated parallelopiped needed to establish the Hall-Witt identity.

While it is not needed for proving the Hall-Witt identity, we also observe for future reference that we may also glue in some distorted parallelograms and obtain a slightly more complicated diagram:

Figure 19.

To form the second component, let us now erase all interior components of Figure 18 or Figure 19:

Figure 20.

Then we fill in three distorted parallelograms:

Figure 21.

This is the second component, and is the “front” of the truncated praallelopiped, minus the portions exposed by the truncation.

Finally, we turn to the third component. We begin by erasing the outer edges from the second component in Figure 21:

Figure 22.

We glue in three copies of the commutator-conjugate triangle from Figure 15:

Figure 23.

But now we observe that we can fill in three pentagons, and obtain a small triangle with edges {[[g,h],k^g] [[k,g],h^k] [[h,k],g^h]}:

Figure 24.

Erasing everything except this triangle gives the Hall-Witt identity. Alternatively, one can glue together Figures 18, 21, and 24 to obtain a truncated parallelopiped which one can view as a geometric representation of the proof of the Hall-Witt identity.

Among other things, I found these diagrams to be useful to visualise group cohomology; I give a simple example of this below, developing an analogue of the Hall-Witt identity for {2}-cocycles.

Read the rest of this entry »

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity {r_4(F^n)} for a vector space {F^n} over a finite field {F} of characteristic greater than {4}, where {r_4(F^n)} is defined as the cardinality of the largest subset of {F^n} that does not contain an arithmetic progression of length {4}. In our earlier paper, we gave two arguments that bounded {r_4(F^n)} in the regime when the field {F} was fixed and {n} was large. The first “cheap” argument gave the bound

\displaystyle  r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )

and the more complicated “expensive” argument gave the improvement

\displaystyle  r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)

for some constant {c>0} depending only on {F}.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset {A} of {F^n} that has no arithmetic progressions of length {4}, and seeks to locate a subspace on which {A} has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse {U^3} theorem of Ben and myself (and also independently by Samorodnitsky), one approximates {A} by a “quadratically structured” function {f}, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because {A} has no progressions of length {4}, the count of progressions of length {4} weighted by {f} will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which {f} has increased density.

The error in the paper was to conclude from this that the original function {1_A} also had increased density on the same subspace; it turns out that the manner in which {f} approximates {1_A} is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on {r_4(F^n)} one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of {c = 2^{-22}} for the exponent {c} in (1). In fact, it gives the following stronger result:

Theorem 1 Let {A} be a subset of {F^n} of density at least {\alpha}, and let {\epsilon>0}. Then there is a subspace {W} of {F^n} of codimension {O( \epsilon^{-2^{20}})} such that the number of (possibly degenerate) progressions {a, a+r, a+2r, a+3r} in {A \cap W} is at least {(\alpha^4-\epsilon)|W|^2}.

The bound (1) is an easy consequence of this theorem after choosing {\epsilon := \alpha^4/2} and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to {1_A} with a significantly stronger local approximation to {1_A} on a subspace {W}. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse {U^3} theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace {W} on which {A} is quite dense (of density {\alpha-O(\epsilon)}) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

I had another long plane flight recently, so I decided to try making another game, to explore exactly what types of mathematical reasoning might be amenable to gamification.  I decided to start with one of the simplest types of logical argument (and one of the few that avoids the disjunction problem mentioned in the previous post), namely the Aristotelian logic of syllogistic reasoning, most famously exemplified by the classic syllogism:

  • Major premise: All men are mortal.
  • Minor premise: Socrates is a man.
  • Conclusion: Socrates is a mortal.

There is a classic collection of logic puzzles of Lewis Carroll (from his book on symbolic logic), in which he presents a set of premises and asks to derive a conclusion using all of the premises.  Here are four examples of such sets:

  • Babies are illogical;
  • Nobody is despised who can manage a crocodile;
  • Illogical persons are despised.
  • My saucepans are the only things that I have that are made of tin;
  • I find all your presents very useful;
  • None of my saucepans are of the slightest use.
  • No  potatoes of mine, that are new, have been boiled;
  • All of my potatoes in this dish are fit to eat;
  • No unboiled potatoes of mine are fit to eat.
  • No ducks waltz;
  • No officers ever decline to waltz;
  • All my poultry are ducks.

After a certain amount of effort, I was able to gamify the solution process to these sort of puzzles in a Scratch game, although I am not fully satisfied with the results (in part due to the inherent technical limitations of the Scratch software, but also because I have not yet found a smooth user interface for this process).   In order to not have to build a natural language parser, I modified Lewis Carroll’s sentences somewhat in order to be machine-readable.  Here is a typical screenshot:

Unfortunately, the gameplay is somewhat clunkier than in the algebra game, basically because one needs three or four clicks and a keyboard press in order to make a move, whereas in the algebra game each click corresponded to one move. This is in part due to Scratch not having an easy way to have drag-and-drop or right-click commands, but even with a fully featured GUI, I am unsure how to make an interface that would make the process of performing a deduction easy; one may need a “smart” interface that is able to guess some possible intended moves from a minimal amount of input from the user, and then suggest these choices (somewhat similarly to the “auto-complete” feature in a search box).   This would require more effort than I could expend on a plane trip, though (as well as the use of a more powerful language than Scratch).

There are of course several existing proof assistants one could try to use as a model (Coq, Isabelle, etc.), but my impression is that the syntax for such assistants would only be easily mastered by someone who already is quite experienced with computer languages as well as proof writing, which would defeat the purpose of the games I have in mind.  But perhaps it is possible to create a proof assistant for a very restricted logic (such as one without disjunction) that can be easily used by non-experts…

Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown.  (Requires either Java or Flash.)   Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:


After completing each level, an icon appears which one can click on to proceed to the next level.  (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)

The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)

One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)

High school algebra marks a key transition point in one’s early mathematical education, and is a common point at which students feel that mathematics becomes really difficult. One of the reasons for this is that the problem solving process for a high school algebra question is significantly more free-form than the mechanical algorithms one is taught for elementary arithmetic, and a certain amount of planning and strategy now comes into play. For instance, if one wants to, say, write {\frac{1,572,342}{4,124}} as a mixed fraction, there is a clear (albeit lengthy) algorithm to do this: one simply sets up the long division problem, extracts the quotient and remainder, and organises these numbers into the desired mixed fraction. After a suitable amount of drill, this is a task that can be accomplished by a large fraction of students at the middle school level. But if, for instance, one has to solve a system of equations such as

\displaystyle  a^2 + bc = 7

\displaystyle  2b - c = 1

\displaystyle  a + 2 = c

for {a,b,c}, there is no similarly mechanical procedure that can be easily taught to a high school student in order to solve such a problem “mindlessly”. (I doubt, for instance, that any attempt to teach Buchberger’s algorithm to such students will be all that successful.) Instead, one is taught the basic “moves” (e.g. multiplying both sides of an equation by some factor, subtracting one equation from another, substituting an expression for a variable into another equation, and so forth), and some basic principles (e.g. simplifying an expression whenever possible, for instance by gathering terms, or solving for one variable in terms of others in order to eliminate it from the system). It is then up to the student to find a suitable combination of moves that isolates each of the variables in turn, to reveal their identities.

Once one is sufficiently expert in algebraic manipulation, this is not all that difficult to do, but when one is just starting to learn algebra, this type of problem can be quite daunting, in part because of an “embarrasment of riches”; there are several possible “moves” one could try to apply to the equations given, and to the novice it is not always clear in advance which moves will simplify the problem and which ones will make it more complicated. Often, such a student may simply try moves at random, which can lead to a dishearteningly large amount of effort expended without getting any closer to a solution. What is worse, each move introduces the possibility of an arithmetic error (such as a sign error), the effect of which is usually not apparent until the student finally arrives at his or her solution and either checks it against the original equation, or submits the answer to be graded. (My own son can get quite frustrated after performing a lengthy series of computations to solve an algebra problem, only to be told that the answer was wrong due to an arithmetic error; I am sure this experience is common to many other schoolchildren.)

It occurred to me recently, though, that the set of problem-solving skills needed to solve algebra problems (and, to some extent, calculus problems also) is somewhat similar to the set of skills needed to solve puzzle-type computer games, in which a certain limited set of moves must be applied in a certain order to achieve a desired result. (There are countless games of this general type; a typical example is “Factory balls“.) Given that the computer game format is already quite familiar to many schoolchildren, one could then try to teach the strategy component of algebraic problem-solving via such a game, which could automate mechanical tasks such as gathering terms and performing arithmetic in order to reduce some of the more frustrating aspects of algebra. (Note that this is distinct from the type of maths games one often sees on educational web sites, which are usually based on asking the player to correctly answer some maths problem in order to advance within the game, making the game essentially a disguised version of a maths quiz. Here, the focus is not so much on being able to supply the correct answer, but on being able to select an effective problem-solving strategy.)

It is difficult to explain in words exactly what type of game I have in mind, so I decided to create a quick mockup of what a sample “level” would look like here (note: requires Java). I didn’t want to spend too much time to make this mockup, so I wrote it in Scratch, which is a somewhat limited programming language intended for use by children, but which has the benefit of being able to churn out simple but functional apps very quickly (the mockup took less than half an hour to write). (I would certainly not attempt to write a full game in this language, though.) In this mockup level, the objective is to solve a single linear equation in one variable, such as {2x+7=11}, with only two “moves” available: the ability to subtract {1} from both sides of the equation, and the ability to divide both sides of the equation by {2}, which one performs by clicking on an appropriate icon.

It seems to me that one could actually teach a fair amount of algebra through a game such as this, with a progressively difficult sequence of levels that gradually introduce more and more types of “moves” that can handle increasingly difficult problems (e.g. simultaneous linear equations in several unknowns, quadratic equations in one or more variables, inequalities, etc.). Even within a single class of problem, such as solving linear equations, one could create additional challenge by placing some restriction on the available moves, for instance by limiting the number of available moves (as was done in the mockup), or requiring that each move cost some amount of game currency (which might possibly be waived if one is willing to perform the move “by hand”, i.e. by entering the transformed equation manually). And of course one could also make the graphics, sound, and gameplay fancier (e.g. by allowing for various competitive gameplay modes). One could also imagine some other types of high-school and lower-division undergraduate mathematics being amenable to this sort of gamification (calculus and ODE comes to mind, and maybe propositional logic), though I doubt that one could use it effectively for advanced undergraduate or graduate topics. (Though I have sometimes wished for an “integrate by parts” or “use Sobolev embedding” button when trying to control solutions to a PDE…)

This would however be a fair amount of work to actually implement, and is not something I could do by myself with the time I have available these days. But perhaps it may be possible to develop such a game (or platform for such a game) collaboratively, somewhat in the spirit of the polymath projects? I have almost no experience in modern software development (other than a summer programming job I had as a teenager, which hardly counts), so I would be curious to know how projects such as this actually get started in practice.

Nonstandard analysis is a mathematical framework in which one extends the standard mathematical universe {{\mathfrak U}} of standard numbers, standard sets, standard functions, etc. into a larger nonstandard universe {{}^* {\mathfrak U}} of nonstandard numbers, nonstandard sets, nonstandard functions, etc., somewhat analogously to how one places the real numbers inside the complex numbers, or the rationals inside the reals. This nonstandard universe enjoys many of the same properties as the standard one; in particular, we have the transfer principle that asserts that any statement in the language of first order logic is true in the standard universe if and only if it is true in the nonstandard one. (For instance, because Fermat’s last theorem is known to be true for standard natural numbers, it is automatically true for nonstandard natural numbers as well.) However, the nonstandard universe also enjoys some additional useful properties that the standard one does not, most notably the countable saturation property, which is a property somewhat analogous to the completeness property of a metric space; much as metric completeness allows one to assert that the intersection of a countable family of nested closed balls is non-empty, countable saturation allows one to assert that the intersection of a countable family of nested satisfiable formulae is simultaneously satisfiable. (See this previous blog post for more on the analogy between the use of nonstandard analysis and the use of metric completions.) Furthermore, by viewing both the standard and nonstandard universes externally (placing them both inside a larger metatheory, such as a model of Zermelo-Frankel-Choice (ZFC) set theory; in some more advanced set-theoretic applications one may also wish to add some large cardinal axioms), one can place some useful additional definitions and constructions on these universes, such as defining the concept of an infinitesimal nonstandard number (a number which is smaller in magnitude than any positive standard number). The ability to rigorously manipulate infinitesimals is of course one of the most well-known advantages of working with nonstandard analysis.

To build a nonstandard universe {{}^* {\mathfrak U}} from a standard one {{\mathfrak U}}, the most common approach is to take an ultrapower of {{\mathfrak U}} with respect to some non-principal ultrafilter over the natural numbers; see e.g. this blog post for details. Once one is comfortable with ultrafilters and ultrapowers, this becomes quite a simple and elegant construction, and greatly demystifies the nature of nonstandard analysis.

On the other hand, nonprincipal ultrafilters do have some unappealing features. The most notable one is that their very existence requires the axiom of choice (or more precisely, a weaker form of this axiom known as the boolean prime ideal theorem). Closely related to this is the fact that one cannot actually write down any explicit example of a nonprincipal ultrafilter, but must instead rely on nonconstructive tools such as Zorn’s lemma, the Hahn-Banach theorem, Tychonoff’s theorem, the Stone-Cech compactification, or the boolean prime ideal theorem to locate one. As such, ultrafilters definitely belong to the “infinitary” side of mathematics, and one may feel that it is inappropriate to use such tools for “finitary” mathematical applications, such as those which arise in hard analysis. From a more practical viewpoint, because of the presence of the infinitary ultrafilter, it can be quite difficult (though usually not impossible, with sufficient patience and effort) to take a finitary result proven via nonstandard analysis and coax an effective quantitative bound from it.

There is however a “cheap” version of nonstandard analysis which is less powerful than the full version, but is not as infinitary in that it is constructive (in the sense of not requiring any sort of choice-type axiom), and which can be translated into standard analysis somewhat more easily than a fully nonstandard argument; indeed, a cheap nonstandard argument can often be presented (by judicious use of asymptotic notation) in a way which is nearly indistinguishable from a standard one. It is obtained by replacing the nonprincipal ultrafilter in fully nonstandard analysis with the more classical Fréchet filter of cofinite subsets of the natural numbers, which is the filter that implicitly underlies the concept of the classical limit {\lim_{{\bf n} \rightarrow \infty} a_{\bf n}} of a sequence when the underlying asymptotic parameter {{\bf n}} goes off to infinity. As such, “cheap nonstandard analysis” aligns very well with traditional mathematics, in which one often allows one’s objects to be parameterised by some external parameter such as {{\bf n}}, which is then allowed to approach some limit such as {\infty}. The catch is that the Fréchet filter is merely a filter and not an ultrafilter, and as such some of the key features of fully nonstandard analysis are lost. Most notably, the law of the excluded middle does not transfer over perfectly from standard analysis to cheap nonstandard analysis; much as there exist bounded sequences of real numbers (such as {0,1,0,1,\ldots}) which do not converge to a (classical) limit, there exist statements in cheap nonstandard analysis which are neither true nor false (at least without passing to a subsequence, see below). The loss of such a fundamental law of mathematical reasoning may seem like a major disadvantage for cheap nonstandard analysis, and it does indeed make cheap nonstandard analysis somewhat weaker than fully nonstandard analysis. But in some situations (particularly when one is reasoning in a “constructivist” or “intuitionistic” fashion, and in particular if one is avoiding too much reliance on set theory) it turns out that one can survive the loss of this law; and furthermore, the law of the excluded middle is still available for standard analysis, and so one can often proceed by working from time to time in the standard universe to temporarily take advantage of this law, and then transferring the results obtained there back to the cheap nonstandard universe once one no longer needs to invoke the law of the excluded middle. Furthermore, the law of the excluded middle can be recovered by adopting the freedom to pass to subsequences with regards to the asymptotic parameter {{\bf n}}; this technique is already in widespread use in the analysis of partial differential equations, although it is generally referred to by names such as “the compactness method” rather than as “cheap nonstandard analysis”.

Below the fold, I would like to describe this cheap version of nonstandard analysis, which I think can serve as a pedagogical stepping stone towards fully nonstandard analysis, as it is formally similar to (though weaker than) fully nonstandard analysis, but on the other hand is closer in practice to standard analysis. As we shall see below, the relation between cheap nonstandard analysis and standard analysis is analogous in many ways to the relation between probabilistic reasoning and deterministic reasoning; it also resembles somewhat the preference in much of modern mathematics for viewing mathematical objects as belonging to families (or to categories) to be manipulated en masse, rather than treating each object individually. (For instance, nonstandard analysis can be used as a partial substitute for scheme theory in order to obtain uniformly quantitative results in algebraic geometry, as discussed for instance in this previous blog post.)

Read the rest of this entry »

I recently finished the first draft of the the first of my books, entitled “Hilbert’s fifth problem and related topics“, based on the lecture notes for my graduate course of the same name.    The PDF of this draft is available here.  As always, comments and corrections are welcome.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem) Let {A} be a set of integers of positive upper density, thus {\lim \sup_{N \rightarrow\infty} \frac{|A \cap [-N,N]|}{|[-N,N]|} > 0}, where {[-N,N] := \{-N, -N+1,\ldots,N\}}. Then {A} contains an arithmetic progression of length {k} for any {k>1}.

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions {P_1 = \{ a, a+r_1, a+2r_1, \ldots, a+(k_1-1)r_1\}}, but also progressions of progressions {P_2 := \{ P_1, P_1 + r_2, P_1+2r_2, \ldots, P_1+(k_2-1)r_2\}}, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set {A} (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length {k} that consists entirely of elements of {A}.

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression {P, P + r, \ldots, P+(k-1)r} of progressions of length {k}, with each progression {P+ir}, {i=0,\ldots,k-1} being quite long, and containing a near-maximal amount of elements of {A}, thus

\displaystyle  |A \cap (P+ir)| \approx \delta |P|

where {\delta := \lim \sup_{|P| \rightarrow \infty} \frac{|A \cap P|}{|P|}} is the “maximal density” of {A} along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as {\approx} instead.) By hypothesis, {\delta} is positive. The objective is then to locate a progression {a, a+r', \ldots,a+(k-1)r'} in {A}, with each {a+ir} in {P+ir} for {i=0,\ldots,k-1}. It may help to view the progression of progressions {P, P + r, \ldots, P+(k-1)r} as a tall thin rectangle {P \times \{0,\ldots,k-1\}}.

If we write {A_i := \{ a \in P: a+ir \in A \}} for {i=0,\ldots,k-1}, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression {a_0,a_1,\ldots,a_{k-1}}, with each {a_i} in {A_i}.

By hypothesis, we know already that each set {A_i} has density about {\delta} in {P}:

\displaystyle  |A_i \cap P| \approx \delta |P|. \ \ \ \ \ (1)

Let us now make a “weakly mixing” assumption on the {A_i}, which roughly speaking asserts that

\displaystyle  |A_i \cap E| \approx \delta \sigma |P| \ \ \ \ \ (2)

for “most” subsets {E} of {P} of density {\approx \sigma} of a certain form to be specified shortly. This is a plausible type of assumption if one believes {A_i} to behave like a random set, and if the sets {E} are constructed “independently” of the {A_i} in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions {a_0,\ldots,a_{k-1}} of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims {C(i,j)} for each {0 \leq i \leq j < k}:

  • {C(i,j)}: For most choices of {a_j \in P}, there are {\sim \delta^i |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} with the specified choice of {a_j}, such that {a_l \in A_l} for all {l=0,\ldots,i-1}.

(Actually, to avoid boundary issues one should restrict {a_j} to lie in the middle third of {P}, rather than near the edges, but let us ignore this minor technical detail.) The quantity {\delta^i |P|} is natural here, given that there are {\sim |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} that pass through {a_i} in the {i^{th}} position, and that each one ought to have a probability of {\delta^i} or so that the events {a_0 \in A_0, \ldots, a_{i-1} \in A_{i-1}} simultaneously hold.) If one has the claim {C(k-1,k-1)}, then by selecting a typical {a_{k-1}} in {A_{k-1}}, we obtain a progression {a_0,\ldots,a_{k-1}} with {a_i \in A_i} for all {i=0,\ldots,k-1}, as required. (In fact, we obtain about {\delta^k |P|^2} such progressions by this method.)

We can heuristically justify the claims {C(i,j)} by induction on {i}. For {i=0}, the claims {C(0,j)} are clear just from direct counting of progressions (as long as we keep {a_j} away from the edges of {P}). Now suppose that {i>0}, and the claims {C(i-1,j)} have already been proven. For any {i \leq j < k} and for most {a_j \in P}, we have from hypothesis that there are {\sim \delta^{i-1} |P|} progressions {a_0,\ldots,a_{k-1}} in {P} through {a_j} with {a_0 \in A_0,\ldots,a_{i-2}\in A_{i-2}}. Let {E = E(a_j)} be the set of all the values of {a_{i-1}} attained by these progressions, then {|E| \sim \delta^{i-1} |P|}. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of {a_j}, we have

\displaystyle  |A_{i-1} \cap E| \sim \delta^i |P|

which then gives the desired claim {C(i,j)}.

The observant reader will note that we only needed the claim {C(i,j)} in the case {j=k-1} for the above argument, but for technical reasons, the full proof requires one to work with more general values of {j} (also the claim {C(i,j)} needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block {A_i} of {A}, one can easily concoct a scenario in which this hypothesis fails, by choosing {E} to overlap with {A_i} too strongly, or to be too disjoint from {A_i}. However, one can do better if one can select {A_i} from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). Let {E} be a subset of {P} of density {\approx \sigma}. Then (if {M} is large enough) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E| \lessapprox \delta \sigma |P|.

Proof: The key is the double counting identity

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| = \sum_{a \in E} |A \cap \{ a, a+r, \ldots, a+(M-1) r\}|.

Because {A} has maximal density {\delta} and {M} is large, we have

\displaystyle  |A \cap \{ a, a+r, \ldots, a+(M-1) r\}| \lessapprox \delta M

for each {a}, and thus

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| \lessapprox \delta M |E|.

The claim then follows from the pigeonhole principle. \Box

Now suppose we want to obtain weak mixing not just for a single set {E}, but for a small number {E_1,\ldots,E_m} of such sets, i.e. we wish to find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|. \ \ \ \ \ (3)

for all {j=1,\ldots,m}, where {\sigma_j} is the density of {E_j} in {P}. The above proposition gives, for each {j}, a choice of {i} for which (3) holds, but it could be a different {i} for each {j}, and so it is not immediately obvious how to use Proposition 2 to find an {i} for which (3) holds simultaneously for all {j}. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_i} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {j}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: Suppose that the claim failed (for some suitably large {M}). Then, for each {i = 0,\ldots,M-1}, there exists {j \in \{1,\ldots,m\}} such that

\displaystyle  |A_i \cap E_j| \gg \delta \sigma_j |P|.

This can be viewed as a colouring of the interval {\{1,\ldots,M\}} by {m} colours. If we take {M} large compared to {m}, van der Waerden’s theorem allows us to then find a long subprogression of {\{1,\ldots,M\}} which is monochromatic, so that {j} is constant on this progression. But then this will furnish a counterexample to Proposition 2. \Box

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_i} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {m}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \approx \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: By applying the previous proposition to the collection of sets {E_1,\ldots,E_m} and their complements {P\backslash E_1,\ldots,P \backslash E_m} (thus replacing {m} with {2m}, one can find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

and

\displaystyle  |A_i \cap (P \backslash E_j)| \lessapprox \delta (1-\sigma_j) |P|

which gives the claim. \Box

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number {m} of sets {E_1,\ldots,E_m} for which mixing is established is too small compared with the length {M} of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of {E_i} to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {v} in some (large) finite set {V}, let {E_v} be a subset of {P} of density {\approx \sigma_v}. Then (if {M} is large enough, but not dependent on the size of {V}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_v| \approx \delta \sigma_v |P|

simultaneously for almost all {v \in V}.

Proof: We build a bipartite graph {G = (P, V, E)} connecting the progression {P} to the finite set {V} by placing an edge {(a,v)} between an element {a \in P} and an element {v \in V} whenever {a \in E_v}. The number {|E_v| \approx \sigma_v |P|} can then be interpreted as the degree of {v} in this graph, while the number {|A_i \cap E_v|} is the number of neighbours of {v} that land in {A_i}.

We now apply the regularity lemma to this graph {G}. Roughly speaking, what this lemma does is to partition {P} and {V} into almost equally sized cells {P = P_1 \cup \ldots P_m} and {V = V_1 \cup \ldots V_m} such that for most pairs {P_j, V_k} of cells, the graph {G} resembles a random bipartite graph of some density {d_{jk}} between these two cells. The key point is that the number {m} of cells here is bounded uniformly in the size of {P} and {V}. As a consequence of this lemma, one can show that for most vertices {v} in a typical cell {V_k}, the number {|E_v|} is approximately equal to

\displaystyle  |E_v| \approx \sum_{j=1}^m d_{ij} |P_j|

and the number {|A_i \cap E_v|} is approximately equal to

\displaystyle  |A_i \cap E_v| \approx \sum_{j=1}^m d_{ij} |A_i \cap P_j|.

The point here is that the {|V|} different statistics {|A_i \cap E_v|} are now controlled by a mere {m} statistics {|A_i \cap P_j|} (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an {i} for which

\displaystyle  |A_i \cap P_j| \approx \delta |P_j|

simultaneously for all {j=1,\ldots,m}, and the claim follows. \Box

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions {P, P+r, \ldots, P+(k-1)r} of progressions containing a near-maximal density of elements of {A}, we will now have to work with a family {(P_\lambda, P_\lambda+r_\lambda,\ldots,P_\lambda+(k-1)r_\lambda)_{\lambda \in \Lambda}} of such progression of progressions, where {\Lambda} ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given {i}, it should be possible to find many reasonably large subfamilies of this family for which the {i^{th}} terms {P_\lambda + i r_\lambda} of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets {E_v} in Proposition 5 are not allowed to depend on {i}, one also needs the progressions {P_\lambda + i' r_\lambda} for any given {0 \leq i' < i} to be “similar” in the sense that they intersect {A} in the same fashion (thus the sets {A \cap (P_\lambda + i' r_\lambda)} as {\lambda} varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set {\Lambda} in order to gain good mixing properties for at least one of the sets {P_\lambda +i r_\lambda} in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank {d} than the rank {2} progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length {k} progressions, one has to consider generalised progressions of rank as high as {2^k+1}. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by {A} in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank {d}, a family of large generalised progressions of rank {d-1} which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of {i}. (But getting this sort of property for all values of {i} simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index {i}, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank {2^k+1} that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

This is an addendum to last quarter’s course notes on Hilbert’s fifth problem, which I am in the process of reviewing in order to transcribe them into a book (as was done similarly for several other sets of lecture notes on this blog). When reviewing the zeroth set of notes in particular, I found that I had made a claim (Proposition 11 from those notes) which asserted, roughly speaking, that any sufficiently large nilprogression was an approximate group, and promised to prove it later in the course when we had developed the ability to calculate efficiently in nilpotent groups. As it turned out, I managed finish the course without the need to develop these calculations, and so the proposition remained unproven. In order to rectify this, I will use this post to lay out some of the basic algebra of nilpotent groups, and use it to prove the above proposition, which turns out to be a bit tricky. (In my paper with Breuillard and Green, we avoid the need for this proposition by restricting attention to a special type of nilprogression, which we call a nilprogression in {C}-normal form, for which the computations are simpler.)

There are several ways to think about nilpotent groups; for instance one can use the model example of the Heisenberg group

\displaystyle  H(R) :=\begin{pmatrix} 1 & R & R \\ 0 & 1 & R\\ 0 & 0 & 1 \end{pmatrix}

over an arbitrary ring {R} (which need not be commutative), or more generally any matrix group consisting of unipotent upper triangular matrices, and view a general nilpotent group as being an abstract generalisation of such concrete groups. (In the case of nilpotent Lie groups, at least, this is quite an accurate intuition, thanks to Engel’s theorem.) Or, one can adopt a Lie-theoretic viewpoint and try to think of nilpotent groups as somehow arising from nilpotent Lie algebras; this intuition is rigorous when working with nilpotent Lie groups (at least when the characteristic is large, in order to avoid issues coming from the denominators in the Baker-Campbell-Hausdorff formula), but also retains some conceptual value in the non-Lie setting. In particular, nilpotent groups (particularly finitely generated ones) can be viewed in some sense as “nilpotent Lie groups over {{\bf Z}}“, even though Lie theory does not quite work perfectly when the underlying scalars merely form an integral domain instead of a field.

Another point of view, which arises naturally both in analysis and in algebraic geometry, is to view nilpotent groups as modeling “infinitesimal” perturbations of the identity, where the infinitesimals have a certain finite order. For instance, given a (not necessarily commutative) ring {R} without identity (representing all the “small” elements of some larger ring or algebra), we can form the powers {R^j} for {j=1,2,\ldots}, defined as the ring generated by {j}-fold products {r_1 \ldots r_j} of elements {r_1,\ldots,r_j} in {R}; this is an ideal of {R} which represents the elements which are “{j^{th}} order” in some sense. If one then formally adjoins an identity {1} onto the ring {R}, then for any {s \geq 1}, the multiplicative group {G := 1+R \hbox{ mod } R^{s+1}} is a nilpotent group of step at most {s}. For instance, if {R} is the ring of strictly upper {s \times s} matrices (over some base ring), then {R^{s+1}} vanishes and {G} becomes the group of unipotent upper triangular matrices over the same ring, thus recovering the previous matrix-based example. In analysis applications, {R} might be a ring of operators which are somehow of “order” {O(\epsilon)} or {O(\hbar)} for some small parameter {\epsilon} or {\hbar}, and one wishes to perform Taylor expansions up to order {O(\epsilon^s)} or {O(\hbar^s)}, thus discarding (i.e. quotienting out) all errors in {R^{s+1}}.

From a dynamical or group-theoretic perspective, one can also view nilpotent groups as towers of central extensions of a trivial group. Finitely generated nilpotent groups can also be profitably viewed as a special type of polycylic group; this is the perspective taken in this previous blog post. Last, but not least, one can view nilpotent groups from a combinatorial group theory perspective, as being words from some set of generators of various “degrees” subject to some commutation relations, with commutators of two low-degree generators being expressed in terms of higher degree objects, and all commutators of a sufficiently high degree vanishing. In particular, generators of a given degree can be moved freely around a word, as long as one is willing to generate commutator errors of higher degree.

With this last perspective, in particular, one can start computing in nilpotent groups by adopting the philosophy that the lowest order terms should be attended to first, without much initial concern for the higher order errors generated in the process of organising the lower order terms. Only after the lower order terms are in place should attention then turn to higher order terms, working successively up the hierarchy of degrees until all terms are dealt with. This turns out to be a relatively straightforward philosophy to implement in many cases (particularly if one is not interested in explicit expressions and constants, being content instead with qualitative expansions of controlled complexity), but the arguments are necessarily recursive in nature and as such can become a bit messy, and require a fair amount of notation to express precisely. So, unfortunately, the arguments here will be somewhat cumbersome and notation-heavy, even if the underlying methods of proof are relatively simple.

Read the rest of this entry »

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

RSS Buzz feed

Follow

Get every new post delivered to your Inbox.

Join 1,277 other followers