You are currently browsing the monthly archive for May 2012.
One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations of an integer as the sum of three primes , as a Fourier-analytic integral over the unit circle involving exponential sums such as
The strategy is then to obtain sufficiently accurate bounds on exponential sums such as in order to obtain non-trivial bounds on quantities such as . For instance, if one can show that for all odd integers greater than some given threshold , this implies that all odd integers greater than are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.
Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff with a smoother cutoff for a suitable choice of cutoff function , or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function . However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.
In many cases, it turns out that one can get fairly precise evaluations on sums such as in the major arc case, when is close to a rational number with small denominator , by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that
and the prime number theorem in residue classes modulo suggests more generally that
when is small and is close to , basically thanks to the elementary calculation that the phase has an average value of when is uniformly distributed amongst the residue classes modulo that are coprime to . Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.
In the minor arc case when is not close to a rational with small denominator, one no longer expects to have such precise control on the value of , due to the “pseudorandom” fluctuations of the quantity . Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of “pseudorandom” phases should fluctuate randomly and be of typical magnitude , one expects upper bounds of the shape
which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as are far more typical.
Because one only expects to have upper bounds on , rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of , so that averaging in is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude oscillates in sympathetic resonance with the phase , thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region :
Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as , at least when is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer is the sum of three primes; my own result that all odd numbers greater than can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of , such as . (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)
However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number as the sum of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations of as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large , one has to find a non-trivial lower bound for the quantity
for sufficiently large , as this quantity is also the number of ways to represent as the sum of two primes . Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity
that goes to infinity as , as this quantity is also the number of ways to represent as the difference of two primes less than or equal to .
In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums . Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity or , expresses it in terms of using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to or , and then uses the hypothetical solution to the given problem to obtain the required lower bounds on or .
Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on or (or similar quantities) purely from the upper and lower bounds on or similar quantities (and of various type norms on such quantities, such as the bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:
- (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
- (ii) Upper and lower bounds on the magnitude of are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of or better); but
- (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.
I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.
In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.
This is a sequel to my previous blog post “Cayley graphs and the geometry of groups“. In that post, the concept of a Cayley graph of a group was used to place some geometry on that group . In this post, we explore a variant of that theme, in which (fragments of) a Cayley graph on is used to describe the basic algebraic structure of , and in particular on elementary word identities in . Readers who are familiar with either category theory or group homology/cohomology will recognise these concepts lurking not far beneath the surface; we wil remark briefly on these connections later in this post. However, no knowledge of categories or cohomology is needed for the main discussion, which is primarily focused on elementary group theory.
Throughout this post, we fix a single group , which is allowed to be non-abelian and/or infinite. All our graphs will be directed, with loops and multiple edges permitted.
In the previous post, we drew the entire Cayley graph of a group . Here, we will be working much more locally, and will only draw the portions of the Cayley graph that are relevant to the discussion. In this graph, the vertices are elements of the group , and one draws a directed edge from to labeled (or “coloured”) by the group element for any ; the graph consisting of all such vertices and edges will be denoted . Thus, a typical edge in looks like this:
One usually does not work with the complete Cayley graph . It is customary to instead work with smaller Cayley graphs , in which the edge colours are restricted to a smaller subset of , such as a set of generators for . As we will be working locally, we will in fact work with even smaller fragments of at a time; in particular, we only use a handful of colours (no more than nine, in fact, for any given diagram), and we will not require these colours to generate the entire group (we do not care if the Cayley graph is connected or not, as this is a global property rather than a local one).
Cayley graphs are left-invariant: for any , the left translation map is a graph isomorphism. To emphasise this left invariance, we will usually omit the vertex labels, and leave only the coloured directed edge, like so:
This is analogous to how, in undergraduate mathematics and physics, vectors in Euclidean space are often depicted as arrows of a given magnitude and direction, with the initial and final points of this arrow being of secondary importance only. (Indeed, this depiction of vectors in a vector space can be viewed as an abelian special case of the more general depiction of group elements used in this post.)
Let us define a diagram to be a finite directed graph , with edges coloured by elements of , which has at least one graph homomorphism into the complete Cayley graph of ; thus there exists a map (not necessarily injective) with the property that whenever is a directed edge in coloured by a group element . Informally, a diagram is a finite subgraph of a Cayley graph with the vertex labels omitted, and with distinct vertices permitted to represent the same group element. Thus, for instance, the single directed edge displayed in Figure 2 is a very simple example of a diagram. An even simpler example of a diagram would be a depiction of the identity element:
We will however omit the identity loops in our diagrams in order to reduce clutter.
We make the obvious remark that any directed edge in a diagram can be coloured by at most one group element , since implies . This simple observation provides a way to prove group theoretic identities using diagrams: to show that two group elements are equal, it suffices to show that they connect together (with the same orientation) the same pair of vertices in a diagram.
Remark 1 One can also interpret these diagrams as commutative diagrams in a category in which all the objects are copies of , and the morphisms are right-translation maps. However, we will deviate somewhat from the category theoretic way of thinking here by focusing on the geometric arrangement and shape of these diagrams, rather than on their abstract combinatorial description. In particular, we view the arrows more as distorted analogues of vector arrows, than as the abstract arrows appearing in category theory.
Just as vector addition can be expressed via concatenation of arrows, group multiplication can be described by concatenation of directed edges. Indeed, for any , the vertices can be connected by the following triangular diagram:
In a similar spirit, inversion is described by the following diagram:
We make the pedantic remark though that we do not consider a edge to be the reversal of the edge, but rather as a distinct edge that just happens to have the same initial and final endpoints as the reversal of the edge. (This will be of minor importance later, when we start integrating “-forms” on such edges.)
A fundamental operation for us will be that of gluing two diagrams together.
Lemma 1 ((Labeled) gluing) Let be two diagrams of a given group . Suppose that the intersection of the two diagrams connects all of (i.e. any two elements of are joined by a path in ). Then the union is also a diagram of .
Proof: By hypothesis, we have graph homomorphisms , . If they agree on then one simply glues together the two homomorphisms to create a new graph homomorphism . If they do not agree, one can apply a left translation to either or to make the two diagrams agree on at least one vertex of ; then by the connected nature of we see that they now must agree on all vertices of , and then we can form the glued graph homomorphism as before.
The above lemma required one to specify the label the vertices of (in order to form the intersection and union ). However, if one is presented with two diagrams with unlabeled vertices, one can identify some partial set of vertices of with a partial set of vertices of of matching cardinality. Provided that the subdiagram common to and after this identification connects all of the common vertices together, we may use the above lemma to create a glued diagram .
For instance, if a diagram contains two of the three edges in the triangular diagram in Figure 4, one can “fill in” the triangle by gluing in the third edge:
One can use glued diagrams to demonstrate various basic group-theoretic identities. For instance, by gluing together two copies of the triangular diagram in Figure 4 to create the glued diagram
and then filling in two more triangles, we obtain a tetrahedral diagram that demonstrates the associative law :
Similarly, by gluing together two copies of Figure 4 with three copies of Figure 5 in an appropriate order, we can demonstrate the Abel identity :
In addition to gluing, we will also use the trivial operation of erasing: if is a diagram for a group , then any subgraph of (formed by removing vertices and/or edges) is also a diagram of . This operation is not strictly necessary for our applications, but serves to reduce clutter in the pictures.
If two group elements commute, then we obtain a parallelogram as a diagram, exactly as in the vector space case:
In general, of course, two arbitrary group elements will fail to commute, and so this parallelogram is no longer available. However, various substitutes for this diagram exist. For instance, if we introduce the conjugate of one group element by another, then we have the following slightly distorted parallelogram:
By appropriate gluing and filling, this can be used to demonstrate the homomorphism properties of a conjugation map :
Another way to replace the parallelogram in Figure 10 is to introduce the commutator of two elements, in which case we can perturb the parallelogram into a pentagon:
We will tend to depict commutator edges as being somewhat shorter than the edges generating that commutator, reflecting a “perturbative” or “nilpotent” philosophy. (Of course, to fully reflect a nilpotent perspective, one should orient commutator edges in a different dimension from their generating edges, but of course the diagrams drawn here do not have enough dimensions to display this perspective easily.) We will also be adopting a “Lie” perspective of interpreting groups as behaving like perturbations of vector spaces, in particular by trying to draw all edges of the same colour as being approximately (though not perfectly) parallel to each other (and with approximately the same length).
Gluing the above pentagon with the conjugation parallelogram and erasing some edges, we discover a “commutator-conjugate” triangle, describing the basic identity :
Other gluings can also give the basic relations between commutators and conjugates. For instance, by gluing the pentagon in Figure 14 with its reflection, we see that . The following diagram, obtained by gluing together copies of Figures 11 and 15, demonstrates that ,
while this figure demonstrates that :
Now we turn to a more sophisticated identity, the Hall-Witt identity
which is the fully noncommutative version of the more well-known Jacobi identity for Lie algebras.
The full diagram for the Hall-Witt identity resembles a slightly truncated parallelopiped. Drawing this truncated paralleopiped in full would result in a rather complicated looking diagram, so I will instead display three components of this diagram separately, and leave it to the reader to mentally glue these three components back to form the full parallelopiped. The first component of the diagram is formed by gluing together three pentagons from Figure 14, and looks like this:
This should be thought of as the “back” of the truncated parallelopiped needed to establish the Hall-Witt identity.
While it is not needed for proving the Hall-Witt identity, we also observe for future reference that we may also glue in some distorted parallelograms and obtain a slightly more complicated diagram:
To form the second component, let us now erase all interior components of Figure 18 or Figure 19:
Then we fill in three distorted parallelograms:
This is the second component, and is the “front” of the truncated praallelopiped, minus the portions exposed by the truncation.
Finally, we turn to the third component. We begin by erasing the outer edges from the second component in Figure 21:
We glue in three copies of the commutator-conjugate triangle from Figure 15:
But now we observe that we can fill in three pentagons, and obtain a small triangle with edges :
Erasing everything except this triangle gives the Hall-Witt identity. Alternatively, one can glue together Figures 18, 21, and 24 to obtain a truncated parallelopiped which one can view as a geometric representation of the proof of the Hall-Witt identity.
Among other things, I found these diagrams to be useful to visualise group cohomology; I give a simple example of this below, developing an analogue of the Hall-Witt identity for -cocycles.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space over a finite field of characteristic greater than , where is defined as the cardinality of the largest subset of that does not contain an arithmetic progression of length . In our earlier paper, we gave two arguments that bounded in the regime when the field was fixed and was large. The first “cheap” argument gave the bound
for some constant depending only on .
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of that has no arithmetic progressions of length , and seeks to locate a subspace on which has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse theorem of Ben and myself (and also independently by Samorodnitsky), one approximates by a “quadratically structured” function , which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because has no progressions of length , the count of progressions of length weighted by will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which approximates is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent in (1). In fact, it gives the following stronger result:
Theorem 1 Let be a subset of of density at least , and let . Then there is a subspace of of codimension such that the number of (possibly degenerate) progressions in is at least .
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to on a subspace . This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace on which is quite dense (of density ) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.