You are currently browsing the monthly archive for May 2012.
One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations of an integer
as the sum of three primes
, as a Fourier-analytic integral over the unit circle
involving exponential sums such as
where the sum here ranges over all primes up to , and
. For instance, the expression
mentioned earlier can be written as
The strategy is then to obtain sufficiently accurate bounds on exponential sums such as in order to obtain non-trivial bounds on quantities such as
. For instance, if one can show that
for all odd integers
greater than some given threshold
, this implies that all odd integers greater than
are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.
Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff
with a smoother cutoff
for a suitable choice of cutoff function
, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function
. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.
In many cases, it turns out that one can get fairly precise evaluations on sums such as in the major arc case, when
is close to a rational number
with small denominator
, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that
and the prime number theorem in residue classes modulo suggests more generally that
when is small and
is close to
, basically thanks to the elementary calculation that the phase
has an average value of
when
is uniformly distributed amongst the residue classes modulo
that are coprime to
. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.
In the minor arc case when is not close to a rational
with small denominator, one no longer expects to have such precise control on the value of
, due to the “pseudorandom” fluctuations of the quantity
. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of
“pseudorandom” phases should fluctuate randomly and be of typical magnitude
, one expects upper bounds of the shape
for “typical” minor arc . Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that
which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as are far more typical.
Because one only expects to have upper bounds on , rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as
for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of
, so that averaging in
is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude
oscillates in sympathetic resonance with the phase
, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region
:
Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as , at least when
is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer
is the sum of three primes; my own result that all odd numbers greater than
can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of
, such as
. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)
However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number as the sum
of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations
of
as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large
, one has to find a non-trivial lower bound for the quantity
for sufficiently large , as this quantity
is also the number of ways to represent
as the sum
of two primes
. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity
that goes to infinity as , as this quantity
is also the number of ways to represent
as the difference
of two primes less than or equal to
.
In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums . Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity
or
, expresses it in terms of
using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to
or
, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on
or
.
Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on or
(or similar quantities) purely from the upper and lower bounds on
or similar quantities (and of various
type norms on such quantities, such as the
bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in
are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:
- (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
- (ii) Upper and lower bounds on the magnitude of
are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of
or better); but
- (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.
I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.
In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.
This is a sequel to my previous blog post “Cayley graphs and the geometry of groups“. In that post, the concept of a Cayley graph of a group was used to place some geometry on that group
. In this post, we explore a variant of that theme, in which (fragments of) a Cayley graph on
is used to describe the basic algebraic structure of
, and in particular on elementary word identities in
. Readers who are familiar with either category theory or group homology/cohomology will recognise these concepts lurking not far beneath the surface; we wil remark briefly on these connections later in this post. However, no knowledge of categories or cohomology is needed for the main discussion, which is primarily focused on elementary group theory.
Throughout this post, we fix a single group , which is allowed to be non-abelian and/or infinite. All our graphs will be directed, with loops and multiple edges permitted.
In the previous post, we drew the entire Cayley graph of a group . Here, we will be working much more locally, and will only draw the portions of the Cayley graph that are relevant to the discussion. In this graph, the vertices are elements
of the group
, and one draws a directed edge from
to
labeled (or “coloured”) by the group element
for any
; the graph consisting of all such vertices and edges will be denoted
. Thus, a typical edge in
looks like this:
Figure 1.
One usually does not work with the complete Cayley graph . It is customary to instead work with smaller Cayley graphs
, in which the edge colours
are restricted to a smaller subset of
, such as a set of generators for
. As we will be working locally, we will in fact work with even smaller fragments of
at a time; in particular, we only use a handful of colours (no more than nine, in fact, for any given diagram), and we will not require these colours to generate the entire group (we do not care if the Cayley graph is connected or not, as this is a global property rather than a local one).
Cayley graphs are left-invariant: for any , the left translation map
is a graph isomorphism. To emphasise this left invariance, we will usually omit the vertex labels, and leave only the coloured directed edge, like so:
Figure 2.
This is analogous to how, in undergraduate mathematics and physics, vectors in Euclidean space are often depicted as arrows of a given magnitude and direction, with the initial and final points of this arrow being of secondary importance only. (Indeed, this depiction of vectors in a vector space can be viewed as an abelian special case of the more general depiction of group elements used in this post.)
Let us define a diagram to be a finite directed graph , with edges coloured by elements of
, which has at least one graph homomorphism into the complete Cayley graph
of
; thus there exists a map
(not necessarily injective) with the property that
whenever
is a directed edge in
coloured by a group element
. Informally, a diagram is a finite subgraph of a Cayley graph with the vertex labels omitted, and with distinct vertices permitted to represent the same group element. Thus, for instance, the single directed edge displayed in Figure 2 is a very simple example of a diagram. An even simpler example of a diagram would be a depiction of the identity element:
We will however omit the identity loops in our diagrams in order to reduce clutter.
We make the obvious remark that any directed edge in a diagram can be coloured by at most one group element , since
implies
. This simple observation provides a way to prove group theoretic identities using diagrams: to show that two group elements
are equal, it suffices to show that they connect together (with the same orientation) the same pair of vertices in a diagram.
Remark 1 One can also interpret these diagrams as commutative diagrams in a category in which all the objects are copies of
, and the morphisms are right-translation maps. However, we will deviate somewhat from the category theoretic way of thinking here by focusing on the geometric arrangement and shape of these diagrams, rather than on their abstract combinatorial description. In particular, we view the arrows more as distorted analogues of vector arrows, than as the abstract arrows appearing in category theory.
Just as vector addition can be expressed via concatenation of arrows, group multiplication can be described by concatenation of directed edges. Indeed, for any , the vertices
can be connected by the following triangular diagram:
In a similar spirit, inversion is described by the following diagram:
Figure 5.
We make the pedantic remark though that we do not consider a edge to be the reversal of the
edge, but rather as a distinct edge that just happens to have the same initial and final endpoints as the reversal of the
edge. (This will be of minor importance later, when we start integrating “
-forms” on such edges.)
A fundamental operation for us will be that of gluing two diagrams together.
Lemma 1 ((Labeled) gluing) Let
be two diagrams of a given group
. Suppose that the intersection
of the two diagrams connects all of
(i.e. any two elements of
are joined by a path in
). Then the union
is also a diagram of
.
Proof: By hypothesis, we have graph homomorphisms ,
. If they agree on
then one simply glues together the two homomorphisms to create a new graph homomorphism
. If they do not agree, one can apply a left translation to either
or
to make the two diagrams agree on at least one vertex of
; then by the connected nature of
we see that they now must agree on all vertices of
, and then we can form the glued graph homomorphism as before.
The above lemma required one to specify the label the vertices of (in order to form the intersection
and union
). However, if one is presented with two diagrams
with unlabeled vertices, one can identify some partial set of vertices of
with a partial set of vertices of
of matching cardinality. Provided that the subdiagram common to
and
after this identification connects all of the common vertices together, we may use the above lemma to create a glued diagram
.
For instance, if a diagram contains two of the three edges in the triangular diagram in Figure 4, one can “fill in” the triangle by gluing in the third edge:
Figure 6.
One can use glued diagrams to demonstrate various basic group-theoretic identities. For instance, by gluing together two copies of the triangular diagram in Figure 4 to create the glued diagram
Figure 7.
and then filling in two more triangles, we obtain a tetrahedral diagram that demonstrates the associative law :
Figure 8.
Similarly, by gluing together two copies of Figure 4 with three copies of Figure 5 in an appropriate order, we can demonstrate the Abel identity :
Figure 9.
In addition to gluing, we will also use the trivial operation of erasing: if is a diagram for a group
, then any subgraph of
(formed by removing vertices and/or edges) is also a diagram of
. This operation is not strictly necessary for our applications, but serves to reduce clutter in the pictures.
If two group elements commute, then we obtain a parallelogram as a diagram, exactly as in the vector space case:
Figure 10.
In general, of course, two arbitrary group elements will fail to commute, and so this parallelogram is no longer available. However, various substitutes for this diagram exist. For instance, if we introduce the conjugate
of one group element
by another, then we have the following slightly distorted parallelogram:
Figure 11.
By appropriate gluing and filling, this can be used to demonstrate the homomorphism properties of a conjugation map :
Figure 12.
Figure 13.
Another way to replace the parallelogram in Figure 10 is to introduce the commutator of two elements, in which case we can perturb the parallelogram into a pentagon:
Figure 14.
We will tend to depict commutator edges as being somewhat shorter than the edges generating that commutator, reflecting a “perturbative” or “nilpotent” philosophy. (Of course, to fully reflect a nilpotent perspective, one should orient commutator edges in a different dimension from their generating edges, but of course the diagrams drawn here do not have enough dimensions to display this perspective easily.) We will also be adopting a “Lie” perspective of interpreting groups as behaving like perturbations of vector spaces, in particular by trying to draw all edges of the same colour as being approximately (though not perfectly) parallel to each other (and with approximately the same length).
Gluing the above pentagon with the conjugation parallelogram and erasing some edges, we discover a “commutator-conjugate” triangle, describing the basic identity :
Figure 15.
Other gluings can also give the basic relations between commutators and conjugates. For instance, by gluing the pentagon in Figure 14 with its reflection, we see that . The following diagram, obtained by gluing together copies of Figures 11 and 15, demonstrates that
,
Figure 16.
while this figure demonstrates that :
Figure 17.
Now we turn to a more sophisticated identity, the Hall-Witt identity
which is the fully noncommutative version of the more well-known Jacobi identity for Lie algebras.
The full diagram for the Hall-Witt identity resembles a slightly truncated parallelopiped. Drawing this truncated paralleopiped in full would result in a rather complicated looking diagram, so I will instead display three components of this diagram separately, and leave it to the reader to mentally glue these three components back to form the full parallelopiped. The first component of the diagram is formed by gluing together three pentagons from Figure 14, and looks like this:
This should be thought of as the “back” of the truncated parallelopiped needed to establish the Hall-Witt identity.
While it is not needed for proving the Hall-Witt identity, we also observe for future reference that we may also glue in some distorted parallelograms and obtain a slightly more complicated diagram:
Figure 19.
To form the second component, let us now erase all interior components of Figure 18 or Figure 19:
Figure 20.
Then we fill in three distorted parallelograms:
Figure 21.
This is the second component, and is the “front” of the truncated praallelopiped, minus the portions exposed by the truncation.
Finally, we turn to the third component. We begin by erasing the outer edges from the second component in Figure 21:
Figure 22.
We glue in three copies of the commutator-conjugate triangle from Figure 15:
Figure 23.
But now we observe that we can fill in three pentagons, and obtain a small triangle with edges :
Figure 24.
Erasing everything except this triangle gives the Hall-Witt identity. Alternatively, one can glue together Figures 18, 21, and 24 to obtain a truncated parallelopiped which one can view as a geometric representation of the proof of the Hall-Witt identity.
Among other things, I found these diagrams to be useful to visualise group cohomology; I give a simple example of this below, developing an analogue of the Hall-Witt identity for -cocycles.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space
over a finite field
of characteristic greater than
, where
is defined as the cardinality of the largest subset of
that does not contain an arithmetic progression of length
. In our earlier paper, we gave two arguments that bounded
in the regime when the field
was fixed and
was large. The first “cheap” argument gave the bound
and the more complicated “expensive” argument gave the improvement
for some constant depending only on
.
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of
that has no arithmetic progressions of length
, and seeks to locate a subspace on which
has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse
theorem of Ben and myself (and also independently by Samorodnitsky), one approximates
by a “quadratically structured” function
, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because
has no progressions of length
, the count of progressions of length
weighted by
will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which
has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which
approximates
is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on
one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent
in (1). In fact, it gives the following stronger result:
Theorem 1 Let
be a subset of
of density at least
, and let
. Then there is a subspace
of
of codimension
such that the number of (possibly degenerate) progressions
in
is at least
.
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to
on a subspace
. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse
theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace
on which
is quite dense (of density
) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.
Recent Comments