You are currently browsing the monthly archive for October 2014.
I’ve just uploaded to the arXiv my paper “The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture“. As the title suggests, this paper links together the Elliott-Halberstam conjecture from sieve theory with the conjecture of Vinogradov concerning the least quadratic nonresidue of a prime
. Vinogradov established the bound
for any fixed . Unconditionally, the best result so far (up to logarithmic factors) that holds for all primes
is by Burgess, who showed that
for any fixed . See this previous post for a proof of these bounds.
In this paper, we show that the Vinogradov conjecture is a consequence of the Elliott-Halberstam conjecture. Using a variant of the argument, we also show that the “Type II” estimates established by Zhang and numerically improved by the Polymath8a project can be used to improve a little on the Vinogradov bound (1), to
although this falls well short of the Burgess bound. However, the method is somewhat different (although in both cases it is the Weil exponential sum bounds that are the source of the gain over (1)) and it is conceivable that a numerically stronger version of the Type II estimates could obtain results that are more competitive with the Burgess bound. At any rate, this demonstrates that the equidistribution estimates introduced by Zhang may have further applications beyond the family of results related to bounded gaps between primes.
The connection between the least quadratic nonresidue problem and Elliott-Halberstam is follows. Suppose for contradiction we can find a prime with
unusually large. Letting
be the quadratic character modulo
, this implies that the sums
are also unusually large for a significant range of
(e.g.
), although the sum is also quite small for large
(e.g.
), due to the cancellation present in
. It turns out (by a sort of “uncertainty principle” for multiplicative functions, as per this paper of Granville and Soundararajan) that these facts force
to be unusually large in magnitude for some large
(with
for two large absolute constants
). By the periodicity of
, this means that
must be unusually large also. However, because is large, one can factorise
as
for a fairly sparsely supported function
. The Elliott-Halberstam conjecture, which controls the distribution of
in arithmetic progressions on the average can then be used to show that
is small, giving the required contradiction.
The implication involving Type II estimates is proven by a variant of the argument. If is large, then a character sum
is unusually large for a certain
. By multiplicativity of
, this shows that
correlates with
, and then by periodicity of
, this shows that
correlates with
for various small
. By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that
correlates with
for some distinct
. But this can be ruled out by using Type II estimates.
I’ll record here a well-known observation concerning potential counterexamples to any improvement to the Burgess bound, that is to say an infinite sequence of primes with
. Suppose we let
be the asymptotic mean value of the quadratic character
at
and
the mean value of
; these quantities are defined precisely in my paper, but roughly speaking one can think of
and
Thanks to the basic Dirichlet convolution identity , one can establish the Wirsing integral equation
for all ; see my paper for details (actually far sharper claims than this appear in previous work of Wirsing and Granville-Soundararajan). If we have an infinite sequence of counterexamples to any improvement to the Burgess bound, then we have
while from the Burgess exponential sum estimates we have
These two constraints, together with the Wirsing integral equation, end up determining and
completely. It turns out that we must have
and
and then for ,
evolves by the integral equation
For instance
and then oscillates in a somewhat strange fashion towards zero as
. This very odd behaviour of
is surely impossible, but it seems remarkably hard to exclude it without invoking a strong hypothesis, such as GRH or the Elliott-Halberstam conjecture (or weaker versions thereof).
The prime number theorem can be expressed as the assertion
as , where
is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula
where the second von Mangoldt function is defined by the formula
(We are avoiding the use of the symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate
suffices.
In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space of compactly supported continuous functions
equipped with the convolution operation
This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula
for any element of a unital commutative Banach algebra
, where
is the space of characters (i.e., continuous unital algebra homomorphisms from
to
) of
. This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.
The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.
Theorem 1 (Construction of a Banach algebra norm) For any
, let
denote the quantity
Then
is a seminorm on
with the bound
for all
. Furthermore, we have the Banach algebra bound
for all
.
We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that contains a plentiful supply of local units):
Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum) Let
be a seminorm on
obeying (7), (8). Suppose that
is not identically zero. Then there exists
such that
for all
. In particular, by (7), one has
whenever
is a non-negative function.
The second is a consequence of the Selberg symmetry formula and the fact that is real (as well as Mertens’ theorem, in the
case), and is closely related to the non-vanishing of the Riemann zeta function
on the line
:
Theorem 3 (Breaking the parity barrier) Let
. Then there exists
such that
is non-negative, and
Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm constructed in Theorem 1 is trivial, and thus
as for any Schwartz function
(the decay rate in
may depend on
). Specialising to functions of the form
for some smooth compactly supported
on
, we conclude that
as ; by the smooth Urysohn lemma this implies that
as for any fixed
, and the prime number theorem then follows by a telescoping series argument.
The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that
for any fixed Dirichlet character ; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity
is non-vanishing.
In graph theory, the recently developed theory of graph limits has proven to be a useful tool for analysing large dense graphs, being a convenient reformulation of the Szemerédi regularity lemma. Roughly speaking, the theory asserts that given any sequence of finite graphs, one can extract a subsequence
which converges (in a specific sense) to a continuous object known as a “graphon” – a symmetric measurable function
. What “converges” means in this context is that subgraph densities converge to the associated integrals of the graphon
. For instance, the edge density
converge to the integral
the triangle density
converges to the integral
the four-cycle density
converges to the integral
and so forth. One can use graph limits to prove many results in graph theory that were traditionally proven using the regularity lemma, such as the triangle removal lemma, and can also reduce many asymptotic graph theory problems to continuous problems involving multilinear integrals (although the latter problems are not necessarily easy to solve!). See this text of Lovasz for a detailed study of graph limits and their applications.
One can also express graph limits (and more generally hypergraph limits) in the language of nonstandard analysis (or of ultraproducts); see for instance this paper of Elek and Szegedy, Section 6 of this previous blog post, or this paper of Towsner. (In this post we assume some familiarity with nonstandard analysis, as reviewed for instance in the previous blog post.) Here, one starts as before with a sequence of finite graphs, and then takes an ultraproduct (with respect to some arbitrarily chosen non-principal ultrafilter
) to obtain a nonstandard graph
, where
is the ultraproduct of the
, and similarly for the
. The set
can then be viewed as a symmetric subset of
which is measurable with respect to the Loeb
-algebra
of the product
(see this previous blog post for the construction of Loeb measure). A crucial point is that this
-algebra is larger than the product
of the Loeb
-algebra of the individual vertex set
. This leads to a decomposition
where the “graphon” is the orthogonal projection of
onto
, and the “regular error”
is orthogonal to all product sets
for
. The graphon
then captures the statistics of the nonstandard graph
, in exact analogy with the more traditional graph limits: for instance, the edge density
(or equivalently, the limit of the along the ultrafilter
) is equal to the integral
where denotes Loeb measure on a nonstandard finite set
; the triangle density
(or equivalently, the limit along of the triangle densities of
) is equal to the integral
and so forth. Note that with this construction, the graphon is living on the Cartesian square of an abstract probability space
, which is likely to be inseparable; but it is possible to cut down the Loeb
-algebra on
to minimal countable
-algebra for which
remains measurable (up to null sets), and then one can identify
with
, bringing this construction of a graphon in line with the traditional notion of a graphon. (See Remark 5 of this previous blog post for more discussion of this point.)
Additive combinatorics, which studies things like the additive structure of finite subsets of an abelian group
, has many analogies and connections with asymptotic graph theory; in particular, there is the arithmetic regularity lemma of Green which is analogous to the graph regularity lemma of Szemerédi. (There is also a higher order arithmetic regularity lemma analogous to hypergraph regularity lemmas, but this is not the focus of the discussion here.) Given this, it is natural to suspect that there is a theory of “additive limits” for large additive sets of bounded doubling, analogous to the theory of graph limits for large dense graphs. The purpose of this post is to record a candidate for such an additive limit. This limit can be used as a substitute for the arithmetic regularity lemma in certain results in additive combinatorics, at least if one is willing to settle for qualitative results rather than quantitative ones; I give a few examples of this below the fold.
It seems that to allow for the most flexible and powerful manifestation of this theory, it is convenient to use the nonstandard formulation (among other things, it allows for full use of the transfer principle, whereas a more traditional limit formulation would only allow for a transfer of those quantities continuous with respect to the notion of convergence). Here, the analogue of a nonstandard graph is an ultra approximate group in a nonstandard group
, defined as the ultraproduct of finite
-approximate groups
for some standard
. (A
-approximate group
is a symmetric set containing the origin such that
can be covered by
or fewer translates of
.) We then let
be the external subgroup of
generated by
; equivalently,
is the union of
over all standard
. This space has a Loeb measure
, defined by setting
whenever is an internal subset of
for any standard
, and extended to a countably additive measure; the arguments in Section 6 of this previous blog post can be easily modified to give a construction of this measure.
The Loeb measure is a translation invariant measure on
, normalised so that
has Loeb measure one. As such, one should think of
as being analogous to a locally compact abelian group equipped with a Haar measure. It should be noted though that
is not actually a locally compact group with Haar measure, for two reasons:
- There is not an obvious topology on
that makes it simultaneously locally compact, Hausdorff, and
-compact. (One can get one or two out of three without difficulty, though.)
- The addition operation
is not measurable from the product Loeb algebra
to
. Instead, it is measurable from the coarser Loeb algebra
to
(compare with the analogous situation for nonstandard graphs).
Nevertheless, the analogy is a useful guide for the arguments that follow.
Let denote the space of bounded Loeb measurable functions
(modulo almost everywhere equivalence) that are supported on
for some standard
; this is a complex algebra with respect to pointwise multiplication. There is also a convolution operation
, defined by setting
whenever ,
are bounded nonstandard functions (extended by zero to all of
), and then extending to arbitrary elements of
by density. Equivalently,
is the pushforward of the
-measurable function
under the map
.
The basic structural theorem is then as follows.
Theorem 1 (Kronecker factor) Let
be an ultra approximate group. Then there exists a (standard) locally compact abelian group
of the form
for some standard
and some compact abelian group
, equipped with a Haar measure
and a measurable homomorphism
(using the Loeb
-algebra on
and the Baire
-algebra on
), with the following properties:
- (i)
has dense image, and
is the pushforward of Loeb measure
by
.
- (ii) There exists sets
with
open and
compact, such that
- (iii) Whenever
with
compact and
open, there exists a nonstandard finite set
such that
- (iv) If
, then we have the convolution formula
where
are the pushforwards of
to
, the convolution
on the right-hand side is convolution using
, and
is the pullback map from
to
. In particular, if
, then
for all
.
One can view the locally compact abelian group as a “model “or “Kronecker factor” for the ultra approximate group
(in close analogy with the Kronecker factor from ergodic theory). In the case that
is a genuine nonstandard finite group rather than an ultra approximate group, the non-compact components
of the Kronecker group
are trivial, and this theorem was implicitly established by Szegedy. The compact group
is quite large, and in particular is likely to be inseparable; but as with the case of graphons, when one is only studying at most countably many functions
, one can cut down the size of this group to be separable (or equivalently, second countable or metrisable) if desired, so one often works with a “reduced Kronecker factor” which is a quotient of the full Kronecker factor
. Once one is in the separable case, the Baire sigma algebra is identical with the more familiar Borel sigma algebra.
Given any sequence of uniformly bounded functions for some fixed
, we can view the function
defined by
as an “additive limit” of the , in much the same way that graphons
are limits of the indicator functions
. The additive limits capture some of the statistics of the
, for instance the normalised means
converge (along the ultrafilter ) to the mean
and for three sequences of functions, the normalised correlation
converges along to the correlation
the normalised Gowers norm
converges along to the
Gowers norm
and so forth. We caution however that some correlations that involve evaluating more than one function at the same point will not necessarily be preserved in the additive limit; for instance the normalised norm
does not necessarily converge to the norm
but can converge instead to a larger quantity, due to the presence of the orthogonal projection in the definition (4) of
.
An important special case of an additive limit occurs when the functions involved are indicator functions
of some subsets
of
. The additive limit
does not necessarily remain an indicator function, but instead takes values in
(much as a graphon
takes values in
even though the original indicators
take values in
). The convolution
is then the ultralimit of the normalised convolutions
; in particular, the measure of the support of
provides a lower bound on the limiting normalised cardinality
of a sumset. In many situations this lower bound is an equality, but this is not necessarily the case, because the sumset
could contain a large number of elements which have very few (
) representations as the sum of two elements of
, and in the limit these portions of the sumset fall outside of the support of
. (One can think of the support of
as describing the “essential” sumset of
, discarding those elements that have only very few representations.) Similarly for higher convolutions of
. Thus one can use additive limits to partially control the growth
of iterated sumsets of subsets
of approximate groups
, in the regime where
stays bounded and
goes to infinity.
Theorem 1 can be proven by Fourier-analytic means (combined with Freiman’s theorem from additive combinatorics), and we will do so below the fold. For now, we give some illustrative examples of additive limits.
Example 2 (Bohr sets) We take
to be the intervals
, where
is a sequence going to infinity; these are
-approximate groups for all
. Let
be an irrational real number, let
be an interval in
, and for each natural number
let
be the Bohr set
In this case, the (reduced) Kronecker factor
can be taken to be the infinite cylinder
with the usual Lebesgue measure
. The additive limits of
and
end up being
and
, where
is the finite cylinder
and
is the rectangle
Geometrically, one should think of
and
as being wrapped around the cylinder
via the homomorphism
, and then one sees that
is converging in some normalised weak sense to
, and similarly for
and
. In particular, the additive limit predicts the growth rate of the iterated sumsets
to be quadratic in
until
becomes comparable to
, at which point the growth transitions to linear growth, in the regime where
is bounded and
is large.
If
were rational instead of irrational, then one would need to replace
by the finite subgroup
here.
Example 3 (Structured subsets of progressions) We take
be the rank two progression
where
is a sequence going to infinity; these are
-approximate groups for all
. Let
be the subset
Then the (reduced) Kronecker factor can be taken to be
with Lebesgue measure
, and the additive limits of the
and
are then
and
, where
is the square
and
is the circle
Geometrically, the picture is similar to the Bohr set one, except now one uses a Freiman homomorphism
for
to embed the original sets
into the plane
. In particular, one now expects the growth rate of the iterated sumsets
and
to be quadratic in
, in the regime where
is bounded and
is large.
Example 4 (Dissociated sets) Let
be a fixed natural number, and take
where
are randomly chosen elements of a large cyclic group
, where
is a sequence of primes going to infinity. These are
-approximate groups. The (reduced) Kronecker factor
can (almost surely) then be taken to be
with counting measure, and the additive limit of
is
, where
and
is the standard basis of
. In particular, the growth rates of
should grow approximately like
for
bounded and
large.
Example 5 (Random subsets of groups) Let
be a sequence of finite additive groups whose order is going to infinity. Let
be a random subset of
of some fixed density
. Then (almost surely) the Kronecker factor here can be reduced all the way to the trivial group
, and the additive limit of the
is the constant function
. The convolutions
then converge in the ultralimit (modulo almost everywhere equivalence) to the pullback of
; this reflects the fact that
of the elements of
can be represented as the sum of two elements of
in
ways. In particular,
occupies a proportion
of
.
Example 6 (Trigonometric series) Take
for a sequence
of primes going to infinity, and for each
let
be an infinite sequence of frequencies chosen uniformly and independently from
. Let
denote the random trigonometric series
Then (almost surely) we can take the reduced Kronecker factor
to be the infinite torus
(with the Haar probability measure
), and the additive limit of the
then becomes the function
defined by the formula
In fact, the pullback
is the ultralimit of the
. As such, for any standard exponent
, the normalised
norm
can be seen to converge to the limit
The reader is invited to consider combinations of the above examples, e.g. random subsets of Bohr sets, to get a sense of the general case of Theorem 1.
It is likely that this theorem can be extended to the noncommutative setting, using the noncommutative Freiman theorem of Emmanuel Breuillard, Ben Green, and myself, but I have not attempted to do so here (see though this recent preprint of Anush Tserunyan for some related explorations); in a separate direction, there should be extensions that can control higher Gowers norms, in the spirit of the work of Szegedy.
Note: the arguments below will presume some familiarity with additive combinatorics and with nonstandard analysis, and will be a little sketchy in places.
One of the first basic theorems in group theory is Cayley’s theorem, which links abstract finite groups with concrete finite groups (otherwise known as permutation groups).
Theorem 1 (Cayley’s theorem) Let
be a group of some finite order
. Then
is isomorphic to a subgroup
of the symmetric group
on
elements
. Furthermore, this subgroup is simply transitive: given two elements
of
, there is precisely one element
of
such that
.
One can therefore think of as a sort of “universal” group that contains (up to isomorphism) all the possible groups of order
.
Proof: The group acts on itself by multiplication on the left, thus each element
may be identified with a permutation
on
given by the map
. This can be easily verified to identify
with a simply transitive permutation group on
. The claim then follows by arbitrarily identifying
with
.
More explicitly, the permutation group arises by arbitrarily enumerating
as
and then associating to each group element
the permutation
defined by the formula
The simply transitive group given by Cayley’s theorem is not unique, due to the arbitrary choice of identification of
with
, but is unique up to conjugation by an element of
. On the other hand, it is easy to see that every simply transitive subgroup of
is of order
, and that two such groups are isomorphic if and only if they are conjugate by an element of
. Thus Cayley’s theorem in fact identifies the moduli space of groups of order
(up to isomorphism) with the simply transitive subgroups of
(up to conjugacy by elements of
).
One can generalise Cayley’s theorem to groups of infinite order without much difficulty. But in this post, I would like to note an (easy) generalisation of Cayley’s theorem in a different direction, in which the group is no longer assumed to be of order
, but rather to have an index
subgroup that is isomorphic to a fixed group
. The generalisation is:
Theorem 2 (Cayley’s theorem for
-sets) Let
be a group, and let
be a group that contains an index
subgroup isomorphic to
. Then
is isomorphic to a subgroup
of the semidirect product
, defined explicitly as the set of tuples
with product
and inverse
(This group is a wreath product of
with
, and is sometimes denoted
, or more precisely
.) Furthermore,
is simply transitive in the following sense: given any two elements
of
and
, there is precisely one
in
such that
and
.
Of course, Theorem 1 is the special case of Theorem 2 when is trivial. This theorem allows one to view
as a “universal” group for modeling all groups containing a copy of
as an index
subgroup, in exactly the same way that
is a universal group for modeling groups of order
. This observation is not at all deep, but I had not seen it before, so I thought I would record it here. (EDIT: as pointed out in comments, this is a slight variant of the universal embedding theorem of Krasner and Kaloujnine, which covers the case when
is normal, in which case one can embed
into the wreath product
, which is a subgroup of
.)
Proof: The basic idea here is to replace the category of sets in Theorem 1 by the category of -sets, by which we mean sets
with a right-action of the group
. A morphism between two
-sets
is a function
which respects the right action of
, thus
for all
and
.
Observe that if contains a copy of
as a subgroup, then one can view
as an
-set, using the right-action of
(which we identify with the indicated subgroup of
). The left action of
on itself commutes with the right-action of
, and so we can represent
by
-set automorphisms on the
-set
.
As has index
in
, we see that
is (non-canonically) isomorphic (as an
-set) to the
-set
with the obvious right action of
:
. It is easy to see that the group of
-set automorphisms of
can be identified with
, with the latter group acting on the former
-set by the rule
(it is routine to verify that this is indeed an action of by
-set automorphisms. It is then a routine matter to verify the claims (the simple transitivity of
follows from the simple transitivity of the action of
on itself).
More explicitly, the group arises by arbitrarily enumerating the left-cosets of
in
as
and then associating to each group element
the element
, where the permutation
and the elements
are defined by the formula
By noting that is an index
normal subgroup of
, we recover the classical result of Poincaré that any group
that contains
as an index
subgroup, contains a normal subgroup
of index dividing
that is contained in
. (Quotienting out the
right-action, we recover also the classical proof of this result, as the action of
on itself then collapses to the action of
on the quotient space
, the stabiliser of which is
.)
Exercise 1 Show that a simply transitive subgroup
of
contains a copy of
as an index
subgroup; in particular, there is a canonical embedding of
into
, and
can be viewed as an
-set.
Exercise 2 Show that any two simply transitive subgroups
of
are isomorphic simultaneously as groups and as
-sets (that is, there is a bijection
that is simultaneously a group isomorphism and an
-set isomorphism) if and only if they are conjugate by an element of
.
[UPDATE: Exercises corrected; thanks to Keith Conrad for some additional corrections and comments.]
Recent Comments