You are currently browsing the monthly archive for May 2010.
In Notes 5, we saw that the Gowers uniformity norms on vector spaces in high characteristic were controlled by classical polynomial phases
.
Now we study the analogous situation on cyclic groups . Here, there is an unexpected surprise: the polynomial phases (classical or otherwise) are no longer sufficient to control the Gowers norms
once
exceeds
. To resolve this problem, one must enlarge the space of polynomials to a larger class. It turns out that there are at least three closely related options for this class: the local polynomials, the bracket polynomials, and the nilsequences. Each of the three classes has its own strengths and weaknesses, but in my opinion the nilsequences seem to be the most natural class, due to the rich algebraic and dynamical structure coming from the nilpotent Lie group undergirding such sequences. For reasons of space we shall focus primarily on the nilsequence viewpoint here.
Traditionally, nilsequences have been defined in terms of linear orbits on nilmanifolds
; however, in recent years it has been realised that it is convenient for technical reasons (particularly for the quantitative “single-scale” theory) to generalise this setup to that of polynomial orbits
, and this is the perspective we will take here.
A polynomial phase on a finite abelian group
is formed by starting with a polynomial
to the unit circle, and then composing it with the exponential function
. To create a nilsequence
, we generalise this construction by starting with a polynomial
into a nilmanifold
, and then composing this with a Lipschitz function
. (The Lipschitz regularity class is convenient for minor technical reasons, but one could also use other regularity classes here if desired.) These classes of sequences certainly include the polynomial phases, but are somewhat more general; for instance, they almost include bracket polynomial phases such as
. (The “almost” here is because the relevant functions
involved are only piecewise Lipschitz rather than Lipschitz, but this is primarily a technical issue and one should view bracket polynomial phases as “morally” being nilsequences.)
In these notes we set out the basic theory for these nilsequences, including their equidistribution theory (which generalises the equidistribution theory of polynomial flows on tori from Notes 1) and show that they are indeed obstructions to the Gowers norm being small. This leads to the inverse conjecture for the Gowers norms that shows that the Gowers norms on cyclic groups are indeed controlled by these sequences.
In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.
Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field
.
In this case, the Gowers norms are closely tied to the space
of polynomials of degree at most
. Indeed, as noted in Exercise 20 of Notes 4, a function
of
norm
has
norm equal to
if and only if
for some
; thus polynomials solve the “
inverse problem” for the trivial inequality
. They are also a crucial component of the solution to the “
inverse problem” and “
inverse problem”. For the former, we will soon show:
Proposition 1 (
inverse theorem for
) Let
be such that
and
for some
. Then there exists
such that
, where
is a constant depending only on
.
Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:
Exercise 1 (Converse to
inverse theorem for
) If
and
for some
, then
, where
is a constant depending only on
.
In the world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has
Lemma 2 (Converse to the
inverse theorem for
) If
and
are such that
, where
, then
.
Proof: From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has
and the claim follows.
In the high characteristic case at least, this can be reversed:
Theorem 3 (
inverse theorem for
) Suppose that
. If
is such that
and
, then there exists
such that
.
This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small , the claim is easy:
Exercise 2 Verify the cases
of this theorem. (Hint: to verify the
case, use the Fourier-analytic identities
and
, where
is the space of all homomorphisms
from
to
, and
are the Fourier coefficients of
.)
This conjecture for larger values of are more difficult to establish. The
case of the theorem was established by Ben Green and myself in the high characteristic case
; the low characteristic case
was independently and simultaneously established by Samorodnitsky. The cases
in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.
The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials
. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if
and
; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.
The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:
Theorem 4 (Szemerédi’s theorem for finite fields) Let
be a finite field, let
, and let
be such that
. If
is sufficiently large depending on
, then
contains an (affine) line
for some
with
.
Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if
and
is sufficiently large depending on
, then
contains an affine
-dimensional subspace.
We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.
A (complex, semi-definite) inner product space is a complex vector space equipped with a sesquilinear form
which is conjugate symmetric, in the sense that
for all
, and non-negative in the sense that
for all
. By inspecting the non-negativity of
for complex numbers
, one obtains the Cauchy-Schwarz inequality
if one then defines , one then quickly concludes the triangle inequality
which then soon implies that is a semi-norm on
. If we make the additional assumption that the inner product
is positive definite, i.e. that
whenever
is non-zero, then this semi-norm becomes a norm. If
is complete with respect to the metric
induced by this norm, then
is called a Hilbert space.
The above material is extremely standard, and can be found in any graduate real analysis course; I myself covered it here. But what is perhaps less well known (except inside the fields of additive combinatorics and ergodic theory) is that the above theory of classical Hilbert spaces is just the first case of a hierarchy of higher order Hilbert spaces, in which the binary inner product is replaced with a
-ary inner product
that obeys an appropriate generalisation of the conjugate symmetry, sesquilinearity, and positive semi-definiteness axioms. Such inner products then obey a higher order Cauchy-Schwarz inequality, known as the Cauchy-Schwarz-Gowers inequality, and then also obey a triangle inequality and become semi-norms (or norms, if the inner product was non-degenerate). Examples of such norms and spaces include the Gowers uniformity norms
, the Gowers box norms
, and the Gowers-Host-Kra seminorms
; a more elementary example are the family of Lebesgue spaces
when the exponent is a power of two. They play a central role in modern additive combinatorics and to certain aspects of ergodic theory, particularly those relating to Szemerédi’s theorem (or its ergodic counterpart, the Furstenberg multiple recurrence theorem); they also arise in the regularity theory of hypergraphs (which is not unrelated to the other two topics).
A simple example to keep in mind here is the order two Hilbert space on a measure space
, where the inner product takes the form
In this brief note I would like to set out the abstract theory of such higher order Hilbert spaces. This is not new material, being already implicit in the breakthrough papers of Gowers and Host-Kra, but I just wanted to emphasise the fact that the material is abstract, and is not particularly tied to any explicit choice of norm so long as a certain axiom are satisfied. (Also, I wanted to write things down so that I would not have to reconstruct this formalism again in the future.) Unfortunately, the notation is quite heavy and the abstract axiom is a little strange; it may be that there is a better way to formulate things. In this particular case it does seem that a concrete approach is significantly clearer, but abstraction is at least possible.
Note: the discussion below is likely to be comprehensible only to readers who already have some exposure to the Gowers norms.
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues
of a Wigner random matrix . More specifically, we consider
Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal
on the diagonal, and
off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.
The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has
(in the vague topology) where is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)
One can phrase this law in a number of equivalent ways. For instance, in the bulk region , one almost surely has
uniformly for in , where the classical location
of the (normalised)
eigenvalue
is defined by the formula
The bound (1) also holds in the edge case (by using the operator norm bound , due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.
From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of . There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of
. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing
in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale
(and even at the slightly finer scale of
for some absolute constant
) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of
depends only on the first three moments of the matrix entries.
It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing at the critical scale
, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.
The main result of this paper is to show that this is not the case; that at the critical scale , the distribution of eigenvalues
is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.
Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment
A standard moment method computation shows that the right hand side is equal to
where is the fourth moment of the real part of the off-diagonal coefficients of
. In particular, a change in the fourth moment
by
leads to a change in the expression
by
. Thus, for a typical
, one expects
to shift by
; since
on the average, we thus expect
itself to shift by about
by the mean-value theorem.
To make this rigorous, one needs a sufficiently strong concentration of measure result for that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that
was sharply concentrated around an interval of size
around
for any
(in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that
was also weakly concentrated around an interval of size
around
, in the sense that the probability that one was outside this interval was
for some absolute constant
. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of
as
varied from
to
was of the size of
for some absolute
.
As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual is
(without averaging in
). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean
is indeed sensitive to the fourth moment of the entries at the critical scale
.
One curious feature of the analysis is how differently the median and the mean of the eigenvalue react to the available technology. To control the global behaviour of the eigenvalues (after averaging in
), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of
to an interval of size
, but can only localise the mean to a much larger interval of size
. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as
for which all eigenvalues could deviate as far as
from their expected location, instead of their typical deviation of
. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than
times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape
. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.
Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “Approximate subgroups of linear groups“, submitted to GAFA. This paper contains (the first part) of the results announced previously by us; the second part of these results, concerning expander groups, will appear subsequently. The release of this paper has been coordinated with the release of a parallel paper by Pyber and Szabo (previously announced within an hour(!) of our own announcement).
Our main result describes (with polynomial accuracy) the “sufficiently Zariski dense” approximate subgroups of simple algebraic groups , or more precisely absolutely almost simple algebraic groups over
, such as
. More precisely, define a
-approximate subgroup of a genuine group
to be a finite symmetric neighbourhood of the identity
(thus
and
) such that the product set
can be covered by
left-translates (and equivalently,
right-translates) of
.
Let be a field, and let
be its algebraic closure. For us, an absolutely almost simple algebraic group over
is a linear algebraic group
defined over
(i.e. an algebraic subvariety of
for some
with group operations given by regular maps) which is connected (i.e. irreducible), and such that the completion
has no proper normal subgroups of positive dimension (i.e. the only normal subgroups are either finite, or are all of
. To avoid degeneracies we also require
to be non-abelian (i.e. not one-dimensional). These groups can be classified in terms of their associated finite-dimensional simple complex Lie algebra, which of course is determined by its Dynkin diagram, together with a choice of weight lattice (and there are only finitely many such choices once the Lie algebra is fixed). However, the exact classification of these groups is not directly used in our work.
Our first main theorem classifies the approximate subgroups of such a group
in the model case when
generates the entire group
, and
is finite; they are either very small or very large.
Theorem 1 (Approximate groups that generate) Let
be an absolutely almost simple algebraic group over
. If
is finite and
is a
-approximate subgroup of
that generates
, then either
or
, where the implied constants depend only on
.
The hypothesis that generates
cannot be removed completely, since if
was a proper subgroup of
of size intermediate between that of the trivial group and of
, then the conclusion would fail (with
). However, one can relax the hypothesis of generation to that of being sufficiently Zariski-dense in
. More precisely, we have
Theorem 2 (Zariski-dense approximate groups) Let
be an absolutely almost simple algebraic group over
. If
is a
-approximate group) is not contained in any proper algebraic subgroup of
of complexity at most
(where
is sufficiently large depending on
), then either
or
, where the implied constants depend only on
and
is the group generated by
.
Here, we say that an algebraic variety has complexity at most if it can be cut out of an ambient affine or projective space of dimension at most
by using at most
polynomials, each of degree at most
. (Note that this is not an intrinsic notion of complexity, but will depend on how one embeds the algebraic variety into an ambient space; but we are assuming that our algebraic group
is a linear group and thus comes with such an embedding.)
In the case when , the second option of this theorem cannot occur since
is infinite, leading to a satisfactory classification of the Zariski-dense approximate subgroups of almost simple connected algebraic groups over
. On the other hand, every approximate subgroup of
is Zariski-dense in some algebraic subgroup, which can be then split as an extension of a semisimple algebraic quotient group by a solvable algebraic group (the radical of the Zariski closure). Pursuing this idea (and glossing over some annoying technical issues relating to connectedness), together with the Freiman theory for solvable groups over
due to Breuillard and Green, we obtain our third theorem:
Theorem 3 (Freiman’s theorem in
) Let
be a
-approximate subgroup of
. Then there exists a nilpotent
-approximate subgroup
of size at most
, such that
is covered by
translates of
.
This can be compared with Gromov’s celebrated theorem that any finitely generated group of polynomial growth is virtually nilpotent. Indeed, the above theorem easily implies Gromov’s theorem in the case of finitely generated subgroups of .
By fairly standard arguments, the above classification theorems for approximate groups can be used to give bounds on the expansion and diameter of Cayley graphs, for instance one can establish a conjecture of Babai and Seress that connected Cayley graphs on absolutely almost simple groups over a finite field have polylogarithmic diameter at most. Applications to expanders include the result on Suzuki groups mentioned in a previous post; further applications will appear in a forthcoming paper.
Apart from the general structural theory of algebraic groups, and some quantitative analogues of the basic theory of algebraic geometry (which we chose to obtain via ultrafilters, as discussed in this post), we rely on two basic tools. Firstly, we use a version of the pivot argument developed first by Konyagin and Bourgain-Glibichuk-Konyagin in the setting of sum-product estimates, and generalised to more non-commutative settings by Helfgott; this is discussed in this previous post. Secondly, we adapt an argument of Larsen and Pink (which we learned from a paper of Hrushovski) to obtain a sharp bound on the extent to which a sufficiently Zariski-dense approximate groups can concentrate in a (bounded complexity) subvariety; this is discussed at the end of this blog post.
In the previous lectures, we have focused mostly on the equidistribution or linear patterns on a subset of the integers , and in particular on intervals
. The integers are of course a very important domain to study in additive combinatorics; but there are also other fundamental model examples of domains to study. One of these is that of a vector space
over a finite field
of prime order. Such domains are of interest in computer science (particularly when
) and also in number theory; but they also serve as an important simplified “dyadic model” for the integers. See this survey article of Green for further discussion of this point.
The additive combinatorics of the integers , and of vector spaces
over finite fields, are analogous, but not quite identical. For instance, the analogue of an arithmetic progression in
is a subspace of
. In many cases, the finite field theory is a little bit simpler than the integer theory; for instance, subspaces are closed under addition, whereas arithmetic progressions are only “almost” closed under addition in various senses. (For instance,
is closed under addition approximately half of the time.) However, there are some ways in which the integers are better behaved. For instance, because the integers can be generated by a single generator, a homomorphism from
to some other group
can be described by a single group element
:
. However, to specify a homomorphism from a vector space
to
one would need to specify one group element for each dimension of
. Thus we see that there is a tradeoff when passing from
(or
) to a vector space model; one gains a bounded torsion property, at the expense of conceding the bounded generation property. (Of course, if one wants to deal with arbitrarily large domains, one has to concede one or the other; the only additive groups that have both bounded torsion and boundedly many generators, are bounded.)
The starting point for this course (Notes 1) was the study of equidistribution of polynomials from the integers to the unit circle. We now turn to the parallel theory of equidistribution of polynomials
from vector spaces over finite fields to the unit circle. Actually, for simplicity we will mostly focus on the classical case, when the polynomials in fact take values in the
roots of unity (where
is the characteristic of the field
). As it turns out, the non-classical case is also of importance (particularly in low characteristic), but the theory is more difficult; see these notes for some further discussion.
Michael Nielsen has posted a draft of his article “Introduction to the Polymath Project and “Density Hales-Jewett and Moser Numbers”” for comments, which will precede our own polymath article in the Szemerédi birthday conference proceedings.
As an unrelated link, Jordan Ellenberg has just written a piece for the Washington Post on statistical sampling and how it could make the US census more accurate. (Whether this is politically viable is, of course, a different matter.)
Recent Comments