You are currently browsing the monthly archive for June 2014.
As laid out in the foundational work of Kolmogorov, a classical probability space (or probability space for short) is a triplet , where
is a set,
is a
-algebra of subsets of
, and
is a countably additive probability measure on
. Given such a space, one can form a number of interesting function spaces, including
- the (real) Hilbert space
of square-integrable functions
, modulo
-almost everywhere equivalence, and with the positive definite inner product
; and
- the unital commutative Banach algebra
of essentially bounded functions
, modulo
-almost everywhere equivalence, with
defined as the essential supremum of
.
There is also a trace on
defined by integration:
.
One can form the category of classical probability spaces, by defining a morphism
between probability spaces to be a function
which is measurable (thus
for all
) and measure-preserving (thus
for all
).
Let us now abstract the algebraic features of these spaces as follows; for want of a better name, I will refer to this abstraction as an algebraic probability space, and is very similar to the non-commutative probability spaces studied in this previous post, except that these spaces are now commutative (and real).
Definition 1 An algebraic probability space is a pair
where
is a unital commutative real algebra;
is a homomorphism such that
and
for all
;
- Every element
of
is bounded in the sense that
. (Technically, this isn’t an algebraic property, but I need it for technical reasons.)
A morphism
is a homomorphism
which is trace-preserving, in the sense that
for all
.
For want of a better name, I’ll denote the category of algebraic probability spaces as . One can view this category as the opposite category to that of (a subcategory of) the category of tracial commutative real algebras. One could emphasise this opposite nature by denoting the algebraic probability space as
rather than
; another suggestive (but slightly inaccurate) notation, inspired by the language of schemes, would be
rather than
. However, we will not adopt these conventions here, and refer to algebraic probability spaces just by the pair
.
By the previous discussion, we have a covariant functor that takes a classical probability space
to its algebraic counterpart
, with a morphism
of classical probability spaces mapping to a morphism
of the corresponding algebraic probability spaces by the formula
for . One easily verifies that this is a functor.
In this post I would like to describe a functor which partially inverts
(up to natural isomorphism), that is to say a recipe for starting with an algebraic probability space
and producing a classical probability space
. This recipe is not new – it is basically the (commutative) Gelfand-Naimark-Segal construction (discussed in this previous post) combined with the Loomis-Sikorski theorem (discussed in this previous post). However, I wanted to put the construction in a single location for sake of reference. I also wanted to make the point that
and
are not complete inverses; there is a bit of information in the algebraic probability space (e.g. topological information) which is lost when passing back to the classical probability space. In some future posts, I would like to develop some ergodic theory using the algebraic foundations of probability theory rather than the classical foundations; this turns out to be convenient in the ergodic theory arising from nonstandard analysis (such as that described in this previous post), in which the groups involved are uncountable and the underlying spaces are not standard Borel spaces.
Let us describe how to construct the functor , with details postponed to below the fold.
- Starting with an algebraic probability space
, form an inner product on
by the formula
, and also form the spectral radius
.
- The inner product is clearly positive semi-definite. Quotienting out the null vectors and taking completions, we arrive at a real Hilbert space
, to which the trace
may be extended.
- Somewhat less obviously, the spectral radius is well-defined and gives a norm on
. Taking
limits of sequences in
of bounded spectral radius gives us a subspace
of
that has the structure of a real commutative Banach algebra.
- The idempotents
of the Banach algebra
may be indexed by elements
of an abstract
-algebra
.
- The Boolean algebra homomorphisms
(or equivalently, the real algebra homomorphisms
) may be indexed by elements
of a space
.
- Let
denote the
-algebra on
generated by the basic sets
for every
.
- Let
be the
-ideal of
generated by the sets
, where
is a sequence with
.
- One verifies that
is isomorphic to
. Using this isomorphism, the trace
on
can be used to construct a countably additive measure
on
. The classical probability space
is then
, and the abstract spaces
may now be identified with their concrete counterparts
,
.
- Every algebraic probability space morphism
generates a classical probability morphism
via the formula
using a pullback operation
on the abstract
-algebras
that can be defined by density.
Remark 1 The classical probability space
constructed by the functor
has some additional structure; namely
is a
-Stone space (a Stone space with the property that the closure of any countable union of clopen sets is clopen),
is the Baire
-algebra (generated by the clopen sets), and the null sets are the meager sets. However, we will not use this additional structure here.
The partial inversion relationship between the functors and
is given by the following assertion:
- There is a natural transformation from
to the identity functor
.
More informally: if one starts with an algebraic probability space and converts it back into a classical probability space
, then there is a trace-preserving algebra homomorphism of
to
, which respects morphisms of the algebraic probability space. While this relationship is far weaker than an equivalence of categories (which would require that
and
are both natural isomorphisms), it is still good enough to allow many ergodic theory problems formulated using classical probability spaces to be reformulated instead as an equivalent problem in algebraic probability spaces.
Remark 2 The opposite composition
is a little odd: it takes an arbitrary probability space
and returns a more complicated probability space
, with
being the space of homomorphisms
. while there is “morally” an embedding of
into
using the evaluation map, this map does not exist in general because points in
may well have zero measure. However, if one takes a “pointless” approach and focuses just on the measure algebras
,
, then these algebras become naturally isomorphic after quotienting out by null sets.
Remark 3 An algebraic probability space captures a bit more structure than a classical probability space, because
may be identified with a proper subset of
that describes the “regular” functions (or random variables) of the space. For instance, starting with the unit circle
(with the usual Haar measure and the usual trace
), any unital subalgebra
of
that is dense in
will generate the same classical probability space
on applying the functor
, namely one will get the space
of homomorphisms from
to
(with the measure induced from
). Thus for instance
could be the continuous functions
, the Wiener algebra
or the full space
, but the classical space
will be unable to distinguish these spaces from each other. In particular, the functor
loses information (roughly speaking, this functor takes an algebraic probability space and completes it to a von Neumann algebra, but then forgets exactly what algebra was initially used to create this completion). In ergodic theory, this sort of “extra structure” is traditionally encoded in topological terms, by assuming that the underlying probability space
has a nice topological structure (e.g. a standard Borel space); however, with the algebraic perspective one has the freedom to have non-topological notions of extra structure, by choosing
to be something other than an algebra
of continuous functions on a topological space. I hope to discuss one such example of extra structure (coming from the Gowers-Host-Kra theory of uniformity seminorms) in a later blog post (this generalises the example of the Wiener algebra given previously, which is encoding “Fourier structure”).
A small example of how one could use the functors is as follows. Suppose one has a classical probability space
with a measure-preserving action of an uncountable group
, which is only defined (and an action) up to almost everywhere equivalence; thus for instance for any set
and any
,
and
might not be exactly equal, but only equal up to a null set. For similar reasons, an element
of the invariant factor
might not be exactly invariant with respect to
, but instead one only has
and
equal up to null sets for each
. One might like to “clean up” the action of
to make it defined everywhere, and a genuine action everywhere, but this is not immediately achievable if
is uncountable, since the union of all the null sets where something bad occurs may cease to be a null set. However, by applying the functor
, each shift
defines a morphism
on the associated algebraic probability space (i.e. the Koopman operator), and then applying
, we obtain a shift
on a new classical probability space
which now gives a genuine measure-preserving action of
, and which is equivalent to the original action from a measure algebra standpoint. The invariant factor
now consists of those sets in
which are genuinely
-invariant, not just up to null sets. (Basically, the classical probability space
contains a Boolean algebra
with the property that every measurable set
is equivalent up to null sets to precisely one set in
, allowing for a canonical “retraction” onto
that eliminates all null set issues.)
More indirectly, the functors suggest that one should be able to develop a “pointless” form of ergodic theory, in which the underlying probability spaces are given algebraically rather than classically. I hope to give some more specific examples of this in later posts.
There are a number of ways to construct the real numbers , for instance
- as the metric completion of
(thus,
is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals
;
- as the space of quasimorphisms
on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)
There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers
:
- The group
of all nonstandard integers of magnitude less than or comparable to
; and
- The group
of nonstandard integers of magnitude infinitesimally smaller than
.
The group is a subgroup of
, so we may form the quotient group
. This space is isomorphic to the reals
, and can in fact be used to construct the reals:
Proposition 1 For any coset
of
, there is a unique real number
with the property that
. The map
is then an isomorphism between the additive groups
and
.
Proof: Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for
.
In a similar vein, we can view the unit interval in the reals as the quotient
where is the nonstandard (i.e. internal) set
; of course,
is not a group, so one should interpret
as the image of
under the quotient map
(or
, if one prefers). Or to put it another way, (1) asserts that
is the image of
with respect to the map
.
In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on . Given an internal subset
of
, we may define the elementary measure
of
by the formula
This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the Loeb outer measure
of any subset
in complete analogy with Lebesgue outer measure by the formula
where ranges over all sequences of internal subsets of
that cover
. We say that a subset
of
is Loeb measurable if, for any (standard)
, one can find an internal subset
of
which differs from
by a set of Loeb outer measure at most
, and in that case we define the Loeb measure
of
to be
. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space
of Loeb measurable sets is a
-algebra, and that
is a countably additive probability measure on this space that extends the elementary measure
. Thus
now has the structure of a probability space
.
Now, the group acts (Loeb-almost everywhere) on the probability space
by the addition map, thus
for
and
(excluding a set of Loeb measure zero where
exits
). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor
, defined by restricting attention to those Loeb measurable sets
with the property that
is equal
-almost everywhere to
for each
.
The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure
(and the trivial action of
), by the same factor map
used in (1). More precisely:
Theorem 2 Given a set
, there exists a Lebesgue measurable set
, unique up to
-a.e. equivalence, such that
is
-a.e. equivalent to the set
. Conversely, if
is Lebesgue measurable, then
is in
, and
.
More informally, we have the measure-theoretic version
of (1).
Proof: We first prove the converse. It is clear that is
-invariant, so it suffices to show that
is Loeb measurable with Loeb measure
. This is easily verified when
is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of
is bounded by the Lebesgue outer measure of
for any set
; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.
Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let
be an arbitrary standard real number, then we can find an internal set
which differs from
by a set of Loeb measure at most
. As
is
-invariant, we conclude that for every
,
and
differ by a set of Loeb measure (and hence elementary measure) at most
. By the (contrapositive of the) underspill principle, there must exist a standard
such that
and
differ by a set of elementary measure at most
for all
. If we then define the nonstandard function
by the formula
then from the (nonstandard) triangle inequality we have
(say). On the other hand, has the Lipschitz continuity property
and so in particular we see that
for some Lipschitz continuous function . If we then let
be the set where
, one can check that
differs from
by a set of Loeb outer measure
, and hence
does so also. Sending
to zero, we see (from the converse claim) that
is a Cauchy sequence in
and thus converges in
for some Lebesgue measurable
. The sets
then converge in Loeb outer measure to
, giving the claim.
Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function
can be expressed (as a function on
, defined
-a.e.) as
By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts ,
of minimal
norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages
for
converge (as a net, rather than a sequence) in
to
.
If is (the standard part of) an internal function, that is to say the ultralimit of a sequence
of finitary bounded functions, one can view the measurable function
as a limit of the
that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function
is related to the discrete functions
by the formula
for all , where
is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence
such that
thus is the asymptotic density function of the
. For instance, if
is the indicator function of a randomly chosen subset of
, then the asymptotic density function would equal
(almost everywhere, at least).
I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter
, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.
The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if is a unitary operator on a Hilbert space
, and
is a vector in that Hilbert space, then one has
in the strong topology, where is the
-invariant subspace of
, and
is the orthogonal projection to
. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if
is a countable amenable group acting on a Hilbert space
by unitary transformations
, and
is a vector in that Hilbert space, then one has
for any Følner sequence of
, where
is the
-invariant subspace. Thus one can interpret
as a certain average of elements of the orbit
of
.
I recently discovered that there is a simple variant of this ergodic theorem that holds even when the group is not amenable (or not discrete), using a more abstract notion of averaging:
Theorem 1 (Abstract ergodic theorem) Let
be an arbitrary group acting unitarily on a Hilbert space
, and let
be a vector in
. Then
is the element in the closed convex hull of
of minimal norm, and is also the unique element of
in this closed convex hull.
Proof: As the closed convex hull of is closed, convex, and non-empty in a Hilbert space, it is a classical fact (see e.g. Proposition 1 of this previous post) that it has a unique element
of minimal norm. If
for some
, then the midpoint of
and
would be in the closed convex hull and be of smaller norm, a contradiction; thus
is
-invariant. To finish the first claim, it suffices to show that
is orthogonal to every element
of
. But if this were not the case for some such
, we would have
for all
, and thus on taking convex hulls
, a contradiction.
Finally, since is orthogonal to
, the same is true for
for any
in the closed convex hull of
, and this gives the second claim.
This result is due to Alaoglu and Birkhoff. It implies the amenable ergodic theorem (1); indeed, given any , Theorem 1 implies that there is a finite convex combination
of shifts
of
which lies within
(in the
norm) to
. By the triangle inequality, all the averages
also lie within
of
, but by the Følner property this implies that the averages
are eventually within
(say) of
, giving the claim.
It turns out to be possible to use Theorem 1 as a substitute for the mean ergodic theorem in a number of contexts, thus removing the need for an amenability hypothesis. Here is a basic application:
Corollary 2 (Relative orthogonality) Let
be a group acting unitarily on a Hilbert space
, and let
be a
-invariant closed subspace of
. Then
and
are relatively orthogonal over their common subspace
, that is to say the restrictions of
and
to the orthogonal complement of
are orthogonal to each other.
Proof: By Theorem 1, we have for all
, and the claim follows. (Thanks to Gergely Harcos for this short argument.)
Now we give a more advanced application of Theorem 1, to establish some “Mackey theory” over arbitrary groups . Define a
-system
to be a probability space
together with a measure-preserving action
of
on
; this gives an action of
on
, which by abuse of notation we also call
:
(In this post we follow the usual convention of defining the spaces by quotienting out by almost everywhere equivalence.) We say that a
-system is ergodic if
consists only of the constants.
(A technical point: the theory becomes slightly cleaner if we interpret our measure spaces abstractly (or “pointlessly“), removing the underlying space and quotienting
by the
-ideal of null sets, and considering maps such as
only on this quotient
-algebra (or on the associated von Neumann algebra
or Hilbert space
). However, we will stick with the more traditional setting of classical probability spaces here to keep the notation familiar, but with the understanding that many of the statements below should be understood modulo null sets.)
A factor of a
-system
is another
-system together with a factor map
which commutes with the
-action (thus
for all
) and respects the measure in the sense that
for all
. For instance, the
-invariant factor
, formed by restricting
to the invariant algebra
, is a factor of
. (This factor is the first factor in an important hierachy, the next element of which is the Kronecker factor
, but we will not discuss higher elements of this hierarchy further here.) If
is a factor of
, we refer to
as an extension of
.
From Corollary 2 we have
Corollary 3 (Relative independence) Let
be a
-system for a group
, and let
be a factor of
. Then
and
are relatively independent over their common factor
, in the sense that the spaces
and
are relatively orthogonal over
when all these spaces are embedded into
.
This has a simple consequence regarding the product of two
-systems
and
, in the case when the
action is trivial:
Lemma 4 If
are two
-systems, with the action of
on
trivial, then
is isomorphic to
in the obvious fashion.
This lemma is immediate for countable , since for a
-invariant function
, one can ensure that
holds simultaneously for all
outside of a null set, but is a little trickier for uncountable
.
Proof: It is clear that is a factor of
. To obtain the reverse inclusion, suppose that it fails, thus there is a non-zero
which is orthogonal to
. In particular, we have
orthogonal to
for any
. Since
lies in
, we conclude from Corollary 3 (viewing
as a factor of
) that
is also orthogonal to
. Since
is an arbitrary element of
, we conclude that
is orthogonal to
and in particular is orthogonal to itself, a contradiction. (Thanks to Gergely Harcos for this argument.)
Now we discuss the notion of a group extension.
Definition 5 (Group extension) Let
be an arbitrary group, let
be a
-system, and let
be a compact metrisable group. A
-extension of
is an extension
whose underlying space is
(with
the product of
and the Borel
-algebra on
), the factor map is
, and the shift maps
are given by
where for each
,
is a measurable map (known as the cocycle associated to the
-extension
).
An important special case of a -extension arises when the measure
is the product of
with the Haar measure
on
. In this case,
also has a
-action
that commutes with the
-action, making
a
-system. More generally,
could be the product of
with the Haar measure
of some closed subgroup
of
, with
taking values in
; then
is now a
system. In this latter case we will call
-uniform.
If is a
-extension of
and
is a measurable map, we can define the gauge transform
of
to be the
-extension of
whose measure
is the pushforward of
under the map
, and whose cocycles
for
are given by the formula
It is easy to see that is a
-extension that is isomorphic to
as a
-extension of
; we will refer to
and
as equivalent systems, and
as cohomologous to
. We then have the following fundamental result of Mackey and of Zimmer:
Theorem 6 (Mackey-Zimmer theorem) Let
be an arbitrary group, let
be an ergodic
-system, and let
be a compact metrisable group. Then every ergodic
-extension
of
is equivalent to an
-uniform extension of
for some closed subgroup
of
.
This theorem is usually stated for amenable groups , but by using Theorem 1 (or more precisely, Corollary 3) the result is in fact also valid for arbitrary groups; we give the proof below the fold. (In the usual formulations of the theorem,
and
are also required to be Lebesgue spaces, or at least standard Borel, but again with our abstract approach here, such hypotheses will be unnecessary.) Among other things, this theorem plays an important role in the Furstenberg-Zimmer structural theory of measure-preserving systems (as well as subsequent refinements of this theory by Host and Kra); see this previous blog post for some relevant discussion. One can obtain similar descriptions of non-ergodic extensions by working relative to the invariant factor (or via the ergodic decomposition, if one has enough separability hypotheses on the system), but the result becomes more complicated to state, and we will not do so here; see this paper of Austin for details.
This should be the final thread (for now, at least) for the Polymath8 project (encompassing the original Polymath8a paper, the nearly finished Polymath8b paper, and the retrospective paper), superseding the previous Polymath8b thread (which was quite full) and the Polymath8a/retrospective thread (which was more or less inactive).
On Polymath8a: I talked briefly with Andrew Granville, who is handling the paper for Algebra & Number Theory, and he said that a referee report should be coming in soon. Apparently length of the paper is a bit of an issue (not surprising, as it is 163 pages long) and there will be some suggestions to trim the size down a bit.
In view of the length issue at A&NT, I’m now leaning towards taking up Ken Ono’s offer to submit the Polymath8b paper to the new open access journal “Research in the Mathematical Sciences“. I think the paper is almost ready to be submitted (after the current participants sign off on it, of course), but it might be worth waiting on the Polymath8a referee report in case the changes suggested impact the 8b paper.
Finally, it is perhaps time to start working on the retrospective article, and collect some impressions from participants. I wrote up a quick draft of my own experiences, and also pasted in Pace Nielsen’s thoughts, as well as a contribution from an undergraduate following the project (Andrew Gibson). Hopefully we can collect a few more (either through comments on this blog, through email, or through Dropbox), and then start working on editing them together and finding some suitable concluding points to make about the Polymath8 project, and what lessons we can take from it for future projects of this type.
Given two unit vectors in a real inner product space, one can define the correlation between these vectors to be their inner product
, or in more geometric terms, the cosine of the angle
subtended by
and
. By the Cauchy-Schwarz inequality, this is a quantity between
and
, with the extreme positive correlation
occurring when
are identical, the extreme negative correlation
occurring when
are diametrically opposite, and the zero correlation
occurring when
are orthogonal. This notion is closely related to the notion of correlation between two non-constant square-integrable real-valued random variables
, which is the same as the correlation between two unit vectors
lying in the Hilbert space
of square-integrable random variables, with
being the normalisation of
defined by subtracting off the mean
and then dividing by the standard deviation of
, and similarly for
and
.
One can also define correlation for complex (Hermitian) inner product spaces by taking the real part of the complex inner product to recover a real inner product.
While reading the (highly recommended) recent popular maths book “How not to be wrong“, by my friend and co-author Jordan Ellenberg, I came across the (important) point that correlation is not necessarily transitive: if correlates with
, and
correlates with
, then this does not imply that
correlates with
. A simple geometric example is provided by the three unit vectors
in the Euclidean plane :
and
have a positive correlation of
, as does
and
, but
and
are not correlated with each other. Or: for a typical undergraduate course, it is generally true that good exam scores are correlated with a deep understanding of the course material, and memorising from flash cards are correlated with good exam scores, but this does not imply that memorising flash cards is correlated with deep understanding of the course material.
However, there are at least two situations in which some partial version of transitivity of correlation can be recovered. The first is in the “99%” regime in which the correlations are very close to : if
are unit vectors such that
is very highly correlated with
, and
is very highly correlated with
, then this does imply that
is very highly correlated with
. Indeed, from the identity
(and similarly for and
) and the triangle inequality
Thus, for instance, if and
, then
. This is of course closely related to (though slightly weaker than) the triangle inequality for angles:
Remark 1 (Thanks to Andrew Granville for conversations leading to this observation.) The inequality (1) also holds for sub-unit vectors, i.e. vectors
with
. This comes by extending
in directions orthogonal to all three original vectors and to each other in order to make them unit vectors, enlarging the ambient Hilbert space
if necessary. More concretely, one can apply (1) to the unit vectors
in
.
But even in the “” regime in which correlations are very weak, there is still a version of transitivity of correlation, known as the van der Corput lemma, which basically asserts that if a unit vector
is correlated with many unit vectors
, then many of the pairs
will then be correlated with each other. Indeed, from the Cauchy-Schwarz inequality
Thus, for instance, if for at least
values of
, then (after removing those indices
for which
)
must be at least
, which implies that
for at least
pairs
. Or as another example: if a random variable
exhibits at least
positive correlation with
other random variables
, then if
, at least two distinct
must have positive correlation with each other (although this argument does not tell you which pair
are so correlated). Thus one can view this inequality as a sort of `pigeonhole principle” for correlation.
A similar argument (multiplying each by an appropriate sign
) shows the related van der Corput inequality
and this inequality is also true for complex inner product spaces. (Also, the do not need to be unit vectors for this inequality to hold.)
Geometrically, the picture is this: if positively correlates with all of the
, then the
are all squashed into a somewhat narrow cone centred at
. The cone is still wide enough to allow a few pairs
to be orthogonal (or even negatively correlated) with each other, but (when
is large enough) it is not wide enough to allow all of the
to be so widely separated. Remarkably, the bound here does not depend on the dimension of the ambient inner product space; while increasing the number of dimensions should in principle add more “room” to the cone, this effect is counteracted by the fact that in high dimensions, almost all pairs of vectors are close to orthogonal, and the exceptional pairs that are even weakly correlated to each other become exponentially rare. (See this previous blog post for some related discussion; in particular, Lemma 2 from that post is closely related to the van der Corput inequality presented here.)
A particularly common special case of the van der Corput inequality arises when is a unit vector fixed by some unitary operator
, and the
are shifts
of a single unit vector
. In this case, the inner products
are all equal, and we arrive at the useful van der Corput inequality
(In fact, one can even remove the absolute values from the right-hand side, by using (2) instead of (4).) Thus, to show that has negligible correlation with
, it suffices to show that the shifts of
have negligible correlation with each other.
Here is a basic application of the van der Corput inequality:
Proposition 2 (Weyl equidistribution estimate) Let
be a polynomial with at least one non-constant coefficient irrational. Then one has
where
.
Note that this assertion implies the more general assertion
for any non-zero integer (simply by replacing
by
), which by the Weyl equidistribution criterion is equivalent to the sequence
being asymptotically equidistributed in
.
Proof: We induct on the degree of the polynomial
, which must be at least one. If
is equal to one, the claim is easily established from the geometric series formula, so suppose that
and that the claim has already been proven for
. If the top coefficient
of
is rational, say
, then by partitioning the natural numbers into residue classes modulo
, we see that the claim follows from the induction hypothesis; so we may assume that the top coefficient
is irrational.
In order to use the van der Corput inequality as stated above (i.e. in the formalism of inner product spaces) we will need a non-principal ultrafilter (see e.g this previous blog post for basic theory of ultrafilters); we leave it as an exercise to the reader to figure out how to present the argument below without the use of ultrafilters (or similar devices, such as Banach limits). The ultrafilter
defines an inner product
on bounded complex sequences
by setting
Strictly speaking, this inner product is only positive semi-definite rather than positive definite, but one can quotient out by the null vectors to obtain a positive-definite inner product. To establish the claim, it will suffice to show that
for every non-principal ultrafilter .
Note that the space of bounded sequences (modulo null vectors) admits a shift , defined by
This shift becomes unitary once we quotient out by null vectors, and the constant sequence is clearly a unit vector that is invariant with respect to the shift. So by the van der Corput inequality, we have
for any . But we may rewrite
. Then observe that if
,
is a polynomial of degree
whose
coefficient is irrational, so by induction hypothesis we have
for
. For
we of course have
, and so
for any . Letting
, we obtain the claim.
Recent Comments