You are currently browsing the monthly archive for February 2013.
The following result is due independently to Furstenberg and to Sarkozy:
Theorem 1 (Furstenberg-Sarkozy theorem) Let
, and suppose that
is sufficiently large depending on
. Then every subset
of
of density
at least
contains a pair
for some natural numbers
with
.
This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern is replaced by
or
for some fixed
respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.
A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift . As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.
The first step is to use the density increment argument that goes back to Roth. For any , let
denote the assertion that for
sufficiently large, all sets
of density at least
contain a pair
with
non-zero. Note that
is vacuously true for
. We will show that for any
, one has the implication
for some absolute constant . This implies that
is true for any
(as can be seen by considering the infimum of all
for which
holds), which gives Theorem 1.
It remains to establish the implication (1). Suppose for sake of contradiction that we can find for which
holds (for some sufficiently small absolute constant
), but
fails. Thus, we can find arbitrarily large
, and subsets
of
of density at least
, such that
contains no patterns of the form
with
non-zero. In particular, we have
(The exact ranges of and
are not too important here, and could be replaced by various other small powers of
if desired.)
Let be the density of
, so that
. Observe that
and
If we thus set , then
In particular, for large enough,
On the other hand, one easily sees that
and hence by the Cauchy-Schwarz inequality
which we can rearrange as
Shifting by
we obtain (again for
large enough)
In particular, by the pigeonhole principle (and deleting the diagonal case , which we can do for
large enough) we can find distinct
such that
so in particular
If we set and shift
by
, we can simplify this (again for
large enough) as
On the other hand, since
we have
for any , and thus
Averaging this with (2) we conclude that
In particular, by the pigeonhole principle we can find such that
or equivalently has density at least
on the arithmetic progression
, which has length
and spacing
, for some absolute constant
. By partitioning this progression into subprogressions of spacing
and length
(plus an error set of size
, we see from the pigeonhole principle that we can find a progression
of length
and spacing
on which
has density at least
(and hence at least
) for some absolute constant
. If we then apply the induction hypothesis to the set
we conclude (for large enough) that
contains a pair
for some natural numbers
with
non-zero. This implies that
lie in
, a contradiction, establishing the implication (1).
A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for (say), any subset of
of density at least
for some sufficiently large absolute constant
contains a pair
with
non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as
. On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least
(although the original argument of Sarkozy, which is a little more intricate, works up to
). In the other direction, a construction of Rusza gives a set of density
without any pairs
.
Remark 1 A similar argument also applies with
replaced by
for fixed
, because this sort of pattern is preserved by affine dilations
into arithmetic progressions whose spacing
is a
power. By re-introducing Fourier analysis, one can also perform an argument of this type for
where
is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form
for polynomials
that consist of more than a single monomial (and with the normalisation
, to avoid local obstructions), because one no longer has this preservation property.
The fundamental notions of calculus, namely differentiation and integration, are often viewed as being the quintessential concepts in mathematical analysis, as their standard definitions involve the concept of a limit. However, it is possible to capture most of the essence of these notions by purely algebraic means (almost completely avoiding the use of limits, Riemann sums, and similar devices), which turns out to be useful when trying to generalise these concepts to more abstract situations in which it becomes convenient to permit the underlying number systems involved to be something other than the real or complex numbers, even if this makes many standard analysis constructions unavailable. For instance, the algebraic notion of a derivation often serves as a substitute for the analytic notion of a derivative in such cases, by abstracting out the key algebraic properties of differentiation, namely linearity and the Leibniz rule (also known as the product rule).
Abstract algebraic analogues of integration are less well known, but can still be developed. To motivate such an abstraction, consider the integration functional from the space
of complex-valued Schwarz functions
to the complex numbers, defined by
where the integration on the right is the usual Lebesgue integral (or improper Riemann integral) from analysis. This functional obeys two obvious algebraic properties. Firstly, it is linear over , thus
for all and
. Secondly, it is translation invariant, thus
for all , where
is the translation of
by
. Motivated by the uniqueness theory of Haar measure, one might expect that these two axioms already uniquely determine
after one sets a normalisation, for instance by requiring that
This is not quite true as stated (one can modify the proof of the Hahn-Banach theorem, after first applying a Fourier transform, to create pathological translation-invariant linear functionals on that are not multiples of the standard Fourier transform), but if one adds a mild analytical axiom, such as continuity of
(using the usual Schwartz topology on
), then the above axioms are enough to uniquely pin down the notion of integration. Indeed, if
is a continuous linear functional that is translation invariant, then from the linearity and translation invariance axioms one has
for all and non-zero reals
. If
is Schwartz, then as
, one can verify that the Newton quotients
converge in the Schwartz topology to the derivative
of
, so by the continuity axiom one has
Next, note that any Schwartz function of integral zero has an antiderivative which is also Schwartz, and so annihilates all zero-integral Schwartz functions, and thus must be a scalar multiple of the usual integration functional. Using the normalisation (4), we see that
must therefore be the usual integration functional, giving the claimed uniqueness.
Motivated by the above discussion, we can define the notion of an abstract integration functional taking values in some vector space
, and applied to inputs
in some other vector space
that enjoys a linear action
(the “translation action”) of some group
, as being a functional which is both linear and translation invariant, thus one has the axioms (1), (2), (3) for all
, scalars
, and
. The previous discussion then considered the special case when
,
,
, and
was the usual translation action.
Once we have performed this abstraction, we can now present analogues of classical integration which bear very little analytic resemblance to the classical concept, but which still have much of the algebraic structure of integration. Consider for instance the situation in which we keep the complex range , the translation group
, and the usual translation action
, but we replace the space
of Schwartz functions by the space
of polynomials
of degree at most
with complex coefficients, where
is a fixed natural number; note that this space is translation invariant, so it makes sense to talk about an abstract integration functional
. Of course, one cannot apply traditional integration concepts to non-zero polynomials, as they are not absolutely integrable. But one can repeat the previous arguments to show that any abstract integration functional must annihilate derivatives of polynomials of degree at most
:
Clearly, every polynomial of degree at most is thus annihilated by
, which makes
a scalar multiple of the functional that extracts the top coefficient
of a polynomial, thus if one sets a normalisation
for some constant , then one has
for any polynomial . So we see that up to a normalising constant, the operation of extracting the top order coefficient of a polynomial of fixed degree serves as the analogue of integration. In particular, despite the fact that integration is supposed to be the “opposite” of differentiation (as indicated for instance by (5)), we see in this case that integration is basically (
-fold) differentiation; indeed, compare (6) with the identity
In particular, we see, in contrast to the usual Lebesgue integral, the integration functional (6) can be localised to an arbitrary location: one only needs to know the germ of the polynomial at a single point
in order to determine the value of the functional (6). This localisation property may initially seem at odds with the translation invariance, but the two can be reconciled thanks to the extremely rigid nature of the class
, in contrast to the Schwartz class
which admits bump functions and so can generate local phenomena that can only be detected in small regions of the underlying spatial domain, and which therefore forces any translation-invariant integration functional on such function classes to measure the function at every single point in space.
The reversal of the relationship between integration and differentiation is also reflected in the fact that the abstract integration operation on polynomials interacts with the scaling operation in essentially the opposite way from the classical integration operation. Indeed, for classical integration on
, one has
for Schwartz functions , and so in this case the integration functional
obeys the scaling law
In contrast, the abstract integration operation defined in (6) obeys the opposite scaling law
Remark 1 One way to interpret what is going on is to view the integration operation (6) as a renormalised version of integration. A polynomial
is, in general, not absolutely integrable, and the partial integrals
diverge as
. But if one renormalises these integrals by the factor
, then one recovers convergence,
thus giving an interpretation of (6) as a renormalised classical integral, with the renormalisation being responsible for the unusual scaling relationship in (7). However, this interpretation is a little artificial, and it seems that it is best to view functionals such as (6) from an abstract algebraic perspective, rather than to try to force an analytic interpretation on them.
Now we return to the classical Lebesgue integral
As noted earlier, this integration functional has a translation invariance associated to translations along the real line , as well as a dilation invariance by real dilation parameters
. However, if we refine the class
of functions somewhat, we can obtain a stronger family of invariances, in which we allow complex translations and dilations. More precisely, let
denote the space of all functions
which are entire (or equivalently, are given by a Taylor series with an infinite radius of convergence around the origin) and also admit rapid decay in a sectorial neighbourhood of the real line, or more precisely there exists an
such that for every
there exists
such that one has the bound
whenever . For want of a better name, we shall call elements of this space Schwartz entire functions. This is clearly a complex vector space. A typical example of a Schwartz entire function are the complex gaussians
where are complex numbers with
. From the Cauchy integral formula (and its derivatives) we see that if
lies in
, then the restriction of
to the real line lies in
; conversely, from analytic continuation we see that every function in
has at most one extension in
. Thus one can identify
with a subspace of
, and in particular the integration functional (8) is inherited by
, and by abuse of notation we denote the resulting functional
as
also. Note, in analogy with the situation with polynomials, that this abstract integration functional is somewhat localised; one only needs to evaluate the function
on the real line, rather than the entire complex plane, in order to compute
. This is consistent with the rigid nature of Schwartz entire functions, as one can uniquely recover the entire function from its values on the real line by analytic continuation.
Of course, the functional remains translation invariant with respect to real translation:
However, thanks to contour shifting, we now also have translation invariance with respect to complex translation:
where of course we continue to define the translation operator for complex
by the usual formula
. In a similar vein, we also have the scaling law
for any , if
is a complex number sufficiently close to
(where “sufficiently close” depends on
, and more precisely depends on the sectoral aperture parameter
associated to
); again, one can verify that
lies in
for
sufficiently close to
. These invariances (which relocalise the integration functional
onto other contours than the real line
) are very useful for computing integrals, and in particular for computing gaussian integrals. For instance, the complex translation invariance tells us (after shifting by
) that
when with
, and then an application of the complex scaling law (and a continuity argument, observing that there is a compact path connecting
to
in the right half plane) gives
using the branch of on the right half-plane for which
. Using the normalisation (4) we thus have
giving the usual gaussian integral formula
This is a basic illustration of the power that a large symmetry group (in this case, the complex homothety group) can bring to bear on the task of computing integrals.
One can extend this sort of analysis to higher dimensions. For any natural number , let
denote the space of all functions
which is jointly entire in the sense that
can be expressed as a Taylor series in
which is absolutely convergent for all choices of
, and such that there exists an
such that for any
there is
for which one has the bound
whenever for all
, where
and
. Again, we call such functions Schwartz entire functions; a typical example is the function
where is an
complex symmetric matrix with positive definite real part,
is a vector in
, and
is a complex number. We can then define an abstract integration functional
by integration on the real slice
:
where is the usual Lebesgue measure on
. By contour shifting in each of the
variables
separately, we see that
is invariant with respect to complex translations of each of the
variables, and is thus invariant under translating the joint variable
by
. One can also verify the scaling law
for complex matrices
sufficiently close to the origin, where
. This can be seen for shear transformations
by Fubini’s theorem and the aforementioned translation invariance, while for diagonal transformations near the origin this can be seen from
applications of one-dimensional scaling law, and the general case then follows by composition. Among other things, these laws then easily lead to the higher-dimensional generalisation
whenever is a complex symmetric matrix with positive definite real part,
is a vector in
, and
is a complex number, basically by repeating the one-dimensional argument sketched earlier. Here, we choose the branch of
for all matrices
in the indicated class for which
.
Now we turn to an integration functional suitable for computing complex gaussian integrals such as
where is now a complex variable
is the adjoint
is a complex
matrix with positive definite Hermitian part,
are column vectors in
,
is a complex number, and
is
times Lebesgue measure on
. (The factors of two here turn out to be a natural normalisation, but they can be ignored on a first reading.) As we shall see later, such integrals are relevant when performing computations on the Gaussian Unitary Ensemble (GUE) in random matrix theory. Note that the integrand here is not complex analytic due to the presence of the complex conjugates. However, this can be dealt with by the trick of replacing the complex conjugate
by a variable
which is formally conjugate to
, but which is allowed to vary independently of
. More precisely, let
be the space of all functions
of two independent
-tuples
of complex variables, which is jointly entire in all variables (in the sense defined previously, i.e. there is a joint Taylor series that is absolutely convergent for all independent choices of
), and such that there is an
such that for every
there is
such that one has the bound
whenever . We will call such functions Schwartz analytic. Note that the integrand in (11) is Schwartz analytic when
has positive definite Hermitian part, if we reinterpret
as the transpose of
rather than as the adjoint of
in order to make the integrand entire in
and
. We can then define an abstract integration functional
by the formula
thus can be localised to the slice
of
(though, as with previous functionals, one can use contour shifting to relocalise
to other slices also.) One can also write this integral as
and note that the integrand here is a Schwartz entire function on , thus linking the Schwartz analytic integral with the Schwartz entire integral. Using this connection, one can verify that this functional
is invariant with respect to translating
and
by independent shifts in
(thus giving a
translation symmetry), and one also has the independent dilation symmetry
for complex matrices
that are sufficiently close to the identity, where
. Arguing as before, we can then compute (11) as
In particular, this gives an integral representation for the determinant-reciprocal of a complex
matrix with positive definite Hermitian part, in terms of gaussian expressions in which
only appears linearly in the exponential:
This formula is then convenient for computing statistics such as
for random matrices drawn from the Gaussian Unitary Ensemble (GUE), and some choice of spectral parameter
with
; we review this computation later in this post. By the trick of matrix differentiation of the determinant (as reviewed in this recent blog post), one can also use this method to compute matrix-valued statistics such as
However, if one restricts attention to classical integrals over real or complex (and in particular, commuting or bosonic) variables, it does not seem possible to easily eradicate the negative determinant factors in such calculations, which is unfortunate because many statistics of interest in random matrix theory, such as the expected Stieltjes transform
which is the Stieltjes transform of the density of states. However, it turns out (as I learned recently from Peter Sarnak and Tom Spencer) that it is possible to cancel out these negative determinant factors by balancing the bosonic gaussian integrals with an equal number of fermionic gaussian integrals, in which one integrates over a family of anticommuting variables. These fermionic integrals are closer in spirit to the polynomial integral (6) than to Lebesgue type integrals, and in particular obey a scaling law which is inverse to the Lebesgue scaling (in particular, a linear change of fermionic variables ends up transforming a fermionic integral by
rather than
), which conveniently cancels out the reciprocal determinants in the previous calculations. Furthermore, one can combine the bosonic and fermionic integrals into a unified integration concept, known as the Berezin integral (or Grassmann integral), in which one integrates functions of supervectors (vectors with both bosonic and fermionic components), and is of particular importance in the theory of supersymmetry in physics. (The prefix “super” in physics means, roughly speaking, that the object or concept that the prefix is attached to contains both bosonic and fermionic aspects.) When one applies this unified integration concept to gaussians, this can lead to quite compact and efficient calculations (provided that one is willing to work with “super”-analogues of various concepts in classical linear algebra, such as the supertrace or superdeterminant).
Abstract integrals of the flavour of (6) arose in quantum field theory, when physicists sought to formally compute integrals of the form
where are familiar commuting (or bosonic) variables (which, in particular, can often be localised to be scalar variables taking values in
or
), while
were more exotic anticommuting (or fermionic) variables, taking values in some vector space of fermions. (As we shall see shortly, one can formalise these concepts by working in a supercommutative algebra.) The integrand
was a formally analytic function of
, in that it could be expanded as a (formal, noncommutative) power series in the variables
. For functions
that depend only on bosonic variables, it is certainly possible for such analytic functions to be in the Schwartz class and thus fall under the scope of the classical integral, as discussed previously. However, functions
that depend on fermionic variables
behave rather differently. Indeed, a fermonic variable
must anticommute with itself, so that
. In particular, any power series in
terminates after the linear term in
, so that a function
can only be analytic in
if it is a polynomial of degree at most
in
; more generally, an analytic function
of
fermionic variables
must be a polynomial of degree at most
, and an analytic function
of
bosonic and
fermionic variables can be Schwartz in the bosonic variables but will be polynomial in the fermonic variables. As such, to interpret the integral (14), one can use classical (Lebesgue) integration (or the variants discussed above for integrating Schwartz entire or Schwartz analytic functions) for the bosonic variables, but must use abstract integrals such as (6) for the fermonic variables, leading to the concept of Berezin integration mentioned earlier.
In this post I would like to set out some of the basic algebraic formalism of Berezin integration, particularly with regards to integration of gaussian-type expressions, and then show how this formalism can be used to perform computations involving GUE (for instance, one can compute the density of states of GUE by this machinery without recourse to the theory of orthogonal polynomials). The use of supersymmetric gaussian integrals to analyse ensembles such as GUE appears in the work of Efetov (and was also proposed in the slightly earlier works of Parisi-Sourlas and McKane, with a related approach also appearing in the work of Wegner); the material here is adapted from this survey of Mirlin, as well as the later papers of Disertori-Pinson-Spencer and of Disertori.
Consider the free Schrödinger equation in spatial dimensions, which I will normalise as
where is the unknown field and
is the spatial Laplacian. To avoid irrelevant technical issues I will restrict attention to smooth (classical) solutions to this equation, and will work locally in spacetime avoiding issues of decay at infinity (or at other singularities); I will also avoid issues involving branch cuts of functions such as
(if one wishes, one can restrict
to be even in order to safely ignore all branch cut issues). The space of solutions to (1) enjoys a number of symmetries. A particularly non-obvious symmetry is the pseudoconformal symmetry: if
solves (1), then the pseudoconformal solution
defined by
for can be seen after some computation to also solve (1). (If
has suitable decay at spatial infinity and one chooses a suitable branch cut for
, one can extend
continuously to the
spatial slice, whereupon it becomes essentially the spatial Fourier transform of
, but we will not need this fact for the current discussion.)
An analogous symmetry exists for the free wave equation in spatial dimensions, which I will write as
where is the unknown field. In analogy to pseudoconformal symmetry, we have conformal symmetry: if
solves (3), then the function
, defined in the interior
of the light cone by the formula
also solves (3).
There are also some direct links between the Schrödinger equation in dimensions and the wave equation in
dimensions. This can be easily seen on the spacetime Fourier side: solutions to (1) have spacetime Fourier transform (formally) supported on a
-dimensional hyperboloid, while solutions to (3) have spacetime Fourier transform formally supported on a
-dimensional cone. To link the two, one then observes that the
-dimensional hyperboloid can be viewed as a conic section (i.e. hyperplane slice) of the
-dimensional cone. In physical space, this link is manifested as follows: if
solves (1), then the function
defined by
solves (3). More generally, for any non-zero scaling parameter , the function
defined by
solves (3).
As an “extra challenge” posed in an exercise in one of my books (Exercise 2.28, to be precise), I asked the reader to use the embeddings (or more generally
) to explicitly connect together the pseudoconformal transformation
and the conformal transformation
. It turns out that this connection is a little bit unusual, with the “obvious” guess (namely, that the embeddings
intertwine
and
) being incorrect, and as such this particular task was perhaps too difficult even for a challenge question. I’ve been asked a couple times to provide the connection more explicitly, so I will do so below the fold.
Let be
Hermitian matrices, with eigenvalues
and
. The Harish-Chandra–Itzykson-Zuber integral formula exactly computes the integral
where is integrated over the Haar probability measure of the unitary group
and
is a non-zero complex parameter, as the expression
when the eigenvalues of are simple, where
denotes the Vandermonde determinant
and is the constant
There are at least two standard ways to prove this formula in the literature. One way is by applying the Duistermaat-Heckman theorem to the pushforward of Liouville measure on the coadjoint orbit (or more precisely, a rotation of such an orbit by
) under the moment map
, and then using a stationary phase expansion. Another way, which I only learned about recently, is to use the formulae for evolution of eigenvalues under Dyson Brownian motion (as well as the closely related formulae for the GUE ensemble), which were derived in this previous blog post. Both of these approaches can be found in several places in the literature (the former being observed in the original paper of Duistermaat and Heckman, and the latter observed in the paper of Itzykson and Zuber as well as in this later paper of Johansson), but I thought I would record both of these here for my own benefit.
The Harish-Chandra-Itzykson-Zuber formula can be extended to other compact Lie groups than . At first glance, this might suggest that these formulae could be of use in the study of the GOE ensemble, but unfortunately the Lie algebra associated to
corresponds to real anti-symmetric matrices rather than real symmetric matrices. This also occurs in the
case, but there one can simply multiply by
to rotate a complex skew-Hermitian matrix into a complex Hermitian matrix. This is consistent, though, with the fact that the (somewhat rarely studied) anti-symmetric GOE ensemble has cleaner formulae (in particular, having a determinantal structure similar to GUE) than the (much more commonly studied) symmetric GOE ensemble.
[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]
One of the most fundamental partial differential equations in mathematics is the heat equation
where is a scalar function
of both time and space, and
is the Laplacian
. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of
in the definition of the heat propagator
is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.
In probability theory, this equation takes on particular significance when is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that
for all . (Actually, it suffices to verify this constraint at time
, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret
as the probability distribution of a Brownian motion
where is a stochastic process with initial probability distribution
; see for instance this previous blog post for more discussion.
A model example of a solution to the heat equation to keep in mind is that of the fundamental solution
defined for any , which represents the distribution of Brownian motion of a particle starting at the origin
at time
. At time
,
represents an
-valued random variable, each coefficient of which is an independent random variable of mean zero and variance
. (As
,
converges in the sense of distributions to a Dirac mass at the origin.)
The heat equation can also be viewed as the gradient flow for the Dirichlet form
since one has the integration by parts identity
for all smooth, rapidly decreasing , which formally implies that
is (half of) the negative gradient of the Dirichlet energy
with respect to the
inner product. Among other things, this implies that the Dirichlet energy decreases in time:
For instance, for the fundamental solution (3), one can verify for any time that
(assuming I have not made a mistake in the calculation). In a similar spirit we have
Since is non-negative, the formula (6) implies that
is integrable in time, and in particular we see that
converges to zero as
, in some averaged
sense at least; similarly, (8) suggests that
also converges to zero. This suggests that
converges to a constant function; but as
is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in
to decay to zero in some sense as
. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the
norm like
.
Since , we also observe the basic cancellation property
for any function .
There are other quantities relating to that also decrease in time under heat flow, particularly in the important case when
is a probability measure. In this case, it is natural to introduce the entropy
Thus, for instance, if is the uniform distribution on some measurable subset
of
of finite measure
, the entropy would be
. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has
for any
, reflecting the fact that
is approximately uniformly distributed on a ball of radius
(and thus of measure
).
A short formal computation shows (if one assumes for simplicity that is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that
where is the square root of
. For instance, if
is the fundamental solution (3), one can check that
(note that this is a significantly cleaner formula than (7)!).
In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.
Actually, one can say more: the rate of decrease of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have
valid for for any smooth function , we see from (1) that
and thus (again assuming that , and hence
, is strictly positive to avoid technicalities)
We thus have
It is now convenient to compute using the Einstein summation convention to hide the summation over indices . We have
and
By integration by parts and interchanging partial derivatives, we may write the first integral as
and from the quotient and product rules, we may write the second integral as
Gathering terms, completing the square, and making the summations explicit again, we see that
and so in particular is always decreasing.
The above identity can also be written as
Exercise 1 Give an alternate proof of the above identity by writing
,
and deriving the equation
for
.
It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.
Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our survey “Small doubling in groups“, for the proceedings of the upcoming Erdos Centennial. This is a short survey of the known results on classifying finite subsets of an (abelian) additive group
or a (not necessarily abelian) multiplicative group
that have small doubling in the sense that the sum set
or product set
is small. Such sets behave approximately like finite subgroups of
(and there is a closely related notion of an approximate group in which the analogy is even tighter) , and so this subject can be viewed as a sort of approximate version of finite group theory. (Unfortunately, thus far the theory does not have much new to say about the classification of actual finite groups; progress has been largely made instead on classifying the (highly restricted) number of ways in which approximate groups can differ from a genuine group.)
In the classical case when is the integers
, these sets were classified (in a qualitative sense, at least) by a celebrated theorem of Freiman, which roughly speaking says that such sets
are necessarily “commensurate” in some sense with a (generalised) arithmetic progression
of bounded rank. There are a number of essentially equivalent ways to define what “commensurate” means here; for instance, in the original formulation of the theorem, one asks that
be a dense subset of
, but in modern formulations it is often more convenient to require instead that
be of comparable size to
and be covered by a bounded number of translates of
, or that
and
have an intersection that is of comparable size to both
and
(cf. the notion of commensurability in group theory).
Freiman’s original theorem was extended to more general abelian groups in a sequence of papers culminating in the paper of Green and Ruzsa that handled arbitrary abelian groups. As such groups now contain non-trivial finite subgroups, the conclusion of the theorem must be modified by allowing for “coset progressions” , which can be viewed as “extensions” of generalized arithmetic progressions
by genuine finite groups
.
The proof methods in these abelian results were Fourier-analytic in nature (except in the cases of sets of very small doubling, in which more combinatorial approaches can be applied, and there were also some geometric or combinatorial methods that gave some weaker structural results). As such, it was a challenge to extend these results to nonabelian groups, although for various important special types of ambient group (such as an linear group over a finite or infinite field) it turns out that one can use tools exploiting the special structure of those groups (e.g. for linear groups one would use tools from Lie theory and algebraic geometry) to obtain quite satisfactory results; see e.g. this survey of Pyber and Szabo for the linear case. When the ambient group
is completely arbitrary, it turns out the problem is closely related to the classical Hilbert’s fifth problem of determining the minimal requirements of a topological group in order for such groups to have Lie structure; this connection was first observed and exploited by Hrushovski, and then used by Breuillard, Green, and myself to obtain the analogue of Freiman’s theorem for an arbitrary nonabelian group.
This survey is too short to discuss in much detail the proof techniques used in these results (although the abelian case is discussed in this book of mine with Vu, and the nonabelian case discussed in this more recent book of mine), but instead focuses on the statements of the various known results, as well as some remaining open questions in the subject (in particular, there is substantial work left to be done in making the estimates more quantitative, particularly in the nonabelian setting).
Recent Comments