You are currently browsing the tag archive for the ‘estimates’ tag.
Suppose is a continuous (but nonlinear) map from one normed vector space
to another
. The continuity means, roughly speaking, that if
are such that
is small, then
is also small (though the precise notion of “smallness” may depend on
or
, particularly if
is not known to be uniformly continuous). If
is known to be differentiable (in, say, the Fréchet sense), then we in fact have a linear bound of the form
for some depending on
, if
is small enough; one can of course make
independent of
(and drop the smallness condition) if
is known instead to be Lipschitz continuous.
In many applications in analysis, one would like more explicit and quantitative bounds that estimate quantities like in terms of quantities like
. There are a number of ways to do this. First of all, there is of course the trivial estimate arising from the triangle inequality:
This estimate is usually not very good when and
are close together. However, when
and
are far apart, this estimate can be more or less sharp. For instance, if the magnitude of
varies so much from
to
that
is more than (say) twice that of
, or vice versa, then (1) is sharp up to a multiplicative constant. Also, if
is oscillatory in nature, and the distance between
and
exceeds the “wavelength” of the oscillation of
at
(or at
), then one also typically expects (1) to be close to sharp. Conversely, if
does not vary much in magnitude from
to
, and the distance between
and
is less than the wavelength of any oscillation present in
, one expects to be able to improve upon (1).
When is relatively simple in form, one can sometimes proceed simply by substituting
. For instance, if
is the squaring function
in a commutative ring
, one has
and thus
or in terms of the original variables one has
If the ring is not commutative, one has to modify this to
Thus, for instance, if are
matrices and
denotes the operator norm, one sees from the triangle inequality and the sub-multiplicativity
of operator norm that
If involves
(or various components of
) in several places, one can sometimes get a good estimate by “swapping”
with
at each of the places in turn, using a telescoping series. For instance, if we again use the squaring function
in a non-commutative ring, we have
which for instance leads to a slight improvement of (2):
More generally, for any natural number , one has the identity
in a commutative ring, while in a non-commutative ring one must modify this to
and for matrices one has
Exercise 1 If
and
are unitary
matrices, show that the commutator
obeys the inequality
(Hint: first control
.)
Now suppose (for simplicity) that is a map between Euclidean spaces. If
is continuously differentiable, then one can use the fundamental theorem of calculus to write
where is any continuously differentiable path from
to
. For instance, if one uses the straight line path
, one has
In the one-dimensional case , this simplifies to
Among other things, this immediately implies the factor theorem for functions: if
is a
function for some
that vanishes at some point
, then
factors as the product of
and some
function
. Another basic consequence is that if
is uniformly bounded in magnitude by some constant
, then
is Lipschitz continuous with the same constant
.
Applying (4) to the power function , we obtain the identity
which can be compared with (3). Indeed, for and
close to
, one can use logarithms and Taylor expansion to arrive at the approximation
, so (3) behaves a little like a Riemann sum approximation to (5).
Exercise 2 For each
, let
and
be random variables taking values in a measurable space
, and let
be a bounded measurable function.
- (i) (Lindeberg exchange identity) Show that
- (ii) (Knowles-Yin exchange identity) Show that
where
is a mixture of
and
, with
uniformly drawn from
independently of each other and of the
.
- (iii) Discuss the relationship between the identities in parts (i), (ii) with the identities (3), (5).
(The identity in (i) is the starting point for the Lindeberg exchange method in probability theory, discussed for instance in this previous post. The identity in (ii) can also be used in the Lindeberg exchange method; the terms in the right-hand side are slightly more symmetric in the indices
, which can be a technical advantage in some applications; see this paper of Knowles and Yin for an instance of this.)
Exercise 3 If
is continuously
times differentiable, establish Taylor’s theorem with remainder
If
is bounded, conclude that
For real scalar functions , the average value of the continuous real-valued function
must be attained at some point
in the interval
. We thus conclude the mean-value theorem
for some (that can depend on
,
, and
). This can for instance give a second proof of fact that continuously differentiable functions
with bounded derivative are Lipschitz continuous. However it is worth stressing that the mean-value theorem is only available for real scalar functions; it is false for instance for complex scalar functions. A basic counterexample is given by the function
; there is no
for which
. On the other hand, as
has magnitude
, we still know from (4) that
is Lipschitz of constant
, and when combined with (1) we obtain the basic bounds
which are already very useful for many applications.
Exercise 4 Let
be
matrices, and let
be a non-negative real.
- (i) Establish the Duhamel formula
where
denotes the matrix exponential of
. (Hint: Differentiate
or
in
.)
- (ii) Establish the iterated Duhamel formula
for any
.
- (iii) Establish the infinitely iterated Duhamel formula
- (iv) If
is an
matrix depending in a continuously differentiable fashion on
, establish the variation formula
where
is the adjoint representation
applied to
, and
is the function
(thus
for non-zero
), with
defined using functional calculus.
We remark that further manipulation of (iv) of the above exercise using the fundamental theorem of calculus eventually leads to the Baker-Campbell-Hausdorff-Dynkin formula, as discussed in this previous blog post.
Exercise 5 Let
be positive definite
matrices, and let
be an
matrix. Show that there is a unique solution
to the Sylvester equation
which is given by the formula
In the above examples we had applied the fundamental theorem of calculus along linear curves . However, it is sometimes better to use other curves. For instance, the circular arc
can be useful, particularly if
and
are “orthogonal” or “independent” in some sense; a good example of this is the proof by Maurey and Pisier of the gaussian concentration inequality, given in Theorem 8 of this previous blog post. In a similar vein, if one wishes to compare a scalar random variable
of mean zero and variance one with a Gaussian random variable
of mean zero and variance one, it can be useful to introduce the intermediate random variables
(where
and
are independent); note that these variables have mean zero and variance one, and after coupling them together appropriately they evolve by the Ornstein-Uhlenbeck process, which has many useful properties. For instance, one can use these ideas to establish monotonicity formulae for entropy; see e.g. this paper of Courtade for an example of this and further references. More generally, one can exploit curves
that flow according to some geometrically natural ODE or PDE; several examples of this occur famously in Perelman’s proof of the Poincaré conjecture via Ricci flow, discussed for instance in this previous set of lecture notes.
In some cases, it is difficult to compute or the derivative
directly, but one can instead proceed by implicit differentiation, or some variant thereof. Consider for instance the matrix inversion map
(defined on the open dense subset of
matrices consisting of invertible matrices). If one wants to compute
for
close to
, one can write temporarily write
, thus
Multiplying both sides on the left by to eliminate the
term, and on the right by
to eliminate the
term, one obtains
and thus on reversing these steps we arrive at the basic identity
For instance, if are
matrices, and we consider the resolvents
then we have the resolvent identity
as long as does not lie in the spectrum of
or
(for instance, if
,
are self-adjoint then one can take
to be any strictly complex number). One can iterate this identity to obtain
for any natural number ; in particular, if
has operator norm less than one, one has the Neumann series
Similarly, if is a family of invertible matrices that depends in a continuously differentiable fashion on a time variable
, then by implicitly differentiating the identity
in using the product rule, we obtain
and hence
(this identity may also be easily derived from (6)). One can then use the fundamental theorem of calculus to obtain variants of (6), for instance by using the curve we arrive at
assuming that the curve stays entirely within the set of invertible matrices. While this identity may seem more complicated than (6), it is more symmetric, which conveys some advantages. For instance, using this identity it is easy to see that if are positive definite with
in the sense of positive definite matrices (that is,
is positive definite), then
. (Try to prove this using (6) instead!)
Exercise 6 If
is an invertible
matrix and
are
vectors, establish the Sherman-Morrison formula
whenever
is a scalar such that
is non-zero. (See also this previous blog post for more discussion of these sorts of identities.)
One can use the Cauchy integral formula to extend these identities to other functions of matrices. For instance, if is an entire function, and
is a counterclockwise contour that goes around the spectrum of both
and
, then we have
and similarly
and hence by (7) one has
similarly, if depends on
in a continuously differentiable fashion, then
as long as goes around the spectrum of
.
Exercise 7 If
is an
matrix depending continuously differentiably on
, and
is an entire function, establish the tracial chain rule
In a similar vein, given that the logarithm function is the antiderivative of the reciprocal, one can express the matrix logarithm of a positive definite matrix by the fundamental theorem of calculus identity
(with the constant term needed to prevent a logarithmic divergence in the integral). Differentiating, we see that if
is a family of positive definite matrices depending continuously on
, that
This can be used for instance to show that is a monotone increasing function, in the sense that
whenever
in the sense of positive definite matrices. One can of course integrate this formula to obtain some formulae for the difference
of the logarithm of two positive definite matrices
.
To compare the square root of two positive definite matrices
is trickier; there are multiple ways to proceed. One approach is to use contour integration as before (but one has to take some care to avoid branch cuts of the square root). Another to express the square root in terms of exponentials via the formula
where is the gamma function; this formula can be verified by first diagonalising
to reduce to the scalar case and using the definition of the Gamma function. Then one has
and one can use some of the previous identities to control . This is pretty messy though. A third way to proceed is via implicit differentiation. If for instance
is a family of positive definite matrices depending continuously differentiably on
, we can differentiate the identity
to obtain
This can for instance be solved using Exercise 5 to obtain
and this can in turn be integrated to obtain a formula for . This is again a rather messy formula, but it does at least demonstrate that the square root is a monotone increasing function on positive definite matrices:
implies
.
Several of the above identities for matrices can be (carefully) extended to operators on Hilbert spaces provided that they are sufficiently well behaved (in particular, if they have a good functional calculus, and if various spectral hypotheses are obeyed). We will not attempt to do so here, however.
It occurred to me recently that the mathematical blog medium may be a good venue not just for expository “short stories” on mathematical concepts or results, but also for more technical discussions of individual mathematical “tricks”, which would otherwise not be significant enough to warrant a publication-length (and publication-quality) article. So I thought today that I would discuss the amplification trick in harmonic analysis and combinatorics (and in particular, in the study of estimates); this trick takes an established estimate involving an arbitrary object (such as a function f), and obtains a stronger (or amplified) estimate by transforming the object in a well-chosen manner (often involving some new parameters) into a new object, applying the estimate to that new object, and seeing what that estimate says about the original object (after optimising the parameters or taking a limit). The amplification trick works particularly well for estimates which enjoy some sort of symmetry on one side of the estimate that is not represented on the other side; indeed, it can be viewed as a way to “arbitrage” differing amounts of symmetry between the left- and right-hand sides of an estimate. It can also be used in the contrapositive, amplifying a weak counterexample to an estimate into a strong counterexample. This trick also sheds some light as to why dimensional analysis works; an estimate which is not dimensionally consistent can often be amplified into a stronger estimate which is dimensionally consistent; in many cases, this new estimate is so strong that it cannot in fact be true, and thus dimensionally inconsistent inequalities tend to be either false or inefficient, which is why we rarely see them. (More generally, any inequality on which a group acts on either the left or right-hand side can often be “decomposed” into the “isotypic components” of the group action, either by the amplification trick or by other related tools, such as Fourier analysis.)
The amplification trick is a deceptively simple one, but it can become particularly powerful when one is arbitraging an unintuitive symmetry, such as symmetry under tensor powers. Indeed, the “tensor power trick”, which can eliminate constants and even logarithms in an almost magical manner, can lead to some interesting proofs of sharp inequalities, which are difficult to establish by more direct means.
The most familiar example of the amplification trick in action is probably the textbook proof of the Cauchy-Schwarz inequality
(1)
for vectors v, w in a complex Hilbert space. To prove this inequality, one might start by exploiting the obvious inequality
(2)
but after expanding everything out, one only gets the weaker inequality
. (3)
Now (3) is weaker than (1) for two reasons; the left-hand side is smaller, and the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality). However, we can amplify (3) by arbitraging some symmetry imbalances. Firstly, observe that the phase rotation symmetry preserves the RHS of (3) but not the LHS. We exploit this by replacing v by
in (3) for some phase
to be chosen later, to obtain
.
Now we are free to choose at will (as long as it is real, of course), so it is natural to choose
to optimise the inequality, which in this case means to make the left-hand side as large as possible. This is achieved by choosing
to cancel the phase of
, and we obtain
(4)
This is closer to (1); we have fixed the left-hand side, but the right-hand side is still too weak. But we can amplify further, by exploiting an imbalance in a different symmetry, namely the homogenisation symmetry for a scalar
, which preserves the left-hand side but not the right. Inserting this transform into (4) we conclude that
where is at our disposal to choose. We can optimise in
by minimising the right-hand side, and indeed one easily sees that the minimum (or infimum, if one of v and w vanishes) is
(which is achieved when
when
are non-zero, or in an asymptotic limit
or
in the degenerate cases), and so we have amplified our way to the Cauchy-Schwarz inequality (1). [See also this discussion by Tim Gowers on the Cauchy-Schwarz inequality.]
Recent Comments