You are currently browsing the category archive for the ‘tricks’ category.

In this previous blog post I noted the following easy application of Cauchy-Schwarz:

Lemma 1 (Van der Corput inequality) Let ${v,u_1,\dots,u_n}$ be unit vectors in a Hilbert space ${H}$. Then

$\displaystyle (\sum_{i=1}^n |\langle v, u_i \rangle_H|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H|.$

Proof: The left-hand side may be written as ${|\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2}$ for some unit complex numbers ${\epsilon_i}$. By Cauchy-Schwarz we have

$\displaystyle |\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2 \leq \langle \sum_{i=1}^n \epsilon_i u_i, \sum_{j=1}^n \epsilon_j u_j \rangle_H$

and the claim now follows from the triangle inequality. $\Box$

As a corollary, correlation becomes transitive in a statistical sense (even though it is not transitive in an absolute sense):

Corollary 2 (Statistical transitivity of correlation) Let ${v,u_1,\dots,u_n}$ be unit vectors in a Hilbert space ${H}$ such that ${|\langle v,u_i \rangle_H| \geq \delta}$ for all ${i=1,\dots,n}$ and some ${0 < \delta \leq 1}$. Then we have ${|\langle u_i, u_j \rangle_H| \geq \delta^2/2}$ for at least ${\delta^2 n^2/2}$ of the pairs ${(i,j) \in \{1,\dots,n\}^2}$.

Proof: From the lemma, we have

$\displaystyle \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H| \geq \delta^2 n^2.$

The contribution of those ${i,j}$ with ${|\langle u_i, u_j \rangle_H| < \delta^2/2}$ is at most ${\delta^2 n^2/2}$, and all the remaining summands are at most ${1}$, giving the claim. $\Box$

One drawback with this corollary is that it does not tell us which pairs ${u_i,u_j}$ correlate. In particular, if the vector ${v}$ also correlates with a separate collection ${w_1,\dots,w_n}$ of unit vectors, the pairs ${(i,j)}$ for which ${u_i,u_j}$ correlate may have no intersection whatsoever with the pairs in which ${w_i,w_j}$ correlate (except of course on the diagonal ${i=j}$ where they must correlate).

While working on an ongoing research project, I recently found that there is a very simple way to get around the latter problem by exploiting the tensor power trick:

Corollary 3 (Simultaneous statistical transitivity of correlation) Let ${v, u^k_i}$ be unit vectors in a Hilbert space for ${i=1,\dots,n}$ and ${k=1,\dots,K}$ such that ${|\langle v, u^k_i \rangle_H| \geq \delta_k}$ for all ${i=1,\dots,n}$, ${k=1,\dots,K}$ and some ${0 < \delta_k \leq 1}$. Then there are at least ${(\delta_1 \dots \delta_K)^2 n^2/2}$ pairs ${(i,j) \in \{1,\dots,n\}^2}$ such that ${\prod_{k=1}^K |\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}$. In particular (by Cauchy-Schwarz) we have ${|\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}$ for all ${k}$.

Proof: Apply Corollary 2 to the unit vectors ${v^{\otimes K}}$ and ${u^1_i \otimes \dots \otimes u^K_i}$, ${i=1,\dots,n}$ in the tensor power Hilbert space ${H^{\otimes K}}$. $\Box$

It is surprisingly difficult to obtain even a qualitative version of the above conclusion (namely, if ${v}$ correlates with all of the ${u^k_i}$, then there are many pairs ${(i,j)}$ for which ${u^k_i}$ correlates with ${u^k_j}$ for all ${k}$ simultaneously) without some version of the tensor power trick. For instance, even the powerful Szemerédi regularity lemma, when applied to the set of pairs ${i,j}$ for which one has correlation of ${u^k_i}$, ${u^k_j}$ for a single ${i,j}$, does not seem to be sufficient. However, there is a reformulation of the argument using the Schur product theorem as a substitute for (or really, a disguised version of) the tensor power trick. For simplicity of notation let us just work with real Hilbert spaces to illustrate the argument. We start with the identity

$\displaystyle \langle u^k_i, u^k_j \rangle_H = \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H + \langle \pi(u^k_i), \pi(u^k_j) \rangle_H$

where ${\pi}$ is the orthogonal projection to the complement of ${v}$. This implies a Gram matrix inequality

$\displaystyle (\langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ 0$

for each ${k}$ where ${A \succ B}$ denotes the claim that ${A-B}$ is positive semi-definite. By the Schur product theorem, we conclude that

$\displaystyle (\prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\prod_{k=1}^K \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n}$

and hence for a suitable choice of signs ${\epsilon_1,\dots,\epsilon_n}$,

$\displaystyle \sum_{1 \leq i, j \leq n} \epsilon_i \epsilon_j \prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H \geq \delta_1^2 \dots \delta_K^2 n^2.$

One now argues as in the proof of Corollary 2.

A separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly related to this current application.

In one of the earliest posts on this blog, I talked about the ability to “arbitrage” a disparity of symmetry in an inequality, and in particular to “amplify” such an inequality into a stronger one. (The principle can apply to other mathematical statements than inequalities, with the “hypothesis” and “conclusion” of that statement generally playing the role of the “right-hand side” and “left-hand side” of an inequality, but for sake of discussion I will restrict attention here to inequalities.) One can formalise this principle as follows. Many inequalities in analysis can be expressed in the form

$\displaystyle A(f) \leq B(f) \ \ \ \ \ (1)$

for all ${f}$ in some space ${X}$ (in many cases ${X}$ will be a function space, and ${f}$ a function in that space), where ${A(f)}$ and ${B(f)}$ are some functionals of ${f}$ (that is to say, real-valued functions of ${f}$). For instance, ${B(f)}$ might be some function space norm of ${f}$ (e.g. an ${L^p}$ norm), and ${A(f)}$ might be some function space norm of some transform of ${f}$. In addition, we assume we have some group ${G}$ of symmetries ${T: X \rightarrow X}$ acting on the underlying space. For instance, if ${X}$ is a space of functions on some spatial domain, the group might consist of translations (e.g. ${Tf(x) = f(x-h)}$ for some shift ${h}$), or perhaps dilations with some normalisation (e.g. ${Tf(x) = \frac{1}{\lambda^\alpha} f(\frac{x}{\lambda})}$ for some dilation factor ${\lambda > 0}$ and some normalisation exponent ${\alpha \in {\bf R}}$, which can be thought of as the dimensionality of length one is assigning to ${f}$). If we have

$\displaystyle A(Tf) = A(f)$

for all symmetries ${T \in G}$ and all ${f \in X}$, we say that ${A}$ is invariant with respect to the symmetries in ${G}$; otherwise, it is not.

Suppose we know that the inequality (1) holds for all ${f \in X}$, but that there is an imbalance of symmetry: either ${A}$ is ${G}$-invariant and ${B}$ is not, or vice versa. Suppose first that ${A}$ is ${G}$-invariant and ${B}$ is not. Substituting ${f}$ by ${Tf}$ in (1) and taking infima, we can then amplify (1) to the stronger inequality

$\displaystyle A(f) \leq \inf_{T \in G} B(Tf).$

In particular, it is often the case that there is a way to send ${T}$ off to infinity in such a way that the functional ${B(Tf)}$ has a limit ${B_\infty(f)}$, in which case we obtain the amplification

$\displaystyle A(f) \leq B_\infty(f) \ \ \ \ \ (2)$

of (1). Note that these amplified inequalities will now be ${G}$-invariant on both sides (assuming that the way in which we take limits as ${T \rightarrow \infty}$ is itself ${G}$-invariant, which it often is in practice). Similarly, if ${B}$ is ${G}$-invariant but ${A}$ is not, we may instead amplify (1) to

$\displaystyle \sup_{T \in G} A(Tf) \leq B(f)$

and in particular (if ${A(Tf)}$ has a limit ${A_\infty(f)}$ as ${T \rightarrow \infty}$)

$\displaystyle A_\infty(f) \leq B(f). \ \ \ \ \ (3)$

If neither ${A(f)}$ nor ${B(f)}$ has a ${G}$-symmetry, one can still use the ${G}$-symmetry by replacing ${f}$ by ${Tf}$ and taking a limit to conclude that

$\displaystyle A_\infty(f) \leq B_\infty(f),$

though now this inequality is not obviously stronger than the original inequality (1) (for instance it could well be trivial). In some cases one can also average over ${G}$ instead of taking a limit as ${T \rightarrow \infty}$, thus averaging a non-invariant inequality into an invariant one.

As discussed in the previous post, this use of amplification gives rise to a general principle about inequalities: the most efficient inequalities are those in which the left-hand side and right-hand side enjoy the same symmetries. It is certainly possible to have true inequalities that have an imbalance of symmetry, but as shown above, such inequalities can always be amplified to more efficient and more symmetric inequalities. In the case when limits such as ${A_\infty}$ and ${B_\infty}$ exist, the limiting functionals ${A_\infty(f)}$ and ${B_\infty(f)}$ are often simpler in form, or more tractable analytically, than their non-limiting counterparts ${A(f)}$ and ${B(f)}$ (this is one of the main reasons why we take limits at infinity in the first place!), and so in many applications there is really no reason to use the weaker and more complicated inequality (1), when stronger, simpler, and more symmetric inequalities such as (2), (3) are available. Among other things, this explains why many of the most useful and natural inequalities one sees in analysis are dimensionally consistent.

One often tries to prove inequalities (1) by directly chaining together simpler inequalities. For instance, one might attempt to prove (1) by by first bounding ${A(f)}$ by some auxiliary quantity ${C(f)}$, and then bounding ${C(f)}$ by ${B(f)}$, thus obtaining (1) by chaining together two inequalities

$\displaystyle A(f) \leq C(f) \leq B(f). \ \ \ \ \ (4)$

A variant of the above principle then asserts that when proving inequalities by such direct methods, one should, whenever possible, try to maintain the symmetries that are present in both sides of the inequality. Why? Well, suppose that we ignored this principle and tried to prove (1) by establishing (4) for some ${C}$ that is not ${G}$-invariant. Assuming for sake of argument that (4) were actually true, we could amplify the first half ${A(f) \leq C(f)}$ of this inequality to conclude that

$\displaystyle A(f) \leq \inf_{T \in G} C(Tf)$

and also amplify the second half ${C(f) \leq B(f)}$ of the inequality to conclude that

$\displaystyle \sup_{T \in G} C(Tf) \leq B(f)$

and hence (4) amplifies to

$\displaystyle A(f) \leq \inf_{T \in G} C(Tf) \leq \sup_{T \in G} C(Tf) \leq B(f). \ \ \ \ \ (5)$

Let’s say for sake of argument that all the quantities involved here are positive numbers (which is often the case in analysis). Then we see in particular that

$\displaystyle \frac{\sup_{T \in G} C(Tf)}{\inf_{T \in G} C(Tf)} \leq \frac{B(f)}{A(f)}. \ \ \ \ \ (6)$

Informally, (6) asserts that in order for the strategy (4) used to prove (1) to work, the extent to which ${C}$ fails to be ${G}$-invariant cannot exceed the amount of “room” present in (1). In particular, when dealing with those “extremal” ${f}$ for which the left and right-hand sides of (1) are comparable to each other, one can only have a bounded amount of non-${G}$-invariance in the functional ${C}$. If ${C}$ fails so badly to be ${G}$-invariant that one does not expect the left-hand side of (6) to be at all bounded in such extremal situations, then the strategy of proving (1) using the intermediate quantity ${C}$ is doomed to failure – even if one has already produced some clever proof of one of the two inequalities ${A(f) \leq C(f)}$ or ${C(f) \leq B(f)}$ needed to make this strategy work. And even if it did work, one could amplify (4) to a simpler inequality

$\displaystyle A(f) \leq C_\infty(f) \leq B(f) \ \ \ \ \ (7)$

(assuming that the appropriate limit ${C_\infty(f) = \lim_{T \rightarrow \infty} C(Tf)}$ existed) which would likely also be easier to prove (one can take whatever proofs one had in mind of the inequalities in (4), conjugate them by ${T}$, and take a limit as ${T \rightarrow \infty}$ to extract a proof of (7)).

Here are some simple (but somewhat contrived) examples to illustrate these points. Suppose one wishes to prove the inequality

$\displaystyle xy \leq x^2 + y^2 \ \ \ \ \ (8)$

for all ${x,y>0}$. Both sides of this inequality are invariant with respect to interchanging ${x}$ with ${y}$, so the principle suggests that when proving this inequality directly, one should only use sub-inequalities that are also invariant with respect to this interchange. However, in this particular case there is enough “room” in the inequality that it is possible (though somewhat unnatural) to violate this principle. For instance, one could decide (for whatever reason) to start with the inequality

$\displaystyle 0 \leq (x - y/2)^2 = x^2 - xy + y^2/4$

to conclude that

$\displaystyle xy \leq x^2 + y^2/4$

and then use the obvious inequality ${x^2 + y^2/4 \leq x^2+y^2}$ to conclude the proof. Here, the intermediate quantity ${x^2 + y^2/4}$ is not invariant with respect to interchange of ${x}$ and ${y}$, but the failure is fairly mild (changing ${x}$ and ${y}$ only modifies the quantity ${x^2 + y^2/4}$ by a multiplicative factor of ${4}$ at most), and disappears completely in the most extremal case ${x=y}$, which helps explain why one could get away with using this quantity in the proof here. But it would be significantly harder (though still not impossible) to use non-symmetric intermediaries to prove the sharp version

$\displaystyle xy \leq \frac{x^2 + y^2}{2}$

of (8) (that is to say, the arithmetic mean-geometric mean inequality). Try it!

Similarly, consider the task of proving the triangle inequality

$\displaystyle |z+w| \leq |z| + |w| \ \ \ \ \ (9)$

for complex numbers ${z, w}$. One could try to leverage the triangle inequality ${|x+y| \leq |x| + |y|}$ for real numbers by using the crude estimate

$\displaystyle |z+w| \leq |\hbox{Re}(z+w)| + |\hbox{Im}(z+w)|$

and then use the real triangle inequality to obtain

$\displaystyle |\hbox{Re}(z+w)| \leq |\hbox{Re}(z)| + |\hbox{Re}(w)|$

and

$\displaystyle |\hbox{Im}(z+w)| \leq |\hbox{Im}(z)| + |\hbox{Im}(w)|$

and then finally use the inequalities

$\displaystyle |\hbox{Re}(z)|, |\hbox{Im}(z)| \leq |z| \ \ \ \ \ (10)$

and

$\displaystyle |\hbox{Re}(w)|, |\hbox{Im}(w)| \leq |w| \ \ \ \ \ (11)$

but when one puts this all together at the end of the day, one loses a factor of two:

$\displaystyle |z+w| \leq 2(|z| + |w|).$

One can “blame” this loss on the fact that while the original inequality (9) was invariant with respect to phase rotation ${(z,w) \mapsto (e^{i\theta} z, e^{i\theta} w)}$, the intermediate expressions we tried to use when proving it were not, leading to inefficient estimates. One can try to be smarter than this by using Pythagoras’ theorem ${|z|^2 = |\hbox{Re}(z)|^2 + |\hbox{Im}(z)|^2}$; this reduces the loss from ${2}$ to ${\sqrt{2}}$ but does not eliminate it completely, which is to be expected as one is still using non-invariant estimates in the proof. But one can remove the loss completely by using amplification; see the previous blog post for details (we also give a reformulation of this amplification below).

Here is a slight variant of the above example. Suppose that you had just learned in class to prove the triangle inequality

$\displaystyle (\sum_{n=1}^\infty |a_n+b_n|^2)^{1/2} \leq (\sum_{n=1}^\infty |a_n|^2)^{1/2} + (\sum_{n=1}^\infty |b_n|^2)^{1/2} \ \ \ \ \ (12)$

for (say) real square-summable sequences ${(a_n)_{n=1}^\infty}$, ${(b_n)_{n=1}^\infty}$, and was tasked to conclude the corresponding inequality

$\displaystyle (\sum_{n \in {\bf Z}} |a_n+b_n|^2)^{1/2} \leq (\sum_{n \in {\bf Z}} |a_n|^2)^{1/2} + (\sum_{n \in {\bf Z}} |b_n|^2)^{1/2} \ \ \ \ \ (13)$

for doubly infinite square-summable sequences ${(a_n)_{n \in {\bf Z}}, (b_n)_{n \in {\bf Z}}}$. The quickest way to do this is of course to exploit a bijection between the natural numbers ${1,2,\dots}$ and the integers, but let us say for sake of argument that one was unaware of such a bijection. One could then proceed instead by splitting the integers into the positive integers and the non-positive integers, and use (12) on each component separately; this is very similar to the strategy of proving (9) by splitting a complex number into real and imaginary parts, and will similarly lose a factor of ${2}$ or ${\sqrt{2}}$. In this case, one can “blame” this loss on the abandonment of translation invariance: both sides of the inequality (13) are invariant with respect to shifting the sequences ${(a_n)_{n \in {\bf Z}}}$, ${(b_n)_{n \in {\bf Z}}}$ by some shift ${h}$ to arrive at ${(a_{n-h})_{n \in {\bf Z}}, (b_{n-h})_{n \in {\bf Z}}}$, but the intermediate quantities caused by splitting the integers into two subsets are not invariant. Another way of thinking about this is that the splitting of the integers gives a privileged role to the origin ${n=0}$, whereas the inequality (13) treats all values of ${n}$ equally thanks to the translation invariance, and so using such a splitting is unnatural and not likely to lead to optimal estimates. On the other hand, one can deduce (13) from (12) by sending this symmetry to infinity; indeed, after applying a shift to (12) we see that

$\displaystyle (\sum_{n=-N}^\infty |a_n+b_n|^2)^{1/2} \leq (\sum_{n=-N}^\infty |a_n|^2)^{1/2} + (\sum_{n=-N}^\infty |b_n|^2)^{1/2}$

for any ${N}$, and on sending ${N \rightarrow \infty}$ we obtain (13) (one could invoke the monotone convergence theorem here to justify the limit, though in this case it is simple enough that one can just use first principles).

Note that the principle of preserving symmetry only applies to direct approaches to proving inequalities such as (1). There is a complementary approach, discussed for instance in this previous post, which is to spend the symmetry to place the variable ${f}$ “without loss of generality” in a “normal form”, “convenient coordinate system”, or a “good gauge”. Abstractly: suppose that there is some subset ${Y}$ of ${X}$ with the property that every ${f \in X}$ can be expressed in the form ${f = Tg}$ for some ${T \in G}$ and ${g \in Y}$ (that is to say, ${X = GY}$). Then, if one wishes to prove an inequality (1) for all ${f \in X}$, and one knows that both sides ${A(f), B(f)}$ of this inequality are ${G}$-invariant, then it suffices to check (1) just for those ${f}$ in ${Y}$, as this together with the ${G}$-invariance will imply the same inequality (1) for all ${f}$ in ${GY=X}$. By restricting to those ${f}$ in ${Y}$, one has given up (or spent) the ${G}$-invariance, as the set ${Y}$ will in typical not be preserved by the group action ${G}$. But by the same token, by eliminating the invariance, one also eliminates the prohibition on using non-invariant proof techniques, and one is now free to use a wider range of inequalities in order to try to establish (1). Of course, such inequalities should make crucial use of the restriction ${f \in Y}$, for if they did not, then the arguments would work in the more general setting ${f \in X}$, and then the previous principle would again kick in and warn us that the use of non-invariant inequalities would be inefficient. Thus one should “spend” the symmetry wisely to “buy” a restriction ${f \in Y}$ that will be of maximal utility in calculations (for instance by setting as many annoying factors and terms in one’s analysis to be ${0}$ or ${1}$ as possible).

As a simple example of this, let us revisit the complex triangle inequality (9). As already noted, both sides of this inequality are invariant with respect to the phase rotation symmetry ${(z,w) \mapsto (e^{i\theta} z, e^{i\theta} w)}$. This seems to limit one to using phase-rotation-invariant techniques to establish the inequality, in particular ruling out the use of real and imaginary parts as discussed previously. However, we can instead spend the phase rotation symmetry to restrict to a special class of ${z}$ and ${w}$. It turns out that the most efficient way to spend the symmetry is to achieve the normalisation of ${z+w}$ being a nonnegative real; this is of course possible since any complex number ${z+w}$ can be turned into a nonnegative real by multiplying by an appropriate phase ${e^{i\theta}}$. Once ${z+w}$ is a nonnegative real, the imaginary part disappears and we have

$\displaystyle |z+w| = \hbox{Re}(z+w) = \hbox{Re}(z) + \hbox{Re}(w),$

and the triangle inequality (9) is now an immediate consequence of (10), (11). (But note that if one had unwisely spent the symmetry to normalise, say, ${z}$ to be a non-negative real, then one is no closer to establishing (9) than before one had spent the symmetry.)

A few days ago, I was talking with Ed Dunne, who is currently the Executive Editor of Mathematical Reviews (and in particular with its online incarnation at MathSciNet).  At the time, I was mentioning how laborious it was for me to create a BibTeX file for dozens of references by using MathSciNet to locate each reference separately, and to export each one to BibTeX format.  He then informed me that underneath to every MathSciNet reference there was a little link to add the reference to a Clipboard, and then one could export the entire Clipboard at once to whatever format one wished.  In retrospect, this was a functionality of the site that had always been visible, but I had never bothered to explore it, and now I can populate a BibTeX file much more quickly.

This made me realise that perhaps there are many other useful features of popular mathematical tools out there that only a few users actually know about, so I wanted to create a blog post to encourage readers to post their own favorite tools, or features of tools, that are out there, often in plain sight, but not always widely known.  Here are a few that I was able to recall from my own workflow (though for some of them it took quite a while to consciously remember, since I have been so used to them for so long!):

1. TeX for Gmail.  A Chrome plugin that lets one write TeX symbols in emails sent through Gmail (by writing the LaTeX code and pressing a hotkey, usually F8).
2. Boomerang for Gmail.  Another Chrome plugin for Gmail, which does two main things.  Firstly, it can “boomerang” away an email from your inbox to return at some specified later date (e.g. one week from today).  I found this useful to declutter my inbox regarding mail that I needed to act on in the future, but was unable to deal with at present due to travel, or because I was waiting for some other piece of data to arrive first.   Secondly, it can send out email with some specified delay (e.g. by tomorrow morning), giving one time to cancel the email if necessary.  (Thanks to Julia Wolf for telling me about Boomerang!)
3. Which just reminds me, the Undo Send feature on Gmail has saved me from embarrassment a few times (but one has to set it up first; it delays one’s emails by a short period, such as 30 seconds, during which time it is possible to undo the email).
4. LaTeX rendering in Inkscape.  I used to use plain text to write mathematical formulae in my images, which always looked terrible.  It took me years to realise that Inkscape had the functionality to compile LaTeX within it.
5. Bookmarks in TeXnicCenter.  I probably only use a tiny fraction of the functionality that TeXnicCenter offers, but one little feature I quite like is the ability to bookmark a portion of the TeX file (e.g. the bibliography at the end, or the place one is currently editing) with one hot-key (Ctrl-F2) and then one can cycle quickly between one bookmarked location and another with some further hot-keys (F2 and shift-F2).
6. Actually, there are a number of Windows keyboard shortcuts that are worth experimenting with (and similarly for Mac or Linux systems of course).
7. Detexify has been the quickest way for me to locate the TeX code for a symbol that I couldn’t quite remember (or when hunting for a new symbol that would roughly be shaped like something I had in mind).
8. For writing LaTeX on my blog, I use Luca Trevisan’s LaTeX to WordPress Python script (together with a little batch file I wrote to actually run the python script).
9. Using the camera on my phone to record a blackboard computation or a slide (or the wifi password at a conference centre, or any other piece of information that is written or displayed really).  If the phone is set up properly this can be far quicker than writing it down with pen and paper.  (I guess this particular trick is now quite widely used, but I still see people surprised when someone else uses a phone instead of a pen to record things.)
10. Using my online calendar not only to record scheduled future appointments, but also to block out time to do specific tasks (e.g. reserve 2-3pm at Tuesday to read paper X, or do errand Y).  I have found I am able to get a much larger fraction of my “to do” list done on days in which I had previously blocked out such specific chunks of time, as opposed to days in which I had left several hours unscheduled (though sometimes those hours were also very useful for finding surprising new things to do that I had not anticipated).  (I learned of this little trick online somewhere, but I have long since lost the original reference.)

Anyway, I would very much like to hear what other little tools or features other readers have found useful in their work.

This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips.  Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at.  Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

This is a technical post inspired by separate conversations with Jim Colliander and with Soonsik Kwon on the relationship between two techniques used to control non-radiating solutions to dispersive nonlinear equations, namely the “double Duhamel trick” and the “in/out decomposition”. See for instance these lecture notes of Killip and Visan for a survey of these two techniques and other related methods in the subject. (I should caution that this post is likely to be unintelligible to anyone not already working in this area.)

For sake of discussion we shall focus on solutions to a nonlinear Schrödinger equation

$\displaystyle iu_t + \Delta u = F(u)$

and we will not concern ourselves with the specific regularity of the solution ${u}$, or the specific properties of the nonlinearity ${F}$ here. We will also not address the issue of how to justify the formal computations being performed here.

Solutions to this equation enjoy the forward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_0)\Delta} u(t_0) - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt'$

for times ${t}$ to the future of ${t_0}$ in the lifespan of the solution, as well as the backward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_1)\Delta} u(t_1) + i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt'$

for all times ${t}$ to the past of ${t_1}$ in the lifespan of the solution. The first formula asserts that the solution at a given time is determined by the initial state and by the immediate past, while the second formula is the time reversal of the first, asserting that the solution at a given time is determined by the final state and the immediate future. These basic causal formulae are the foundation of the local theory of these equations, and in particular play an instrumental role in establishing local well-posedness for these equations. In this local theory, the main philosophy is to treat the homogeneous (or linear) term ${e^{i(t-t_0)\Delta} u(t_0)}$ or ${e^{i(t-t_1)\Delta} u(t_1)}$ as the main term, and the inhomogeneous (or nonlinear, or forcing) integral term as an error term.

The situation is reversed when one turns to the global theory, and looks at the asymptotic behaviour of a solution as one approaches a limiting time ${T}$ (which can be infinite if one has global existence, or finite if one has finite time blowup). After a suitable rescaling, the linear portion of the solution often disappears from view, leaving one with an asymptotic blowup profile solution which is non-radiating in the sense that the linear components of the Duhamel formulae vanish, thus

$\displaystyle u(t) = - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (1)$

and

$\displaystyle u(t) = i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (2)$

where ${t_0, t_1}$ are the endpoint times of existence. (This type of situation comes up for instance in the Kenig-Merle approach to critical regularity problems, by reducing to a minimal blowup solution which is almost periodic modulo symmetries, and hence non-radiating.) These types of non-radiating solutions are propelled solely by their own nonlinear self-interactions from the immediate past or immediate future; they are generalisations of “nonlinear bound states” such as solitons.

A key task is then to somehow combine the forward representation (1) and the backward representation (2) to obtain new information on ${u(t)}$ itself, that cannot be obtained from either representation alone; it seems that the immediate past and immediate future can collectively exert more control on the present than they each do separately. This type of problem can be abstracted as follows. Let ${\|u(t)\|_{Y_+}}$ be the infimal value of ${\|F_+\|_N}$ over all forward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t_0}^t e^{i(t-t')\Delta} F_+(t') \ dt' \ \ \ \ \ (3)$

where ${N}$ is some suitable spacetime norm (e.g. a Strichartz-type norm), and similarly let ${\|u(t)\|_{Y_-}}$ be the infimal value of ${\|F_-\|_N}$ over all backward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t}^{t_1} e^{i(t-t')\Delta} F_-(t') \ dt'. \ \ \ \ \ (4)$

Typically, one already has (or is willing to assume as a bootstrap hypothesis) control on ${F(u)}$ in the norm ${N}$, which gives control of ${u(t)}$ in the norms ${Y_+, Y_-}$. The task is then to use the control of both the ${Y_+}$ and ${Y_-}$ norm of ${u(t)}$ to gain control of ${u(t)}$ in a more conventional Hilbert space norm ${X}$, which is typically a Sobolev space such as ${H^s}$ or ${L^2}$.

One can use some classical functional analysis to clarify this situation. By the closed graph theorem, the above task is (morally, at least) equivalent to establishing an a priori bound of the form

$\displaystyle \| u \|_X \lesssim \|u\|_{Y_+} + \|u\|_{Y_-} \ \ \ \ \ (5)$

for all reasonable ${u}$ (e.g. test functions). The double Duhamel trick accomplishes this by establishing the stronger estimate

$\displaystyle |\langle u, v \rangle_X| \lesssim \|u\|_{Y_+} \|v\|_{Y_-} \ \ \ \ \ (6)$

for all reasonable ${u, v}$; note that setting ${u=v}$ and applying the arithmetic-geometric inequality then gives (5). The point is that if ${u}$ has a forward representation (3) and ${v}$ has a backward representation (4), then the inner product ${\langle u, v \rangle_X}$ can (formally, at least) be expanded as a double integral

$\displaystyle \int_{t_0}^t \int_{t}^{t_1} \langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X\ dt'' dt'.$

The dispersive nature of the linear Schrödinger equation often causes ${\langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X}$ to decay, especially in high dimensions. In high enough dimension (typically one needs five or higher dimensions, unless one already has some spacetime control on the solution), the decay is stronger than ${1/|t'-t''|^2}$, so that the integrand becomes absolutely integrable and one recovers (6).

Unfortunately it appears that estimates of the form (6) fail in low dimensions (for the type of norms ${N}$ that actually show up in applications); there is just too much interaction between past and future to hope for any reasonable control of this inner product. But one can try to obtain (5) by other means. By the Hahn-Banach theorem (and ignoring various issues related to reflexivity), (5) is equivalent to the assertion that every ${u \in X}$ can be decomposed as ${u = u_+ + u_-}$, where ${\|u_+\|_{Y_+^*} \lesssim \|u\|_X}$ and ${\|u_-\|_{Y_-^*} \lesssim \|v\|_X}$. Indeed once one has such a decomposition, one obtains (5) by computing the inner product of ${u}$ with ${u=u_++u_-}$ in ${X}$ in two different ways. One can also (morally at least) write ${\|u_+\|_{Y_+^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_+\|_{N^*([t_0,t])}}$ and similarly write ${\|u_-\|_{Y_-^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_-\|_{N^*([t,t_1])}}$

So one can dualise the task of proving (5) as that of obtaining a decomposition of an arbitrary initial state ${u}$ into two components ${u_+}$ and ${u_-}$, where the former disperses into the past and the latter disperses into the future under the linear evolution. We do not know how to achieve this type of task efficiently in general – and doing so would likely lead to a significant advance in the subject (perhaps one of the main areas in this topic where serious harmonic analysis is likely to play a major role). But in the model case of spherically symmetric data ${u}$, one can perform such a decomposition quite easily: one uses microlocal projections to set ${u_+}$ to be the “inward” pointing component of ${u}$, which propagates towards the origin in the future and away from the origin in the past, and ${u_-}$ to simimlarly be the “outward” component of ${u}$. As spherical symmetry significantly dilutes the amplitude of the solution (and hence the strength of the nonlinearity) away from the origin, this decomposition tends to work quite well for applications, and is one of the main reasons (though not the only one) why we have a global theory for low-dimensional nonlinear Schrödinger equations in the radial case, but not in general.

The in/out decomposition is a linear one, but the Hahn-Banach argument gives no reason why the decomposition needs to be linear. (Note that other well-known decompositions in analysis, such as the Fefferman-Stein decomposition of BMO, are necessarily nonlinear, a fact which is ultimately equivalent to the non-complemented nature of a certain subspace of a Banach space; see these lecture notes of mine and this old blog post for some discussion.) So one could imagine a sophisticated nonlinear decomposition as a general substitute for the in/out decomposition. See for instance this paper of Bourgain and Brezis for some of the subtleties of decomposition even in very classical function spaces such as ${H^{1/2}(R)}$. Alternatively, there may well be a third way to obtain estimates of the form (5) that do not require either decomposition or the double Duhamel trick; such a method may well clarify the relative relationship between past, present, and future for critical nonlinear dispersive equations, which seems to be a key aspect of the theory that is still only partially understood. (In particular, it seems that one needs a fairly strong decoupling of the present from both the past and the future to get the sort of elliptic-like regularity results that allow us to make further progress with such equations.)

In this post I would like to make some technical notes on a standard reduction used in the (Euclidean, maximal) Kakeya problem, known as the two ends reduction. This reduction (which takes advantage of the approximate scale-invariance of the Kakeya problem) was introduced by Wolff, and has since been used many times, both for the Kakeya problem and in other similar problems (e.g. by Jim Wright and myself to study curved Radon-like transforms). I was asked about it recently, so I thought I would describe the trick here. As an application I give a proof of the ${d=\frac{n+1}{2}}$ case of the Kakeya maximal conjecture.

From Tim Gowers’ blog comes the announcement that the Tricki – a wiki for various tricks and strategies for proving mathematical results – is now live.  (My own articles for the Tricki are also on this blog; also Ben Green has written up an article on using finite fields to prove results about infinite fields which is loosely based on my own post on the topic, which is in turn based on an article of Serre.)  It seems to already be growing at a reasonable rate, with many contributors.

Today I’d like to discuss (in the Tricks Wiki format) a fundamental trick in “soft” analysis, sometimes known as the “limiting argument” or “epsilon regularisation argument”.

Title: Give yourself an epsilon of room.

Quick description: You want to prove some statement $S_0$ about some object $x_0$ (which could be a number, a point, a function, a set, etc.).  To do so, pick a small $\varepsilon > 0$, and first prove a weaker statement $S_\varepsilon$ (which allows for “losses” which go to zero as $\varepsilon \to 0$) about some perturbed object $x_\varepsilon$.  Then, take limits $\varepsilon \to 0$.  Provided that the dependency and continuity of the weaker conclusion $S_\varepsilon$ on $\varepsilon$ are sufficiently controlled, and $x_\varepsilon$ is converging to $x_0$ in an appropriately strong sense, you will recover the original statement.

One can of course play a similar game when proving a statement $S_\infty$ about some object $X_\infty$, by first proving a weaker statement $S_N$ on some approximation $X_N$ to $X_\infty$ for some large parameter N, and then send $N \to \infty$ at the end.

General discussion: Here are some typical examples of a target statement $S_0$, and the approximating statements $S_\varepsilon$ that would converge to $S$:

 $S_0$ $S_\varepsilon$ $f(x_0) = g(x_0)$ $f(x_\varepsilon) = g(x_\varepsilon) + o(1)$ $f(x_0) \leq g(x_0)$ $f(x_\varepsilon) \leq g(x_\varepsilon) + o(1)$ $f(x_0) > 0$ $f(x_\varepsilon) \geq c - o(1)$ for some $c>0$ independent of $\varepsilon$ $f(x_0)$ is finite $f(x_\varepsilon)$ is bounded uniformly in $\varepsilon$ $f(x_0) \geq f(x)$ for all $x \in X$ (i.e. $x_0$ maximises f) $f(x_\varepsilon) \geq f(x)-o(1)$ for all $x \in X$ (i.e. $x_\varepsilon$ nearly maximises f) $f_n(x_0)$ converges as $n \to \infty$ $f_n(x_\varepsilon)$ fluctuates by at most o(1) for sufficiently large n $f_0$ is a measurable function $f_\varepsilon$ is a measurable function converging pointwise to $f_0$ $f_0$ is a continuous function $f_\varepsilon$ is an equicontinuous family of functions converging pointwise to $f_0$ OR $f_\varepsilon$ is continuous and converges (locally) uniformly to $f_0$ The event $E_0$ holds almost surely The event $E_\varepsilon$ holds with probability 1-o(1) The statement $P_0(x)$ holds for almost every x The statement $P_\varepsilon(x)$ holds for x outside of a set of measure o(1)

Of course, to justify the convergence of $S_\varepsilon$ to $S_0$, it is necessary that $x_\varepsilon$ converge to $x_0$ (or $f_\varepsilon$ converge to $f_0$, etc.) in a suitably strong sense. (But for the purposes of proving just upper bounds, such as $f(x_0) \leq M$, one can often get by with quite weak forms of convergence, thanks to tools such as Fatou’s lemma or the weak closure of the unit ball.)  Similarly, we need some continuity (or at least semi-continuity) hypotheses on the functions f, g appearing above.

It is also necessary in many cases that the control $S_\varepsilon$ on the approximating object $x_\varepsilon$ is somehow “uniform in $\varepsilon$“, although for “$\sigma$-closed” conclusions, such as measurability, this is not required. [It is important to note that it is only the final conclusion $S_\varepsilon$ on $x_\varepsilon$ that needs to have this uniformity in $\varepsilon$; one is permitted to have some intermediate stages in the derivation of $S_\varepsilon$ that depend on $\varepsilon$ in a non-uniform manner, so long as these non-uniformities cancel out or otherwise disappear at the end of the argument.]

By giving oneself an epsilon of room, one can evade a lot of familiar issues in soft analysis.  For instance, by replacing “rough”, “infinite-complexity”, “continuous”,  “global”, or otherwise “infinitary” objects $x_0$ with “smooth”, “finite-complexity”, “discrete”, “local”, or otherwise “finitary” approximants $x_\varepsilon$, one can finesse most issues regarding the justification of various formal operations (e.g. exchanging limits, sums, derivatives, and integrals).  [It is important to be aware, though, that any quantitative measure on how smooth, discrete, finite, etc. $x_\varepsilon$ should be expected to degrade in the limit $\varepsilon \to 0$, and so one should take extreme caution in using such quantitative measures to derive estimates that are uniform in $\varepsilon$.]  Similarly, issues such as whether the supremum $M := \sup \{ f(x): x \in X \}$ of a function on a set is actually attained by some maximiser $x_0$ become moot if one is willing to settle instead for an almost-maximiser $x_\varepsilon$, e.g. one which comes within an epsilon of that supremum M (or which is larger than $1/\varepsilon$, if M turns out to be infinite).  Last, but not least, one can use the epsilon room to avoid degenerate solutions, for instance by perturbing a non-negative function to be strictly positive, perturbing a non-strictly monotone function to be strictly monotone, and so forth.

To summarise: one can view the epsilon regularisation argument as a “loan” in which one borrows an epsilon here and there in order to be able to ignore soft analysis difficulties, and can temporarily be able to utilise estimates which are non-uniform in epsilon, but at the end of the day one needs to “pay back” the loan by establishing a final “hard analysis” estimate which is uniform in epsilon (or whose error terms decay to zero as epsilon goes to zero).

A variant: It may seem that the epsilon regularisation trick is useless if one is already in “hard analysis” situations when all objects are already “finitary”, and all formal computations easily justified.  However, there is an important variant of this trick which applies in this case: namely, instead of sending the epsilon parameter to zero, choose epsilon to be a sufficiently small (but not infinitesimally small) quantity, depending on other parameters in the problem, so that one can eventually neglect various error terms and to obtain a useful bound at the end of the day.  (For instance, any result proven using the Szemerédi regularity lemma is likely to be of this type.)  Since one is not sending epsilon to zero, not every term in the final bound needs to be uniform in epsilon, though for quantitative applications one still would like the dependencies on such parameters to be as favourable as possible.

Prerequisites: Graduate real analysis.  (Actually, this isn’t so much a prerequisite as it is a corequisite: the limiting argument plays a central role in many fundamental results in real analysis.)  Some examples also require some exposure to PDE.

Title: Use basic examples to calibrate exponents

Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of exponents in one’s identities, inequalities, and estimates.  For instance, if one is studying a set of N elements, then many expressions that one is faced with will often involve some power $N^p$ of N; if one is instead studying a function f on a measure space X, then perhaps it is an $L^p$ norm $\|f\|_{L^p(X)}$ which will appear instead.  The exponent $p$ involved will typically evolve slowly over the course of the argument, as various algebraic or analytic manipulations are applied.  In some cases, the exact value of this exponent is immaterial, but at other times it is crucial to have the correct value of $p$ at hand.   One can (and should) of course carefully go through one’s arguments line by line to work out the exponents correctly, but it is all too easy to make a sign error or other mis-step at one of the lines, causing all the exponents on subsequent lines to be incorrect.  However, one can guard against this (and avoid some tedious line-by-line exponent checking) by continually calibrating these exponents at key junctures of the arguments by using basic examples of the object of study (sets, functions, graphs, etc.) as test cases.  This is a simple trick, but it lets one avoid many unforced errors with exponents, and also lets one compute more rapidly.

Quick description: When trying to quickly work out what an exponent p in an estimate, identity, or inequality should be without deriving that statement line-by-line, test that statement with a simple example which has non-trivial behaviour with respect to that exponent p, but trivial behaviour with respect to as many other components of that statement as one is able to manage.   The “non-trivial” behaviour should be parametrised by some very large or very small parameter.  By matching the dependence on this parameter on both sides of the estimate, identity, or inequality, one should recover p (or at least a good prediction as to what p should be).

General discussion: The test examples should be as basic as possible; ideally they should have trivial behaviour in all aspects except for one feature that relates to the exponent p that one is trying to calibrate, thus being only “barely” non-trivial.   When the object of study is a function, then (appropriately rescaled, or otherwise modified) bump functions are very typical test objects, as are Dirac masses, constant functions, Gaussians, or other functions that are simple and easy to compute with.  In additive combinatorics, when the object of study is a subset of a group, then subgroups, arithmetic progressions, or random sets are typical test objects.  In graph theory, typical examples of test objects include complete graphs, complete bipartite graphs, and random graphs. And so forth.

This trick is closely related to that of using dimensional analysis to recover exponents; indeed, one can view dimensional analysis as the special case of exponent calibration when using test objects which are non-trivial in one dimensional aspect (e.g. they exist at a single very large or very small length scale) but are otherwise of a trivial or “featureless” nature.   But the calibration trick is more general, as it can involve parameters (such as probabilities, angles, or eccentricities) which are not commonly associated with the physical concept of a dimension.  And personally, I find example-based calibration to be a much more satisfying (and convincing) explanation of an exponent than a calibration arising from formal dimensional analysis.

When one is trying to calibrate an inequality or estimate, one should try to pick a basic example which one expects to saturate that inequality or estimate, i.e. an example for which the inequality is close to being an equality.  Otherwise, one would only expect to obtain some partial information on the desired exponent p (e.g. a lower bound or an upper bound only).  Knowing the examples that saturate an estimate that one is trying to prove is also useful for several other reasons – for instance, it strongly suggests that any technique which is not efficient when applied to the saturating example, is unlikely to be strong enough to prove the estimate in general, thus eliminating fruitless approaches to a problem and (hopefully) refocusing one’s attention on those strategies which actually have a chance of working.

Calibration is best used for the type of quick-and-dirty calculations one uses when trying to rapidly map out an argument that one has roughly worked out already, but without precise details; in particular, I find it particularly useful when writing up a rapid prototype.  When the time comes to write out the paper in full detail, then of course one should instead carefully work things out line by line, but if all goes well, the exponents obtained in that process should match up with the preliminary guesses for those exponents obtained by calibration, which adds confidence that there are no exponent errors have been committed.

Quick description: If one wants to prove an inequality $X \leq Y$ for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality $X \leq CY$ that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that $X^M \leq C Y^M$ for all $M \geq 1$, with a constant C which is independent of M, which implies that $X \leq Y$ as desired by taking $M^{th}$ roots and then taking limits as $M \to \infty$.