You are currently browsing the category archive for the ‘tricks’ category.

This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips.  Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at.  Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

This is a technical post inspired by separate conversations with Jim Colliander and with Soonsik Kwon on the relationship between two techniques used to control non-radiating solutions to dispersive nonlinear equations, namely the “double Duhamel trick” and the “in/out decomposition”. See for instance these lecture notes of Killip and Visan for a survey of these two techniques and other related methods in the subject. (I should caution that this post is likely to be unintelligible to anyone not already working in this area.)

For sake of discussion we shall focus on solutions to a nonlinear Schrödinger equation

$\displaystyle iu_t + \Delta u = F(u)$

and we will not concern ourselves with the specific regularity of the solution ${u}$, or the specific properties of the nonlinearity ${F}$ here. We will also not address the issue of how to justify the formal computations being performed here.

Solutions to this equation enjoy the forward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_0)\Delta} u(t_0) - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt'$

for times ${t}$ to the future of ${t_0}$ in the lifespan of the solution, as well as the backward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_1)\Delta} u(t_1) + i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt'$

for all times ${t}$ to the past of ${t_1}$ in the lifespan of the solution. The first formula asserts that the solution at a given time is determined by the initial state and by the immediate past, while the second formula is the time reversal of the first, asserting that the solution at a given time is determined by the final state and the immediate future. These basic causal formulae are the foundation of the local theory of these equations, and in particular play an instrumental role in establishing local well-posedness for these equations. In this local theory, the main philosophy is to treat the homogeneous (or linear) term ${e^{i(t-t_0)\Delta} u(t_0)}$ or ${e^{i(t-t_1)\Delta} u(t_1)}$ as the main term, and the inhomogeneous (or nonlinear, or forcing) integral term as an error term.

The situation is reversed when one turns to the global theory, and looks at the asymptotic behaviour of a solution as one approaches a limiting time ${T}$ (which can be infinite if one has global existence, or finite if one has finite time blowup). After a suitable rescaling, the linear portion of the solution often disappears from view, leaving one with an asymptotic blowup profile solution which is non-radiating in the sense that the linear components of the Duhamel formulae vanish, thus

$\displaystyle u(t) = - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (1)$

and

$\displaystyle u(t) = i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (2)$

where ${t_0, t_1}$ are the endpoint times of existence. (This type of situation comes up for instance in the Kenig-Merle approach to critical regularity problems, by reducing to a minimal blowup solution which is almost periodic modulo symmetries, and hence non-radiating.) These types of non-radiating solutions are propelled solely by their own nonlinear self-interactions from the immediate past or immediate future; they are generalisations of “nonlinear bound states” such as solitons.

A key task is then to somehow combine the forward representation (1) and the backward representation (2) to obtain new information on ${u(t)}$ itself, that cannot be obtained from either representation alone; it seems that the immediate past and immediate future can collectively exert more control on the present than they each do separately. This type of problem can be abstracted as follows. Let ${\|u(t)\|_{Y_+}}$ be the infimal value of ${\|F_+\|_N}$ over all forward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t_0}^t e^{i(t-t')\Delta} F_+(t') \ dt' \ \ \ \ \ (3)$

where ${N}$ is some suitable spacetime norm (e.g. a Strichartz-type norm), and similarly let ${\|u(t)\|_{Y_-}}$ be the infimal value of ${\|F_-\|_N}$ over all backward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t}^{t_1} e^{i(t-t')\Delta} F_-(t') \ dt'. \ \ \ \ \ (4)$

Typically, one already has (or is willing to assume as a bootstrap hypothesis) control on ${F(u)}$ in the norm ${N}$, which gives control of ${u(t)}$ in the norms ${Y_+, Y_-}$. The task is then to use the control of both the ${Y_+}$ and ${Y_-}$ norm of ${u(t)}$ to gain control of ${u(t)}$ in a more conventional Hilbert space norm ${X}$, which is typically a Sobolev space such as ${H^s}$ or ${L^2}$.

One can use some classical functional analysis to clarify this situation. By the closed graph theorem, the above task is (morally, at least) equivalent to establishing an a priori bound of the form

$\displaystyle \| u \|_X \lesssim \|u\|_{Y_+} + \|u\|_{Y_-} \ \ \ \ \ (5)$

for all reasonable ${u}$ (e.g. test functions). The double Duhamel trick accomplishes this by establishing the stronger estimate

$\displaystyle |\langle u, v \rangle_X| \lesssim \|u\|_{Y_+} \|v\|_{Y_-} \ \ \ \ \ (6)$

for all reasonable ${u, v}$; note that setting ${u=v}$ and applying the arithmetic-geometric inequality then gives (5). The point is that if ${u}$ has a forward representation (3) and ${v}$ has a backward representation (4), then the inner product ${\langle u, v \rangle_X}$ can (formally, at least) be expanded as a double integral

$\displaystyle \int_{t_0}^t \int_{t}^{t_1} \langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X\ dt'' dt'.$

The dispersive nature of the linear Schrödinger equation often causes ${\langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X}$ to decay, especially in high dimensions. In high enough dimension (typically one needs five or higher dimensions, unless one already has some spacetime control on the solution), the decay is stronger than ${1/|t'-t''|^2}$, so that the integrand becomes absolutely integrable and one recovers (6).

Unfortunately it appears that estimates of the form (6) fail in low dimensions (for the type of norms ${N}$ that actually show up in applications); there is just too much interaction between past and future to hope for any reasonable control of this inner product. But one can try to obtain (5) by other means. By the Hahn-Banach theorem (and ignoring various issues related to reflexivity), (5) is equivalent to the assertion that every ${u \in X}$ can be decomposed as ${u = u_+ + u_-}$, where ${\|u_+\|_{Y_+^*} \lesssim \|u\|_X}$ and ${\|u_-\|_{Y_-^*} \lesssim \|v\|_X}$. Indeed once one has such a decomposition, one obtains (5) by computing the inner product of ${u}$ with ${u=u_++u_-}$ in ${X}$ in two different ways. One can also (morally at least) write ${\|u_+\|_{Y_+^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_+\|_{N^*([t_0,t])}}$ and similarly write ${\|u_-\|_{Y_-^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_-\|_{N^*([t,t_1])}}$

So one can dualise the task of proving (5) as that of obtaining a decomposition of an arbitrary initial state ${u}$ into two components ${u_+}$ and ${u_-}$, where the former disperses into the past and the latter disperses into the future under the linear evolution. We do not know how to achieve this type of task efficiently in general – and doing so would likely lead to a significant advance in the subject (perhaps one of the main areas in this topic where serious harmonic analysis is likely to play a major role). But in the model case of spherically symmetric data ${u}$, one can perform such a decomposition quite easily: one uses microlocal projections to set ${u_+}$ to be the “inward” pointing component of ${u}$, which propagates towards the origin in the future and away from the origin in the past, and ${u_-}$ to simimlarly be the “outward” component of ${u}$. As spherical symmetry significantly dilutes the amplitude of the solution (and hence the strength of the nonlinearity) away from the origin, this decomposition tends to work quite well for applications, and is one of the main reasons (though not the only one) why we have a global theory for low-dimensional nonlinear Schrödinger equations in the radial case, but not in general.

The in/out decomposition is a linear one, but the Hahn-Banach argument gives no reason why the decomposition needs to be linear. (Note that other well-known decompositions in analysis, such as the Fefferman-Stein decomposition of BMO, are necessarily nonlinear, a fact which is ultimately equivalent to the non-complemented nature of a certain subspace of a Banach space; see these lecture notes of mine and this old blog post for some discussion.) So one could imagine a sophisticated nonlinear decomposition as a general substitute for the in/out decomposition. See for instance this paper of Bourgain and Brezis for some of the subtleties of decomposition even in very classical function spaces such as ${H^{1/2}(R)}$. Alternatively, there may well be a third way to obtain estimates of the form (5) that do not require either decomposition or the double Duhamel trick; such a method may well clarify the relative relationship between past, present, and future for critical nonlinear dispersive equations, which seems to be a key aspect of the theory that is still only partially understood. (In particular, it seems that one needs a fairly strong decoupling of the present from both the past and the future to get the sort of elliptic-like regularity results that allow us to make further progress with such equations.)

In this post I would like to make some technical notes on a standard reduction used in the (Euclidean, maximal) Kakeya problem, known as the two ends reduction. This reduction (which takes advantage of the approximate scale-invariance of the Kakeya problem) was introduced by Wolff, and has since been used many times, both for the Kakeya problem and in other similar problems (e.g. by Jim Wright and myself to study curved Radon-like transforms). I was asked about it recently, so I thought I would describe the trick here. As an application I give a proof of the ${d=\frac{n+1}{2}}$ case of the Kakeya maximal conjecture.

From Tim Gowers’ blog comes the announcement that the Tricki – a wiki for various tricks and strategies for proving mathematical results – is now live.  (My own articles for the Tricki are also on this blog; also Ben Green has written up an article on using finite fields to prove results about infinite fields which is loosely based on my own post on the topic, which is in turn based on an article of Serre.)  It seems to already be growing at a reasonable rate, with many contributors.

Today I’d like to discuss (in the Tricks Wiki format) a fundamental trick in “soft” analysis, sometimes known as the “limiting argument” or “epsilon regularisation argument”.

Title: Give yourself an epsilon of room.

Quick description: You want to prove some statement $S_0$ about some object $x_0$ (which could be a number, a point, a function, a set, etc.).  To do so, pick a small $\varepsilon > 0$, and first prove a weaker statement $S_\varepsilon$ (which allows for “losses” which go to zero as $\varepsilon \to 0$) about some perturbed object $x_\varepsilon$.  Then, take limits $\varepsilon \to 0$.  Provided that the dependency and continuity of the weaker conclusion $S_\varepsilon$ on $\varepsilon$ are sufficiently controlled, and $x_\varepsilon$ is converging to $x_0$ in an appropriately strong sense, you will recover the original statement.

One can of course play a similar game when proving a statement $S_\infty$ about some object $X_\infty$, by first proving a weaker statement $S_N$ on some approximation $X_N$ to $X_\infty$ for some large parameter N, and then send $N \to \infty$ at the end.

General discussion: Here are some typical examples of a target statement $S_0$, and the approximating statements $S_\varepsilon$ that would converge to $S$:

 $S_0$ $S_\varepsilon$ $f(x_0) = g(x_0)$ $f(x_\varepsilon) = g(x_\varepsilon) + o(1)$ $f(x_0) \leq g(x_0)$ $f(x_\varepsilon) \leq g(x_\varepsilon) + o(1)$ $f(x_0) > 0$ $f(x_\varepsilon) \geq c - o(1)$ for some $c>0$ independent of $\varepsilon$ $f(x_0)$ is finite $f(x_\varepsilon)$ is bounded uniformly in $\varepsilon$ $f(x_0) \geq f(x)$ for all $x \in X$ (i.e. $x_0$ maximises f) $f(x_\varepsilon) \geq f(x)-o(1)$ for all $x \in X$ (i.e. $x_\varepsilon$ nearly maximises f) $f_n(x_0)$ converges as $n \to \infty$ $f_n(x_\varepsilon)$ fluctuates by at most o(1) for sufficiently large n $f_0$ is a measurable function $f_\varepsilon$ is a measurable function converging pointwise to $f_0$ $f_0$ is a continuous function $f_\varepsilon$ is an equicontinuous family of functions converging pointwise to $f_0$ OR $f_\varepsilon$ is continuous and converges (locally) uniformly to $f_0$ The event $E_0$ holds almost surely The event $E_\varepsilon$ holds with probability 1-o(1) The statement $P_0(x)$ holds for almost every x The statement $P_\varepsilon(x)$ holds for x outside of a set of measure o(1)

Of course, to justify the convergence of $S_\varepsilon$ to $S_0$, it is necessary that $x_\varepsilon$ converge to $x_0$ (or $f_\varepsilon$ converge to $f_0$, etc.) in a suitably strong sense. (But for the purposes of proving just upper bounds, such as $f(x_0) \leq M$, one can often get by with quite weak forms of convergence, thanks to tools such as Fatou’s lemma or the weak closure of the unit ball.)  Similarly, we need some continuity (or at least semi-continuity) hypotheses on the functions f, g appearing above.

It is also necessary in many cases that the control $S_\varepsilon$ on the approximating object $x_\varepsilon$ is somehow “uniform in $\varepsilon$“, although for “$\sigma$-closed” conclusions, such as measurability, this is not required. [It is important to note that it is only the final conclusion $S_\varepsilon$ on $x_\varepsilon$ that needs to have this uniformity in $\varepsilon$; one is permitted to have some intermediate stages in the derivation of $S_\varepsilon$ that depend on $\varepsilon$ in a non-uniform manner, so long as these non-uniformities cancel out or otherwise disappear at the end of the argument.]

By giving oneself an epsilon of room, one can evade a lot of familiar issues in soft analysis.  For instance, by replacing “rough”, “infinite-complexity”, “continuous”,  “global”, or otherwise “infinitary” objects $x_0$ with “smooth”, “finite-complexity”, “discrete”, “local”, or otherwise “finitary” approximants $x_\varepsilon$, one can finesse most issues regarding the justification of various formal operations (e.g. exchanging limits, sums, derivatives, and integrals).  [It is important to be aware, though, that any quantitative measure on how smooth, discrete, finite, etc. $x_\varepsilon$ should be expected to degrade in the limit $\varepsilon \to 0$, and so one should take extreme caution in using such quantitative measures to derive estimates that are uniform in $\varepsilon$.]  Similarly, issues such as whether the supremum $M := \sup \{ f(x): x \in X \}$ of a function on a set is actually attained by some maximiser $x_0$ become moot if one is willing to settle instead for an almost-maximiser $x_\varepsilon$, e.g. one which comes within an epsilon of that supremum M (or which is larger than $1/\varepsilon$, if M turns out to be infinite).  Last, but not least, one can use the epsilon room to avoid degenerate solutions, for instance by perturbing a non-negative function to be strictly positive, perturbing a non-strictly monotone function to be strictly monotone, and so forth.

To summarise: one can view the epsilon regularisation argument as a “loan” in which one borrows an epsilon here and there in order to be able to ignore soft analysis difficulties, and can temporarily be able to utilise estimates which are non-uniform in epsilon, but at the end of the day one needs to “pay back” the loan by establishing a final “hard analysis” estimate which is uniform in epsilon (or whose error terms decay to zero as epsilon goes to zero).

A variant: It may seem that the epsilon regularisation trick is useless if one is already in “hard analysis” situations when all objects are already “finitary”, and all formal computations easily justified.  However, there is an important variant of this trick which applies in this case: namely, instead of sending the epsilon parameter to zero, choose epsilon to be a sufficiently small (but not infinitesimally small) quantity, depending on other parameters in the problem, so that one can eventually neglect various error terms and to obtain a useful bound at the end of the day.  (For instance, any result proven using the Szemerédi regularity lemma is likely to be of this type.)  Since one is not sending epsilon to zero, not every term in the final bound needs to be uniform in epsilon, though for quantitative applications one still would like the dependencies on such parameters to be as favourable as possible.

Prerequisites: Graduate real analysis.  (Actually, this isn’t so much a prerequisite as it is a corequisite: the limiting argument plays a central role in many fundamental results in real analysis.)  Some examples also require some exposure to PDE.

Title: Use basic examples to calibrate exponents

Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of exponents in one’s identities, inequalities, and estimates.  For instance, if one is studying a set of N elements, then many expressions that one is faced with will often involve some power $N^p$ of N; if one is instead studying a function f on a measure space X, then perhaps it is an $L^p$ norm $\|f\|_{L^p(X)}$ which will appear instead.  The exponent $p$ involved will typically evolve slowly over the course of the argument, as various algebraic or analytic manipulations are applied.  In some cases, the exact value of this exponent is immaterial, but at other times it is crucial to have the correct value of $p$ at hand.   One can (and should) of course carefully go through one’s arguments line by line to work out the exponents correctly, but it is all too easy to make a sign error or other mis-step at one of the lines, causing all the exponents on subsequent lines to be incorrect.  However, one can guard against this (and avoid some tedious line-by-line exponent checking) by continually calibrating these exponents at key junctures of the arguments by using basic examples of the object of study (sets, functions, graphs, etc.) as test cases.  This is a simple trick, but it lets one avoid many unforced errors with exponents, and also lets one compute more rapidly.

Quick description: When trying to quickly work out what an exponent p in an estimate, identity, or inequality should be without deriving that statement line-by-line, test that statement with a simple example which has non-trivial behaviour with respect to that exponent p, but trivial behaviour with respect to as many other components of that statement as one is able to manage.   The “non-trivial” behaviour should be parametrised by some very large or very small parameter.  By matching the dependence on this parameter on both sides of the estimate, identity, or inequality, one should recover p (or at least a good prediction as to what p should be).

General discussion: The test examples should be as basic as possible; ideally they should have trivial behaviour in all aspects except for one feature that relates to the exponent p that one is trying to calibrate, thus being only “barely” non-trivial.   When the object of study is a function, then (appropriately rescaled, or otherwise modified) bump functions are very typical test objects, as are Dirac masses, constant functions, Gaussians, or other functions that are simple and easy to compute with.  In additive combinatorics, when the object of study is a subset of a group, then subgroups, arithmetic progressions, or random sets are typical test objects.  In graph theory, typical examples of test objects include complete graphs, complete bipartite graphs, and random graphs. And so forth.

This trick is closely related to that of using dimensional analysis to recover exponents; indeed, one can view dimensional analysis as the special case of exponent calibration when using test objects which are non-trivial in one dimensional aspect (e.g. they exist at a single very large or very small length scale) but are otherwise of a trivial or “featureless” nature.   But the calibration trick is more general, as it can involve parameters (such as probabilities, angles, or eccentricities) which are not commonly associated with the physical concept of a dimension.  And personally, I find example-based calibration to be a much more satisfying (and convincing) explanation of an exponent than a calibration arising from formal dimensional analysis.

When one is trying to calibrate an inequality or estimate, one should try to pick a basic example which one expects to saturate that inequality or estimate, i.e. an example for which the inequality is close to being an equality.  Otherwise, one would only expect to obtain some partial information on the desired exponent p (e.g. a lower bound or an upper bound only).  Knowing the examples that saturate an estimate that one is trying to prove is also useful for several other reasons – for instance, it strongly suggests that any technique which is not efficient when applied to the saturating example, is unlikely to be strong enough to prove the estimate in general, thus eliminating fruitless approaches to a problem and (hopefully) refocusing one’s attention on those strategies which actually have a chance of working.

Calibration is best used for the type of quick-and-dirty calculations one uses when trying to rapidly map out an argument that one has roughly worked out already, but without precise details; in particular, I find it particularly useful when writing up a rapid prototype.  When the time comes to write out the paper in full detail, then of course one should instead carefully work things out line by line, but if all goes well, the exponents obtained in that process should match up with the preliminary guesses for those exponents obtained by calibration, which adds confidence that there are no exponent errors have been committed.

As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques.    Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles.  (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)

So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.

Title: The tensor power trick

Quick description: If one wants to prove an inequality $X \leq Y$ for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality $X \leq CY$ that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that $X^M \leq C Y^M$ for all $M \geq 1$, with a constant C which is independent of M, which implies that $X \leq Y$ as desired by taking $M^{th}$ roots and then taking limits as $M \to \infty$.

It occurred to me recently that the mathematical blog medium may be a good venue not just for expository “short stories” on mathematical concepts or results, but also for more technical discussions of individual mathematical “tricks”, which would otherwise not be significant enough to warrant a publication-length (and publication-quality) article. So I thought today that I would discuss the amplification trick in harmonic analysis and combinatorics (and in particular, in the study of estimates); this trick takes an established estimate involving an arbitrary object (such as a function f), and obtains a stronger (or amplified) estimate by transforming the object in a well-chosen manner (often involving some new parameters) into a new object, applying the estimate to that new object, and seeing what that estimate says about the original object (after optimising the parameters or taking a limit). The amplification trick works particularly well for estimates which enjoy some sort of symmetry on one side of the estimate that is not represented on the other side; indeed, it can be viewed as a way to “arbitrage” differing amounts of symmetry between the left- and right-hand sides of an estimate. It can also be used in the contrapositive, amplifying a weak counterexample to an estimate into a strong counterexample. This trick also sheds some light as to why dimensional analysis works; an estimate which is not dimensionally consistent can often be amplified into a stronger estimate which is dimensionally consistent; in many cases, this new estimate is so strong that it cannot in fact be true, and thus dimensionally inconsistent inequalities tend to be either false or inefficient, which is why we rarely see them. (More generally, any inequality on which a group acts on either the left or right-hand side can often be “decomposed” into the “isotypic components” of the group action, either by the amplification trick or by other related tools, such as Fourier analysis.)

The amplification trick is a deceptively simple one, but it can become particularly powerful when one is arbitraging an unintuitive symmetry, such as symmetry under tensor powers. Indeed, the “tensor power trick”, which can eliminate constants and even logarithms in an almost magical manner, can lead to some interesting proofs of sharp inequalities, which are difficult to establish by more direct means.

The most familiar example of the amplification trick in action is probably the textbook proof of the Cauchy-Schwarz inequality

$|\langle v, w \rangle| \leq \|v\| \|w\|$ (1)

for vectors v, w in a complex Hilbert space. To prove this inequality, one might start by exploiting the obvious inequality

$\|v-w\|^2 \geq 0$ (2)

but after expanding everything out, one only gets the weaker inequality

$\hbox{Re} \langle v, w \rangle \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2$. (3)

Now (3) is weaker than (1) for two reasons; the left-hand side is smaller, and the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality). However, we can amplify (3) by arbitraging some symmetry imbalances. Firstly, observe that the phase rotation symmetry $v \mapsto e^{i\theta} v$ preserves the RHS of (3) but not the LHS. We exploit this by replacing v by $e^{i\theta} v$ in (3) for some phase $\theta$ to be chosen later, to obtain

$\hbox{Re} e^{i\theta} \langle v, w \rangle \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2$.

Now we are free to choose $\theta$ at will (as long as it is real, of course), so it is natural to choose $\theta$ to optimise the inequality, which in this case means to make the left-hand side as large as possible. This is achieved by choosing $e^{i\theta}$ to cancel the phase of $\langle v, w \rangle$, and we obtain

$|\langle v, w \rangle| \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2$ (4)

This is closer to (1); we have fixed the left-hand side, but the right-hand side is still too weak. But we can amplify further, by exploiting an imbalance in a different symmetry, namely the homogenisation symmetry $(v,w) \mapsto (\lambda v, \frac{1}{\lambda} w)$ for a scalar $\lambda > 0$, which preserves the left-hand side but not the right. Inserting this transform into (4) we conclude that

$|\langle v, w \rangle| \leq \frac{\lambda^2}{2} \|v\|^2 + \frac{1}{2\lambda^2} \|w\|^2$

where $\lambda > 0$ is at our disposal to choose. We can optimise in $\lambda$ by minimising the right-hand side, and indeed one easily sees that the minimum (or infimum, if one of v and w vanishes) is $\|v\| \|w\|$ (which is achieved when $\lambda = \sqrt{\|w\|/\|v\|}$ when $v,w$ are non-zero, or in an asymptotic limit $\lambda \to 0$ or $\lambda \to \infty$ in the degenerate cases), and so we have amplified our way to the Cauchy-Schwarz inequality (1). [See also this discussion by Tim Gowers on the Cauchy-Schwarz inequality.]