This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips.  Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at.  Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

1.  Split up equalities into inequalities.

If one has to show that two numerical quantities X and Y are equal, try proving that $X \leq Y$ and $Y \leq X$ separately.  (Often one of these will be very easy, and the other one harder; but the easy direction may still provide some clue as to what needs to be done to establish the other direction.)

In a similar spirit, to show that two sets E and F are equal, try proving that $E \subset F$ and $F \subset E$.

2.  Give yourself an epsilon of room.

If one has to show that $X \leq Y$, try proving that $X \leq Y+\varepsilon$ for any $\varepsilon > 0$. (This trick combines well with Trick 1.)

In a similar spirit, if one needs to show that a quantity $X$ vanishes, try showing that $|X| \leq \varepsilon$ for every $\varepsilon > 0$.

Or: if one wishes to show that two functions $f, g$ agree almost everywhere, try showing first that $|f(x)-g(x)| \leq \varepsilon$ holds for almost every x, or even just outside of a set of measure at most $\varepsilon$, for any given $\varepsilon > 0$.

Or: if one wants to show that a sequence $x_n$ of real numbers converges to zero, try showing that $\limsup_{n \to \infty} |x_n| \leq \varepsilon$ for every $\varepsilon > 0$.

Don’t be too focused on getting all your error terms adding up to exactly $\varepsilon$ – usually, as long as the final error bound consists of terms that can all be made as small as one wishes by choosing parameters in a suitable way, that is enough.  For instance, an error term such as $10\varepsilon$ is certainly OK, or even more complicated expressions such as $10 \varepsilon/\delta + 4 \delta$ if one has the ability to choose $\delta$ as small as one wishes, and then after $\delta$ is chosen, one can then also set $\varepsilon$ as small as one wishes (in a manner that can depend on $\delta$).

One caveat: for finite $x$, and any $\varepsilon > 0$, it is true that $x+\varepsilon > x$ and $x-\varepsilon < x$, but this statement is not true when $x$ is equal to $+\infty$ (or $-\infty$).  So remember to exercise some care with the epsilon of room trick when some quantities are infinite.

3.  Decompose or approximate a rough or general object by a smoother or simpler one.

If one has to prove something about an unbounded (or infinite measure) set, consider proving it for bounded (or finite measure) sets first if this looks easier.

Similarly:

1. if one has to prove something about a measurable set, try proving it for open, closed, compact, bounded, or elementary sets first.
2. if one has to prove something about a measurable function, try proving it for functions that are continuous, bounded, compactly supported, simple, absolutely integrable, etc.
3. if one has to prove something about an infinite sum or sequence, try proving it first for finite truncations of that sum or sequence (but try to get all the bounds independent of the number of terms in that truncation, so that you can still pass to the limit!).
4. if one has to prove something about a complex-valued function, try it for real-valued functions first.
5. If one has to prove something about a real-valued function, try it for unsigned functions first.
6. If one has to prove something about a simple function, try it for indicator functions first.

In order to pass back to the general case from these special cases, one will have to somehow decompose the general object into a combination of special ones, or approximate general objects by special ones (or as a limit of a sequence of special objects).  In the latter case, one may need an epsilon of room (trick 2), and some sort of limiting analysis may be needed to deal with the errors in the approximation (it is not always enough to just “pass to the limit”, as one has to justify that the desirable properties of the approximating object are preserved in the limit).

Note: one should not do this blindly, as one might then be loading on a bunch of distracting but ultimately useless hypotheses that end up being a lot less help than one might hope.  But they should be kept in mind as something to try if one starts having thoughts such as “Gee, it would be nice at this point if I could assume that $f$ is continuous / real-valued / simple / unsigned / etc. “.

In the more quantitative areas of analysis and PDE, one sees a common variant of the above technique, namely the method of a priori estimates.  Here, one needs to prove an estimate or inequality for all functions in a large, rough class (e.g. all rough solutions to a PDE).  One can often then first prove this inequality in a much smaller (but still “dense”) class of “nice” functions, so that there is little difficulty justifying the various manipulations (e.g. exchanging integrals, sums, or limits, or integrating by parts) that one wishes to perform.  Once one obtains these a priori estimates, one can then often take some sort of limiting argument to recover the general case.

4.  If one needs to flip an upper bound to a lower bound or vice versa, look for a way to take reflections or complements.

Sometimes one needs a lower bound for some quantity, but only has techniques that give upper bounds.  In some cases, though, one can “reflect” an upper bound into a lower bound (or vice versa) by replacing a set $E$ contained in some space $X$ with its complement $X \backslash E$, or a function $f$ with its negation $-f$ (or perhaps subtracting $f$ from some dominating function $F$ to obtain $F-f$).  This trick works best when the objects being reflected are contained in some sort of “bounded”, “finite measure”, or “absolutely integrable” container, so that one avoids having the dangerous situation of having to subtract infinite quantities from each other.

5.  Uncountable unions can sometimes be replaced by countable or finite unions.

Uncountable unions are not well-behaved in measure theory; for instance, an uncountable union of null sets need not be a null set (or even a measurable set).  (On the other hand, the uncountable union of open sets remains open; this can often be important to know.)  However, in many cases one can replace an uncountable union by a countable one.  For instance, if one needs to prove a statement for all $\varepsilon > 0$, then there are an uncountable number of $\varepsilon$‘s one needs to check, which may threaten measurability; but in many cases it is enough to only work with a countable sequence of $\varepsilon$s, such as the numbers $1/m$ for $m=1,2,3,\ldots$.

In a similar spirit, given a real parameter $\lambda$, this parameter initially ranges over uncountably many values, but in some cases one can get away with only working with a countable set of such values, such as the rationals.  In a similar spirit, rather than work with all boxes (of which there are uncountably many), one might work with the dyadic boxes (of which there are only countably many; also, they obey nicer nesting properties than general boxes and so are often desirable to work with in any event).

If you are working on a compact set, then one can often replace even uncountable unions with finite ones, so long as one is working with open sets.  When this option is available, it is often worth spending an epsilon of measure (or whatever other resource is available to spend) to make one’s sets open, just so that one can take advantage of compactness.

6.  If it is difficult to work globally, work locally instead.

A domain such as Euclidean space ${\bf R}^d$ has infinite measure, and this creates a number of technical difficulties when trying to do measure theory directly on such spaces.  Sometimes it is best to work more locally, for instance working on a large ball $B(0,R)$ or even a small ball such as $B(x,\varepsilon)$ first, and then figuring out how to patch things together later.  Compactness (or the closely related property of total boundedness) is often useful for patching together small balls to cover a large ball.  Patching together large balls into the whole space tends to work well when the properties one are trying to establish are local in nature (such as continuity, or pointwise convergence) or behave well with respect to countable unions.  For instance, to prove that a sequence of functions $f_n$ converges pointwise almost everywhere to $f$ on ${\bf R}^d$, it suffices to verify this pointwise almost everywhere convergence on the ball $B(0,R)$ for every $R > 0$ (which one can take to be an integer to get countability, see Trick 5).

7.  Be willing to throw away an exceptional set.

The “Lebesgue philosophy” to measure theory is that null sets are often “irrelevant”, and so one should be very willing to cut out a set of measure zero on which bad things are happening (e.g. a function is undefined or infinite, a sequence of functions is not converging, etc.).  One should also be only slightly less willing to throw away sets of positive but small measure, e.g. sets of measure at most $\varepsilon$.  If such sets can be made arbitrarily small in measure, this is often almost as good as just throwing away a null set.

Many things in measure theory improve after throwing away a small set.  The most notable examples of this are Egorov’s theorem (pointwise a.e. convergence becomes locally uniform convergence after throwing away a small set) and Lusin’s theorem (measurable functions become continuous after throwing away a small set).  A related (and simpler) principle is that a measurable function $f$ that is finite almost everywhere can become locally bounded after throwing away a small set (this can be seen by applying downward monotone convergence to exceptional sets such as $\{ x \in B(0,R): |f(x)| \geq N \}$ as $N \to \infty$).  From Markov’s inequality, one also sees that a function that is small in $L^1$ norm becomes small uniformly after throwing away a small set.  (The same is true for functions that are small in other $L^p$ norms, or (by definition) for functions that are converging to zero in measure.)

A little later in this course, we will also start seeing a similar trick of throwing away most of a sequence and working with a subsequence instead.  (This trick can also be interpreted as “throwing away a small set”, but to understand what “small” means in this context, one needs the language of ultrafilters, which will not be discussed here.)  See Trick 17 below.

8.  Draw pictures and try to build counterexamples.

Measure theory, particularly on Euclidean spaces, has a significant geometric aspect to it, and you should be exploiting your geometric intuition.  Drawing pictures and graphs of all the objects being studied is a good way to start.  These pictures need not be completely realistic; they should be just complicated enough to hint at the complexities of the problem, but not more.  (For instance, usually one- or two-dimensional pictures suffice for understanding problems in ${\bf R}^d$; drawing intricate 3D (or 4D, etc.) pictures does not often make things simpler.  To indicate that a function is not continuous, one or two discontinuities or oscillations might suffice; make it too ornate and it becomes less clear what to do about that function.)

A common mistake is to try to draw a picture in which both the hypotheses and conclusion of the problem hold.  This is actually not all that useful, as it often does not reveal the causal relationship between the former and the latter.  One should try instead to draw a picture in which the hypotheses hold but for which the conclusion does not – in other words, a counterexample to the problem.  Of course, you should be expected to fail at this task, given that the statement of the problem is presumably true. However, the way in which your picture fails to accomplish this task is often very instructive, and can reveal vital clues as to how the solution to the problem is supposed to proceed.

9.  Try simpler cases first.

This advice of course extends well beyond measure theory, but if one is completely stuck on a problem, try making the problem simpler (while still capturing at least one of the difficulties of the problem that you cannot currently resolve).  For instance, if faced with a problem in ${\bf R}^d$, try the one-dimensional case $d=1$ first.  Faced with a problem about a general measurable function $f$, try a much simpler case first, such as an indicator function $f = 1_E$.  Faced with a problem about a general measurable set, try an elementary set first.  Faced with a problem about a sequence of functions, try a monotone sequence of functions first.  And so forth.  (Note that this trick overlaps quite a bit with Trick 3.)

The problem should not be made so simple that it becomes trivial, as this doesn’t really gain you any new insight about the original problem; instead, one should try to keep the “essential” difficulties of the problem while throwing away those aspects that you think are less important (but are still serving to add to the overall difficulty level).

On the other hand, if the simplified problem is unexpectedly easy, but one cannot extend the methods to the general case (or somehow leverage the simplified case to the general case, as in Trick 3), this is an indication that the true difficulty lies elsewhere.  For instance, if a problem involving general functions could be solved easily for monotone functions, but one cannot then extend that argument to the general case, this suggests that the true enemy is oscillation, and perhaps one should try another simple case in which the function is allowed to be highly oscillatory (but perhaps simple in other ways, e.g. bounded with compact support).

10.  Abstract away any information that you believe or suspect to be irrelevant.

Sometimes one is faced with an embarrassment of riches when it comes to what choice of technique to use on a problem; there are so many different facts that one knows about the problem, and so many different pieces of theory that one could apply, that one doesn’t quite know where to begin.

When this happens, abstraction can be a vital tool to clear away some of the conceptual clutter.  Here, one wants to “forget” part of the setting that the problem is phrased in, and only keep the part that seems to be most relevant to the hypotheses and conclusions of the problem (and thus, presumably, to the solution as well).

For instance, if one is working in a problem that is set in Euclidean space ${\bf R}^d$, but the hypotheses and conclusions only involve measure-theoretic concepts (e.g. measurability, integrability, measure, etc.) rather than topological structure, metric structure, etc., then it may be worthwhile to try abstracting the problem to the more general setting of an abstract measure space, thus forgetting that one was initially working in ${\bf R}^d$.   The point of doing this is that it cuts down on the number of possible ways to start attacking the problem.  For instance, facts such as outer regularity (every measurable set can be approximated from above by an open set) do not hold in abstract measure spaces (which do not even have a meaningful notion of an open set), and so presumably will not play a role in the solution; similarly for any facts involving boxes.  Instead, one should be trying to use general facts about measure, such as countable additivity, which are not specific to ${\bf R}^d$.

[It is worth noting that sometimes this abstraction method does not always work; for instance, when viewed as a measure space, ${\bf R}^d$ is not completely arbitrary, but does have one or two features that distinguish it from a generic measure space, most notably the fact that it is $\sigma$-finite.  So, even if the hypotheses and conclusion of a problem about ${\bf R}^d$ is purely measure-theoretic in nature, one may still need to use some measure-theoretic facts specific to ${\bf R}^d$.  Here, it becomes useful to know a little bit about the classification of measure spaces to have some intuition as to how “generic” a measure space such as ${\bf R}^d$ really is.  This intuition is hard to convey at this level of the subject, but in general, measure spaces form a very “non-rigid” category, with very few invariants, and so it is largely true that one measure space usually behaves much the same as any other.]

Another example of abstraction: suppose that a problem involves a large number of sets (e.g. $E_n$ and $F_n$) and their measures, but that the conclusion of the problem only involves the measures $m(E_n), m(F_n)$ of the sets, rather than the sets themselves.  Then one can try to abstract the sets out of the problem, by trying to write down every single relationship between the numerical quantities $m(E_n), m(F_n)$ that one can easily deduce from the given hypotheses (together with basic properties of measure, such as monotonicity or countable additivity).  One can then rename these quantities (e.g. $a_n := m(E_n)$ and $b_n := m(F_n)$) to “forget” that these quantities arose from a measure-theoretic context, and then work with a purely numerical problem, in which one is starting with hypotheses on some sequences $a_n, b_n$ of numbers and trying to deduce a conclusion about such sequences.  Such a problem is often easier to solve than the original problem due to the simpler context.  Sometimes, this simplified problem will end up being false, but the counterexample will often be instructive, either in indicating the need to add an additional hypothesis connecting the $a_n, b_n$, or to indicate that one cannot work at this level of abstraction but must introduce some additional concrete ingredient.

Note that this trick is in many ways the antithesis of Trick 9, because by passing to a special case, one often makes the problem more concrete, with more things that one is now able to start trying.  However, the two tricks can work together.  One particularly useful “advanced move” in mathematical problem solving is to first abstract the problem to a more general one, and then consider a special case of that more abstract problem which is not directly related to the original one, but is somehow simpler than the original while still capturing some of the essence of the difficulty.   Attacking this alternate problem can then lead to some indirect but important ways to make progress on the original problem.

11.  Exploit Zeno’s paradox: a single epsilon can be cut up into countably many sub-epsilons.

A particularly useful fact in measure theory is that one can cut up a single epsilon into countably many pieces, for instance by using the geometric series identity

$\varepsilon = \varepsilon/2 + \varepsilon/4 + \varepsilon/8 + \ldots;$

this observation arguably goes all the way back to Zeno.  As such, even if one only has an epsilon of room budgeted for a problem, one can still use this budget to do a countably infinite number of things.  This fact underlies many of the countable additivity and subadditivity properties in measure theory, and also makes the ability to approximate rough objects by smoother ones to be useful even when countably many rough objects need to be approximated.

In general, one should be alert to this sort of trick when one has to spend an epsilon or so on an infinite number of objects.  If one was forced to spend the same epsilon on each object, one would soon end up with an unacceptable loss; but if one can get away with using a different epsilon each time, then Zeno’s trick comes in very handy.

12.  If you expand your way to a double sum, a double integral, a sum of an integral, or an integral of a sum, try interchanging the two operations.

Or, to put it another way: “The Fubini-Tonelli theorem is your friend”.  Provided that one is either in the unsigned or absolutely convergent worlds, this theorem allows you to interchange sums and integrals with each other.  In many cases, a double sum or integral that is difficult to sum in one direction can become easier to sum (or at least to upper bound, which is often all that one needs in analysis).  In fact, if in the course of expanding an expression, you encounter such a double sum or integral, you should reflexively try interchanging the operations to see if the resulting expression looks any simpler.

Note that in some cases the parameters in the summation may be constrained, and one may have to take a little care to sum it properly.  For instance,

$\sum_{n=-\infty}^\infty \sum_{m=n}^\infty a_{m,n}$ (1)

interchanges (assuming that the $a_{n,m}$ are either unsigned or absolutely convergent) to

$\sum_{m=-\infty}^\infty \sum_{n=-\infty}^m a_{m,n}$

(why? try plotting the set of pairs $(m,n)$ that appear in both).  If one is having trouble interchanging constrained sums or integrals, one solution is to re-express the constraint using indicator functions. For instance, one can rewrite the constrained sum (1) as the unconstrained sum

$\sum_{n=-\infty}^\infty \sum_{m=-\infty}^\infty 1_{m \geq n} a_{m,n}$

(extending the domain of $a_{m,n}$ if necessary), at which point interchanging the summations is easily accomplished.

The following point is obvious, but bears mentioning explicitly: while the interchanging sums/integrals trick can be very powerful, one should not apply it twice in a row to the same double sum or double operation, unless one is doing something genuinely non-trivial in between those two applications.  So, after one exchanges a sum or integral, the next move should be something other than another exchange (unless one is dealing with a triple or higher sum or integral).

A related move (not so commonly used in measure theory, but occurring in other areas of analysis, particularly those involving the geometry of Euclidean spaces) is to merge two sums or integrals into a single sum or integral over the product space, in order to use some additional feature of the product space (e.g. rotation symmetry) that is not readily visible in the factor spaces.  The classic example of this trick is the evaluation of the gaussian integral $\int_{-\infty}^\infty e^{-x^2}\ dx$ by squaring it, rewriting that square as the two-dimensional gaussian integral $\int_{{\bf R}^2} e^{-x^2-y^2}\ dx dy$, and then switching to polar coordinates.

13.  Pointwise control, uniform control, and integrated (average) control are all partially convertible to each other.

There are three main ways to control functions (or sequences of functions, etc.) in analysis.  Firstly, there is pointwise control, in which one can control the function at every point (or almost every point), but in a non-uniform way.  Then there is uniform control, where one can control the function in the same way at most points (possibly throwing out a set of zero measure, or small measure).  Finally, there is integrated control (or control “on the average”), in which one controls the integral of a function, rather than the pointwise values of that function.

It is important to realise that control of one type can often be partially converted to another type.  Simple examples include the deduction of pointwise convergence from uniform convergence, or integrating a pointwise bound $f(x) \leq g(x)$ to obtain an integrated bound $\int f \leq \int g$.  Of course, these conversions are not reversible and thus lose some information; not every pointwise convergent sequence is uniformly convergent, and an integral bound does not imply a pointwise bound.  However, one can partially reverse such implications if one is willing to throw away an exceptional set (Trick 7).  For instance, Egorov’s theorem lets one convert pointwise convergence to (local) uniform convergence after throwing away an exceptional set, and Markov’s inequality lets one convert integral bounds to pointwise bounds, again after throwing away an exceptional set.

14.  If the conclusion and hypotheses look particularly close to each other, just expand out all the definitions and follow your nose.

This trick is particularly useful when building the most basic foundations of a theory.  Here, one may not need to experiment too much with generalisations, abstractions, or special cases, or try to marshall a lot of possibly relevant facts about the objects being studied: sometimes, all one has to do is go back to first principles, write out all the definitions with their epsilons and deltas, and start plugging away at the problem.

Knowing when to just follow one’s nose, and when to instead look for a more high-level approach to a problem, can require some judgement or experience.  A direct approach tends to work best when the conclusion and hypothesis already look quite similar to each other (e.g. they both state that a certain set or family of sets is measurable, or they both state that a certain function or family of functions is continuous, etc.).  But when the conclusion looks quite different from the hypotheses (e.g. the conclusion is some sort of integral identity, and the hypotheses involve measurability or convergence properties), then one may need to use more sophisticated tools than what one can easily get from using first principles.

15.  Don’t worry too much about exactly what $\varepsilon$ (or $\delta$, or $N$, etc.) needs to be.  It can usually be chosen or tweaked later if necessary.

Often in the middle of an argument, you will want to use some fact that involves a parameter, such as $\varepsilon$, that you are completely free to choose (subject of course to reasonable constraints, such as requiring $\varepsilon$ to be positive).  For instance, you may have a measurable set and decide to approximate it from above by an open set of at most $\varepsilon$ more measure.  But it may not be obvious exactly what value to give this parameter $\varepsilon$; you have so many choices available that you don’t know which one to pick!

In many cases, one can postpone thinking about this problem by leaving $\varepsilon$ undetermined for now, and continuing on with one’s argument, which will gradually start being decorated with $\varepsilon$‘s all over the place.  At some point, one will need $\varepsilon$ to do something (and, in the particular case of $\varepsilon$, “doing something” almost always means “being small enough”), e.g. one may need $3n\varepsilon$ to be less than $\delta$, where $n, \delta$ are some other positive quantities in one’s problem that do not depend on $\varepsilon$.  At this point, one could now set $\varepsilon$ to be whatever is needed to get past this step in the argument, e.g. one could set $\varepsilon$ to equal $\delta/4n$.  But perhaps one still wishes to retain the freedom to set $\varepsilon$ because it might come in handy later.  In that case, one sets aside the requirement “$3n \varepsilon < \delta$” and keeps going.  Perhaps a bit later on, one might need $\varepsilon$ to do something else; for instance, one might also need $5\varepsilon \leq 2^{-n}$.  Once one has compiled the complete “wish list” of everything one wishes one’s parameters to do, then one can finally make the decision of what value to set those parameters equal to.  For instance, if the above two inequalities are the only inequalities required of $\varepsilon$, one can choose $\varepsilon$ equal to $\min( \delta/4n, 2^{-n}/5)$.  This may be a choice of $\varepsilon$ which was not obvious at the start of the argument, but becomes so as the argument progresses.

There is however one big caveat when adopting this “choose parameters later” approach, which is that one needs to avoid a circular dependence of constants.  For instance, it is perfectly fine to have two arbitrary parameters $\varepsilon$ and $\delta$ floating around unspecified for most of the argument, until at some point you realise that you need $\varepsilon$ to be smaller than $\delta$, and so one chooses $\varepsilon$ accordingly (e.g. one sets it to equal $\delta/2$).  Or, perhaps instead one needs $\delta$ to be smaller than $\varepsilon$, and so sets $\delta$ equal to $\varepsilon/2$.  One can execute either of these two choices separately, but of course one cannot perform them simultaneously; this sets up an inconsistent circular dependency in which $\varepsilon$ needs to be defined after $\delta$ is chosen, and $\delta$ can only be chosen after $\varepsilon$ is fixed.  So, if one is going to delay choosing a parameter such as $\varepsilon$ until later, it becomes important to mentally keep track of what objects in one’s argument depend on $\varepsilon$, and which ones are independent of $\varepsilon$.  One can choose $\varepsilon$ in terms of the latter quantities, but one usually cannot do so in terms of the former quantities (unless one takes the care to show that the interlinked constraints between the quantities are still consistent, and thus simultaneously satisfiable).

See also the Tricki article “Keep parameters unspecified until it is clear how to optimize them“.

16.  Once one has started to lose some constants, don’t be hesitant to lose some more.

Many techniques in analysis end up giving inequalities that are inefficient by a constant factor.  For instance, any argument involving dyadic decomposition and powers of two tends to involve losses of factors of 2.  When arguing using balls in Euclidean space, one sometimes loses factors involving the volume of the unit ball (although this factor often cancels itself out if one tracks it more carefully).  And so forth.  However, in many cases these constant factors end up being of little importance: an upper bound of $2\varepsilon$ or $100\varepsilon$ is often just as good as an upper bound of $\varepsilon$ for the purposes of analysis (cf. Trick 15).  So it is often best not to invest too much energy in carefully computing and optimising these constants; giving these constants a symbol such as $C$, and not worrying about their exact value, is often the simplest approach.  (One can also use asymptotic notation, such as $O()$, which is very convenient to use once you know how it works.)

Now there are some cases in which one really does not want to lose any constants at all.  For instance, if one is using Trick 1 to prove that $X=Y$, it is not enough to show that $X \leq 2Y$ and $Y \leq 2X$; one really needs to show $X \leq Y$ and $Y \leq X$ without losing any constants.  (But proving $X \leq (1+\varepsilon)Y$ and $Y \leq (1+\varepsilon)X$ is OK, by Trick 2.)  But once one has already performed one step that loses a constant, there is little further to be lost by losing more; there can be a big difference between $X \leq Y$ and $X \leq 2Y$, but there is little difference in practice between $X \leq 2Y$ and $X \leq 100Y$, at least for the purposes of mathematical analysis.  At that stage, one should put oneself in the mental mode of thought where “constants don’t matter”, which can lead to some simplifications.  For instance, if one has to estimate a sum $X+Y$ of two positive quantities, one can start using such estimates as

$\max(X,Y) \leq X+Y \leq 2 \max(X,Y),$

which says that, up to afactor of $2$, $X+Y$ is the same thing as $\max(X,Y)$.  In some cases the latter is easier to work with (e.g. $\max(X,Y)^n$ is equal to $\max(X^n,Y^n)$, whereas the formula for $(X+Y)^n$ is messier).

17.  One can often pass to a subsequence to improve the convergence properties.

In real analysis, one often ends up possessing a sequence of objects, such as a sequence of functions $f_n$, which may converge in some rather slow or weak fashion to a limit $f$.  Often, one can improve the convergence of this sequence by passing to a subsequence.  For instance:

1. In a metric space, if a sequence $x_n$ converges to a limit $x$, then one can find a subsequence $x_{n_j}$ which converges quickly to the same limit $x$, for instance one can ensure that $d(x_{n_j},x) \leq 2^{-j}$ (or one can replace $2^{-j}$ with any other positive expression depending on $j$).  In particular, one can make $\sum_{j=1}^\infty d(x_{n_j},x)$ and $\sum_{j=1}^\infty d(x_{n_j},x_{n_{j+1}})$ absolutely convergent, which is sometimes useful.
2. A sequence of functions that converges in $L^1$ norm or in measure can be refined to a subsequence that converges pointwise almost everywhere as well.
3. A sequence in a (sequentially) compact space may not converge at all, but some subsequence of it will always converge.
4. The pigeonhole principle: A sequence which takes only finitely many values has a subsequence that is constant.  More generally, a sequence which lives in the union of finitely many sets has a subsequence that lives in just one of these sets.

Often, the subsequence is good enough for one’s applications, and there are also a number of ways to get back from a subsequence to the original sequence:

1. In a metric space, if you know that $x_n$ is a Cauchy sequence, and some subsequence of $x_n$ already converges to $x$, then this drags the entire sequence with it, i.e. $x_n$ converges to $x$ also.
2. The Urysohn subsequence principle: in a topological space, if every subsequence of a sequence $x_n$ itself has a subsequence that converges to a limit $x$, then the entire sequence converges to $x$.

18.  A real limit can be viewed as a meeting of the limit superior and limit inferior.

A sequence $x_n$ of real numbers does not necessarily have a limit $\lim_{n \to \infty} x_n$, but the limit superior $\limsup_{n \to \infty} x_n := \inf_N \sup_{n>N} x_n$ and the limit inferior $\liminf_{n \to \infty} x_n = \sup_N \inf_{n>N} x_n$ always exist (though they may be infinite), and can be easily defined in terms of infima and suprema.  Because of this, it is often convenient to work with the lim sup and lim inf instead of a limit.  For instance, to show that a limit $\lim_{n \to \infty} x_n$ exists, it suffices to show that

$\limsup_{n \to \infty} x_n \leq \liminf_{n \to \infty} x_n + \varepsilon$

for all $\varepsilon > 0$.   In a similar spirit, to show that a sequence $x_n$ of real numbers converges to zero, it suffices to show that

$\limsup_{n \to \infty} |x_n| \leq \varepsilon$

for all $\varepsilon > 0$.  It can be more convenient to work with lim sups and lim infs instead of limits because one does not need to worry about the issue of whether the limit exists or not, and many tools (notably Fatou’s lemma and its relatives) still work in this setting.  One should however be cautious that lim sups and lim infs tend to have only one half of the linearity properties that limits do; for instance, lim sups are subadditive but not necessarily additive, while lim infs are superadditive but not necessarily additive.