Nothing is particularly hard if you divide it into small jobs. (Henry Ford)

A typical argument in modern mathematics is often quite intricate, requiring many different steps, ingredients, and notation. The authors of the argument, having been intimately involved in all aspects of its construction, often do not realise just how complicated such an argument appears to a reader who is encountering it for the first time (I myself have been guilty of this oversight).

Part of the reason for this is that there is plenty of implicit structure in a paper which is crucial to understanding it properly, and which is known to the authors, but is not readily apparent to the readers. Suppose for instance that part of a paper goes like this:

  • In Section 2, facts A, B, and C are derived, and then used to deduce D.
  • In Section 3, D is used to derive E.
  • In Section 4, D, and another fact F, are used to derive G.
  • In Section 5, E, G, and another fact H, are used to derive I.

A reader who is going through this paper one section at a time will try to keep A, B, C, and D all in mind after finishing Section 2, and moving on to later sections.  However, facts A, B, and C are never used again; they were instrumental to the argument because they allowed one to establish D, but once D is established, A, B, and C can be safely forgotten. Note, though that the reader does not know this. As a consequence, while reading Sections 3, 4, and 5, the reader has to set aside some of his or her mental resources to retain some facts which are of no further use, thus obscuring the structure of the argument and making it more difficult to follow.

Now one could address this by placing some remarks at the end of Section 2 along the lines of “Facts A, B, and C will not be used again in the rest of the paper”, or by devoting more thought to organising and motivating the paper. These are all worthwhile things to do, but a much more elegant solution is simply to encapsulate D as a lemma, and to place A, B, C inside the proof of that lemma. This conveys several useful structural cues to the reader: firstly, that D is likely to be an important fact to use in later parts of the argument, and secondly, that A, B, and C are not needed elsewhere in the paper and can be safely forgotten. This additional structure will be useful to all readers, but will be especially appreciated by those readers who are already expert in how to prove facts such as D, since they can then glance at the statement of the lemma, readily convince themselves that the lemma is plausible (possibly by using other tools than A, B, and C), and then quickly move on to the next part of the argument. One can also add some remarks “for the experts” after the proof of a lemma, discussing possible alternate proofs, refinements, special cases, connections to other lemmas in the literature, etc.

We have seen how “folding” the argument into lemmas can reduce the complexity of that argument by “localising” certain facts in the argument. The same method can also be used to localise notation, e.g. some special-purpose notation used to prove D but is not used elsewhere in the paper. The design philosophy here is similar to that of information hiding in software engineering. (Other relevant software engineering philosophies for mathematical writing include structured programming and modularity.)

Lemmas also provide a good opportunity to explicitly “recap” all the running hypotheses, assumptions, and notational conventions that have already been introduced in the argument, which can be invaluable to a reader who has misunderstood or forgotten about part of this implied context. (Sometimes, such a recap would be tediously long, in which case a sentence such as “Let the notation and assumptions be as above” may suffice. When these situations occur, one might wish to formalise all the running assumptions by judiciously introducing some good notation.)

In some cases, the conclusion of the lemma may only need a portion of these hypotheses; this might be worth stating explicitly within the statement of the lemma, as it can clarify the nature of that lemma, and may also make it more useful for future applications.

One should write the statement of a lemma in a way that makes it easy to use, rather than easy to prove. Thus, one should try to make the hypotheses of the lemma natural and easy to verify, and the conclusions of the lemma manifestly useful. Basically, the idea is to push as much of the details of the argument into the lemma as one can, to make the rest of the argument as simple as possible. Also, it may end up that you (or someone else) will eventually find a simpler proof of that lemma, thus reducing the net complexity of the paper markedly (cf. the object-oriented approach to software engineering).

Folding the argument into lemmas also makes it easier to write a rapid prototype, as once one finalises the statement of the lemma, one can defer the proof of the lemma until later.

In summary, it’s almost always a good idea to have plenty of lemmas (and propositions and corollaries too, of course) in an argument; it makes the overall structure of the argument more apparent, it makes the argument easier to follow, and can also provide some useful tools for future work in the area.

There is however one exception to the above rule.  If you have a pair of technical lemmas, neither of which is of much interest in its own right, but only become useful when used together, then they should be unified into  single lemma which will be easier for the reader to keep in mind.  For instance, suppose your paper contains the following two lemmas:

Lemma 15. If Natural Hypothesis A is true, then Technical Statement B is true.

Lemma 16. If Technical Statement B is true, then Natural Conclusion C is true.

If Lemma 16 is the only place where B is used, and Lemma 15 is the only place where B is verified, then it can be better to unify these two lemmas as

Lemma 15′. If Natural Hypothesis A is true, then Natural Conclusion C is true.

Now, the technical statement B has been encapsulated into the proof of Lemma 15′ (which, of course, will be a concatenation of the proofs of Lemma 15 and Lemma 16), and no longer needs to make an appearance outside of this lemma.  In some cases it may also be appropriate to insert something like

Sublemma 16′. Under the hypotheses of Lemma 15′, technical condition B is true.

within the proof of Lemma 15′.

This improves the paper for a number of related reasons:

  • The reader can safely forget about B after leaving the proof of Lemma 15′, thus freeing up mental space.
  • An expert reader might be able to find an alternate proof of Lemma 15′ that avoids B, or at least a good heuristic argument that convinces that reader that Lemma 15′ is true.  Then that reader can skip the proof of Lemma 15′ completely, and not have to deal with B at all.
  • It may be that a followup paper may find a simpler proof of Lemma 15′ that avoids B, in which case that paper can simplify the original paper without having to make any modifications other than the reproof of Lemma 15′.  (In some cases, the very act of reorganising the paper may prompt the reorganiser to find such a simplification by herself or himself.)