Nothing is particularly hard if you divide it into small jobs. (Henry Ford)
A typical argument in modern mathematics is often quite intricate, requiring many different steps, ingredients, and notation. The authors of the argument, having been intimately involved in all aspects of its construction, often do not realise just how complicated such an argument appears to a reader who is encountering it for the first time (I myself have been guilty of this oversight).
Part of the reason for this is that there is plenty of implicit structure in a paper which is crucial to understanding it properly, and which is known to the authors, but is not readily apparent to the readers. Suppose for instance that part of a paper goes like this:
- …
- In Section 2, facts A, B, and C are derived, and then used to deduce D.
- In Section 3, D is used to derive E.
- In Section 4, D, and another fact F, are used to derive G.
- In Section 5, E, G, and another fact H, are used to derive I.
- …
A reader who is going through this paper one section at a time will try to keep A, B, C, and D all in mind after finishing Section 2, and moving on to later sections. However, facts A, B, and C are never used again; they were instrumental to the argument because they allowed one to establish D, but once D is established, A, B, and C can be safely forgotten. Note, though that the reader does not know this. As a consequence, while reading Sections 3, 4, and 5, the reader has to set aside some of his or her mental resources to retain some facts which are of no further use, thus obscuring the structure of the argument and making it more difficult to follow.
Now one could address this by placing some remarks at the end of Section 2 along the lines of “Facts A, B, and C will not be used again in the rest of the paper”, or by devoting more thought to organising and motivating the paper. These are all worthwhile things to do, but a much more elegant solution is simply to encapsulate D as a lemma, and to place A, B, C inside the proof of that lemma. This conveys several useful structural cues to the reader: firstly, that D is likely to be an important fact to use in later parts of the argument, and secondly, that A, B, and C are not needed elsewhere in the paper and can be safely forgotten. This additional structure will be useful to all readers, but will be especially appreciated by those readers who are already expert in how to prove facts such as D, since they can then glance at the statement of the lemma, readily convince themselves that the lemma is plausible (possibly by using other tools than A, B, and C), and then quickly move on to the next part of the argument. One can also add some remarks “for the experts” after the proof of a lemma, discussing possible alternate proofs, refinements, special cases, connections to other lemmas in the literature, etc.
We have seen how “folding” the argument into lemmas can reduce the complexity of that argument by “localising” certain facts in the argument. The same method can also be used to localise notation, e.g. some special-purpose notation used to prove D but is not used elsewhere in the paper. The design philosophy here is similar to that of information hiding in software engineering. (Other relevant software engineering philosophies for mathematical writing include structured programming and modularity.)
Lemmas also provide a good opportunity to explicitly “recap” all the running hypotheses, assumptions, and notational conventions that have already been introduced in the argument, which can be invaluable to a reader who has misunderstood or forgotten about part of this implied context. (Sometimes, such a recap would be tediously long, in which case a sentence such as “Let the notation and assumptions be as above” may suffice. When these situations occur, one might wish to formalise all the running assumptions by judiciously introducing some good notation.)
In some cases, the conclusion of the lemma may only need a portion of these hypotheses; this might be worth stating explicitly within the statement of the lemma, as it can clarify the nature of that lemma, and may also make it more useful for future applications.
One should write the statement of a lemma in a way that makes it easy to use, rather than easy to prove. Thus, one should try to make the hypotheses of the lemma natural and easy to verify, and the conclusions of the lemma manifestly useful. Basically, the idea is to push as much of the details of the argument into the lemma as one can, to make the rest of the argument as simple as possible. Also, it may end up that you (or someone else) will eventually find a simpler proof of that lemma, thus reducing the net complexity of the paper markedly (cf. the object-oriented approach to software engineering).
Folding the argument into lemmas also makes it easier to write a rapid prototype, as once one finalises the statement of the lemma, one can defer the proof of the lemma until later.
In summary, it’s almost always a good idea to have plenty of lemmas (and propositions and corollaries too, of course) in an argument; it makes the overall structure of the argument more apparent, it makes the argument easier to follow, and can also provide some useful tools for future work in the area.
There is however one exception to the above rule. If you have a pair of technical lemmas, neither of which is of much interest in its own right, but only become useful when used together, then they should be unified into single lemma which will be easier for the reader to keep in mind. For instance, suppose your paper contains the following two lemmas:
Lemma 15. If Natural Hypothesis A is true, then Technical Statement B is true.
Lemma 16. If Technical Statement B is true, then Natural Conclusion C is true.
If Lemma 16 is the only place where B is used, and Lemma 15 is the only place where B is verified, then it can be better to unify these two lemmas as
Lemma 15′. If Natural Hypothesis A is true, then Natural Conclusion C is true.
Now, the technical statement B has been encapsulated into the proof of Lemma 15′ (which, of course, will be a concatenation of the proofs of Lemma 15 and Lemma 16), and no longer needs to make an appearance outside of this lemma. In some cases it may also be appropriate to insert something like
Sublemma 16′. Under the hypotheses of Lemma 15′, technical condition B is true.
within the proof of Lemma 15′.
This improves the paper for a number of related reasons:
- The reader can safely forget about B after leaving the proof of Lemma 15′, thus freeing up mental space.
- An expert reader might be able to find an alternate proof of Lemma 15′ that avoids B, or at least a good heuristic argument that convinces that reader that Lemma 15′ is true. Then that reader can skip the proof of Lemma 15′ completely, and not have to deal with B at all.
- It may be that a followup paper may find a simpler proof of Lemma 15′ that avoids B, in which case that paper can simplify the original paper without having to make any modifications other than the reproof of Lemma 15′. (In some cases, the very act of reorganising the paper may prompt the reorganiser to find such a simplification by herself or himself.)
15 comments
Comments feed for this article
29 August, 2007 at 10:53 pm
Jasper
“In those days men were real men, women were real women, small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri. And all dared to brave unknown terrors, to do mighty deeds, to boldly split infinitives that no man had split before – and thus was the Empire forged.” – D. Adams
4 October, 2009 at 2:56 pm
AMS Graduate Student Blog » Blog Archive » An Interview with Terence Tao
[…] my pages on writing papers, for instance, I discuss how computer programming philosophies such as encapsulation and information hiding can help one structure a paper in a more reader-friendly […]
25 October, 2009 at 1:11 pm
Chris
Thanks for all the great advice. I have a clarifying question.
If you have a bunch of auxiliary lemmas needed only for the proof of a main lemma, is it better to state and prove the auxiliary lemmas within the proof of the main lemma, or is it better to state and prove them outside the lemma — perhaps in a separate section?
Your advice above would seem to say that within the proof of the main lemma is better. But would you still do this even if including the proofs of all the auxiliary lemmas would stretch the proof of the main lemma to several pages? It’s also important to keep the proof of each lemma short, too, right (so the reader can keep it in his/her mind)? Thanks.
25 October, 2009 at 6:29 pm
Terence Tao
Well, if the proof of the main lemma is that lengthy, it is perhaps better to promote it to a proposition, and devote an entire section to its proof, e.g. “The purpose of this section is to prove the following key proposition, which will be needed in the proof of Theorem 1.1…”. Then one can create as many lemmas as needed within that section to prove the proposition, which can be safely forgotten once the reader is convinced that the proposition is correct. (If, in addition, the proposition is rather standard and well known to the experts, one can consider moving this entire section to an appendix.)
If the techniques used to prove the proposition are orthogonal to the techniques used in the rest of the paper, it may disrupt the flow of the paper to prove the proposition too early; one may then just state the proposition when it is first appropriate to do so, say something like “We will prove this proposition in Section 17, using techniques from theory X. Let us assume this proposition for now and continue the proof of Theorem 1.1…”, and then return to the proof later (in Section 17, in this case).
27 October, 2009 at 2:44 am
Anonymous
That was very helpful advice, thank you.
minor typo: “notational conventions that have been already been introduced in the argument” (“been” is repeated.)
Another issue:
“Also, it may end up that you (or someone else) will eventually find a simpler proof of that lemma, thus reducing the net simplicity of the paper markedly” (Is the simplicity reduced or increased?)
[Corrected, thanks – T.]
24 February, 2011 at 6:02 pm
Advice on writing paper « Success doesn't come overnight
[…] It is also assists readability if you factor the paper into smaller pieces, for instance by making plenty of lemmas. […]
19 May, 2013 at 9:07 am
Think ahead | Planting an Oak
[…] instance, if one is trying to prove a lemma for one reason or another, take a few moments to ask yourself questions such […]
18 August, 2014 at 6:24 am
Jorge
Terence, what is your styling suggestion if all proofs of theorems are in the appendix and in one particular proof you need to state and prove a lemma? Would you state and prove the lemma before the theorem proof (in the appendix)? I tried to state and prove the lemma “inside” the proof of the theorem but I didn’t look good.
12 September, 2017 at 9:43 am
Szemeredi’s proof of Szemeredi’s theorem | What's new
[…] regularity lemma of Frieze and Kannan is required. Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving […]
1 November, 2019 at 4:41 am
kirk
Hi Prof. Tao, I am really enjoying reading your blog. It is an invaluable resource of knowledge and information. I would like to kindly ask you if possible to suggest us some books on proofs. I’ve read both how to prove it by Velleman and book of proof by Hammack. A common theme that I found among the two is that they defer the explanations of propositions, lemmas, corollaries until later which in my case made my understanding more complicated than it should have been. Instead, after reading your post it made things more clear especially with the connections to CS (maybe due to my CS background). Can you suggest us some books on proofs which makes explicit and clear the definitions of lemmas, propositions and corollaries as well as the ambiguities that might arise in different situations from their definitions (or maybe connections with other fields, e.g. CS)? Thank you in advance!
24 April, 2020 at 11:28 am
Kosay
Dear Prof. Tao
I am sorry if my question is stupid. I wondered if it is allowed to start a new definition for operation or a generalization for old mathematics case then create a new lemma and theory. Suppose a researcher X has a new kind of matrix. How can he start? Is it allowed to use the new definition in research then he proofs the results of his “new definition”? How Cauchy created his first matrix (Cauchy’ matrix), and how they did discuss it at that time? May someone says Cauchy is a great mathematician, and the mathematics society will accept Cauchy’s definition, also the situation has changed, but nowadays, is it allow to do something like that?
24 April, 2020 at 3:23 pm
Terence Tao
Technically, one is “allowed” to write whatever one wishes. But if one is seeking other mathematicians to actually be interested in one’s work, and for mathematical journals to consider one’s articles for publication, there should be some connection to existing mathematics of interest, either through a direct or indirect application (or potential application) to existing mathematics, or by creating a new model for some existing interesting phenomenon that can be used to shed new insight on that phenomenon.
The particular history of the Cauchy matrix is discussed at https://math.stackexchange.com/questions/3383014/history-of-the-cauchy-matrix ; Cauchy did not decide to study this matrix ab nihilo, but arrived at it in the course of trying to compute the anti-symmetric component of a certain rational function of several variables.
24 April, 2020 at 5:20 pm
Anonymous
In some sense, lemmas may be interpreted as “building blocks” of theorems (as letters are “building blocks” of words) to reduce the structural total complexity.
1 August, 2020 at 9:59 am
Gabriel Silva
Thank you Prof. Tao for the nice post!
Advice about how to best write a mathematical proof is interesting and important. Regarding this subject, in case anyone is interested, I also found Leslie Lamport’s article:
Click to access proof.pdf
to be worth reading :)
24 March, 2022 at 4:29 am
Lemmas
Thank you for the nice post!