We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald Knuth, “Literate programming”, paraphrasing Tony Hoare)

After all my other advice on how to write papers, I should add one counterbalancing note: there is a danger in being too perfectionist, and in trying to make every part of a paper as “optimal” as possible. After all the “easy” improvements have been made to a paper, one encounters a law of diminishing returns, in which any further improvements either require large amounts of time and effort, or else require some tradeoffs in other qualities of the paper.

For instance, suppose one has a serviceable lemma that suffices for the task of proving the main theorems of the paper at hand. One can then try to “optimise” this lemma by making the hypotheses weaker and the conclusion stronger, but this can come at the cost of lengthening the proof of the lemma, and obscuring exactly how the lemma fits in with the rest of the paper. In the reverse direction, one could also “optimise” the same lemma by replacing it with a weaker (but easier to prove) statement which still barely suffices to prove the main theorem, but is now unsuitable for use in any later application. Thus one encounters a tradeoff when one tries to improve the lemma in one direction or another. (In this case, one resolution to this tradeoff is to have one formulation of the lemma stated and proved, and then add a remark about the other formulation, i.e. state the strong version and remark that we only use a special case, or state the weak version and remark that stronger versions are possible.)

Carefully optimising results and notations in the hope that this will help future researchers in the field is a little risky; later authors may introduce new insights or new tools which render these painstakingly optimised results obsolete. The only time when this is really profitable is when you already know of a subsequent paper (perhaps a sequel to the one you are already writing) which will indeed rely heavily on these results and notations, or when the current paper is clearly going to be the definitive paper in the subject for a long while.

If you haven’t already written a rapid prototype for your paper, then optimising a lemma may in fact be a complete waste of time, because you may find later on in the writing process that the lemma will need to be modified anyway to deal with an unforeseen glitch in the original argument, or to improve the overall organisation of the paper.

I have sometimes seen authors try to optimise the length of the paper at the expense of all other attributes, in the mistaken belief that brevity is equivalent to simplicity. While it can be that shorter papers are simpler than longer ones, this is generally only true if the shortness of the paper was achieved naturally rather than artificially. If brevity was attained by removing all examples, remarks, whitespace, motivation, and discussion, or by striking out “redundant” English phrases and relying purely on mathematical abbreviations (e.g. $\forall$ instead of “For all”, etc.) and various ungrammatical contractions, then this is generally a poor tradeoff; somewhat ironically, a paper which has been overcompressed may be viewed by readers as being more difficult to read than a longer, gentler, and more leisurely treatment of the same material. (See also “Give appropriate amounts of detail.”)

On the other hand, optimising the readability of the paper is always a good thing (except when it is at the expense of rigour or accuracy), and the effort put into doing so is appreciated by readers.