I have recently finished a draft version of my blog book “Poincaré’s legacies: pages from year two of a mathematical blog“, which covers all the mathematical posts from my blog in 2008, excluding those posts which primarily originated from other authors or speakers.

The draft is much longer – 694 pages – than the analogous draft from 2007 (which was 374 pages using the same style files). This is largely because of the two series of course lecture notes which dominate the book (and inspired its title), namely on ergodic theory and on the Poincaré conjecture. I am talking with the AMS staff about the possibility of splitting the book into two volumes, one focusing on ergodic theory, number theory, and combinatorics, and the other focusing on geometry, topology, and PDE (though there will certainly be miscellaneous sections that will basically be divided arbitrarily amongst the two volumes).

The draft probably also needs an index, which I will attend to at some point before publication.

As in the previous book, those comments and corrections from readers which were of a substantive and mathematical nature have been acknowledged in the text. In many cases, I was only able to refer to commenters by their internet handles; please email me if you wish to be attributed differently (or not to be attributed at all).

Any other suggestions, corrections, etc. are, of course welcome.

I learned some technical tricks for HTML to LaTeX conversion which made the process significantly faster than last year’s, although still rather tedious and time consuming; I thought I might share them below as they may be of use to anyone else contemplating a similar conversion.

Last year, I converted each post to LaTeX separately, which resulted in a lot of redundant search-and-replaces. This time around, I first dumped all the posts (in HTML format) into a single massive text file (1.7 MB!). Then I moved linearly through the file, and every time I saw a conversion needed I executed a global search and replace (if the conversion could be automated), or at least a global search (if it required manual editing). (In the most obvious such searches, e.g. changing , or <li> to \item, there were literally thousands of such replacements; each one thus saved me many tedious minutes of work).

In some cases I had to do several passes to get the conversion done correctly. For instance, in the HTML many short mathematical expressions (e.g. a single symbol such as , , etc.) were often not marked up, so I would do a global search and replace to change, e.g. ” f ” to ” $f$ “. But this would occasionally mess up existing LaTeX expressions, such as $B( f , g )$, and I would then have to do a second pass, e.g. converting ” $f$ ,” back to ” f ,”. My rule of thumb was that such “false positives” were acceptable as long as their rate of occurrence was significantly less than 50%, so that the end result of the search and replace was closer to my goal than the start.

To mitigate the false positive problem, I went through the text linearly, and each section that was cleaned up and fully converted was moved into a separate text file, so that further global search-and-replaces did not mess up the portions of the text that had already been dealt with.

Perhaps the two most difficult conversion tasks, that could not be automated much at all, was the conversion of hypertext links to more traditional bibliographical citations, though I found that if a paper was cited multiple times, then a search-and-replace did cut down on the work required. Wikipedia links had to be largely abandoned, unfortunately, although I did adopt a convention that any technical term introduced with only a cursory definition would be italicised to indicate that more information about that term could be obtained from an external source, such as Wikipedia.

## 9 comments

Comments feed for this article

29 January, 2009 at 2:16 pm

AnonymousSurely the whole process could be automated.

You might consider using the LaTeXMathML script (http://math.etsu.edu/LaTeXMathML/) on your blog. Then you could write your posts in pure latex with no conversion required.

Another option would be to write your posts in LaTeX and then use tex4ht (http://www.cse.ohio-state.edu/~gurari/TeX4ht/) to convert them to html before posting.

29 January, 2009 at 5:06 pm

bengreenTerry,

Will this exhibit blowup in finite time? Having just taken delivery of the 2007 volume I can confirm that the initial data is fairly smooth, if that helps,

Ben

29 January, 2009 at 7:37 pm

Rod CarvalhoProf. Tao,

You might be interested in trying HTML-to-TeX converters. After a couple of minutes of googling, I came across a few, like this one, or this other one. I haven’t used any of them yet, so I can’t comment on their merits / flaws. In any case, they should make the conversion process much more efficient! :-)

29 January, 2009 at 7:46 pm

RichardThis looks and reads much better than I would expect from a first draft! There’s material in there that I’m interested in, and I’ll be first in line to buy this book.

I wonder what triggered the unusual automatically generated “possibly related posts” above. I hope that this isn’t an example of state of the art computer intelligence.

30 January, 2009 at 1:45 am

HeinrichFormat conversions are much less tedious with support from automatic tools.

There is a small program called pandoc that is able to convert (subsets) of HTML to LaTeX and vice-versa.

30 January, 2009 at 8:35 am

TomThe new book looks very nice indeed!

As for the conversion issues, I think that to be as close as possible to Terry’s situation the easiest way might be for Terry to write his posts in LaTeX first (duly marking each $f$ required) and then convert that to a text format suitable to be directly cut and pasted in the wordpress window for new posts (i.e. a conversion adding the extra \latex bit where needed, dealing with URLs, etc.).

This conversion could be made by an easy-to-use custom perl script that would only require the filenames as input. I’ll try to see if I can cook one up (and will post a link to it after careful debugging).

30 January, 2009 at 9:56 am

Terence TaoThanks for the suggestions! Actually, it turns out that Luca Trevisan has already done something like this for his own LaTeX-intensive posts, and has kindly shared his code for me to “beta test”, which I should probably do next week. (The next one or two posts have already been largely written already, though, and won’t be in this new format.)

5 February, 2009 at 1:16 am

Nielsen’s posts tagged Science2.0 | The Daily Clique[...] Poincaré’s legacies: pages from year two of a mathematical blog from Terry Tao’s blog: Tao announces a blog book based on all his mathematical posts from the past year. [...]

1 September, 2009 at 9:09 am

» Science 2.0: What Every Scientist Needs to Know About How the Web is Changing the Way They Work Information/Science[...] Nielsen opened his talk with a discussion of blogging in the sciences. In particular, he described the blog of Terrence Tao, a mathematician and Fields medalist. What makes Tao’s blog special is that it is a place for very high-level thinking and discussion. Tao writes blog posts outlining a mathematical problem he’s working on, along with his ideas for how they might be solved, or introducing a new way to think about them. The comments section is full of other mathmaticians offering advice, rebuttal, criticism, and discussion. It has, in essence, become a forum for mathematical thought that cannot be replicated in traditional journal-style publishing. The output of Tao’s blog is professional enough that it has been formally published in two volumes: “Structure and Randomness: pages from year one of a mathematical blog” and “Poincaré’s legacies: pages from year two of a mathematical blog“. [...]