As the previous discussion on displaying mathematics on the web has become quite lengthy, I am opening a fresh post to continue the topic. I’m leaving the previous thread open for those who wish to respond directly to some specific comments in that thread, but otherwise it would be preferable to start afresh on this thread to make it easier to follow the discussion.
It’s not easy to summarise the discussion so far, but the comments have identified several existing formats for displaying (and marking up) mathematics on the web (mathML, jsMath, MathJax, OpenMath), as well as a surprisingly large number of tools for converting mathematics into web friendly formats (e.g. LaTeX2HTML, LaTeXMathML, LaTeX2WP, Windows 7 Math Input, itex2MML, Ritex, Gellmu, mathTeX, WP-LaTeX, TeX4ht, blahtex, plastex, TtH, WebEQ, techexplorer, etc.). Some of the formats are not widely supported by current software, and by current browsers in particular, but it seems that the situation will improve with the next generation of these browsers.
It seems that the tools that already exist are enough to improvise a passable way of displaying mathematics in various formats online, though there are still significant issues with accessibility, browser support, and ease of use. Even if all these issues are resolved, though, I still feel that something is still missing. Currently, if I want to transfer some mathematical content from one location to another (e.g. from a LaTeX file to a blog, or from a wiki to a PDF, or from email to an online document, or whatever), or to input some new piece of mathematics, I have to think about exactly what format I need for the task at hand, and what conversion tool may be needed. In contrast, if one looks at non-mathematical content such as text, links, fonts, non-Latin alphabets, colours, tables, images, or even video, the formats here have been standardised, and one can manipulate this type of content in both online and offline formats more or less seamlessly (in principle, at least – there is still room for improvement), without the need for any particularly advanced technical expertise. It doesn’t look like we’re anywhere near that level currently with regards to mathematical content, though presumably things will improve when a single mathematics presentation standard, such as mathML, becomes universally adopted and supported in browsers, in operating systems, and in other various pieces of auxiliary software.
Anyway, it has been a very interesting and educational discussion for me, and hopefully for others also; I look forward to any further thoughts that readers have on these topics. (Also, feel free to recapitulate some of the points from the previous thread; the discussion has been far too multifaceted for me to attempt a coherent summary by myself.)
25 comments
Comments feed for this article
4 November, 2009 at 3:41 pm
ARS
I had not heard about MathJax, a not yet ready successor to the excellent Mathjs.
On the Mathjax site it is announced that MathScinet will adopt MathJax, which makes it a serious contender. Also linked on tha site was the following article, which is about Math on the web:
http://www.americanscientist.org/issues/pub/2009/3/writing-math-on-the-web/1
4 November, 2009 at 3:50 pm
Kareem Carr
Much of mathematics is offline. While scans of older papers can be put online, the scans have many of the deficiencies that people have pointed out concerning using images to represent equations.
For that reason, equation OCR seems to me like it should be part of the puzzle. I recently found a program called INFTYreader which seems to convert scans to LaTeX or MathML. I would be interested to know if there are others.
4 November, 2009 at 11:24 pm
Andrew Stacey
One solution to this issue would be to persuade the reCaptcha people to provide a mathematical variant. For those who don’t know, reCaptcha is one of those ways of testing whether someone filling in a form is a human or robot. It asks the user to type in two words that it displays on the screen. One of these words is already known to the reCaptcha engine and that is used to test the humanity of the user. The other word is unknown and the user’s answer goes towards converting scanned documents.
At the moment, they’re doing the New York Times. It would be great if we could persuade them to do old mathematical papers.
5 November, 2009 at 8:46 am
Alex Nikolov
this will be more complicated. do u suggest adding additional interface to make the entering math content easier (can’t expect users to latex it in i guess)? then both words would have to be a formula, otherwise ppl might just not bother and put in jibberish for the math content, knowing the ‘real’ captcha is other word
5 November, 2009 at 9:58 am
Andrew Stacey
Even if it just did the words then it would be a step in the right direction. I agree that the mathematics might be a little trickier. Again, a first step would be to simply identify the characters, thus a “mathematical word” would be simply a letter. There are enough characters in use in mathematics that this would still be sufficiently tricky for a computer to do!
But even if the words are all that’s feasible, once the words are right it wouldn’t be too much work for a graduate student to fill in the rest.
Ideally, if we could get rid of the pesky copyright issues, when reading an old paper and taking notes on it, one should type bits of it in to some central site. But that’s another idea for another day.
5 November, 2009 at 8:27 am
jonathanfine
One of the problems is that it’s difficult to keep track of what’s going on in mathematical content. To help here, a couple of months ago I started bookmarking URLs related to technical aspects of mathematical content.
You can see what I’ve got at
http://delicious.com/mathcontent/
And if you go to
http://delicious.com/mathcontent/survey
you’ll see the American Scientist article that has been mentioned a couple of times in comments to the previous post. And you’ll also find a blog post by Jon Udell on WordPress and polymath. (This is a teaser to encourage you to visit the URL.)
I welcome all offers of help to keep the list of bookmarks up-to-date.
5 November, 2009 at 10:08 am
Andrew Stacey
I think that Michael’s comments in the other thread are the real “killer app” for MathML. To display mathematics on the web in a way that is accessible for all, means MathML. Anything else is merely a patch while we wait for others to catch up. If all the maths blog converted to MathML then I’d bet that we’d see things move a lot more quickly in the remaining implementation issues.
Then there’s the issue of generating the MathML, since who wants to type it directly? Here, the more tools the better! After all, it doesn’t matter what the source is, so long as the output is right. So someone who wants to do dynamic content will use something like itex2MML, which is fast but only a subset of LaTeX (well, not quite a subset), whereas someone who is producing mainly static content and who has one eye on the possibility of publishing that content as a book, will use genuine LaTeX as their source and something like TeX4ht as their converter.
I’m not sure that the non-mathematical picture is as rosy as you paint, but we’re more familiar with it and step around the potholes without noticing that they are there. Plus there are more people in the text, images, video business than in the mathematics business so they have an unfair advantage. But when I think of the hassles I have to go through when someone emails me a word document, or the irritation at having to download yet another codec to play a DVD, or wonder if it’s legal to view images that are GIFs, then I’m not so sure that everything is “sorted out”.
And I’m not sure that I want it to be. I want to be able to use Emacs for writing all my documents. I want to be able to use LaTeX to do even invitations to birthday parties (thanks to the magic of tikz). I don’t expect everyone to do the same, but so long as the final format is something that I can read, I don’t really care how it was produced. But the important thing is knowing what formats can be widely read.
5 November, 2009 at 11:20 am
jonathanfine
Implicit in latex2wp is a standard for TeX-notation mathematics, and also for the LaTeX markup of the source document.
A standard for TeX-notation mathematics together with a high-quality translator of standard TeX-notation mathematics to MathML would be a great help. Only by having standards for input can we be sure that we will get decent output from the translator (Garbage In, Garbage Out).
For me, most or all of the benefits of MathML follow from being a specified language. On the basis of the specification software can be built. Because MathML is specified, MathML to voice software can be developed. The same is not, at present, true of TeX notation mathematics.
In the long term I’d like to see TeX-notation mathematics and MathML to be something like JSON and XML. Two different ways of representing essentially the same underlying data.
5 November, 2009 at 1:15 pm
Andrew Stacey
I wouldn’t. I like the flexibility of TeX and I wouldn’t give that up for compatibility with MathML at all. I don’t want to change LaTeX one single bit to get it to work with the web, at least for static stuff. I’m prepared to make sacrifices for dynamic content, but then I want to work with a subset of LaTeX, or as close as possible.
But LaTeX is a programming language and that’s what makes it bearable. When writing a recent paper, my coauthor and I had absolutely no idea what would make good notation, so we built in a complicated system of macros so that not only could a few commands be changed by a single modification, but that would propagate on down into levels beyond what mere substitution macros can cope with. We basically had Object Oriented LaTeX. At one point, we even had it so that composition and evaluation of functions could be switched from left to right (and everything else would switch as well).
I think that’s something that hasn’t been remarked on yet. It’s implicit in the dynamic vs static dichotomy. Papers tend to be written over a long scale of time (relatively speaking). Blog posts, even wiki pages, tend to be written over a much shorter scale of time. I put much more time and effort into a paper than into a webpage. I’m not completely sure I could explain why; partly, papers are more “mine” than any webpage (even my own) feel like, papers are, at the end of the day, what will last of my work (even though, I concede, they may be published online and in (X?HT|Math)ML format). So I craft them more carefully, and so what the full power of LaTeX to do so.
So don’t you dare downgrade LaTeX just to fit in with the rather limited spec of MathML!
6 November, 2009 at 12:43 am
jonathanfine
Andrew: You wrote ”When writing a recent paper, my coauthor and I had absolutely no idea what would make good notation, so we built in a complicated system of macros so that not only could a few commands be changed by a single modification, but that would propagate on down into levels beyond what mere substitution macros can cope with.”
Would that be your paper “Hunting the Hopf ring” (whose source is available on the arXiv)? If so, I’ll take a look at your macros and text, and will then understand better your point.
6 November, 2009 at 1:33 am
Andrew Stacey
It would.
One aspect of the “Object Oriented” approach that has just occurred to me is that it does make it easier to add low-level semantics to the document. The fact that, in that paper, we write delt for an element of an object of a (concrete!) category D and dmor for a morphism in the category D even though these resolve to the same symbol means that from the source code, it is always clear when d means an element and when d means a morphism.
I’m partially doing this in the lecture notes for my current course. I find it hard to parse a slide full of stark (raving) mathematics, so I can only imagine how my students find it. So to try to help them make the connections, I’ve been adding colour: vectors are one colour, scalars another, matrices a third, and so forth (I won’t claim that I’ve been consistent: what colour should the entries of a matrix be?). I do this by using the code \type{vector}{x} and so forth. Then I can modify how all the vectors are displayed in a single command.
Someone made the point about semantics over in the other thread, and I pooh-poohed it a little. The point is that telling people to add semantics to make other people’s lives easier is never going to work: even if people agree that it’s a good idea, it just takes too much time. However, saying that adding “type” macros to your document makes it easier to write papers, and here’s an easy style file that makes it so very easy to do, oh and incidentally you’ll be helping a lot of other people. Well, then people might start doing it.
6 November, 2009 at 12:54 am
Peter Krautzberger
Since it’s not mentioned yet, I wanted to point out the discussion from the first thread regarding PDF/A-1a http://www.pdfa.org/doku.php?id=artikel:en:improved_pdfa-1b
Since displaying mathematics on the web often involves PDFs (directly or indirectly), wouldn’t it be good to be able to display mathematics using PDF/A-1a?
Would it be useful to have such standards spill over to the web standards? I’m guessing it would make mathematics both more accessible and searchable. Is that realistic?
12 November, 2009 at 5:34 pm
Pacha Nambi
I would like to point out chemists and others who also write papers with equations/math symbols in them do not use LaTex. What is needed is a universal language that all scientists who write articles with math equations in them can use – and such a language should be flexible enough to accommodate ever growing need for interconversion between different flavors of these Math languages. It is hard enough time wise to do research what is not needed is Baskin Robbin flavors of mark up languages. Why not math community embrace articles published in PDF format more? I know of some math journals that will not accept papers – with mathematical equations – submitted in PDF format only. Many word processing programs now permit creating documents with math equations and most of them can produce decent PDF files but many math journals will not accept papers submitted in PDF format only. Why?
13 November, 2009 at 12:40 am
Andrew Stacey
In addition, the PDF produced may not be fully accessible. In particular, the PDF produced by pdftex is not.
12 November, 2009 at 7:15 pm
John Armstrong
Because the journals are going to replace your style files with their own style file before rerendering the paper. If you send them your own PDF then they’re stuck with using your style, which may not be theirs (including headers, footers, pagination, etc.).
12 November, 2009 at 10:21 pm
Pacha Nambi
I see your point and appreciate your comments, I checked to see what ACS does with regard to styles. It turns out each journal published by ACS has its own style – especially with respect to literature citations – even though there is some commonality among them. So, it appears we are left with using a number of interconversion tools for format exchange. What a waste of time!.
13 November, 2009 at 9:45 am
Jacques Distler
“So, it appears we are left with using a number of interconversion tools for format exchange. What a waste of time!.”
You have drawn precisely the wrong conclusion from the available data.
Because each journal has its own style and citation format, what you want is a submission format which separates style from content. PDF bakes stylistic decisions (right down to the fonts used, and the precise placement of glyphs on the page) into the file, which makes it utterly unsuitable as a submission format.
A LaTeX source file, with citations in Bibtex format, presents the content, but leaves the stylistic decisions to the Journal’s choice of LaTeX document class and Bibtex style.
And, indeed, the ACS does have its own LaTeX document class (and Bibtex style) for preparing manuscripts. It’s available from the their website, and is included in most contemporary TeX distributions (e.g. texlive2009).
13 November, 2009 at 9:58 am
Jacques Distler
Thirty-seve different journals are supported with the same document class:
\documentclass[journal=ancham,manuscript=review]{achemso}
23 November, 2009 at 6:52 am
Peter Krautzberger
I know it is a bit late in the thread, but does anybody know what http://www.mathjax.com/ is about? It sounds very promising, but also like vaporware.
15 December, 2009 at 2:48 pm
Paul Topping
We are gradually getting the MathJax site and s/w up and running. There is now a sample page up. There is a lot of fine-tuning and debugging left to do so try not to be too hard on it yet. We are also putting the final touches on the license, which is a process of trying to please multiple parties. Formal announcements will be out soon.
Paul Topping
Design Science, Inc.
one of the founding MathJax parties
6 January, 2010 at 3:19 pm
ateixeira
http://watchmath.com/vlog/?cat=72
http://www.watchmath.blogspot.com/
Seems to be a very good tool for right now. It allows a lot more from LateX than wordpress.com and when together with Luca Trevisan’s script (with a few changes so that it can work in blogspot) seems to be a real terrific tool.
Other than that I reall can’t say much about this.
18 August, 2010 at 4:18 am
Michael Nielsen
Just wanted to point out that the latest build of webkit now incorporates MathML:
http://webkit.org/blog/1366/announcing%E2%80%A6mathml/
This matters because webkit is behind Safari, Chrome, the Android and Blackberry browsers, and many others. So in the future it looks very likely that those browsers will support MathML.
18 August, 2010 at 4:22 am
Michael Nielsen
I forgot to mention: Firefox also supports MathML natively, leaving IE as the only major browser which does not.
18 August, 2010 at 7:14 am
Jacques Distler
This is, indeed, big news.
Another important one is Firefox’s support for MathML-in-HTML5.
But it does support MathML through a (very nice) plugin.
And that plugin provides accessibility features to the visually impaired (reading equations aloud, or converting them to Braille).
25 June, 2013 at 9:40 am
mathtuition88
Reblogged this on Singapore Maths Tuition.