As you may already know, Danica McKellar, the actress and UCLA mathematics alumnus, has recently launched her book “Math Doesn’t Suck“, which is aimed at pre-teenage girls and is a friendly introduction to middle-school mathematics, such as the arithmetic of fractions. The book has received quite a bit of publicity, most of it rather favourable, and is selling quite well; at one point, it even made the Amazon top 20 bestseller list, which is a remarkable achievement for a mathematics book. (The current Amazon rank can be viewed in the product details of the Amazon page for this book.)

I’m very happy that the book is successful for a number of reasons. Firstly, I got to know Danica for a few months (she took my Introduction to Topology class way back in 1997, and in fact was the second-best student there; the class web page has long since disappeared, but you can at least see the midterm and final), and it is always very heartening to see a former student put her or his mathematical knowledge to good use :-) . Secondly, Danica is a wonderful role model and it seems that this book will encourage many school-age kids to give maths a chance. But the final reason is that the book is, in fact, rather good; the mathematical content is organised in a logical manner (for instance, it begins with prime factorisation, then covers least common multiples, then addition of fractions), well motivated, and interleaved with some entertaining, insightful, and slightly goofy digressions, anecdotes, and analogies. (To give one example: to motivate why dividing 6 by 1/2 should yield 12, she first discussed why 6 divided by 2 should give 3, by telling a story about having to serve lattes to a whole bunch of actors, where each actor demands two lattes each, but one could only carry the weight of six lattes at a time, so that only 6/2=3 actors could be served in one go; she then asked what would happen instead of each actor only wanted half a latte instead of two. Danica also gives a very clear explanation of the concept of a variable (such as x), by using the familiar concept of a nickname given to someone with a complicated real name as an analogy.)

While I am not exactly in the target audience for this book, I can relate to its pedagogical approach. When I was a kid myself, one of my favourite maths books was a very obscure (and now completely out of print) book called “Creating Calculus“, which introduced the basics of single-variable calculus via concocting a number of slightly silly and rather contrived stories which always involved one or more ants. For instance, to illustrate the concept of a derivative, in one of these stories one of the ants kept walking up a mathematician’s shin while he was relaxing against a tree, but started slipping down at a point where the slope of the shin reached a certain threshold; this got the mathematician interested enough to compute that slope from first principles. The humour in the book was rather corny, involving for instance some truly awful puns, but it was perfect for me when I was 11: it inspired me to play with calculus, which is an important step towards improving one’s understanding of the subject beyond a superficial level. (Two other books in a similarly playful spirit, yet still full of genuine scientific substance, are “Darwin for beginners” and “Mr. Tompkins in paperback“, both of which I also enjoyed very much as a kid. They are of course no substitute for a serious textbook on these subjects, but they complement such treatments excellently.)

Anyway, Danica’s book has already been reviewed in several places, and there’s not much more I can add to what has been said elsewhere. I thought however that I could talk about another of Danica’s contributions to mathematics, namely her paper “Percolation and Gibbs states multiplicity for ferromagnetic Ashkin-Teller models on {\Bbb Z}^2” (PDF available here), joint with Brandy Winn and my colleague Lincoln Chayes. (Brandy, incidentally, was the only student in my topology class who did better than Danica; she has recently obtained a PhD in mathematics from U. Chicago, with a thesis in PDE.) This paper is noted from time to time in the above-mentioned publicity, and its main result is sometimes referred to there as the “Chayes-McKellar-Winn theorem”, but as far as I know, no serious effort has been made to explain exactly what this theorem is, or the wider context the result is placed in :-) . So I’ll give it a shot; this allows me an opportunity to talk about some beautiful topics in mathematical physics, namely statistical mechanics, spontaneous magnetisation, and percolation.

[Update, Aug 23: I added a non-technical “executive summary” of what the Chayes-McKellar-Winn theorem is at the very end of this post.]

— Statistical mechanics —

To begin the story, I would like to quickly review the theory of statistical mechanics. This is the theory which bridges the gap between the microscopic (particle physics) description of many-particle systems, and the macroscopic (thermodynamic) description, giving a semi-rigorous explanation of the empirical laws of the latter in terms of the fundamental laws of the former.

Statistical mechanics is a remarkably general theory for describing many-particle systems – for instance it treats classical and quantum systems in almost exactly the same way! But to simplify things I will just discuss a toy model of the microscopic dynamics of a many-particle system S – namely a finite Markov chain model. In this model, time is discrete, though the interval between discrete times should be thought of as extremely short. The state space is also discrete; at any given time, the number of possible microstates that the system S could be in is finite (though extremely large – typically, it is exponentially large in the number N of particles). One should view the state space of S as a directed graph with many vertices but relatively low degree. After each discrete time interval, the system may move from one microstate to an adjacent one on the graph, where the transition probability from one microstate to the next is independent of time, or on the past history of the system. We make the key assumption that the counting measure on microstates is invariant, or equivalently that the sum of all the transition probabilities that lead one away from a given microstate equals the sum of all the transition probabilities that lead one towards that microstate. (In classical systems, governed by Hamiltonian mechanics, the analogue of this assumption is Liouville’s theorem; in quantum systems, governed by Schrödinger’s equation, the analogue is the unitarity of the evolution operator.) We also make the mild assumption that the transition probability across any edge is positive.

If the graph of microstates was connected (i.e. one can get from any microstate to any other by some path along the graph), then after a sufficiently long period of time, the probability distribution of the microstates will converge towards normalised counting measure, as can be seen by basic Markov chain theory. However, if the system S is isolated (i.e. not interacting with the outside world), conservation laws intervene to disconnect the graph. In particular, if each microstate x had a total energy H(x), and one had a law of conservation of energy which meant that microstates could only transition to other microstates with the same energy, then the probability distribution could be trapped on a single energy surface, defined as the collection \{ x: H(x) = E \} of all microstates of S with a fixed total energy E.

Physics has many conservation laws, of course, but to simplify things let us suppose that energy is the only conserved quantity of any significance (roughly speaking, this means that no other conservation law has a significant impact on the entropy of possible microstates). In fact, let us make the stronger assumption that the energy surface is connected; informally, this means that there are no “secret” conservation laws beyond the energy which could prevent the system evolving from one side of the energy surface to the other.

In that case, Markov chain theory lets one conclude that if the solution started out at a fixed total energy E, and the system S was isolated, then the limiting distribution of microstates would just be the uniform distribution on the energy surface \{ x: H(x) = E \}; every state on this surface is equally likely to occur at any given instant of time (this is known as the fundamental postulate of statistical mechanics, though in this simple Markov chain model we can actually derive this postulate rigorously). This distribution is known as the microcanonical ensemble of S at energy E. It is remarkable that this ensemble is largely independent of the actual values of the transition probabilities; it is only the energy E and the function H which are relevant. (This analysis is perfectly rigorous in the Markov chain model, but in more realistic models such as Hamiltonian mechanics or quantum mechanics, it is much more difficult to rigorously justify convergence to the microcanonical ensemble. The trouble is that while these models appear to have a chaotic dynamics, which should thus exhibit very pseudorandom behaviour (similar to the genuinely random behaviour of a Markov chain model), it is very difficult to demonstrate this pseudorandomness rigorously; the same difficulty, incidentally, is present in the Navier-Stokes regularity problem.)

In practice, of course, a small system S is almost never truly isolated from the outside world S’, which is a far larger system; in particular, there will be additional transitions in the combined system S \cup S', through which S can exchange energy with S’. In this case we do not expect the S-energy H(x) of the combined microstate (x,x’) to remain constant; only the global energy H(x) + H'(x’) will equal a fixed number E. However, we can still view the larger system S \cup S' as a massive isolated system, which will have some microcanonical ensemble; we can then project this ensemble onto S to obtain the canonical ensemble for that system, which describes the distribution of S when it is in thermal equilibrium with S’. (Of course, since we have not yet experienced heat death, the entire outside world is not yet in the microcanonical ensemble; but in practice, we can immerse a small system in a heat bath, such as the atmosphere, which accomplishes a similar effect.)
Now it would seem that in order to compute what this canonical ensemble is, one would have to know a lot about the external system S’, or the total energy E. Rather astonishingly, though, as long as S’ is much larger than S, and obeys some plausible physical assumptions, we can specify the canonical ensemble of S using only a single scalar parameter, the temperature T. To see this, recall in the microcanonical ensemble of S \cup S', each microstate (x,x’) with combined energy H(x)+H'(x’)=E has an equal probability of occurring at any given time. Thus, given any microstate x of S, the probability that x occurs at a given time will be proportional to the cardinality of the set \{ x': H(x') = E - H(x) \}. Now as the outside system S’ is very large, this set will be be enormous, and presumably very complicated as well; however, the key point is that it only depends on E and x through the quantity E-H(x). Indeed, we conclude that the canonical ensemble distribution of microstates at x is proportional to \Omega(E - H(x)), where \Omega(E') is the number of microstates of the outside system S’ with energy E’.

Now it seems that it is hopeless to compute \Omega(E') without knowing exactly how the system S’ works. But, in general, the number of microstates in a system tends to grow exponentially in the energy in some fairly smooth manner, thus we have \Omega(E') = \exp(F(E')) for some smooth increasing function F of E’ (although in some rare cases involving population inversion, F may be decreasing). Now, we are assuming S’ is much larger than S, so E should be very large compared with H(x). In such a regime, we expect Taylor expansion to be reasonably accurate, thus \Omega(E - H(x)) \approx \exp( F(E) - \beta H(x) ), where \beta := F'(E) is the derivative of F at E (or equivalently, the log-derivative of \Omega); note that we are assuming \beta to be positive. The quantity \exp(F(E)) doesn’t depend on x, and so we conclude that the canonical ensemble is proportional to counting measure, multiplied by the function \exp( - \beta H(x) ). Since probability distributions have total mass 1, we can in fact describe the probability P(x) of the canonical ensemble being at x exactly as

P(x) = \frac{1}{Z} e^{-\beta H(x)}

where Z is the partition function

Z := \sum_x e^{-\beta H(x)}.

The canonical ensemble is thus specified completely except for a single parameter \beta > 0, which depends on the external system S’ and on the total energy E. But if we take for granted the laws of thermodynamics (particularly the zeroth law), and compare S’ with an ideal gas, we can obtain the relationship \beta = 1/kT, where T is the temperature of S’ and k is Boltzmann’s constant. Thus the canonical ensemble of a system S is completely determined by the temperature, and on the energy functional H. The underlying transition graph and transition probabilities, while necessary to ensure that one eventually attains this ensemble, do not actually need to be known in order to compute what this ensemble is, and can now (amazingly enough) be discarded. (More generally, the microscopic laws of physics, whether they be classical or quantum, can similarly be discarded almost completely at this point in the theory of statistical mechanics; the only thing one needs those laws of physics to provide is a description of all the microstates and their energy [though in some situations one also needs to be able to compute other conserved quantities, such as particle number].)

At the temperature extreme T \to 0, the canonical ensemble becomes concentrated at the minimum possible energy E_{\hbox{min}} for the system (this fact, incidentally, inspires the numerical strategy of simulated annealing); whereas at the other temperature extreme T \to \infty, all microstates become equally likely, regardless of energy.

— Gibbs states —

One can of course continue developing the theory of statistical mechanics and relate temperature and energy to other macroscopic variables such as volume, particle number, and entropy (see for instance Schrödinger’s classic little book “Statistical thermodynamics“), but I’ll now turn to the topic of Gibbs states of infinite systems, which is one of the main concerns of the Chayes-McKellar-Winn paper.

A Gibbs state is simply a distribution of microstates of a system which is invariant under the dynamics of that system, physically, such states are supposed to represent an equilibrium state of the system. For systems S with finitely many degrees of freedom, the microcanonical and canonical systems given above are examples of Gibbs states. But now let us consider systems with infinitely many degrees of freedom, such as those arising from an infinite number of particles (e.g. particles in a lattice). One cannot now argue as before that the entire system is going to be in a canonical ensemble; indeed, as the total energy of the system is likely to be infinite, it is not even clear that such an ensemble still exists. However, one can still argue that any localised portion S_1 of the system (with finitely many degrees of freedom) should still be in a canonical ensemble, by treating the remaining portion S \backslash S_1 of the system as a heat bath that S_1 is immersed in. Furthermore, the zeroth law of thermodynamics suggests that all such localised subsystems should be at the same temperature T. This leads to the definition of a Gibbs state at temperature T for a global system S: it is any probability distribution of microstates whose projection to any local subsystem S_1 is in the microcanonical ensemble at temperature T. (To make this precise, one needs probability theory on infinite dimensional spaces, but this can be put on a completely rigorous footing, using the theory of product measures. There are also some technical issues regarding compatibility on the boundary between S_1 and S \backslash S_1 which I will ignore here.)

For systems with finitely many degrees of freedom, there is only one canonical ensemble at temperature T, and thus only one Gibbs state at that temperature. However, for systems with infinitely many degrees of freedom, it is possible to have more than one Gibbs state at a given temperature. This phenomenon manifests itself physically via phase transitions, the most familiar of which involves transitions between solid, liquid, and gaseous forms of matter, but also includes things like spontaneous magnetisation or demagnetisation. Closely related to this is the phenomenon of spontaneous symmetry breaking, in which the underlying system (and in particular, the energy functional H) enjoys some symmetry (e.g. translation symmetry or rotation symmetry), but the Gibbs states for that system do not. For instance, the laws of magnetism in a bar of iron are rotation symmetric, but there are some obviously non-rotation-symmetric Gibbs states, such as the magnetised state in which all the iron atoms have magnetic dipoles oriented with the north pole pointing in (say) the upward direction. [Of course, as the universe is finite, these systems do not truly have infinitely many degrees of freedom, but they do behave analogously to such systems in many ways.]

It is thus of interest to determine, for any given physical system, under what choices of parameters (such as the temperature T) one has non-uniqueness of Gibbs states. For “real-life” physical systems, this question is rather difficult to answer, so mathematicians have focused attention instead on some simpler toy models. One of the most popular of these is the Ising model, which is a simplified model for studying phenomena such as spontaneous magnetism. A slight generalisation of the Ising model is the Potts model; the Ashkin-Teller model, which is studied by Chayes-McKellar-Winn, is an interpolant betwen a certain Ising model and a certain Potts model.

— Ising, Potts, and Ashkin-Teller models —

All three of these models involve particles on the infinite two-dimensional lattice {\Bbb Z}^2, with one particle at each lattice point (or site). (One can also consider these models in other dimensions, of course; the behaviour is quite different in different dimensions.) Each particle can be in one of a finite number of states, which one can think of as “magnetisations” of that particle. In the classical Ising model there are only two states (+1 and -1), though in the Chayes-McKellar-Winn paper, four-state models are considered. The particles do not move from their designated site, but can change their state over time, depending on the state of particles at nearby sites.

As discussed earlier, in order to do statistical mechanics, we do not actually need to specify the exact mechanism by which the particles interact with each other; we only need to describe the total energy of the system. In these models, the energy is contained in the bonds between adjacent sites on the lattice (i.e. sites which are a unit distance apart). The energy of the whole system is then the sum of the energies of all the bonds, and the energy of each bond depends only on the state of the particles at the two endpoints of the bond. (The total energy of the infinite system is then a divergent sum, but this is not a concern since one only needs to be able to compute the energy of finite subsystems, in which one only considers those particles within, say, a square of length R.) The Ising, Potts, and Ashkin-Teller models then differ only in the number of states and the energy of various bond configurations. Up to some harmless normalisations, we can describe them as follows:

  1. In the classical Ising model, there are two magnetisation states (+1 and -1); the energy of a bond between two particles is -1/2 if they are in the same state, and +1/2 if they are in the opposite state (thus one expects the states to align at low temperatures and become non-aligned at high temperatures);
  2. In the four-state Ising model, there are four magnetisation states (+1,+1), (+1,-1), (-1,+1), and (-1,-1) (which can be viewed as four equally spaced vectors in the plane), and the energy of a bond between two particles is the sum of the classical Ising bond energy between the first component of the two particle states, and the classical Ising bond energy between the second component. Thus for instance the bond energy between particles in the same state is -1, particles in opposing states is +1, and particles in orthogonal states (e.g. (+1,+1) and (+1,-1)) is 0. This system is equivalent to two non-interacting classical Ising models, and so the four-state theory can be easily deduced from the two-state theory.
  3. In the degenerate Ising model, we have the same four magnetisation states, but now the bond energy between particles is +1 if they are in the same state or opposing state, and 0 if they are in an orthogonal state. This model essentially collapses to the two-state model after identifying (+1,+1) and (-1,-1) as a single state, and identifying (+1,-1) and (-1,+1) as a single state.
  4. In the four-state Potts model, we have the same four magnetisation states, but now the energy of a bond between two particles is -1 if they are in the same state and 0 otherwise.
  5. In the Ashkin-Teller model, we have the same four magnetisation states; the energy of a bond between two particles is -1 if they are in the same state, 0 if they are orthogonal, and \epsilon if they are in opposing states. The case \epsilon = +1 is the four-state Ising model, the case \epsilon = 0 is the Potts model, and the cases 0 < \epsilon < 1 are intermediate between the two, while the case \epsilon=-1 is the degenerate Ising model.

For the classical Ising model, there are two minimal-energy states: the state where all particles are magnetised at +1, and the state where all particles are magnetised at -1. (One can of course also take a probabilistic combination of these two states, but we may as well restrict attention to pure states here.) Since one expects the system to have near-minimal energy at low temperatures, we thus expect to have non-uniqueness of Gibbs states at low temperatures for the Ising model. Conversely, at sufficiently high temperatures the differences in bond energy should become increasingly irrelevant, and so one expects to have uniqueness of Gibbs states at high energy. (Nevertheless, there is an important duality relationship between the Ising model at low and high energies.)

Similar heuristic arguments apply for the other models discussed above, though for the degenerate Ising model there are many more minimal-energy states and so even at very low temperatures one only expects to obtain partial ordering rather than total ordering in the magnetisations.

For the Askhin-Teller models with 0 < \epsilon < 1, it was known for some time that there is a unique critical temperature T_c (which has a physical interpretation as the Curie temperature), below which one has non-unique and magnetised Gibbs states (thus the expected magnetisation of any given particle is non-zero), and above which one has unique (non-magnetised) Gibbs states. (For \epsilon close to -1 there are two critical temperatures, describing the transition from totally ordered magnetisation to partially ordered, and from partially ordered to unordered.) The problem of computing this temperature T_c exactly, and to describe the nature of this transition, appears to be rather difficult, although there are a large number of partial results. What Chayes, McKellar, and Winn showed, though, is that this critical temperature T_c is also the critical temperature T_p for a somewhat simpler phenomenon, namely that of site percolation. Let us denote one of the magnetised states, say (+1,+1), as “blue”. We then consider the Gibbs state for a bounded region (e.g. an N x N square), subject to the boundary condition that the entire boundary is blue. In the zero temperature limit T \to 0 the entire square would then be blue; in the high temperature limit T \to +\infty each particle would have an independent random state. Consider the probability p_N that a particle at the center of this square is part of the blue “boundary cluster”; in other words, the particle is not only blue, but there is a path of bond edges connecting this particle to the boundary which only goes through blue vertices. Thus we expect this probability to be close to 1 at very low temperatures, and close to 0 at very high temperatures. And indeed, standard percolation theory arguments show that there is a critical temperature T_p below which \lim_{N \to \infty} p_N is positive (or equivalently, the boundary cluster has density bounded away from zero), and below which \lim_{N \to \infty} p_N = 0 (thus the boundary cluster has asymptotic density zero). The “Chayes-McKellar-Winn theorem” is then the claim that T_c = T_p.

This result is part of a very successful program, initiated by Fortuin and Kasteleyn, to analyse the statistical mechanics of site models such as the Ising, Potts, and Ashkin-Teller models via the random clusters generated by the bonds between these sites. (One of the fruits of this program, by the way, was the FKG correlation inequality, which asserts that any two monotone properties on a lattice are positively correlated. This inequality has since proven to be incredibly useful in probability, combinatorics and computer science.) The claims T_c \leq T_p and T_p \leq T_c are proven separately. To prove T_c \leq T_p (i.e. multiple Gibbs states implies percolation), the main tool is a theorem of Chayes and Machta that relates the non-uniqueness of Gibbs states to positive magnetisation (the existence of states in which the expected magnetisation of a particle is non-zero). To prove T_p \leq T_c (i.e. percolation implies multiple Gibbs states), the main tool is a theorem of Gandolfi, Keane, and Russo, which studied percolation on the infinite lattice and who showed that under certain conditions (in particular, that a version of the FKG inequality is satisfied), there can be at most one infinite cluster; basically, one can use the colour of this cluster (which will exist if percolation occurs) to distinguish between different Gibbs states. (The fractal structure of this infinite cluster, especially near the critical temperature, is quite interesting, but that’s a whole other story.) One of the main tasks in the Chayes-McKellar-Winn paper is to verify the FKG inequality for the Ashkin-Teller model; this is done by viewing that model as a perturbation of the Ising model, and expanding the former using the random clusters of the latter.

— Executive summary —

When one heats an iron bar magnet above a certain special temperature – the Curie temperature – the iron bar will cease to be magnetised; when one cools the bar again below this temperature, the bar can once again spontaneously magnetise in the presence of an external magnetic field. This phenomenon is still not perfectly understood; for instance, it is difficult to predict the Curie temperature precisely from the fundamental laws of physics, although one can at least prove that this temperature exists. However, Chayes, McKellar, and Winn have shown that for a certain simplified model for magnetism (known as the Ashkin-Teller model), the Curie temperature is equal to the critical temperature below which percolation can occur; this means that even when the bar is unmagnetised, enough of the iron atoms in the bar spin in the same direction that they can create a connected path from one end of the bar to another. Percolation in the Ashkin-Teller model is not fully understood either, but it is a simpler phenomenon to deal with than spontaneous magnetisation, and so this result represents an advance in our understanding of how the latter phenomenon works.

See also this explanation by John Baez.