I’ve spent the last week or so reworking the first draft of my universality article for Mathematics Awareness Month, in view of the useful comments and feedback received on that draft here on this blog, as well as elsewhere. In fact, I ended up rewriting the article from scratch, and expanding it substantially, in order to focus on a more engaging and less technical narrative. I found that I had to use a substantially different mindset than the one I am used to having for technical expository writing; indeed, the exercise reminded me more of my high school English assignments than of my professional work. (This is perhaps a bad sign: English was not exactly my strongest subject as a student.)
The piece now has title: “E pluribus unum: from complexity, universality”. This is a somewhat US-centric piece of wordplay, but Mathematics Awareness Month is, after all, a US-based initiative, even though awareness of mathematics certainly transcends national boundaries. Still, it is a trivial matter to modify the title later if a better proposal arises, and I am sure that if I send this text to be published, that the editors may have some suggestions in this regard.
By coincidence, I moved up and expanded the other US-centric item – the discussion of the 2008 US presidential elections – to the front of the paper to play the role of the hook. I’ll try to keep the Commonwealth spelling conventions, though. :-)
I decided to cut out the discussion of the N-body problem for various values of N, in part due to the confusion over the notion of a “solution”; there is a nice mathematical story there, but perhaps one that gets in the way of the main story of universality.
I have added a fair number of relevant images, though some of them will have to be changed in the final version for copyright reasons. The narrow column format of this blog means that the image placement is not optimal, but I am sure that this can be rectified if this article is published professionally.
– E pluribus unum: from complexity, universality –
A brief tour of the mysteriously universal laws of mathematics and nature.
1. Prologue: the 2008 US presidential election and the law of large numbers.
Do I contradict myself?
Very well then I contradict myself,
(I am large, I contain multitudes.)
The US presidential elections of November 4, 2008 were a massively complicated affair. Over a hundred million voters from fifty states cast their ballot, with each voter’s decision being influenced in countless different ways by the campaign rhetoric, the media coverage, rumors, personal impressions of the candidates, or from discussing politics with friends and colleagues. There were millions of “swing” voters that were not firmly supporting either of the two major candidates; their final decision would be unpredictable, and perhaps even random in some cases. There was the same uncertainty at the state level: while many states were considered safe for one candidate or the other, at least a dozen states were considered “in play”, and could conceivably have gone either way.
In such a situation, it would seem impossible to be able to accurately forecast the election outcome in advance. Sure, there were electoral polls – hundreds of them – but each poll only surveyed a few hundred or a few thousand likely voters, which is only a tiny fraction of the entire population. And the polls often fluctuated wildly and disagreed with each other; not all polls were equally reliable or unbiased, and no two polling organisations used exactly the same methodology.
Nevertheless, well before election night was over, the polls had predicted the outcome of the presidential election, and most of the other elections taking place that night, quite accurately. Perhaps the most spectacular instance of this was the predictions of the statistician Nate Silver, who used a weighted analysis of all existing polls to correctly predict the outcome of the presidential election in 49 out of 50 states, as well as of all 35 of the 35 US senate races. (The lone exception was the presidential election in Indiana, which Silver called narrowly for McCain, but which eventually favoured Obama by just 0.9%.)
The theoretical basis for polling is a mathematical law known as the law of large numbers. Roughly speaking, this law asserts that if one is making a set of samples via some random method, then as one makes the sample size larger and larger, the average outcome of those samples will almost always converge to a single number, known as the expected outcome of that random method. For instance, if one flips a fair coin a thousand times, then one can use this law to show that the proportion of heads one gets from doing so will usually be quite close to the expected value of 50%; indeed, it will be within 3% of 50% (i.e. between 470 and 530 heads out of 1000) about 95% of the time.
In a very similar vein, if one selects a thousand voters at random and in an unbiased fashion (so that each voter is equally likely to be selected by the poll), and finds out who they would vote for (out of two choices, such as Obama and McCain; assume for simplicity that third-party votes are negligible), then the outcome of this poll has a margin of error of about 3% with a 95% confidence level, which means that 95% of the time that such a poll is conducted, the result of the poll will be within 3% of the true result of the election.
One of the remarkable things about the law of large numbers is that it is universal. Does the election involve a hundred thousand voters, or a hundred million voters? It doesn’t matter – the margin of error for the poll will still be 3%. Is it a state that favors McCain to Obama 55% to 45%, or Obama to McCain 60% to 40%? Again, it doesn’t matter – the margin of error for the poll will still be 3%. Is the state a homogeneous bloc of (say) affluent white urban voters, or is the state instead a mix of voters of all incomes, races, and backgrounds? It still doesn’t matter – the margin of error for the poll will still be 3%. And so on and so forth. The only factor which really makes a significant difference1 is the size of the poll; the larger the poll, the smaller the margin of error.
In 2008, reliably accurate meta-polls were still something of a novelty. But they seem here to stay, particularly in high-profile elections in which hundreds of polls are conducted; expect to see more from them in the future.
2. Approaching normal: bell curves and other universal laws.
I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. (Sir Francis Galton, Natural Inheritance, 1889, describing what is now known as the central limit theorem.)
The law of large numbers is one of the simplest and best understood of the universal laws in mathematics and nature, but it is by no means the only one. Over the decades, many such universal laws have been found, that govern the behaviour of wide classes of complex systems, regardless of what the components of that system are, or even how they interact with each other. In this article we will have a quick tour of some of these remarkable laws.
After the law of large numbers, perhaps the next most fundamental example of a universal law is the central limit theorem. Roughly speaking, this theorem asserts that if one takes a statistic that is a combination of many independent and randomly fluctuating components, with no one component having a decisive influence on the whole, then that statistic will be approximately distributed according to a law called the normal distribution (or Gaussian distribution), and more popularly known as the bell curve; some examples of this mathematical curve already appeared above in Figure 3. The law is universal because it holds regardless of exactly how the individual components fluctuate, or how many components there are (although the accuracy of the law improves when the number of components increases); it can be seen2 in a staggeringly diverse range of statistics, from the incidence rate of accidents, to the variation of height, weight, or other vital statistics amongst a species, to the financial gains or losses caused by chance, to the velocities of the component particles of a physical system. The size, width, location, and even the units of measurement of the distribution varies from statistic to statistic, but the bell curve shape can be discerned in all cases. This convergence arises not because of any “low-level” or “microscopic” connection between such diverse phenomena as car crashes, human height, trading profits, or stellar velocities, but because in all of these cases the “high-level” or “macroscopic” structure is the same, namely a compound statistic formed from a combination of the small influences of many independent factors. This is the essence of universality: the macroscopic behaviour of a large, complex system can be almost totally independent of its microscopic structure.
The universal nature of the central limit theorem is tremendously useful in many industries, allowing them to manage what would otherwise be an intractably complex and chaotic system. With this theorem, insurers can manage the risk of, say, their car insurance policies, without having to know all the complicated details of how car crashes actually occur; astronomers can measure the size and location of distant galaxies, without having to solve the complicated equations of celestial mechanics; electrical engineers can predict the effect of noise and interference on electronic communications, without having to know exactly how this noise was generated; and so forth. It is important to note, though that the central limit theorem is not completely universal; there are important cases when the theorem does not apply, giving statistics with a distribution quite different from the bell curve. I’ll return to this point later.
There are some other distant cousins of the central limit theorem that are universal laws for slightly different types of statistics. For instance, Benford’s law is a universal law for the first few digits of a large statistic, such as the population of a country or the size of an account; it gives a number of counterintuitive predictions, for instance that any given statistic occurring in nature is more than six times as likely to start with the digit 1, than with the digit 9. Among other things, this law (which can be explained by combining the central limit theorem with the mathematical theory of logarithms) has been used to detect accounting fraud, since numbers that are simply made up, as opposed to arising naturally in nature, often do not obey this law.
In a similar vein, Zipf’s law is a universal law that governs the largest statistics in a given category, such as the largest country populations in the world, or the most frequent words in the English language. It asserts that the size of a statistic is usually inversely proportional to its ranking; thus for instance the tenth largest statistic should be about half the size of the fifth largest statistic. (The law tends not to work so well for the top two or three statistics, but becomes more accurate after that.) Unlike the central limit theorem and Benford’s law, this law is primarily an empirical law; it is observed in practice, but mathematicians still do not have a fully satisfactory and convincing explanation for how the law comes about, and why it is so universal.
3. At the threshold: the universality of phase transitions
There is nothing so stable as change. (Bob Dylan, 1963. From “No Direction Home”, by Robert Shelton)
We’ve been discussing universal laws for individual statistics: complex numerical quantities that arise as the combination of many smaller and independent factors. But universal laws have also been found for more complicated objects than mere numerical statistics. One example of this is the laws governing the complicated shapes and structures that arise from phase transitions in physics and chemistry.
As we learn in high school science classes, matter comes in various states, including the three classic states of solid, liquid, and gas, but also a number of exotic states such as plasmas or superfluids. Ferromagnetic materials, such as iron, also have magnetised and non-magnetised states; other materials become electrical conductors at some temperatures and insulators at others. What state a given material is in depends on a number of factors, most notably the temperature and, in some cases, the pressure. (For some materials, the level of impurities is also relevant.) For a fixed value of the pressure, most materials tend to be in one state (e.g. a solid) for one range of temperatures, and in another state (e.g. a liquid) for another range. But when the material is at or very close to the temperature dividing these two ranges, interesting phase transitions occur, in which the material is not fully in one state or the other, but tends3 to split up into beautifully fractal shapes known as clusters, each of which embodies one or the other of the two states.
There are countless different materials in existence, each of which having a different set of key parameters (such as the boiling point at a given pressure). There are also a large number of different mathematical models that physicists and chemists use to model these materials and their phase transitions, in which individual atoms or molecues are assumed to be connected to some of their neighbours by a random number of bonds, assigned according to some probabilistic rule. At the microscopic level, these models can look quite different from each other. For instance, the figures below display the small-scale structure of two typical such models: a site percolation model on a hexagonal lattice, in which each hexagon (or site) is an abstraction of an atom or molecule randomly placed in one of two states, with the clusters being the connected regions of a single colour; and a bond percolation model on a square lattice, in which the edges of the lattice are abstractions of molecular bonds that each have some probability of being activated, with the clusters being the connected regions given by the active bonds.
If, however, one zooms out to a more macroscopic scale, and looks at the large-scale structure of clusters when one is at or near the critical value of parameters such as temperature, the differences in microscopic structure fade away, and one begins to see a number of universal laws emerging. While the clusters have a random size and shape, they almost always have a fractal structure, which roughly speaking means that if one zooms in a little on any portion of the cluster, the resulting image resembles the cluster as a whole. Basic statistics such as the number of clusters, the average size of the clusters, or how often a cluster connects two given regions of space, appear to obey some specific universal laws, known as power laws (which are somewhat similar, though not quite the same, as Zipf’s law, which was mentioned earlier). These laws seem to arise in almost every mathematical model that has been put forward to explain (continuous) phase transitions), and have also been observed many times in nature. As with other universal laws, the precise microscopic structure of the model or the material may affect some basic parameters, such as the phase transition temperature, but the underlying structure of the law is the same across all such models and materials.
In contrast to more classical universal laws such as the central limit theorem, our understanding of the universal laws of phase transition are still incomplete. Physicists have put forward some compelling heuristic arguments that explain or support many of these laws (based on a powerful, but not fully rigorous, tool known as the renormalisation group method), but a completely rigorous proof of these laws has not yet been obtained in all cases. This is very much a current area of research; for instance, in August of 2010 a Fields medal (one of the most prestigious prizes in mathematics) was awarded to Stanislav Smirnov for his breakthroughs in rigorously establishing the validity of these universal laws for some key models (such as percolation models on a triangular lattice).
4. Nuclear resonance and the music of the primes: the universality of spectra
Even without Hollywood hyperbole, however, the chance encounter of Montgomery and Dyson was a genuinely dramatic moment. Their conversation revealed an unsuspected connection between areas of mathematics and physics that had seemed remote. Why should the same equation describe both the structure of an atomic nucleus and a sequence at the heart of number theory? And what do random matrices have to do with either of those realms? In recent years, the plot has thickened further, as random matrices have turned up in other unlikely places, such as games of solitaire, one-dimensional gases and chaotic quantum systems. Is it all just a cosmic coincidence, or is there something going on behind the scenes? (Brian Hayes, “The spectrum of Riemannium“, American Scientist, 2003).
We are nearing the end of our tour of universal laws, and I’ll now turn to another example of this phenomenon which is closer to my own area of research. Here, the object of study is not a single numerical statistic (as was the case of the central limit theorem) or a shape (as was the case for phase transitions), but a discrete spectrum: a sequence of points (or numbers, or frequencies, or energy levels) spread out along a line.
Perhaps the most familiar example of a discrete spectrum is the radio frequencies emitted by local radio stations; this is a sequence of frequencies in the radio portion of the electromagnetic spectrum, which one can of course access by turning a radio dial. These frequencies are not evenly spaced, but usually some effort is made to keep any two station frequencies separated from each other, to reduce interference.
Another familiar example of a discrete spectrum is the spectral lines of an atomic element that come from the frequencies that the electrons in the atomic shells can absorb and emit, according to the laws of quantum mechanics. When these frequencies lie in the visible portion of the electromagnetic spectrum, they give individual elements their distinctive colour, from the blue light of argon gas (which, confusingly, is often used in neon lamps, as pure neon emits orange-red light) to the yellow light of sodium. For simple elements, such as hydrogen, the equations of quantum mechanics can be solved relatively easily, and the spectral lines follow a regular pattern; but for heavier elements, the spectral lines become quite complicated, and not easy to work out just from first principles.
These resonances have an interesting distribution; they are not independent of each other, but instead seem to obey a precise repulsion law that makes it quite unlikely that two adjacent resonances are too close to each other, somewhat in analogy to how radio station frequencies tend to avoid being too close together, except that the former phenomenon arises from the laws of nature rather than from government regulation of the spectrum. In the 1950s, the renowned physicist and Nobel laureate Eugene Wigner investigated these resonance statistics and proposed a remarkable mathematical model to explain them, an example of what we now call a random matrix model. The precise mathematical details of these models are too technical to describe here, but roughly speaking one can view such models as a large collection of masses, all connected to each other by springs of various randomly selected strengths. Such a mechanical system will oscillate (or resonate) at a certain set of frequencies; and the Wigner hypothesis asserts that the resonances of a large atomic nucleus should resemble that of the resonances of such a random matrix model. In particular, they should experience the same repulsion phenomenon. Since it is possible to rigorously prove repulsion of the frequencies of a random matrix model, this gives a heuristic explanation for the same phenomenon that is experimentally observed for nuclei.
Now, of course, an atomic nucleus does not actually resemble a large system of masses and springs (among other things, it is governed by the laws of quantum mechanics rather than of classical mechanics). Instead, as we have since discovered, Wigner’s hypothesis is a manifestation of a universal law that govern many types of spectral lines, including those that ostensibly have little in common with atomic nuclei or random matrix models. For instance, the same spacing distribution has been found in the waiting times between buses arriving at a bus stop:
The discovery of the GUE hypothesis, connecting the music of the primes and the energy levels of nuclei, occurred at the Institute of Advanced Study in 1972, and the story is legendary in mathematical circles. It concerns a chance meeting between the mathematician Hugh Montgomery, who had been working on the distribution of zeroes of the zeta function (and more specifically, on a certain statistic relating to that distribution known as the pair correlation function), and the renowned physicist Freeman Dyson. To tell the story, I will quote from Dan Rockmore’s book “Stalking the Riemann hypothesis”:
As Dyson recalls it, he and Montgomery had crossed paths from time to time at the Institute nursery4 when picking up and dropping off their children. Nevertheless, they had not been formally introduced. In spite of Dyson’s fame, Montgomery hadn’t seen any purpose in meeting him. “What will we talk about?” is what Montgomery purportedly said when brought to tea. Nevertheless, Montgomery relented and upon being introduced, the amiable physicist asked the young number theorist about his work. Montgomery began to explain his recent results on the pair correlation, and Dyson stopped him short – “Did you get this?” he asked, writing down a particular mathematical formula. Montgomery almost fell over in surprise: Dyson had written down the sinc-infused pair correlation function. … Whereas Montgomery had traveled a number theorist’s road to a “prime picture” of the pair correlation, Dyson had arrived at this formula through the study of these energy levels in the mathematics of matrices…
The chance discovery by Montgomery and Dyson that the same universal law that governs random matrices and atomic spectra, also applies to the zeta function, was given substantial numerical support by the computational work of Andrew Odlyzko from the 1980s onwards (see Figure 22). But this does not mean that the primes are somehow nuclear-powered, or that atomic physics is somehow driven by the prime numbers; instead, it is evidence that a single law for spectra is so universal that it is the natural end product of any number of different processes, whether it comes from nuclear physics, random matrix models, or number theory.
The precise mechanism underlying this law has not yet been fully unearthed; in particular, we still do not have a compelling explanation, let alone a rigorous proof, of why the zeroes of the zeta function are subject to the GUE hypothesis. However, there is now a substantial body of rigorous work (including some by myself, and including some substantial breakthroughs in just the last one or two years) that gives support to the universality of this hypothesis, by showing that a wide variety of random matrix models (not just the most famous model of the Gaussian Unitary Ensemble) are all governed by essentially the same law for their spacings. At present, these demonstrations of universality have not extended to the number theoretic or physical settings, but they do give indirect support to the law being applicable in those cases.
The arguments used in this recent work are too technical to give here, but I will just mention one of the key ideas, which my co-author, Van Vu, and I borrowed from an old proof of the central limit theorem by Jarl Lindeberg in 1922. In terms of the mechanical analogy of a system of masses and springs mentioned earlier, the key strategy was to replace just one of the springs by another, randomly selected, spring, and show that the distribution of the frequencies of this system did not change significantly when doing so. Applying this replacement operation to each spring in turn, one can eventually replace a given random matrix model with a completely different model while keeping the distribution mostly unchanged, and this can be used to show that large classes of random matrix models have essentially the same distribution.
This is currently a very active area of research; for instance, simultaneously with my work with Van Vu last year, Laszlo Erdos, Benjamin Schlein, and Horng-Tzer Yau have also given a number of other demonstrations of universality for random matrix models, based on ideas from mathematical physics. The field is moving quickly, and in a few years we may have many more insights as to the nature of this still-mysterious universal law.
5. When universality fails: the complex middle ground
There are many, many other universal laws of mathematics and nature out there; the examples given above are only a small fraction of those that have been discovered over the years, from such diverse subjects as dynamical systems and quantum field theory. For instance, many of the “macroscopic” laws of physics, such as the laws of thermodynamics or the equations of fluid motion, are quite universal in nature, with the microscopic structure of the material or fluid being studied being almost completely irrelevant, other than via some key parameters such as viscosity, compressibility, or entropy.
However, the principle of universality does have definite limitations. Take for instance the central limit theorem, which gives a bell curve distribution to any quantity that arises from a combination of many small and independent factors. This theorem can fail when the required hypotheses are not met. For instance, the distribution of the heights of all human adults (male and female) does not obey a bell curve distribution, because one single factor – gender – makes so much of an impact on height that it is not averaged out by all the other environmental and genetic factors that influence this statistic.
Another very important way in which the central limit fails is when the individual factors that make up a quantity do not fluctuate independently of each other, but are instead correlated, so that they tend to rise or fall in unison. In such cases, “fat tails” or “black swans” can develop, in which the quantity moves much further from its average value than the central limit theorem would predict. This phenomenon is particularly important in financial modeling, especially when dealing with complex financial instruments such as the collateralised debt obligations (CDOs) that were formed by aggregating several mortgages together. As long as the mortgages behaved independently of each other, the central limit theorem could be used to model the risk of these instruments; but in the recent financial crisis, this independence hypothesis broke down spectacularly, leading to significant financial losses for many holders of these obligations (and for their insurers). A mathematical model is only as strong as the assumptions behind it.
Another instance where the universal laws of fluid mechanics break down is at the mesoscopic scale – larger than the microscopic scale of individual molecues, but smaller than the macroscopic scales for which universality applies. An important example of a mesoscopic fluid is the blood flowing through blood vessels; the blood cells that make up this liquid are so large that they cannot be treated merely as an ensemble of microscopic molecues, but as mesoscopic agents with complex behaviour. Other examples of materials with interesting mesoscopic behaviour include colloidal fluids (such as mud), and certain types of nanomaterials; it is a continuing challenge to mathematically model such materials properly.
There are also many macroscopic situations in which no universal law is known to exist, particularly in cases where the system contains human agents. The stock market is a good example: despite extremely intensive efforts, no satisfactory universal laws to describe the movement of stock prices have been discovered (the central limit theorem, for instance, does not seem to be a good model, as discussed earlier). One reason for this, of course, is that any regularity in the market that is discovered, is likely to be exploited by arbitrageurs until it disappears. For similar reasons, finding universal laws for macroeconomics appears to be a moving target; according to Goodhart’s law, if an observed statistical regularity in economic data is exploited for policy purposes, it tends to collapse. (Ironically, Goodhart’s law is itself arguably an example of a universal law…)
Even when universal laws do exist, it may still be practically impossible to use them to predict what happens next. For instance, we have universal laws for the motion of fluids, such as the Navier-Stokes equations, and these are certainly used all the time in such tasks as weather prediction, but these equations are so complex and unstable that even with the most powerful computers, we are still unable to accurately predict the weather more than a week or two into the future.
In conclusion, we see that between the vast, macroscopic systems for which universal laws hold sway, and the simple systems that can be analysed using the fundamental laws of nature, there is a substantial middle ground of systems that are too complex for fundamental analysis, but too simple to be universal. Plenty of room, in short, for all the complexities of life as we know it.
6. Further reading
- Percy Deift, “Universality for physical and dynamical systems“, International Congress of Mathematicians, Vol. I, 125–152, Eur. Math. Soc., Zürich, 2007.
- Brian Hayes, “The spectrum of Riemannium“, American Scientist, 2003.
- Dan Rockmore, “Stalking the Riemann hypothesis”, Pantheon Books, New York, 2005.
- Terence Tao, “Small samples and the margin of error“, blog post, 2008.
- Terence Tao, “Benford’s law, Zipf’s law, and the Pareto distribution“, blog post, 2009.
- Actually, this is an oversimplification: some of the factors mentioned in the text do make a slight difference to the margin of error. For instance, the margin of error when there are a hundred million voters is actually closer to 3.1%, while for a hundred thousand voters it turns out to be about 3.08%. More significantly, the idealised assumptions applied here do not hold in real-world polling. Not every voter is equally likely to be polled; for instance, in a telephone poll, voters with telephones clearly have a greater chance of being polled than voters without telephones. Not every person polled will eventually vote. Not every person polled will actually answer the pollster’s questions correctly, or at all. And, if a poll organisation is partisan, it may give favorable polls more publicity than unfavorable polls. All of these factors can widen the margin of error, or bias the result one way or another, although this can often be compensated for by using the correct methodology. On the other hand, the accuracy and bias of a polling organisation can be discerned through the long term record of that organisation’s polls, by yet another application of the law of large numbers. Once one has this information, one can combine the most recent polls from several organisations via a weighted average into what is effectively a single large poll, which in principle has a smaller margin of error and less bias than any individual poll. These meta-analyses of polls, which exploited the universal law of large numbers multiple times, ended up giving among the most accurate predictions of the election. (In the case of Nate Silver’s predictions, this method was also combined with a demographic analysis, for instance using polls from counties with similar demographic features to calibrate each other by a statistical technique known as regression, to increase the accuracy even further.)
- Again, this is an oversimplification; when dealing with statistics that cannot be negative numbers (such as number of car crashes, or human height, but not the profit or loss from a day of trading), then one often has to replace the normal distribution by a slight variant known as the log-normal distribution; when dealing with statistics of a multi-dimensional nature, such as a three-dimensional velocity, then another variant known as the chi-squared distribution may be needed; and so forth.
- Strictly speaking, this clustering behaviour only occurs for a certain subclass of phase transitions known as continuous or second-order phase transitions, which do not involve the additional complication of latent heat.
- The role of child care in mathematical research should not be underestimated. One of my own collaborations, with Emmanuel Candés in the subject of compressed sensing, took place in no small part in the preschool where we dropped off our respective children.