You are currently browsing the category archive for the ‘math.MP’ category.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

As is now widely reported, the Fields medals for 2010 have been awarded to Elon Lindenstrauss, Ngo Bao Chau, Stas Smirnov, and Cedric Villani. Concurrently, the Nevanlinna prize (for outstanding contributions to mathematical aspects of information science) was awarded to Dan Spielman, the Gauss prize (for outstanding mathematical contributions that have found significant applications outside of mathematics) to Yves Meyer, and the Chern medal (for lifelong achievement in mathematics) to Louis Nirenberg. All of the recipients are of course exceptionally qualified and deserving for these awards; congratulations to all of them. (I should mention that I myself was only very tangentially involved in the awards selection process, and like everyone else, had to wait until the ceremony to find out the winners. I imagine that the work of the prize committees must have been extremely difficult.)

Today, I thought I would mention one result of each of the Fields medalists; by chance, three of the four medalists work in areas reasonably close to my own. (Ngo is rather more distant from my areas of expertise, but I will give it a shot anyway.) This will of course only be a tiny sample of each of their work, and I do not claim to be necessarily describing their “best” achievement, as I only know a portion of the research of each of them, and my selection choice may be somewhat idiosyncratic. (I may discuss the work of Spielman, Meyer, and Nirenberg in a later post.)

A recurring theme in mathematics is that of duality: a mathematical object ${X}$ can either be described internally (or in physical space, or locally), by describing what ${X}$ physically consists of (or what kind of maps exist into ${X}$), or externally (or in frequency space, or globally), by describing what ${X}$ globally interacts or resonates with (or what kind of maps exist out of ${X}$). These two fundamentally opposed perspectives on the object ${X}$ are often dual to each other in various ways: performing an operation on ${X}$ may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:

1. Vector space duality A vector space ${V}$ over a field ${F}$ can be described either by the set of vectors inside ${V}$, or dually by the set of linear functionals ${\lambda: V \rightarrow F}$ from ${V}$ to the field ${F}$ (or equivalently, the set of vectors inside the dual space ${V^*}$). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
2. Vector subspace duality In a similar spirit, a subspace ${W}$ of ${V}$ can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement ${W^\perp := \{ \lambda \in V^*: \lambda(w)=0 \hbox{ for all } w \in W \})}$. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
3. Convex duality More generally, a (closed, bounded) convex body ${K}$ in a vector space ${V}$ can be described either by listing a set of (extreme) points whose convex hull is ${K}$, or else by listing a set of (irreducible) linear inequalities that cut out ${K}$. The fundamental connection between the two is given by the Farkas lemma.
4. Ideal-variety duality In a slightly different direction, an algebraic variety ${V}$ in an affine space ${A^n}$ can be viewed either “in physical space” or “internally” as a collection of points in ${V}$, or else “in frequency space” or “externally” as a collection of polynomials on ${A^n}$ whose simultaneous zero locus cuts out ${V}$. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
5. Hilbert space duality An element ${v}$ in a Hilbert space ${H}$ can either be thought of in physical space as a vector in that space, or in momentum space as a covector ${w \mapsto \langle v, w \rangle}$ on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
6. Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
7. Intrinsic-extrinsic duality A (Riemannian) manifold ${M}$ can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
8. Group duality A group ${G}$ can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
9. Pontryagin group duality A (locally compact Hausdorff) abelian group ${G}$ can be described either by listing its elements ${g \in G}$, or by listing the characters ${\chi: G \rightarrow {\bf R}/{\bf Z}}$ (i.e. continuous homomorphisms from ${G}$ to the unit circle, or equivalently elements of ${\hat G}$). The connection between the two is the focus of abstract harmonic analysis.
10. Pontryagin subgroup duality A subgroup ${H}$ of a locally compact abelian group ${G}$ can be described either by generators in ${H}$, or generators in the orthogonal complement ${H^\perp := \{ \xi \in \hat G: \xi \cdot h = 0 \hbox{ for all } h \in H \}}$. One of the fundamental connections between the two is the Poisson summation formula.
11. Fourier duality A (sufficiently nice) function ${f: G \rightarrow {\bf C}}$ on a locally compact abelian group ${G}$ (equipped with a Haar measure ${\mu}$) can either be described in physical space (by its values ${f(x)}$ at each element ${x}$ of ${G}$) or in frequency space (by the values ${\hat f(\xi) = \int_G f(x) e( - \xi \cdot x )\ d\mu(x)}$ at elements ${\xi}$ of the Pontryagin dual ${\hat G}$). The fundamental connection between the two is the Fourier inversion formula.
12. The uncertainty principle The behaviour of a function ${f}$ at physical scales above (resp. below) a certain scale ${R}$ is almost completely controlled by the behaviour of its Fourier transform ${\hat f}$ at frequency scales below (resp. above) the dual scale ${1/R}$ and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
13. Stone/Gelfand duality A (locally compact Hausdorff) topological space ${X}$ can be viewed in physical space (as a collection of points), or dually, via the ${C^*}$ algebra ${C(X)}$ of continuous complex-valued functions on that space, or (in the case when ${X}$ is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of ${C(X)}$). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.

I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:

1. A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
2. A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
3. Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
4. Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
5. The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
6. To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
7. To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
8. Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
9. Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
10. The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
11. If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
12. If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
13. Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
14. In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
15. Etc., etc.
16. Almost all of the above statements generalise to other locally compact abelian groups than ${{\bf R}}$ or ${{\bf R}^n}$, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)

I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is at the centre of this cloud, but is by no means the only aspect of it.

The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.

This week at UCLA, Pierre-Louis Lions gave one of this year’s Distinguished Lecture Series, on the topic of mean field games. These are a relatively novel class of systems of partial differential equations, that are used to understand the behaviour of multiple agents each individually trying to optimise their position in space and time, but with their preferences being partly determined by the choices of all the other agents, in the asymptotic limit when the number of agents goes to infinity. A good example here is that of traffic congestion: as a first approximation, each agent wishes to get from A to B in the shortest path possible, but the speed at which one can travel depends on the density of other agents in the area. A more light-hearted example is that of a Mexican wave (or audience wave), which can be modeled by a system of this type, in which each agent chooses to stand, sit, or be in an intermediate position based on his or her comfort level, and also on the position of nearby agents.

Under some assumptions, mean field games can be expressed as a coupled system of two equations, a Fokker-Planck type equation evolving forward in time that governs the evolution of the density function ${m}$ of the agents, and a Hamilton-Jacobi (or Hamilton-Jacobi-Bellman) type equation evolving backward in time that governs the computation of the optimal path for each agent. The combination of both forward propagation and backward propagation in time creates some unusual “elliptic” phenomena in the time variable that is not seen in more conventional evolution equations. For instance, for Mexican waves, this model predicts that such waves only form for stadiums exceeding a certain minimum size (and this phenomenon has apparently been confirmed experimentally!).

Due to lack of time and preparation, I was not able to transcribe Lions’ lectures in full detail; but I thought I would describe here a heuristic derivation of the mean field game equations, and mention some of the results that Lions and his co-authors have been working on. (Video of a related series of lectures (in French) by Lions on this topic at the Collége de France is available here.)

To avoid (rather important) technical issues, I will work at a heuristic level only, ignoring issues of smoothness, convergence, existence and uniqueness, etc.

$\displaystyle i \hbar \partial_t |\psi \rangle = H |\psi\rangle$

is the fundamental equation of motion for (non-relativistic) quantum mechanics, modeling both one-particle systems and ${N}$-particle systems for ${N>1}$. Remarkably, despite being a linear equation, solutions ${|\psi\rangle}$ to this equation can be governed by a non-linear equation in the large particle limit ${N \rightarrow \infty}$. In particular, when modeling a Bose-Einstein condensate with a suitably scaled interaction potential ${V}$ in the large particle limit, the solution can be governed by the cubic nonlinear Schrödinger equation

$\displaystyle i \partial_t \phi = \Delta \phi + \lambda |\phi|^2 \phi. \ \ \ \ \ (1)$

I recently attended a talk by Natasa Pavlovic on the rigorous derivation of this type of limiting behaviour, which was initiated by the pioneering work of Hepp and Spohn, and has now attracted a vast recent literature. The rigorous details here are rather sophisticated; but the heuristic explanation of the phenomenon is fairly simple, and actually rather pretty in my opinion, involving the foundational quantum mechanics of ${N}$-particle systems. I am recording this heuristic derivation here, partly for my own benefit, but perhaps it will be of interest to some readers.

This discussion will be purely formal, in the sense that (important) analytic issues such as differentiability, existence and uniqueness, etc. will be largely ignored.

Gauge theory” is a term which has connotations of being a fearsomely complicated part of mathematics – for instance, playing an important role in quantum field theory, general relativity, geometric PDE, and so forth.  But the underlying concept is really quite simple: a gauge is nothing more than a “coordinate system” that varies depending on one’s “location” with respect to some “base space” or “parameter space”, a gauge transform is a change of coordinates applied to each such location, and a gauge theory is a model for some physical or mathematical system to which gauge transforms can be applied (and is typically gauge invariant, in that all physically meaningful quantities are left unchanged (or transform naturally) under gauge transformations).  By fixing a gauge (thus breaking or spending the gauge symmetry), the model becomes something easier to analyse mathematically, such as a system of partial differential equations (in classical gauge theories) or a perturbative quantum field theory (in quantum gauge theories), though the tractability of the resulting problem can be heavily dependent on the choice of gauge that one fixed.  Deciding exactly how to fix a gauge (or whether one should spend the gauge symmetry at all) is a key question in the analysis of gauge theories, and one that often requires the input of geometric ideas and intuition into that analysis.

I was asked recently to explain what a gauge theory was, and so I will try to do so in this post.  For simplicity, I will focus exclusively on classical gauge theories; quantum gauge theories are the quantization of classical gauge theories and have their own set of conceptual difficulties (coming from quantum field theory) that I will not discuss here. While gauge theories originated from physics, I will not discuss the physical significance of these theories much here, instead focusing just on their mathematical aspects.  My discussion will be informal, as I want to try to convey the geometric intuition rather than the rigorous formalism (which can, of course, be found in any graduate text on differential geometry).

My penultimate article for my PCM series is a very short one, on “Hamiltonians“. The PCM has a number of short articles to define terms which occur frequently in the longer articles, but are not substantive enough topics by themselves to warrant a full-length treatment. One of these is the term “Hamiltonian”, which is used in all the standard types of physical mechanics (classical or quantum, microscopic or statistical) to describe the total energy of a system. It is a remarkable feature of the laws of physics that this single object (which is a scalar-valued function in classical physics, and a self-adjoint operator in quantum mechanics) suffices to describe the entire dynamics of a system, although from a mathematical perspective it is not always easy to read off all the analytic aspects of this dynamics just from the form of the Hamiltonian.

In mathematics, Hamiltonians of course arise in the equations of mathematical physics (such as Hamilton’s equations of motion, or Schrödinger’s equations of motion), but also show up in symplectic geometry (as a special case of a moment map) and in microlocal analysis.

For this post, I would also like to highlight an article of my good friend Andrew Granville on one of my own favorite topics, “Analytic number theory“, focusing in particular on the classical problem of understanding the distribution of the primes, via such analytic tools as zeta functions and L-functions, sieve theory, and the circle method.

Einstein’s equation $E=mc^2$ describing the equivalence of mass and energy is arguably the most famous equation in physics. But his beautifully elegant derivation of this formula (here is the English translation) from previously understood laws of physics is considerably less famous. (There is an amusing Far Side cartoon in this regard, with the punchline “squared away”, which you can find on-line by searching hard enough, though I will not link to it directly.)

This topic had come up in recent discussion on this blog, so I thought I would present Einstein’s derivation here. Actually, to be precise, in the paper mentioned above, Einstein uses the postulates of special relativity and other known laws of physics to show the following:

Proposition. (Mass-energy equivalence) If a body at rest emits a total energy of E while remaining at rest, then the mass of that body decreases by $E/c^2$.

Assuming that bodies at rest with zero mass necessarily have zero energy, this implies the famous formula $E = mc^2$ – but only for bodies which are at rest. For moving bodies, there is a similar formula, but one has to first decide what the correct definition of mass is for moving bodies; I will not discuss this issue here, but see for instance the Wikipedia entry on this topic.

Broadly speaking, the derivation of the above proposition proceeds via the following five steps:

1. Using the postulates of special relativity, determine how space and time coordinates transform under changes of reference frame (i.e. derive the Lorentz transformations).
2. Using 1., determine how the temporal frequency $\nu$ (and wave number k) of photons transform under changes of reference frame (i.e. derive the formulae for relativistic Doppler shift).
3. Using Planck’s relation $E = h\nu$ (and de Broglie’s law $p = \hbar k$) and 2., determine how the energy E (and momentum p) of photons transform under changes of reference frame.
4. Using the law of conservation of energy (and momentum) and 3., determine how the energy (and momentum) of bodies transform under changes of reference frame.
5. Comparing the results of 4. with the classical Newtonian approximations $KE \approx \frac{1}{2} m|v|^2$ (and $p \approx mv$), deduce the relativistic relationship between mass and energy for bodies at rest (and more generally between mass, velocity, energy, and momentum for moving bodies).

Actually, as it turns out, Einstein’s analysis for bodies at rest only needs to understand changes of reference frame at infinitesimally low velocity, $|v| \ll c$. However, in order to see enough relativistic effects to deduce the mass-energy equivalence, one needs to obtain formulae which are accurate to second order in v (or more precisely, $v/c$), as opposed to those in Newtonian physics which are accurate to first order in v. Also, to understand the relationship between mass, velocity, energy, and momentum for moving bodies rather than bodies at rest, one needs to consider non-infinitesimal changes of reference frame.

Important note: Einstein’s argument is, of course, a physical argument rather than a mathematical one. While I will use the language and formalism of pure mathematics here, it should be emphasised that I am not exactly giving a formal proof of the above Proposition in the sense of modern mathematics; these arguments are instead more like the classical proofs of Euclid, in that numerous “self evident” assumptions about space, time, velocity, etc. will be made along the way. (Indeed, there is a very strong analogy between Euclidean geometry and the Minkowskian geometry of special relativity.) One can of course make these assumptions more explicit, and this has been done in many other places, but I will avoid doing so here in order not to overly obscure Einstein’s original argument. Read the rest of this entry »

I’m continuing my series of articles for the Princeton Companion to Mathematics with my article on phase space. This brief article, which overlaps to some extent with my article on the Schrödinger equation, introduces the concept of phase space, which is used to describe both the positions and momenta of a system in both classical and quantum mechanics, although in the latter one has to accept a certain amount of ambiguity (or non-commutativity, if one prefers) in this description thanks to the uncertainty principle. (Note that positions alone are not sufficient to fully characterise the state of a system; this observation essentially goes all the way back to Zeno with his arrow paradox.)

Phase space is also used in pure mathematics, where it is used to simultaneously describe position (or time) and frequency; thus the term “time-frequency analysis” is sometimes used to describe phase space-based methods in analysis. The counterpart of classical mechanics is then symplectic geometry and Hamiltonian ODE, while the counterpart of quantum mechanics is the theory of linear differential and pseudodifferential operators. The former is essentially the “high-frequency limit” of the latter; this can be made more precise using the techniques of microlocal analysis, semi-classical analysis, and geometric quantisation.

As usual, I will highlight another author’s PCM article in this post, this one being Frank Kelly‘s article “The mathematics of traffic in networks“, a subject which, as a resident of Los Angeles, I can relate to on a personal level :-) . Frank’s article also discusses in detail Braess’s paradox, which is the rather unintuitive fact that adding extra capacity to a network can sometimes increase the overall delay in the network, by inadvertently redirecting more traffic through bottlenecks! If nothing else, this paradox demonstrates that the mathematics of traffic is non-trivial.