Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation ${E=mc^2}$ for the rest energy of a body with mass ${m}$. (Throughout this post, mass is used to refer to the invariant mass (also known as rest mass) of an object.) This derivation used a number of physical assumptions, including the following:

1. The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal ${c}$ in every such inertial frame.
2. Planck’s relation and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together.
3. The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum).
4. The Newtonian approximations ${E \approx E_0 + \frac{1}{2} m|v|^2}$, ${p \approx m v}$ to energy and momentum at low velocities.

The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.

As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group ${SO(d,1)}$ to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:

1. The two postulates of special relativity;
2. The law of conservation of energy (and the additivity of energy);
3. The Newtonian approximation ${E \approx E_0 + \frac{1}{2} m|v|^2}$ at low velocities.

The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous ${E=mc^2}$ formula.

Disclaimer: As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on ${100\%}$ rigorous mathematical reasoning.)

— 1. The main argument —

We will assume that the total energy ${E}$ of a moving body depends only on the mass ${m}$ of that body, and the velocity ${v}$ of that body:

$\displaystyle E = E(m,v).$

(This is actually a non-trivial assumption; it excludes the possibility that the energy might also be depenent on other features of the body, such as spin or charge.) At present, this functional relationship ${E: (m,v) \mapsto E(m,v)}$ is arbitrary. However, we can use some physical arguments to constrain this relationship. We first use the following argument of Galileo. Consider two bodies side by side, traveling at the same velocity ${v}$, with the first body of mass ${m_1}$ and the second of mass ${m_2}$. Then, the first body has energy ${E(m_1,v)}$ and the second has energy ${E(m_2,v)}$, so the combined system of two bodies has total energy ${E(m_1,v)+E(m_2,v)}$. On the other hand, if we imagine connecting the two bodies by an infinitesimally thin thread, we can view the system as a single body of mass ${m_1+m_2}$ traveling at the same velocity ${v}$. This leads us to the relationship

$\displaystyle E(m_1,v) + E(m_2,v) = E(m_1+m_2,v)$

for any ${m_1,m_2,v}$, which (under reasonable hypotheses of continuity) implies a linear relationship between energy and mass, thus

$\displaystyle E(m,v) = m f(v)$

for some function ${f: v \mapsto f(v)}$ depending only on the velocity ${v}$.

We still have to determine this unknown functional relationship ${f: v \mapsto f(v)}$. We assume rotational symmetry of the laws of physics (which one can view as a special case of the first postulate of special relativity): if two bodies of equal mass move at the same speed, but at different directions, the energies should be the same. In other words, ${f}$ should be spherically symmetric, so by abuse of notation we write

$\displaystyle f(v) = f(|v|). \ \ \ \ \ (1)$

Now consider a body ${B}$ of mass ${M}$ at rest at the origin(in some reference frame ${O}$), which somehow disintegrates (at time ${t=0}$, for simplicity) into two smaller bodies ${B_+, B_-}$ of equal mass ${m}$, one moving in the positive ${x}$ direction at some velocity ${(+v,0,0)}$, and the other moving in the negative ${x}$ direction at the opposite velocity ${(-v,0,0)}$ (note that this situation is consistent with the law of conservation of momentum). (If one prefers, one could also view the time-reversed situation, in which two masses of equal and opposite velocity collide to form a large stationary mass; the analysis of this situation is basically identical to the one given here.) In Newtonian mechanics, we have conservation and additivity of mass, so that ${M}$ must equal ${m+m}$; but we will not assume conservation and additivity of mass here (and in fact at least one of these laws must break down in special relativity, at least if one insists on using an invariant notion of mass). Instead, we can link ${M}$, ${m}$, and ${v}$ to each other by the law of conservation of energy. Before the disintegration, the body ${B}$ has total energy ${E(M,0) = M f(0)}$, while after the disintegration the system ${B_++B_-}$ has total energy ${E(m,(+v,0,0)) + E(m,(-v,0,0)) = 2 m f(|v|)}$ (using the spherically symmetric nature (1)) of ${f}$, and so

$\displaystyle M f(0) = 2m f(v). \ \ \ \ \ (2)$

Now we view the same system relative to another reference frame ${O'}$, which relative to ${O}$ is moving at a velocity ${(w,0,0)}$ in the ${x}$ direction for some ${w}$, while keeping the ${y}$ and ${z}$ coordinates unchanged. The spacetime coordinates ${(x',y',z',t')}$ of ${O'}$ are then related to those ${(x,y,z,t)}$ of ${O}$ by the usual Lorentz transformations

$\displaystyle x' = \frac{x-wt}{\sqrt{1-w^2/c^2}}$

$\displaystyle y' = y$

$\displaystyle z' = z$

$\displaystyle t' = \frac{t-wx/c^2}{\sqrt{1-w^2/c^2}}$

which can be deduced from the postulates of special relativity by a standard derivation that we will not give here (it is sketched in the previous blog post). The pre-disintegration body ${B}$ is moving along the worldline ${\{ (0,0,0,t): t < 0\}}$ in the ${O}$ reference frame, and is thus moving along the line ${\{ (\frac{-wt}{\sqrt{1-w^2/c^2}}, 0, 0, \frac{t}{\sqrt{1-w^2/c^2}}): t < 0 \}}$ in the ${O'}$ reference frame; in particular, it has velocity ${-w}$ in this frame and thus has energy ${M f(w)}$ in this frame.

Now consider the first post-disintegration body ${B_+}$. It is moving along the worldline ${\{ (vt, 0,0,t): t>0 \}}$ in the ${O}$ reference frame, and thus along the line ${\{ (\frac{vt-wt}{\sqrt{1-w^2/c^2}}, 0, 0, \frac{t-vwt/c^2}{\sqrt{1-w^2/c^2}}): t > 0 \}}$ in the ${O'}$ reference frame; in particular, the speed of ${B_+}$ in this frame is ${\frac{v-w}{1-vw/c^2}}$ (the well known velocity addition (or subtraction) formula), and so the energy of this body is ${m f( \frac{v-w}{1-vw/c^2} )}$. Similarly, ${B_-}$ has energy ${m f( \frac{v+w}{1+vw/c^2} )}$. Equating energies, we are thus led to

$\displaystyle M f(w) = m (f( \frac{v-w}{1-vw/c^2} ) + f(\frac{v+w}{1+vw/c^2})).$

We can eliminate ${M,m}$ using (2), to obtain a functional equation for ${f}$:

$\displaystyle 2f(v) f(w) = f(0) (f( \frac{v-w}{1-vw/c^2} ) + f(\frac{v+w}{1+vw/c^2})). \ \ \ \ \ (3)$

This equation should hold for all (physically attainable) velocities ${v,w}$. To solve this equation, it is convenient to work with the change of variables

$\displaystyle v = c \tanh \alpha; w = c \tanh \beta;$

the hyperbolic angles ${\alpha,\beta}$ are known as the rapiditiesassociated to ${v}$ and ${w}$ respectively. The point of using this change of variables is that the hyperbolic tangent addition formula yields

$\displaystyle \frac{v+w}{1+vw/c^2} = c \tanh(\alpha+\beta); \quad \frac{v-w}{1-vw/c^2} = c \tanh(\alpha-\beta).$

Thus if we make the change of variables

$\displaystyle g(\alpha) := f( c \tanh \alpha )$

then (3)simplifies to

$\displaystyle 2 g(\alpha) g(\beta) = g(0) ( g(\alpha+\beta) + g(\alpha-\beta) ).$

It is tempting to plug in some special values into this equation, such as ${\beta=0}$, but this only gives a trivial equation. However, if we first differentiate twice in ${\beta}$ to obtain

$\displaystyle 2 g(\alpha) g''(\beta) = g(0) ( g''(\alpha+\beta) + g''(\alpha-\beta) )$

and thenset ${\beta=0}$, we obtain the non-trivial equation

$\displaystyle g(\alpha) g''(0) = g(0) g''(\alpha).$

This is a differential equation in ${g}$, and can be solved as

$\displaystyle g(\alpha) = A \cosh(k \alpha) + B \sinh(k \alpha)$

for some unknowns ${A,B}$, where ${k}$ is the square root of ${g''(0)/g(0)}$. From (1), ${g}$ should have vanishing derivative at the origin, and so ${B=0}$, and so we have

$\displaystyle f( c \tanh \alpha ) = A \cosh(k \alpha). \ \ \ \ \ (4)$

This is significant progress in constraining the behaviour of ${f}$, but there are still two unknown parameters ${A, k}$. To proceed further, it becomes necessary to utilise a second dimension. Namely, we repeat the previous arguments, but with ${O'}$ now moving at velocity ${(0,w,0)}$ instead of ${(w,0,0)}$. The Lorentz transformations are now

$\displaystyle x' = x$

$\displaystyle y' = \frac{y-wt}{\sqrt{1-w^2/c^2}}$

$\displaystyle z' = z$

$\displaystyle t' = \frac{t-wy/c^2}{\sqrt{1-w^2/c^2}}.$

The pre-disintegration body ${B}$ is moving along the worldline ${\{ (0,0,0,t): t < 0\}}$ in the ${O}$ reference frame, and is thus moving along the line ${\{ (0, \frac{-wt}{\sqrt{1-w^2/c^2}}, 0, \frac{t}{\sqrt{1-w^2/c^2}}): t < 0 \}}$ in the ${O'}$ reference frame; in particular, it has velocity ${-w}$ in this frame and thus has energy ${M f(w)}$ in this frame.

Now consider the first post-disintegration body ${B_+}$. It is moving along the worldline ${\{ (vt, 0,0,t): t>0 \}}$ in the ${O}$ reference frame, and thus along the line ${\{ (vt, \frac{-wt}{\sqrt{1-w^2/c^2}}, 0, \frac{t}{\sqrt{1-w^2/c^2}}): t > 0 \}}$ in the ${O'}$ reference frame; in particular, the speed of ${B_+}$ in this frame is ${\sqrt{ v^2 (1-w^2/c^2) + w^2 }}$, and so the energy of this body is ${m f( \sqrt{ v^2 (1-w^2/c^2) + w^2 } )}$. Similarly for ${B_-}$. Equating energies, we are thus led to

$\displaystyle M f(w) = 2m f( \sqrt{ v^2 (1-w^2/c^2) + w^2 } ).$

We can eliminate ${M,m}$ using (2), to obtain a functional equation for ${f}$:

$\displaystyle f(v) f(w) = f(0) f( \sqrt{ v^2 (1-w^2/c^2) + w^2 } ).$

This equation should hold for all (physically attainable) velocities ${v,w}$. To solve this equation, we work with infinitesimal ${w}$ and perform a Taylor expansion. From the symmetry (2), ${f}$ should be flat at the origin, and so (assuming sufficient smoothness for ${f}$) we have

$\displaystyle f(w) = f(0) + \frac{1}{2} f''(0) w^2 + o(w^2),$

while from the Taylor approximation

$\displaystyle \sqrt{ v^2 (1-w^2/c^2) + w^2 } = v + \frac{1-v^2/c^2}{2v} w^2$

we have

$\displaystyle f( \sqrt{ v^2 (1-w^2/c^2) + w^2 } ) = f(v) + \frac{1-v^2/c^2}{2v} w^2 f'(v) + o(w^2).$

Inserting these expansions and extracting the ${w^2}$ coefficient, we obtain the differential equation

$\displaystyle f(v) f''(0) = f(0) \frac{1-v^2/c^2}{2v} f'(v).$

which we can rewrite as

$\displaystyle \frac{d}{dv} \log f(v) = C \frac{v}{1-v^2/c^2}$

for some constant ${C}$. We can integrate this as

$\displaystyle \log f(v) = -C \log(1-v^2/c^2) + C'$

and thus

$\displaystyle f(v) = D (1-v^2/c^2)^{-C}$

for some parameters ${C, D}$. In rapidity coordinates ${v = c \tanh \alpha}$, this becomes

$\displaystyle f( c \tanh \alpha) = D \cosh^{2C} \alpha.$

Comparing this with (4)(e.g. by performing a Taylor expansion to fourth order around ${\alpha=0}$) we see that ${k = 2C = 1}$, thus

$\displaystyle f(c \tanh \alpha) = A \cosh \alpha$

or equivalently

$\displaystyle f(v) = \frac{A}{\sqrt{1-v^2/c^2}}.$

Thus we have

$\displaystyle E(m,v) = \frac{Am}{\sqrt{1-|v|^2/c^2}}.$

For infinitesimal velocities ${v}$, we may Taylor expand

$\displaystyle E(m,v) = A m + \frac{1}{2} \frac{A}{c^2} m|v|^2 + o(|v|^2)$

and so the kinetic energy of a slowly moving mass is ${\frac{1}{2} \frac{A}{c^2} m|v|^2 + o(|v|^2)}$. Comparing this with the Newtonian approximation of ${\frac{1}{2} m|v|^2}$ we conclude that ${A=c^2}$, and thus

$\displaystyle E(m,v) = \frac{mc^2}{\sqrt{1-|v|^2/c^2}}. \ \ \ \ \ (5)$

In particular, setting ${v=0}$ we see that the rest energy ${E(m,0)}$ of a body of mass ${m}$ is ${mc^2}$, as required.

Remark 1 The above derivation did not explicitly use the law of conservation of momentum (other than to observe that the scenario of one mass at rest splitting into two smaller masses moving in equal and opposite directions was compatible with this law). Actually, if one definesthe momentum ${p(m,v)}$ of a body of mass ${m}$ and velocity ${v}$ by the formula

$\displaystyle p(m,v) := \frac{mv}{\sqrt{1-|v|^2/c^2}}$

and the momentum of a system as the sum of the momenta of its components, one can use (5) and the Lorentz transformations to (after some algebra) express the total momentum of a system as a linear combination of the total energy of that system viewed in a couple reference frames (or, if one prefers, as the derivatives of the total energy with respect to infinitesimal reference frame changes), and as a consequence one can actually derive the law of conservation of momentum from the law of conservation of energy, together with special relativity. (Actually, this can also be done in Galilean relativity as well, using the classical formula ${E(m,v)=E(m,0) + \frac{1}{2} m |v|^2}$; we leave this as an exercise to the reader.) Indeed, in special relativity it is natural to unify energy and momentum together as a single quantity known as the four-momentum.

Remark 2 The above arguments ultimately rely on the fact that the Lorentz group ${SO(d,1)}$ has an essentially unique linear action on ${{\bf R}^{1+d}}$ when the spatial dimension ${d}$ is at least two. For ${d=1}$, the group ${SO(d,1)}$ becomes abelian, and there is a multiplicity of such actions (parameterised by the different possibilities for the quantity ${k}$ appearing in (4)), and one could a priorihave a number of different laws relating energy and momentum with mass and velocity that are consistent with special relativity and the conservation laws. Indeed, for any choice of ${k > 0}$, one could postulate the laws

$\displaystyle E(m,c \tanh \alpha) = \frac{m c^2}{k^2} \cosh(k \alpha)$

and

$\displaystyle p(m, c \tanh \alpha) = \frac{mc}{k} \sinh(k \alpha)$

for the energy and momentum of a body of mass ${m}$ moving at rapidity ${\alpha}$ (i.e. at velocity ${c\tanh \alpha}$). One can verify that such laws are consistent with the laws of conservation of mass and energy, with the postulates of special relativity, and with the Newtonian approximation, as long as one is only in one spatial dimension; one needs to use at least one other dimension to be able to reduce to the ${k=1}$ case. Thus we see that higher-dimensional relativity is more rigid than one-dimensional relativity. In the case of Einstein’s original argument, the quantum mechanical properties of photons are used instead to show that ${E/|p| \rightarrow c}$ in the lightspeed limit ${\alpha \rightarrow \infty}$, which gives the reduction to ${k=1}$.