Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation for the rest energy of a body with mass . (Throughout this post, mass is used to refer to the invariant mass (also known as *rest mass*) of an object.) This derivation used a number of physical assumptions, including the following:

- The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal in every such inertial frame.
- Planck’s law and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together.
- The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum).
- The Newtonian approximations , to energy and momentum at low velocities.

The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.

As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:

- The two postulates of special relativity;
- The law of conservation of energy (and the additivity of energy);
- The Newtonian approximation at low velocities.

The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous formula.

*Disclaimer:* As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on rigorous mathematical reasoning.)

** — 1. The main argument — **

We will assume that the total energy of a moving body depends only on the mass of that body, and the velocity of that body:

(This is actually a non-trivial assumption; it excludes the possibility that the energy might also be depenent on other features of the body, such as spin or charge.) At present, this functional relationship is arbitrary. However, we can use some physical arguments to constrain this relationship. We first use the following argument of Galileo. Consider two bodies side by side, traveling at the same velocity , with the first body of mass and the second of mass . Then, the first body has energy and the second has energy , so the combined system of two bodies has total energy . On the other hand, if we imagine connecting the two bodies by an infinitesimally thin thread, we can view the system as a single body of mass traveling at the same velocity . This leads us to the relationship

for any , which (under reasonable hypotheses of continuity) implies a linear relationship between energy and mass, thus

for some function depending only on the velocity .

We still have to determine this unknown functional relationship . We assume rotational symmetry of the laws of physics (which one can view as a special case of the first postulate of special relativity): if two bodies of equal mass move at the same speed, but at different directions, the energies should be the same. In other words, should be spherically symmetric, so by abuse of notation we write

Now consider a body of mass at rest at the origin(in some reference frame ), which somehow disintegrates (at time , for simplicity) into two smaller bodies of equal mass , one moving in the positive direction at some velocity , and the other moving in the negative direction at the opposite velocity (note that this situation is consistent with the law of conservation of momentum). (If one prefers, one could also view the time-reversed situation, in which two masses of equal and opposite velocity collide to form a large stationary mass; the analysis of this situation is basically identical to the one given here.) In Newtonian mechanics, we have conservation and additivity of mass, so that must equal ; but we will not assume conservation and additivity of mass here (and in fact at least one of these laws must break down in special relativity, at least if one insists on using an invariant notion of mass). Instead, we can link , , and to each other by the law of conservation of energy. Before the disintegration, the body has total energy , while after the disintegration the system has total energy (using the spherically symmetric nature (1)) of , and so

Now we view the same system relative to another reference frame , which relative to is moving at a velocity in the direction for some , while keeping the and coordinates unchanged. The spacetime coordinates of are then related to those of by the usual Lorentz transformations

which can be deduced from the postulates of special relativity by a standard derivation that we will not give here (it is sketched in the previous blog post). The pre-disintegration body is moving along the worldline in the reference frame, and is thus moving along the line in the reference frame; in particular, it has velocity in this frame and thus has energy in this frame.

Now consider the first post-disintegration body . It is moving along the worldline in the reference frame, and thus along the line in the reference frame; in particular, the speed of in this frame is (the well known velocity addition (or subtraction) formula), and so the energy of this body is . Similarly, has energy . Equating energies, we are thus led to

We can eliminate using (2), to obtain a functional equation for :

This equation should hold for all (physically attainable) velocities . To solve this equation, it is convenient to work with the change of variables

the hyperbolic angles are known as the rapiditiesassociated to and respectively. The point of using this change of variables is that the hyperbolic tangent addition formula yields

Thus if we make the change of variables

then (3)simplifies to

It is tempting to plug in some special values into this equation, such as , but this only gives a trivial equation. However, if we first differentiate twice in to obtain

and *then*set , we obtain the non-trivial equation

This is a differential equation in , and can be solved as

for some unknowns , where is the square root of . From (1), should have vanishing derivative at the origin, and so , and so we have

This is significant progress in constraining the behaviour of , but there are still two unknown parameters . To proceed further, it becomes necessary to utilise a second dimension. Namely, we repeat the previous arguments, but with now moving at velocity instead of . The Lorentz transformations are now

The pre-disintegration body is moving along the worldline in the reference frame, and is thus moving along the line in the reference frame; in particular, it has velocity in this frame and thus has energy in this frame.

Now consider the first post-disintegration body . It is moving along the worldline in the reference frame, and thus along the line in the reference frame; in particular, the speed of in this frame is , and so the energy of this body is . Similarly for . Equating energies, we are thus led to

We can eliminate using (2), to obtain a functional equation for :

This equation should hold for all (physically attainable) velocities . To solve this equation, we work with infinitesimal and perform a Taylor expansion. From the symmetry (2), should be flat at the origin, and so (assuming sufficient smoothness for ) we have

while from the Taylor approximation

we have

Inserting these expansions and extracting the coefficient, we obtain the differential equation

which we can rewrite as

for some constant . We can integrate this as

and thus

for some parameters . In rapidity coordinates , this becomes

Comparing this with (4)(e.g. by performing a Taylor expansion to fourth order around ) we see that , thus

or equivalently

Thus we have

For infinitesimal velocities , we may Taylor expand

and so the kinetic energy of a slowly moving mass is . Comparing this with the Newtonian approximation of we conclude that , and thus

In particular, setting we see that the rest energy of a body of mass is , as required.

Remark 1The above derivation did not explicitly use the law of conservation of momentum (other than to observe that the scenario of one mass at rest splitting into two smaller masses moving in equal and opposite directions was compatible with this law). Actually, if onedefinesthe momentum of a body of mass and velocity by the formulaand the momentum of a system as the sum of the momenta of its components, one can use (5) and the Lorentz transformations to (after some algebra) express the total momentum of a system as a linear combination of the total energy of that system viewed in a couple reference frames (or, if one prefers, as the derivatives of the total energy with respect to infinitesimal reference frame changes), and as a consequence one can actually

derivethe law of conservation of momentum from the law of conservation of energy, together with special relativity. (Actually, this can also be done in Galilean relativity as well, using the classical formula ; we leave this as an exercise to the reader.) Indeed, in special relativity it is natural to unify energy and momentum together as a single quantity known as the four-momentum.

Remark 2The above arguments ultimately rely on the fact that the Lorentz group has an essentially unique linear action on when the spatial dimension is at least two. For , the group becomes abelian, and there is a multiplicity of such actions (parameterised by the different possibilities for the quantity appearing in (4)), and one coulda priorihave a number of different laws relating energy and momentum with mass and velocity that are consistent with special relativity and the conservation laws. Indeed, for any choice of , one could postulate the lawsand

for the energy and momentum of a body of mass moving at rapidity (i.e. at velocity ). One can verify that such laws are consistent with the laws of conservation of mass and energy, with the postulates of special relativity, and with the Newtonian approximation, as long as one is only in one spatial dimension; one needs to use at least one other dimension to be able to reduce to the case. Thus we see that higher-dimensional relativity is more rigid than one-dimensional relativity. In the case of Einstein’s original argument, the quantum mechanical properties of photons are used instead to show that in the lightspeed limit , which gives the reduction to .

## 34 comments

Comments feed for this article

2 October, 2012 at 8:07 pm

AnonymousThank you Prof. Tao. It is a great post.

2 October, 2012 at 9:39 pm

Anonymousisn’t m missing in the Taylor’s expression for E(m,v)?

[Corrected, thanks - T.]3 October, 2012 at 12:27 am

mirceaIs Galileo’s reasoning about linearity of energy in mass ok with special relativity? In which reference frame do we compare the velocities?

3 October, 2012 at 1:41 pm

Terence TaoGalileo’s argument only requires a single frame of reference. (But thanks to the first postulate of special relativity, the relationship between energy, mass, and velocity should be the same in all reference frames.)

EDIT: Actually, on thinking about it a bit more, Galileo’s argument also needs the fact that the combined mass of two bodies traveling together at the same velocity is the sum of the masses of each individual mass. By working in a comoving reference frame, it suffices to assume this when both bodies are at rest, which is a plausible enough assumption, but it does require the use of the additional reference frame.

3 October, 2012 at 2:25 am

Bo JacobyIf a is a constant and x is a variable the function y=ax is conventionally not written y=xa. How come that E=mc^2 is not written E=c^2m ?

3 October, 2012 at 3:38 am

RnInstead of using the one-dimensional velocity-addition (or subtraction) formula on two separate parts of the proof, what would happen if the two-dimensional formula is used just once. Would that work or would it just complicate things and not even give you two conditions ?

3 October, 2012 at 7:58 am

Terence TaoYes, this approach would work too (and ultimately leads back to understanding the representation theory of SO(3,1), which I alluded to earlier).

3 October, 2012 at 3:42 am

RnAnother blog with special relativity stuff is:

http://thespectrumofriemannium.wordpress.com/

3 October, 2012 at 3:52 am

Number8In Newtonian mechanics, we have conservation of mass, so that M must equal m+m ; but we will not assume conservation of mass here (and in fact this law turns out to be false in special relativity). Its still true for special relativity if the system is closed, The energy given out in a collision still has mass w.r.t to the systems frame even if its in the form of photons.

3 October, 2012 at 7:57 am

Terence TaoAh, I realised that “conservation of mass” is a bit ambiguous. It’s really the conjunction of “conservation of mass” (total mass of a system remains constant), “additivity of mass” (mass of a system equals the sum of the masses of the components), and “invariance of mass” (mass is the same in every reference frame) which cannot simultaneously hold in special relativity. Depending on how exactly one defines mass, one can make two of these three properties true in SR, but not all three at once. I’ve reworded the text to reflect this. (Also I now realise, for similar reasons, that additivity of energy is implicitly being relied on rather crucially in the discussion.)

3 October, 2012 at 5:20 am

philhTypo? In the equation just before (3), Mf(w) = 2m(f(…) + f(…)), I don’t think the 2 should be there.

[Corrected, thanks - T.]3 October, 2012 at 6:36 am

updogNice, but I’m not going to pretend to understand any of that at 7am.

3 October, 2012 at 6:39 am

quantummoxieTom Moore has a similar derivation in his fantastic textbook, Six Ideas That Shaped Physics, Unit R: The Laws of Physics Are Frame-Independent, Second Edition (McGraw-Hill, 2003). It’s a bit simpler than yours, but maybe slightly more heuristic.

3 October, 2012 at 7:08 am

Marco MI have somewhat similar derivation in section 11 of my paper http://arxiv.org/abs/physics/0605204. There I do not use the Planck energy frequency relation, but use the relation between energy and momentum in a light wave, which was known before the theory of relativity or quantum mechanics.

16 October, 2012 at 7:04 am

mmoriconioops, sorry about this comment. I had no intention to promote any results. Just read the “about” section. My apologies if the post was inappropriate…

3 October, 2012 at 9:33 am

Hager El-Boghdady (@H_Sayed_M)It is a great post …

3 October, 2012 at 1:18 pm

teacherYou lost me right at equation (2). Why would we assume that the energies before and after the “disintegration” are equal in your scenario? You say that a body B of mass M “somehow disintegrates”… but in real life, massive bodies don’t split into two pieces flying off in equal and opposite directions without an energy input — some dynamite, perhaps. (Similarly, if you reverse time and have the two bodies collide, the energy of their motion will dissipate in one way or another, heat and light and sound, say.) So I’m not getting the physical intuition here that would lead me to equate the left and right sides in (2). Am I being too literal here, or missing some subtlety?

3 October, 2012 at 1:33 pm

Terence TaoWell, the body can certainly contain within it some energy source (like the dynamite you mention, which contains energy and hence mass within its chemical bonds), and the total mass of that body will then have to take that energy into consideration. In the context of two bodies colliding, the new body might acquire some additional energy (through heat, etc.), and again this would be reflected in the total mass of the new body.

Note at the elementary particle level that it is quite common for one particle to spontaneously disintegrate into two or more particles (e.g. a neutron can decay into a proton, electron, and neutrino, a radioactive nucleus can decay into a smaller nucleus and an alpha particle, and so forth). Implicit in the above analysis is the assumption that there are enough interactions of this form to span enough physically realisable values of mass, velocity, etc. that one can safely extrapolate to arbitrary values of these parameters. (By the way, the assumption that the initial body splits into two bodies of equal mass is mostly for mathematical convenience; a similar argument would also work for more complicated collisions in which some number of bodies of various masses came together and formed some other number of bodies, again of various masses, moving with different velocities, but the mathematical analysis of such collisions becomes messier.)

3 October, 2012 at 2:27 pm

teacherThanks for the clarification! I guess the problem is that I’m not automatically taking into account the equivalence of mass and energy – I mean, I understand it abstractly well enough (in a limited, undergraduate-physics sense), but it’s not intuitive to me when I’m picturing concrete physical objects, as in your scenario. (I know, of course, that it can be measured concretely under the right experimental conditions — e.g. the gases released by the dynamite, cooled down and collected, would weigh ever so slightly less than the dynamite itself — but the effect is so far from perceptible under ordinary conditions that it doesn’t translate into real intuition.)

I think that the problem is that I was looking to your derivation of the relationship between mass and energy precisely in order to *supply* some of that intuition. And I guess to do so would be a bit circular, since in order to accept equation (2), I need to understand that the total mass of the object must take into account the potential energy of the “dynamite,” which means that I have to accept mass-energy equivalence in the first place. Which I suppose is OK — you’re not trying to prove mass-energy equivalence, just to quantify it (E=mc^2) — is that a reasonable assessment?

3 October, 2012 at 2:49 pm

Terence TaoYes, by assuming the functional relationship E = E(m,v), one is implicitly assuming some sort of mass-energy equivalence, which one is then trying to quantify. As mentioned at the start of the derivation, this is a non-trivial assumption. (But given that kinetic energy, at least, depends only on mass and velocity in Newtonian mechanics, it is not too unreasonable of a hypothesis.)

At the elementary particle level, this sort of assumption becomes more plausible because the particles have almost no internal structure which could conceivably supply a non-mass source of energy.

3 October, 2012 at 6:14 pm

teacherGotcha! After reading the whole derivation through a couple of times I had a moment of sudden enlightenment – in which I, all at once, thoroughly understood, and more importantly felt intuition for, the whole story. Thanks for this post and for engaging with me here in the comments – I really learned something.

4 October, 2012 at 5:38 am

Wednesday/Thursday Highlights | Pseudo-Polymath[...] Einsteins most famous equation (relation?) revisited. [...]

4 October, 2012 at 5:40 am

Stones Cry Out - If they keep silent… » Things Heard: e240v3n4[...] Einsteins most famous equation (relation?) revisited. [...]

4 October, 2012 at 7:39 pm

AlekseyIn the part where you say that “The pre-disintegration body B is moving along the worldline in the O reference frame, and is thus moving along the line in the O’ reference frame” You are missing a 0 in the for one of the spacial coordinates.

[Corrected, thanks - T.]5 October, 2012 at 1:25 am

AnonymousReferring to second volume of Course of Theoretical Physics by Landau and Lifshitz, one can first find action principle for special relativity. Action must be invariant under lorentz transformations and integrand must be a differential of first order, so the integrand should be A*ds where A is a constant and ds = sqrt(c^2(dt)^2 – (dx)^2- (dy)^2 – (dz)^2). Limit in the low velocity should give the action for newton mechanics so A can be found. Then p = dL/dv and E = p.v – L .

5 October, 2012 at 3:10 am

chorasimilarityYou take the soul out of the craft. On the other side Landau & Lifshitz derivation is clear and physical (i.e. frame independent, like in geometry).

6 October, 2012 at 6:09 pm

Weekly Science Picks | Australian Science[...] We all know the equation. Perhaps a lot less can actually tell you what it means. And then there are those special few that can decipher it, mathematically speaking, that is. Terence Tao, a name befitting of a mathematician, revisits the famous equation. [...]

13 October, 2012 at 7:37 am

tubinhphamIt’s great. But remains forever true and beautiful.

Tkank Prof T. Tao

13 October, 2012 at 6:28 pm

Sujit NairIn Remark 1, you state that “…one can actually derive the law of conservation of momentum from the law of conservation of energy, together with special relativity.” I always think of conservation laws as a consequence of symmetry + action principle + Noether’s Theorem. I am curious to know if what you state in this remark is equivalent to the Noether approach.

14 October, 2012 at 7:54 am

Terence TaoWhen the least action principle is one of the laws of physics, yes; the fact that momentum conservation can be derived from energy conservation and Lorentz invariance is equivalent (via Noether’s theorem) to the fact that spatial translation symmetry can be derived from time translation symmetry and Lorentz invariance. But one could conceivably imagine alternate laws of physics in which there was no action principle, but for which Lorentz invariance and conservation of energy/momentum were still valid (but coming from some other source other than minimisation of a Lagrangian).

14 October, 2012 at 6:46 am

John JiangThis really fills in the missing part of Einstein’s layman treatment of SR, where he derives all the Lorentz spatial time transformations like a charm but fell short of giving a convincing (or at least accessible) argument for the energy mass relation. I like the mention of rigidity of SO(1,2) representations, but wonder if anything special occurs in d =3. I remember reading an article in scientific American years ago on that we are actually living in 2-d, like a holograph. Maybe there is some evidence in this derivation?

It also seems to me all the non-rigorous part of the derivation attributes to smoothness or continuity assumptions. Is that correct? I wonder if there is a formalism that clarifies the relation between regularity assumptions and physics derivation of the kind above.

15 October, 2012 at 1:56 am

E.L. WistyReblogged this on Pink Iguana.

16 November, 2012 at 2:16 am

AnonymousI got it…

20 November, 2012 at 8:42 pm

BobYou say that “it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on 100% rigorous mathematical reasoning”.

I think, in fact, that even tautological statements could not be asserted, for in the absence of any information about the physical world, we can’t assume that we can even use logic (of any kind) and rules of inference to say anything about the world. It could very well be that there are worlds where completely different logical systems apply (or worlds where no such systems apply at all), and where tautologies deduced using rules of reasoning appropriate to our particular world would not apply.