You are currently browsing the monthly archive for December 2022.

This post is an unofficial sequel to one of my first blog posts from 2007, which was entitled “Quantum mechanics and Tomb Raider“.

One of the oldest and most famous allegories is Plato’s allegory of the cave. This allegory centers around a group of people chained to a wall in a cave that cannot see themselves or each other, but only the two-dimensional shadows of themselves cast on the wall in front of them by some light source they cannot directly see. Because of this, they identify reality with this two-dimensional representation, and have significant conceptual difficulties in trying to view themselves (or the world as a whole) as three-dimensional, until they are freed from the cave and able to venture into the sunlight.

There is a similar conceptual difficulty when trying to understand Einstein’s theory of special relativity (and more so for general relativity, but let us focus on special relativity for now). We are very much accustomed to thinking of reality as a three-dimensional space endowed with a Euclidean geometry that we traverse through in time, but in order to have the clearest view of the universe of special relativity it is better to think of reality instead as a four-dimensional spacetime that is endowed instead with a Minkowski geometry, which mathematically is similar to a (four-dimensional) Euclidean space but with a crucial change of sign in the underlying metric. Indeed, whereas the distance ${ds}$ between two points in Euclidean space ${{\bf R}^3}$ is given by the three-dimensional Pythagorean theorem

$\displaystyle ds^2 = dx^2 + dy^2 + dz^2$

under some standard Cartesian coordinate system ${(x,y,z)}$ of that space, and the distance ${ds}$ in a four-dimensional Euclidean space ${{\bf R}^4}$ would be similarly given by

$\displaystyle ds^2 = dx^2 + dy^2 + dz^2 + du^2$

under a standard four-dimensional Cartesian coordinate system ${(x,y,z,u)}$, the spacetime interval ${ds}$ in Minkowski space is given by

$\displaystyle ds^2 = dx^2 + dy^2 + dz^2 - c^2 dt^2$

(though in many texts the opposite sign convention ${ds^2 = -dx^2 -dy^2 - dz^2 + c^2dt^2}$ is preferred) in spacetime coordinates ${(x,y,z,t)}$, where ${c}$ is the speed of light. The geometry of Minkowski space is then quite similar algebraically to the geometry of Euclidean space (with the sign change replacing the traditional trigonometric functions ${\sin, \cos, \tan}$, etc. by their hyperbolic counterparts ${\sinh, \cosh, \tanh}$, and with various factors involving “${c}$” inserted in the formulae), but also has some qualitative differences to Euclidean space, most notably a causality structure connected to light cones that has no obvious counterpart in Euclidean space.

That said, the analogy between Minkowski space and four-dimensional Euclidean space is strong enough that it serves as a useful conceptual aid when first learning special relativity; for instance the excellent introductory text “Spacetime physics” by Taylor and Wheeler very much adopts this view. On the other hand, this analogy doesn’t directly address the conceptual problem mentioned earlier of viewing reality as a four-dimensional spacetime in the first place, rather than as a three-dimensional space that objects move around in as time progresses. Of course, part of the issue is that we aren’t good at directly visualizing four dimensions in the first place. This latter problem can at least be easily addressed by removing one or two spatial dimensions from this framework – and indeed many relativity texts start with the simplified setting of only having one spatial dimension, so that spacetime becomes two-dimensional and can be depicted with relative ease by spacetime diagrams – but still there is conceptual resistance to the idea of treating time as another spatial dimension, since we clearly cannot “move around” in time as freely as we can in space, nor do we seem able to easily “rotate” between the spatial and temporal axes, the way that we can between the three coordinate axes of Euclidean space.

With this in mind, I thought it might be worth attempting a Plato-type allegory to reconcile the spatial and spacetime views of reality, in a way that can be used to describe (analogues of) some of the less intuitive features of relativity, such as time dilation, length contraction, and the relativity of simultaneity. I have (somewhat whimsically) decided to place this allegory in a Tolkienesque fantasy world (similarly to how my previous allegory to describe quantum mechanics was phrased in a world based on the computer game “Tomb Raider”). This is something of an experiment, and (like any other analogy) the allegory will not be able to perfectly capture every aspect of the phenomenon it is trying to represent, so any feedback to improve the allegory would be appreciated.

If ${\lambda>0}$, a Poisson random variable ${{\bf Poisson}(\lambda)}$ with mean ${\lambda}$ is a random variable taking values in the natural numbers with probability distribution

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) = k) = e^{-\lambda} \frac{\lambda^k}{k!}.$

One is often interested in bounding upper tail probabilities

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u))$

for ${u \geq 0}$, or lower tail probabilities

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u))$

for ${-1 < u \leq 0}$. A standard tool for this is Bennett’s inequality:

Proposition 1 (Bennett’s inequality) One has

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp(-\lambda h(u))$

for ${u \geq 0}$ and

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \leq \exp(-\lambda h(u))$

for ${-1 < u \leq 0}$, where

$\displaystyle h(u) := (1+u) \log(1+u) - u.$

From the Taylor expansion ${h(u) = \frac{u^2}{2} + O(u^3)}$ for ${u=O(1)}$ we conclude Gaussian type tail bounds in the regime ${u = o(1)}$ (and in particular when ${u = O(1/\sqrt{\lambda})}$ (in the spirit of the Chernoff, Bernstein, and Hoeffding inequalities). but in the regime where ${u}$ is large and positive one obtains a slight gain over these other classical bounds (of ${\exp(- \lambda u \log u)}$ type, rather than ${\exp(-\lambda u)}$).

Proof: We use the exponential moment method. For any ${t \geq 0}$, we have from Markov’s inequality that

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq e^{-t \lambda(1+u)} {\bf E} \exp( t {\bf Poisson}(\lambda) ).$

A standard computation shows that the moment generating function of the Poisson distribution is given by

$\displaystyle \exp( t {\bf Poisson}(\lambda) ) = \exp( (e^t - 1) \lambda )$

and hence

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp( (e^t - 1)\lambda - t \lambda(1+u) ).$

For ${u \geq 0}$, it turns out that the right-hand side is optimized by setting ${t = \log(1+u)}$, in which case the right-hand side simplifies to ${\exp(-\lambda h(u))}$. This proves the first inequality; the second inequality is proven similarly (but now ${u}$ and ${t}$ are non-positive rather than non-negative). $\Box$

Remark 2 Bennett’s inequality also applies for (suitably normalized) sums of bounded independent random variables. In some cases there are direct comparison inequalities available to relate those variables to the Poisson case. For instance, suppose ${S = X_1 + \dots + X_n}$ is the sum of independent Boolean variables ${X_1,\dots,X_n \in \{0,1\}}$ of total mean ${\sum_{j=1}^n {\bf E} X_j = \lambda}$ and with ${\sup_i {\bf P}(X_i) \leq \varepsilon}$ for some ${0 < \varepsilon < 1}$. Then for any natural number ${k}$, we have

$\displaystyle {\bf P}(S=k) = \sum_{1 \leq i_1 < \dots < i_k \leq n} {\bf P}(X_{i_1}=1) \dots {\bf P}(X_{i_k}=1)$

$\displaystyle \prod_{i \neq i_1,\dots,i_k} {\bf P}(X_i=0)$

$\displaystyle \leq \frac{1}{k!} (\sum_{i=1}^n \frac{{\bf P}(X_i=1)}{{\bf P}(X_i=0)})^k \times \prod_{i=1}^n {\bf P}(X_i=0)$

$\displaystyle \leq \frac{1}{k!} (\frac{\lambda}{1-\varepsilon})^k \prod_{i=1}^n \exp( - {\bf P}(X_i = 1))$

$\displaystyle \leq e^{-\lambda} \frac{\lambda^k}{(1-\varepsilon)^k k!}$

$\displaystyle \leq e^{\frac{\varepsilon}{1-\varepsilon} \lambda} {\bf P}( \mathbf{Poisson}(\frac{\lambda}{1-\varepsilon}) = k).$

As such, for ${\varepsilon}$ small, one can efficiently control the tail probabilities of ${S}$ in terms of the tail probability of a Poisson random variable of mean close to ${\lambda}$; this is of course very closely related to the well known fact that the Poisson distribution emerges as the limit of sums of many independent boolean variables, each of which is non-zero with small probability. See this paper of Bentkus and this paper of Pinelis for some further useful (and less obvious) comparison inequalities of this type.

In this note I wanted to record the observation that one can improve the Bennett bound by a small polynomial factor once one leaves the Gaussian regime ${u = O(1/\sqrt{\lambda})}$, in particular gaining a factor of ${1/\sqrt{\lambda}}$ when ${u \sim 1}$. This observation is not difficult and is implicitly in the literature (one can extract it for instance from the much more general results of this paper of Talagrand, and the basic idea already appears in this paper of Glynn), but I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog. (But if a reader knows of a reference that basically contains the bound below, I would be happy to know of it.)

Proposition 3 (Improved Bennett’s inequality) One has

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}$

for ${u \geq 0}$ and

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}$

for ${-1 < u \leq 0}$.

Proof: We begin with the first inequality. We may assume that ${u \geq 1/\sqrt{\lambda}}$, since otherwise the claim follows from the usual Bennett inequality. We expand out the left-hand side as

$\displaystyle e^{-\lambda} \sum_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.$

Observe that for ${k \geq \lambda(1+u)}$ that

$\displaystyle \frac{\lambda^{k+1}}{(k+1)!} \leq \frac{1}{1+u} \frac{\lambda^{k}}{k!} .$

Thus the sum is dominated by the first term times a geometric series ${\sum_{j=0}^\infty \frac{1}{(1+u)^j} = 1 + \frac{1}{u}}$. We can thus bound the left-hand side by

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.$

By the Stirling approximation, this is

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{1}{\sqrt{k}} \frac{(e\lambda)^k}{k^k}.$

The expression inside the supremum is decreasing in ${k}$ for ${k > \lambda}$, thus we can bound it by

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},$

which simplifies to

$\displaystyle \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}$

after a routine calculation.

Now we turn to the second inequality. As before we may assume that ${u \leq -1/\sqrt{\lambda}}$. We first dispose of a degenerate case in which ${\lambda(1+u) < 1}$. Here the left-hand side is just

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) = 0 ) = e^{-\lambda}$

and the right-hand side is comparable to

$\displaystyle e^{-\lambda} \exp( - \lambda (1+u) \log (1+u) + \lambda(1+u) ) / \sqrt{\lambda(1+u)}.$

Since ${-\lambda(1+u) \log(1+u)}$ is negative and ${0 < \lambda(1+u) < 1}$, we see that the right-hand side is ${\gg e^{-\lambda}}$, and the estimate holds in this case.

It remains to consider the regime where ${u \leq -1/\sqrt{\lambda}}$ and ${\lambda(1+u) \geq 1}$. The left-hand side expands as

$\displaystyle e^{-\lambda} \sum_{k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.$

The sum is dominated by the first term times a geometric series ${\sum_{j=-\infty}^0 \frac{1}{(1+u)^j} = \frac{1}{|u|}}$. The maximal ${k}$ is comparable to ${\lambda(1+u)}$, so we can bound the left-hand side by

$\displaystyle \ll e^{-\lambda} \frac{1}{|u|} \sup_{\lambda(1+u) \ll k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.$

Using the Stirling approximation as before we can bound this by

$\displaystyle \ll e^{-\lambda} \frac{1}{|u|} \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},$

which simplifies to

$\displaystyle \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}$

after a routine calculation. $\Box$

The same analysis can be reversed to show that the bounds given above are basically sharp up to constants, at least when ${\lambda}$ (and ${\lambda(1+u)}$) are large.

[The following information was provided to me by Geordie Williamson, who is Director of the Sydney Mathematics Research Institute – T.]

We are currently advertising two positions in math and AI:

Both positions are for three years and are based at the Sydney Mathematical Research Institute. The positions are research only, but teaching at the University of Sydney is possible if desired. The successful candidate will have considerable time and flexibility to pursue their own research program.

We are after either:

1. excellent mathematicians with some interest in programming and modern AI;
2. excellent computer scientists with some interest and background in mathematics, as well as an interest in using AI to attack tough problems in mathematics.