In this post I would like to collect a list of resources that are available to mathematicians displaced by conflict. Here are some general resources:

There are also resources specific to the current crisis:

Finally, there are a number of institutes and departments who are willing to extend visiting or adjunct positions to such displaced mathematicians:

If readers have other such resources to contribute (or to update the ones already listed), please do so in the comments and I will modify the above lists as appropriate.

As with the previous post, any purely political comment not focused on such resources will be considered off-topic and thus subject to deletion.

[Note: while I am chair of the ICM Structure Committee, this blog post is not an official request from this committee, as events are still moving too rapidly to proceed at present via normal committee deliberations. We are however discussing these matters and may issue a more formal request in due course. -T.]

The International Mathematical Union has just made the following announcement concerning the International Congress of Mathematicians (ICM) that was previously scheduled to be held in St. Petersburg, Russia in July.

Decision of the Executive Committee of the IMU on the upcoming ICM 2022 and IMU General Assembly

On 26 February 2022, the Executive Committee of the International Mathematical Union (IMU) decided that:

1. The International Congress of Mathematicians (ICM) 2022 will take place as a fully virtual event, hosted outside Russia but following the original time schedule planned for Saint Petersburg.
2. Participation in the virtual ICM event will be free of charge.
3. The IMU General Assembly (GA) will take place as an in-person event outside Russia.
4. A prize ceremony will be held the day after the IMU GA, at the same venue as the IMU GA, for the awarding of the 2022 IMU prizes.
5. The dates for the ICM and the GA will remain unaltered.
6. We will return with further practical information regarding the two events.

An expanded version of the announcement can be found here. (See also this addendum.)

While I am not on the IMU Executive Committee and thus not privy to their deliberations, I have been in contact with several members of this committee and I support their final decision on these matters.

As we have all experienced during the COVID-19 pandemic, virtual conferences can be rather variable in quality, but there certainly are ways to make the experience more positive for both the speakers and participants. In the interest of maximizing the benefits that this meeting can still produce, I would like to invite readers of this blog to share any experiences they have had with very large virtual conferences, and any opinions on what types of virtual events were effective and engaging.

One idea that has been suggested to me has been to have (either unofficial, semi-official, or official) regional ICM hosting events at various places worldwide where mathematicians could gather in person to view ICM talks that would be streamed online (and perhaps some ICM speakers from that area could give talks in person in such locations). This would be very nonstandard, of course, but could be one way to salvage some of the physical ICM experience, and perhaps also a way to symbolically support the spirit of the Congress. I would be interested to get some feedback on this proposal.

Finally, I would like to request that comments to this post remain focused on the upcoming virtual ICM. Broader political issues are very much worth discussing at present, but there are other venues for such discussion, and as per my usual blog policy any off-topic comments may be subject to deletion.

As I have mentioned in some recent posts, I am interested in exploring unconventional modalities for presenting mathematics, for instance using media with high production value. One such recent example of this I saw was a presentation of the fundamental zero product property (or domain property) of the real numbers – namely, that ab=0 implies a=0 or b=0 for real numbers a,b – expressed through the medium of German-language rap:

EDIT: and here is a lesson on fractions, expressed through the medium of a burger chain advertisement:

I’d be interested to know what further examples of this type are out there.

SECOND EDIT: The following two examples from Wired magazine are slightly more conventional in nature, but still worth mentioning, I think. Firstly, my colleague at UCLA, Amit Sahai, presents the concept of zero knowledge proofs at various levels of technicality:

Secondly, Moon Duchin answers math questions of all sorts from Twitter:

I’ve just uploaded to the arXiv my preprint “Perfectly packing a square by squares of nearly harmonic sidelength“. This paper concerns a variant of an old problem of Meir and Moser, who asks whether it is possible to perfectly pack squares of sidelength {1/n} for {n \geq 2} into a single square or rectangle of area {\sum_{n=2}^\infty \frac{1}{n^2} = \frac{\pi^2}{6} - 1}. (The following variant problem, also posed by Meir and Moser and discussed for instance in this MathOverflow post, is perhaps even more well known: is it possible to perfectly pack rectangles of dimensions {1/n \times 1/(n+1)} for {n \geq 1} into a single square of area {\sum_{n=1}^\infty \frac{1}{n(n+1)} = 1}?) For the purposes of this paper, rectangles and squares are understood to have sides parallel to the axes, and a packing is perfect if it partitions the region being packed up to sets of measure zero. As one partial result towards these problems, it was shown by Paulhus that squares of sidelength {1/n} for {n \geq 2} can be packed (not quite perfectly) into a single rectangle of area {\frac{\pi^2}{6} - 1 + \frac{1}{1244918662}}, and rectangles of dimensions {1/n \times 1/n+1} for {n \geq 1} can be packed (again not quite perfectly) into a single square of area {1 + \frac{1}{10^9+1}}. (Paulhus’s paper had some gaps in it, but these were subsequently repaired by Grzegorek and Januszewski.)

Another direction in which partial progress has been made is to consider instead the problem of packing squares of sidelength {n^{-t}}, {n \geq 1} perfectly into a square or rectangle of total area {\sum_{n=1}^\infty \frac{1}{n^{2t}}}, for some fixed constant {t > 1/2} (this lower bound is needed to make the total area {\sum_{n=1}^\infty \frac{1}{n^{2t}}} finite), with the aim being to get {t} as close to {1} as possible. Prior to this paper, the most recent advance in this direction was by Januszewski and Zielonka last year, who achieved such a packing in the range {1/2 < t \leq 2/3}.

In this paper we are able to get {t} arbitrarily close to {1} (which turns out to be a “critical” value of this parameter), but at the expense of deleting the first few tiles:

Theorem 1 If {1/2 < t < 1}, and {n_0} is sufficiently large depending on {t}, then one can pack squares of sidelength {n^{-t}}, {n \geq n_0} perfectly into a square of area {\sum_{n=n_0}^\infty \frac{1}{n^{2t}}}.

As in previous works, the general strategy is to execute a greedy algorithm, which can be described somewhat incompletely as follows.

  • Step 1: Suppose that one has already managed to perfectly pack a square {S} of area {\sum_{n=n_0}^\infty \frac{1}{n^{2t}}} by squares of sidelength {n^{-t}} for {n_0 \leq n < n_1}, together with a further finite collection {{\mathcal R}} of rectangles with disjoint interiors. (Initially, we would have {n_1=n_0} and {{\mathcal R} = \{S\}}, but these parameter will change over the course of the algorithm.)
  • Step 2: Amongst all the rectangles in {{\mathcal R}}, locate the rectangle {R} of the largest width (defined as the shorter of the two sidelengths of {R}).
  • Step 3: Pack (as efficiently as one can) squares of sidelength {n^{-t}} for {n_1 \leq n < n_2} into {R} for some {n_2>n_1}, and decompose the portion of {R} not covered by this packing into rectangles {{\mathcal R}'}.
  • Step 4: Replace {n_1} by {n_2}, replace {{\mathcal R}} by {({\mathcal R} \backslash \{R\}) \cup {\mathcal R}'}, and return to Step 1.

The main innovation of this paper is to perform Step 3 somewhat more efficiently than in previous papers.

The above algorithm can get stuck if one reaches a point where one has already packed squares of sidelength {1/n^t} for {n_0 \leq n < n_1}, but that all remaining rectangles {R} in {{\mathcal R}} have width less than {n_1^{-t}}, in which case there is no obvious way to fit in the next square. If we let {w(R)} and {h(R)} denote the width and height of these rectangles {R}, then the total area of the rectangles must be

\displaystyle  \sum_{R \in {\mathcal R}} w(R) h(R) = \sum_{n=n_0}^\infty \frac{1}{n^{2t}} - \sum_{n=n_0}^{n_1-1} \frac{1}{n^{2t}} \asymp n_1^{1-2t}

and the total perimeter {\mathrm{perim}({\mathcal R})} of these rectangles is

\displaystyle  \mathrm{perim}({\mathcal R}) = \sum_{R \in {\mathcal R}} 2(w(R)+h(R)) \asymp \sum_{R \in {\mathcal R}} h(R).

Thus we have

\displaystyle  n_1^{1-2t} \ll \mathrm{perim}({\mathcal R}) \sup_{R \in {\mathcal R}} w(R)

and so to ensure that there is at least one rectangle {R} with {w(R) \geq n_1^{-t}} it would be enough to have the perimeter bound

\displaystyle  \mathrm{perim}({\mathcal R}) \leq c n_1^{1-t}

for a sufficiently small constant {c>0}. It is here that we now see the critical nature of the exponent {t=1}: for {t<1}, the amount of perimeter we are permitted to have in the remaining rectangles increases as one progresses with the packing, but for {t=1} the amount of perimeter one is “budgeted” for stays constant (and for {t>1} the situation is even worse, in that the remaining rectangles {{\mathcal R}} should steadily decrease in total perimeter).

In comparison, the perimeter of the squares that one has already packed is equal to

\displaystyle  \sum_{n=n_0}^{n_1-1} 4 n^{-t}

which is comparable to {n_1^{1-t}} for {n_1} large (with the constants blowing up as {t} approaches the critical value of {1}). In previous algorithms, the total perimeter of the remainder rectangles {{\mathcal R}} was basically comparable to the perimeter of the squares already packed, and this is the main reason why the results only worked when {t} was sufficiently far away from {1}. In my paper, I am able to get the perimeter of {{\mathcal R}} significantly smaller than the perimeter of the squares already packed, by grouping those squares into lattice-like clusters (of about {M^2} squares arranged in an {M \times M} pattern), and sliding the squares in each cluster together to almost entirely eliminate the wasted space between each square, leaving only the space around the cluster as the main source of residual perimeter, which will be comparable to about {M n_1^{-t}} per cluster, as compared to the total perimeter of the squares in the cluster which is comparable to {M^2 n_1^{-t}}. This strategy is perhaps easiest to illustrate with a picture, in which {3 \times 4} squares {S_{i,j}} of slowly decreasing sidelength are packed together with relatively little wasted space:

By choosing the parameter {M} suitably large (and taking {n_0} sufficiently large depending on {M}), one can then prove the theorem. (In order to do some technical bookkeeping and to allow one to close an induction in the verification of the algorithm’s correctness, it is convenient to replace the perimeter {\sum_{R \in {\mathcal R}} 2(w(R)+h(R))} by a slightly weighted variant {\sum_{R \in {\mathcal R}} w(R)^\delta h(R)} for a small exponent {\delta}, but this is a somewhat artificial device that somewhat obscures the main ideas.)

About a year ago, I was contacted by Masterclass (a subscription-based online education company) on the possibility of producing a series of classes with the premise of explaining mathematical ways of thinking (such as reducing a complex problem to simpler sub-problems, abstracting out inessential aspects of a problem, or applying transforms or analogies to find new ways of thinking about a problem). After a lot of discussion and planning, as well as a film shoot over the summer, the series is now completed. As per their business model, the full lecture series is only available to subscribers of their platform, but the above link does contain a trailer and some sample content.

Asgar Jamneshan and myself have just uploaded to the arXiv our preprint “The inverse theorem for the {U^3} Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches“. This paper, which is a companion to another recent paper of ourselves and Or Shalom, studies the inverse theory for the third Gowers uniformity norm

\displaystyle  \| f \|_{U^3(G)}^8 = {\bf E}_{h_1,h_2,h_3,x \in G} \Delta_{h_1} \Delta_{h_2} \Delta_{h_3} f(x)

on an arbitrary finite abelian group {G}, where {\Delta_h f(x) := f(x+h) \overline{f(x)}} is the multiplicative derivative. Our main result is as follows:

Theorem 1 (Inverse theorem for {U^3(G)}) Let {G} be a finite abelian group, and let {f: G \rightarrow {\bf C}} be a {1}-bounded function with {\|f\|_{U^3(G)} \geq \eta} for some {0 < \eta \leq 1/2}. Then:
  • (i) (Correlation with locally quadratic phase) There exists a regular Bohr set {B(S,\rho) \subset G} with {|S| \ll \eta^{-O(1)}} and {\exp(-\eta^{-O(1)}) \ll \rho \leq 1/2}, a locally quadratic function {\phi: B(S,\rho) \rightarrow {\bf R}/{\bf Z}}, and a function {\xi: G \rightarrow \hat G} such that

    \displaystyle  {\bf E}_{x \in G} |{\bf E}_{h \in B(S,\rho)} f(x+h) e(-\phi(h)-\xi(x) \cdot h)| \gg \eta^{O(1)}.

  • (ii) (Correlation with nilsequence) There exists an explicit degree two filtered nilmanifold {H/\Lambda} of dimension {O(\eta^{-O(1)})}, a polynomial map {g: G \rightarrow H/\Lambda}, and a Lipschitz function {F: H/\Lambda \rightarrow {\bf C}} of constant {O(\exp(\eta^{-O(1)}))} such that

    \displaystyle  |{\bf E}_{x \in G} f(x) \overline{F}(g(x))| \gg \exp(-\eta^{-O(1)}).

Such a theorem was proven by Ben Green and myself in the case when {|G|} was odd, and by Samorodnitsky in the {2}-torsion case {G = {\bf F}_2^n}. In all cases one uses the “higher order Fourier analysis” techniques introduced by Gowers. After some now-standard manipulations (using for instance what is now known as the Balog-Szemerédi-Gowers lemma), one arrives (for arbitrary {G}) at an estimate that is roughly of the form

\displaystyle  |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) b(x,k) b(x,h) e(-B(h,k))| \gg \eta^{O(1)}

where {b} denotes various {1}-bounded functions whose exact values are not too important, and {B: B(S,\rho) \times B(S,\rho) \rightarrow {\bf R}/{\bf Z}} is a symmetric locally bilinear form. The idea is then to “integrate” this form by expressing it in the form

\displaystyle  B(h,k) = \phi(h+k) - \phi(h) - \phi(k) \ \ \ \ \ (1)

for some locally quadratic {\phi: B(S,\rho) \rightarrow {\bf C}}; this then allows us to write the above correlation as

\displaystyle  |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) e(-\phi(h+k)) b(x,k) b(x,h)| \gg \eta^{O(1)}

(after adjusting the {b} functions suitably), and one can now conclude part (i) of the above theorem using some linear Fourier analysis. Part (ii) follows by encoding locally quadratic phase functions as nilsequences; for this we adapt an algebraic construction of Manners.

So the key step is to obtain a representation of the form (1), possibly after shrinking the Bohr set {B(S,\rho)} a little if needed. This has been done in the literature in two ways:

  • When {|G|} is odd, one has the ability to divide by {2}, and on the set {2 \cdot B(S,\frac{\rho}{10}) = \{ 2x: x \in B(S,\frac{\rho}{10})\}} one can establish (1) with {\phi(h) := B(\frac{1}{2} h, h)}. (This is similar to how in single variable calculus the function {x \mapsto \frac{1}{2} x^2} is a function whose second derivative is equal to {1}.)
  • When {G = {\bf F}_2^n}, then after a change of basis one can take the Bohr set {B(S,\rho)} to be {{\bf F}_2^m} for some {m}, and the bilinear form can be written in coordinates as

    \displaystyle  B(h,k) = \sum_{1 \leq i,j \leq m} a_{ij} h_i k_j / 2 \hbox{ mod } 1

    for some {a_{ij} \in {\bf F}_2} with {a_{ij}=a_{ji}}. The diagonal terms {a_{ii}} cause a problem, but by subtracting off the rank one form {(\sum_{i=1}^m a_{ii} h_i) ((\sum_{i=1}^m a_{ii} k_i) / 2} one can write

    \displaystyle  B(h,k) = \sum_{1 \leq i,j \leq m} b_{ij} h_i k_j / 2 \hbox{ mod } 1

    on the orthogonal complement of {(a_{11},\dots,a_{mm})} for some coefficients {b_{ij}=b_{ji}} which now vanish on the diagonal: {b_{ii}=0}. One can now obtain (1) on this complement by taking

    \displaystyle  \phi(h) := \sum_{1 \leq i < j \leq m} b_{ij} h_i h_k / 2 \hbox{ mod } 1.

In our paper we can now treat the case of arbitrary finite abelian groups {G}, by means of the following two new ingredients:

  • (i) Using some geometry of numbers, we can lift the group {G} to a larger (possibly infinite, but still finitely generated) abelian group {G_S} with a projection map {\pi: G_S \rightarrow G}, and find a globally bilinear map {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} on the latter group, such that one has a representation

    \displaystyle  B(\pi(x), \pi(y)) = \tilde B(x,y) \ \ \ \ \ (2)

    of the locally bilinear form {B} by the globally bilinear form {\tilde B} when {x,y} are close enough to the origin.
  • (ii) Using an explicit construction, one can show that every globally bilinear map {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} has a representation of the form (1) for some globally quadratic function {\tilde \phi: G_S \rightarrow {\bf R}/{\bf Z}}.

To illustrate (i), consider the Bohr set {B(S,1/10) = \{ x \in {\bf Z}/N{\bf Z}: \|x/N\|_{{\bf R}/{\bf Z}} < 1/10\}} in {G = {\bf Z}/N{\bf Z}} (where {\|\|_{{\bf R}/{\bf Z}}} denotes the distance to the nearest integer), and consider a locally bilinear form {B: B(S,1/10) \times B(S,1/10) \rightarrow {\bf R}/{\bf Z}} of the form {B(x,y) = \alpha x y \hbox{ mod } 1} for some real number {\alpha} and all integers {x,y \in (-N/10,N/10)} (which we identify with elements of {G}. For generic {\alpha}, this form cannot be extended to a globally bilinear form on {G}; however if one lifts {G} to the finitely generated abelian group

\displaystyle  G_S := \{ (x,\theta) \in {\bf Z}/N{\bf Z} \times {\bf R}: \theta = x/N \hbox{ mod } 1 \}

(with projection map {\pi: (x,\theta) \mapsto x}) and introduces the globally bilinear form {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} by the formula

\displaystyle  \tilde B((x,\theta),(y,\sigma)) = N^2 \alpha \theta \sigma \hbox{ mod } 1

then one has (2) when {\theta,\sigma} lie in the interval {(-1/10,1/10)}. A similar construction works for higher rank Bohr sets.

To illustrate (ii), the key case turns out to be when {G_S} is a cyclic group {{\bf Z}/N{\bf Z}}, in which case {\tilde B} will take the form

\displaystyle  \tilde B(x,y) = \frac{axy}{N} \hbox{ mod } 1

for some integer {a}. One can then check by direct construction that (1) will be obeyed with

\displaystyle  \tilde \phi(x) = \frac{a \binom{x}{2}}{N} - \frac{a x \binom{N}{2}}{N^2} \hbox{ mod } 1

regardless of whether {N} is even or odd. A variant of this construction also works for {{\bf Z}}, and the general case follows from a short calculation verifying that the claim (ii) for any two groups {G_S, G'_S} implies the corresponding claim (ii) for the product {G_S \times G'_S}.

This concludes the Fourier-analytic proof of Theorem 1. In this paper we also give an ergodic theory proof of (a qualitative version of) Theorem 1(ii), using a correspondence principle argument adapted from this previous paper of Ziegler, and myself. Basically, the idea is to randomly generate a dynamical system on the group {G}, by selecting an infinite number of random shifts {g_1, g_2, \dots \in G}, which induces an action of the infinitely generated free abelian group {{\bf Z}^\omega = \bigcup_{n=1}^\infty {\bf Z}^n} on {G} by the formula

\displaystyle  T^h x := x + \sum_{i=1}^\infty h_i g_i.

Much as the law of large numbers ensures the almost sure convergence of Monte Carlo integration, one can show that this action is almost surely ergodic (after passing to a suitable Furstenberg-type limit {X} where the size of {G} goes to infinity), and that the dynamical Host-Kra-Gowers seminorms of that system coincide with the combinatorial Gowers norms of the original functions. One is then well placed to apply an inverse theorem for the third Host-Kra-Gowers seminorm {U^3(X)} for {{\bf Z}^\omega}-actions, which was accomplished in the companion paper to this one. After doing so, one almost gets the desired conclusion of Theorem 1(ii), except that after undoing the application of the Furstenberg correspondence principle, the map {g: G \rightarrow H/\Lambda} is merely an almost polynomial rather than a polynomial, which roughly speaking means that instead of certain derivatives of {g} vanishing, they instead are merely very small outside of a small exceptional set. To conclude we need to invoke a “stability of polynomials” result, which at this level of generality was first established by Candela and Szegedy (though we also provide an independent proof here in an appendix), which roughly speaking asserts that every approximate polynomial is close in measure to an actual polynomial. (This general strategy is also employed in the Candela-Szegedy paper, though in the absence of the ergodic inverse theorem input that we rely upon here, the conclusion is weaker in that the filtered nilmanifold {H/\Lambda} is replaced with a general space known as a “CFR nilspace”.)

This transference principle approach seems to work well for the higher step cases (for instance, the stability of polynomials result is known in arbitrary degree); the main difficulty is to establish a suitable higher step inverse theorem in the ergodic theory setting, which we hope to do in future research.

Asgar Jamneshan, Or Shalom, and myself have just uploaded to the arXiv our preprint “The structure of arbitrary Conze–Lesigne systems“. As the title suggests, this paper is devoted to the structural classification of Conze-Lesigne systems, which are a type of measure-preserving system that are “quadratic” or of “complexity two” in a certain technical sense, and are of importance in the theory of multiple recurrence. There are multiple ways to define such systems; here is one. Take a countable abelian group {\Gamma} acting in a measure-preserving fashion on a probability space {(X,\mu)}, thus each group element {\gamma \in \Gamma} gives rise to a measure-preserving map {T^\gamma: X \rightarrow X}. Define the third Gowers-Host-Kra seminorm {\|f\|_{U^3(X)}} of a function {f \in L^\infty(X)} via the formula

\displaystyle  \|f\|_{U^3(X)}^8 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2,h_3 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2,\omega_3 \in \{0,1\}}

\displaystyle {\mathcal C}^{\omega_1+\omega_2+\omega_3} f(T^{\omega_1 h_1 + \omega_2 h_2 + \omega_3 h_3} x)\ d\mu(x)

where {\Phi_n} is a Folner sequence for {\Gamma} and {{\mathcal C}: z \mapsto \overline{z}} is the complex conjugation map. One can show that this limit exists and is independent of the choice of Folner sequence, and that the {\| \|_{U^3(X)}} seminorm is indeed a seminorm. A Conze-Lesigne system is an ergodic measure-preserving system in which the {U^3(X)} seminorm is in fact a norm, thus {\|f\|_{U^3(X)}>0} whenever {f \in L^\infty(X)} is non-zero. Informally, this means that when one considers a generic parallelepiped in a Conze–Lesigne system {X}, the location of any vertex of that parallelepiped is more or less determined by the location of the other seven vertices. These are the important systems to understand in order to study “complexity two” patterns, such as arithmetic progressions of length four. While not all systems {X} are Conze-Lesigne systems, it turns out that they always have a maximal factor {Z^2(X)} that is a Conze-Lesigne system, known as the Conze-Lesigne factor or the second Host-Kra-Ziegler factor of the system, and this factor controls all the complexity two recurrence properties of the system.

The analogous theory in complexity one is well understood. Here, one replaces the {U^3(X)} norm by the {U^2(X)} norm

\displaystyle  \|f\|_{U^2(X)}^4 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2 \in \{0,1\}} {\mathcal C}^{\omega_1+\omega_2} f(T^{\omega_1 h_1 + \omega_2 h_2} x)\ d\mu(x)

and the ergodic systems for which {U^2} is a norm are called Kronecker systems. These systems are completely classified: a system is Kronecker if and only if it arises from a compact abelian group {Z} equipped with Haar probability measure and a translation action {T^\gamma \colon z \mapsto z + \phi(\gamma)} for some homomorphism {\phi: \Gamma \rightarrow Z} with dense image. Such systems can then be analyzed quite efficiently using the Fourier transform, and this can then be used to satisfactory analyze “complexity one” patterns, such as length three progressions, in arbitrary systems (or, when translated back to combinatorial settings, in arbitrary dense sets of abelian groups).

We return now to the complexity two setting. The most famous examples of Conze-Lesigne systems are (order two) nilsystems, in which the space {X} is a quotient {G/\Lambda} of a two-step nilpotent Lie group {G} by a lattice {\Lambda} (equipped with Haar probability measure), and the action is given by a translation {T^\gamma x = \phi(\gamma) x} for some group homomorphism {\phi: \Gamma \rightarrow G}. For instance, the Heisenberg {{\bf Z}}-nilsystem

\displaystyle  \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix} / \begin{pmatrix} 1 & {\bf Z} & {\bf Z} \\ 0 & 1 & {\bf Z} \\ 0 & 0 & 1 \end{pmatrix}

with a shift of the form

\displaystyle  Tx = \begin{pmatrix} 1 & \alpha & 0 \\ 0 & 1 & \beta \\ 0 & 0 & 1 \end{pmatrix} x

for {\alpha,\beta} two real numbers with {1,\alpha,\beta} linearly independent over {{\bf Q}}, is a Conze-Lesigne system. As the base case of a well known result of Host and Kra, it is shown in fact that all Conze-Lesigne {{\bf Z}}-systems are inverse limits of nilsystems (previous results in this direction were obtained by Conze-Lesigne, Furstenberg-Weiss, and others). Similar results are known for {\Gamma}-systems when {\Gamma} is finitely generated, thanks to the thesis work of Griesmer (with further proofs by Gutman-Lian and Candela-Szegedy). However, this is not the case once {\Gamma} is not finitely generated; as a recent example of Shalom shows, Conze-Lesigne systems need not be the inverse limit of nilsystems in this case.

Our main result is that even in the infinitely generated case, Conze-Lesigne systems are still inverse limits of a slight generalisation of the nilsystem concept, in which {G} is a locally compact Polish group rather than a Lie group:

Theorem 1 (Classification of Conze-Lesigne systems) Let {\Gamma} be a countable abelian group, and {X} an ergodic measure-preserving {\Gamma}-system. Then {X} is a Conze-Lesigne system if and only if it is the inverse limit of translational systems {G/\Lambda}, where {G} is a nilpotent locally compact Polish group of nilpotency class two, and {\Lambda} is a lattice in {G} (and also a lattice in the commutator group {[G,G]}), with {G/\Lambda} equipped with the Haar probability measure and a translation action {T^\gamma x = \phi(\gamma) x} for some homomorphism {\phi: \Gamma \rightarrow G}.

In a forthcoming companion paper to this one, Asgar Jamneshan and I will use this theorem to derive an inverse theorem for the Gowers norm {U^3(G)} for an arbitrary finite abelian group {G} (with no restrictions on the order of {G}, in particular our result handles the case of even and odd {|G|} in a unified fashion). In principle, having a higher order version of this theorem will similarly allow us to derive inverse theorems for {U^{s+1}(G)} norms for arbitrary {s} and finite abelian {G}; we hope to investigate this further in future work.

We sketch some of the main ideas used to prove the theorem. The existing machinery developed by Conze-Lesigne, Furstenberg-Weiss, Host-Kra, and others allows one to describe an arbitrary Conze-Lesigne system as a group extension {Z \rtimes_\rho K}, where {Z} is a Kronecker system (a rotational system on a compact abelian group {Z = (Z,+)} and translation action {\phi: \Gamma \rightarrow Z}), {K = (K,+)} is another compact abelian group, and the cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is a collection of measurable maps {\rho_\gamma: Z \rightarrow K} obeying the cocycle equation

\displaystyle  \rho_{\gamma_1+\gamma_2}(x) = \rho_{\gamma_1}(T^{\gamma_2} x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)

for almost all {x \in Z}. Furthermore, {\rho} is of “type two”, which means in this concrete setting that it obeys an additional equation

\displaystyle  \rho_\gamma(x + z_1 + z_2) - \rho_\gamma(x+z_1) - \rho_\gamma(x+z_2) + \rho_\gamma(x) \ \ \ \ \ (2)

\displaystyle  = F(x + \phi(\gamma), z_1, z_2) - F(x,z_1,z_2)

for all {\gamma \in \Gamma} and almost all {x,z_1,z_2 \in Z}, and some measurable function {F: Z^3 \rightarrow K}; roughly speaking this asserts that {\phi_\gamma} is “linear up to coboundaries”. For technical reasons it is also convenient to reduce to the case where {Z} is separable. The problem is that the equation (2) is unwieldy to work with. In the model case when the target group {K} is a circle {{\bf T} = {\bf R}/{\bf Z}}, one can use some Fourier analysis to convert (2) into the more tractable Conze-Lesigne equation

\displaystyle  \rho_\gamma(x+z) - \rho_\gamma(x) = F_z(x+\phi(\gamma)) - F_z(x) + c_z(\gamma) \ \ \ \ \ (3)

for all {\gamma \in \Gamma}, all {z \in Z}, and almost all {x \in Z}, where for each {z}, {F_z: Z \rightarrow K} is a measurable function, and {c_z: \Gamma \rightarrow K} is a homomorphism. (For technical reasons it is often also convenient to enforce that {F_z, c_z} depend in a measurable fashion on {z}; this can always be achieved, at least when the Conze-Lesigne system is separable, but actually verifying that this is possible actually requires a certain amount of effort, which we devote an appendix to in our paper.) It is not difficult to see that (3) implies (2) for any group {K} (as long as one has the measurability in {z} mentioned previously), but the converse turns out to fail for some groups {K}, such as solenoid groups (e.g., inverse limits of {{\bf R}/2^n{\bf Z}} as {n \rightarrow \infty}), as was essentially shown by Rudolph. However, in our paper we were able to find a separate argument that also derived the Conze-Lesigne equation in the case of a cyclic group {K = \frac{1}{N}{\bf Z}/{\bf Z}}. Putting together the {K={\bf T}} and {K = \frac{1}{N}{\bf Z}/{\bf Z}} cases, one can then derive the Conze-Lesigne equation for arbitrary compact abelian Lie groups {K} (as such groups are isomorphic to direct products of finitely many tori and cyclic groups). As has been known for some time (see e.g., this paper of Host and Kra), once one has a Conze-Lesigne equation, one can more or less describe the system {X} as a translational system {G/\Lambda}, where the Host-Kra group {G} is the set of all pairs {(z, F_z)} that solve an equation of the form (3) (with these pairs acting on {X \equiv Z \rtimes_\rho K} by the law {(z,F_z) \cdot (x,k) := (x+z, k+F_z(x))}), and {\Lambda} is the stabiliser of a point in this system. This then establishes the theorem in the case when {K} is a Lie group, and the general case basically comes from the fact (from Fourier analysis or the Peter-Weyl theorem) that an arbitrary compact abelian group is an inverse limit of Lie groups. (There is a technical issue here in that one has to check that the space of translational system factors of {X} form a directed set in order to have a genuine inverse limit, but this can be dealt with by modifications of the tools mentioned here.)

There is an additional technical issue worth pointing out here (which unfortunately was glossed over in some previous work in the area). Because the cocycle equation (1) and the Conze-Lesigne equation (3) are only valid almost everywhere instead of everywhere, the action of {G} on {X} is technically only a near-action rather than a genuine action, and as such one cannot directly define {\Lambda} to be the stabiliser of a point without running into multiple problems. To fix this, one has to pass to a topological model of {X} in which the action becomes continuous, and the stabilizer becomes well defined, although one then has to work a little more to check that the action is still transitive. This can be done via Gelfand duality; we proceed using a mixture of a construction from this book of Host and Kra, and the machinery in this recent paper of Asgar and myself.

Now we discuss how to establish the Conze-Lesigne equation (3) in the cyclic group case {K = \frac{1}{N}{\bf Z}/{\bf Z}}. As this group embeds into the torus {{\bf T}}, it is easy to use existing methods obtain (3) but with the homomorphism {c_z} and the function {F_z} taking values in {{\bf R}/{\bf Z}} rather than in {\frac{1}{N}{\bf Z}/{\bf Z}}. The main task is then to fix up the homomorphism {c_z} so that it takes values in {\frac{1}{N}{\bf Z}/{\bf Z}}, that is to say that {Nc_z} vanishes. This only needs to be done locally near the origin, because the claim is easy when {z} lies in the dense subgroup {\phi(\Gamma)} of {Z}, and also because the claim can be shown to be additive in {z}. Near the origin one can leverage the Steinhaus lemma to make {c_z} depend linearly (or more precisely, homomorphically) on {z}, and because the cocycle {\rho} already takes values in {\frac{1}{N}{\bf Z}/{\bf Z}}, {N\rho} vanishes and {Nc_z} must be an eigenvalue of the system {Z}. But as {Z} was assumed to be separable, there are only countably many eigenvalues, and by another application of Steinhaus and linearity one can then make {Nc_z} vanish on an open neighborhood of the identity, giving the claim.

As math educators, we often wish out loud that our students were more excited about mathematics. I finally came across a video that indicates what such a world might be like:

A popular way to visualise relationships between some finite number of sets is via Venn diagrams, or more generally Euler diagrams. In these diagrams, a set is depicted as a two-dimensional shape such as a disk or a rectangle, and the various Boolean relationships between these sets (e.g., that one set is contained in another, or that the intersection of two of the sets is equal to a third) is represented by the Boolean algebra of these shapes; Venn diagrams correspond to the case where the sets are in “general position” in the sense that all non-trivial Boolean combinations of the sets are non-empty. For instance to depict the general situation of two sets {A,B} together with their intersection {A \cap B} and {A \cup B} one might use a Venn diagram such as


(where we have given each region depicted a different color, and moved the edges of each region a little away from each other in order to make them all visible separately), but if one wanted to instead depict a situation in which the intersection {A \cap B} was empty, one could use an Euler diagram such as


One can use the area of various regions in a Venn or Euler diagram as a heuristic proxy for the cardinality {|A|} (or measure {\mu(A)}) of the set {A} corresponding to such a region. For instance, the above Venn diagram can be used to intuitively justify the inclusion-exclusion formula

\displaystyle  |A \cup B| = |A| + |B| - |A \cap B|

for finite sets {A,B}, while the above Euler diagram similarly justifies the special case

\displaystyle  |A \cup B| = |A| + |B|

for finite disjoint sets {A,B}.

While Venn and Euler diagrams are traditionally two-dimensional in nature, there is nothing preventing one from using one-dimensional diagrams such as


or even three-dimensional diagrams such as this one from Wikipedia:


Of course, in such cases one would use length or volume as a heuristic proxy for cardinality or measure, rather than area.

With the addition of arrows, Venn and Euler diagrams can also accommodate (to some extent) functions between sets. Here for instance is a depiction of a function {f: A \rightarrow B}, the image {f(A)} of that function, and the image {f(A')} of some subset {A'} of {A}:


Here one can illustrate surjectivity of {f: A \rightarrow B} by having {f(A)} fill out all of {B}; one can similarly illustrate injectivity of {f} by giving {f(A)} exactly the same shape (or at least the same area) as {A}. So here for instance might be how one would illustrate an injective function {f: A \rightarrow B}:


Cartesian product operations can be incorporated into these diagrams by appropriate combinations of one-dimensional and two-dimensional diagrams. Here for instance is a diagram that illustrates the identity {(A \cup B) \times C = (A \times C) \cup (B \times C)}:


In this blog post I would like to propose a similar family of diagrams to illustrate relationships between vector spaces (over a fixed base field {k}, such as the reals) or abelian groups, rather than sets. The categories of ({k}-)vector spaces and abelian groups are quite similar in many ways; the former consists of modules over a base field {k}, while the latter consists of modules over the integers {{\bf Z}}; also, both categories are basic examples of abelian categories. The notion of a dimension in a vector space is analogous in many ways to that of cardinality of a set; see this previous post for an instance of this analogy (in the context of Shannon entropy). (UPDATE: I have learned that an essentially identical notation has also been proposed in an unpublished manuscript of Ravi Vakil.)

Read the rest of this entry »

In everyday usage, we rely heavily on percentages to quantify probabilities and proportions: we might say that a prediction is {50\%} accurate or {80\%} accurate, that there is a {2\%} chance of dying from some disease, and so forth. However, for those without extensive mathematical training, it can sometimes be difficult to assess whether a given percentage amounts to a “good” or “bad” outcome, because this depends very much on the context of how the percentage is used. For instance:

  • (i) In a two-party election, an outcome of say {51\%} to {49\%} might be considered close, but {55\%} to {45\%} would probably be viewed as a convincing mandate, and {60\%} to {40\%} would likely be viewed as a landslide.
  • (ii) Similarly, if one were to poll an upcoming election, a poll of {51\%} to {49\%} would be too close to call, {55\%} to {45\%} would be an extremely favorable result for the candidate, and {60\%} to {40\%} would mean that it would be a major upset if the candidate lost the election.
  • (iii) On the other hand, a medical operation that only had a {51\%}, {55\%}, or {60\%} chance of success would be viewed as being incredibly risky, especially if failure meant death or permanent injury to the patient. Even an operation that was {90\%} or {95\%} likely to be non-fatal (i.e., a {10\%} or {5\%} chance of death) would not be conducted lightly.
  • (iv) A weather prediction of, say, {30\%} chance of rain during a vacation trip might be sufficient cause to pack an umbrella, even though it is more likely than not that rain would not occur. On the other hand, if the prediction was for an {80\%} chance of rain, and it ended up that the skies remained clear, this does not seriously damage the accuracy of the prediction – indeed, such an outcome would be expected in one out of every five such predictions.
  • (v) Even extremely tiny percentages of toxic chemicals in everyday products can be considered unacceptable. For instance, EPA rules require action to be taken when the percentage of lead in drinking water exceeds {0.0000015\%} (15 parts per billion). At the opposite extreme, recycling contamination rates as high as {10\%} are often considered acceptable.

Because of all the very different ways in which percentages could be used, I think it may make sense to propose an alternate system of units to measure one class of probabilities, namely the probabilities of avoiding some highly undesirable outcome, such as death, accident or illness. The units I propose are that of “nines“, which are already commonly used to measure availability of some service or purity of a material, but can be equally used to measure the safety (i.e., lack of risk) of some activity. Informally, nines measure how many consecutive appearances of the digit {9} are in the probability of successfully avoiding the negative outcome, thus

  • {90\%} success = one nine of safety
  • {99\%} success = two nines of safety
  • {99.9\%} success = three nines of safety
and so forth. Using the mathematical device of logarithms, one can also assign a fractional number of nines of safety to a general probability:

Definition 1 (Nines of safety) An activity (affecting one or more persons, over some given period of time) that has a probability {p} of the “safe” outcome and probability {1-p} of the “unsafe” outcome will have {k} nines of safety against the unsafe outcome, where {k} is defined by the formula

\displaystyle  k = -\log_{10}(1-p) \ \ \ \ \ (1)

(where {\log_{10}} is the logarithm to base ten), or equivalently

\displaystyle  p = 1 - 10^{-k}. \ \ \ \ \ (2)

Remark 2 Because of the various uncertainties in measuring probabilities, as well as the inaccuracies in some of the assumptions and approximations we will be making later, we will not attempt to measure the number of nines of safety beyond the first decimal point; thus we will round to the nearest tenth of a nine of safety throughout this post.

Here is a conversion table between percentage rates of success (the safe outcome), failure (the unsafe outcome), and the number of nines of safety one has:

Success rate {p} Failure rate {1-p} Number of nines {k}
{0\%} {100\%} {0.0}
{50\%} {50\%} {0.3}
{75\%} {25\%} {0.6}
{80\%} {20\%} {0.7}
{90\%} {10\%} {1.0}
{95\%} {5\%} {1.3}
{97.5\%} {2.5\%} {1.6}
{98\%} {2\%} {1.7}
{99\%} {1\%} {2.0}
{99.5\%} {0.5\%} {2.3}
{99.75\%} {0.25\%} {2.6}
{99.8\%} {0.2\%} {2.7}
{99.9\%} {0.1\%} {3.0}
{99.95\%} {0.05\%} {3.3}
{99.975\%} {0.025\%} {3.6}
{99.98\%} {0.02\%} {3.7}
{99.99\%} {0.01\%} {4.0}
{100\%} {0\%} infinite

Thus, if one has no nines of safety whatsoever, one is guaranteed to fail; but each nine of safety one has reduces the failure rate by a factor of {10}. In an ideal world, one would have infinitely many nines of safety against any risk, but in practice there are no {100\%} guarantees against failure, and so one can only expect a finite amount of nines of safety in any given situation. Realistically, one should thus aim to have as many nines of safety as one can reasonably expect to have, but not to demand an infinite amount.

Remark 3 The number of nines of safety against a certain risk is not absolute; it will depend not only on the risk itself, but (a) the number of people exposed to the risk, and (b) the length of time one is exposed to the risk. Exposing more people or increasing the duration of exposure will reduce the number of nines, and conversely exposing fewer people or reducing the duration will increase the number of nines; see Proposition 7 below for a rough rule of thumb in this regard.

Remark 4 Nines of safety are a logarithmic scale of measurement, rather than a linear scale. Other familiar examples of logarithmic scales of measurement include the Richter scale of earthquake magnitude, the pH scale of acidity, the decibel scale of sound level, octaves in music, and the magnitude scale for stars.

Remark 5 One way to think about nines of safety is via the Swiss cheese model that was created recently to describe pandemic risk management. In this model, each nine of safety can be thought of as a slice of Swiss cheese, with holes occupying {10\%} of that slice. Having {k} nines of safety is then analogous to standing behind {k} such slices of Swiss cheese. In order for a risk to actually impact you, it must pass through each of these {k} slices. A fractional nine of safety corresponds to a fractional slice of Swiss cheese that covers the amount of space given by the above table. For instance, {0.6} nines of safety corresponds to a fractional slice that covers about {75\%} of the given area (leaving {25\%} uncovered).

Now to give some real-world examples of nines of safety. Using data for deaths in the US in 2019 (without attempting to account for factors such as age and gender), a random US citizen will have had the following amount of safety from dying from some selected causes in that year:

Cause of death Mortality rate per {100,\! 000} (approx.) Nines of safety
All causes {870} {2.0}
Heart disease {200} {2.7}
Cancer {180} {2.7}
Accidents {52} {3.3}
Drug overdose {22} {3.7}
Influenza/Pneumonia {15} {3.8}
Suicide {14} {3.8}
Gun violence {12} {3.9}
Car accident {11} {4.0}
Murder {5} {4.3}
Airplane crash {0.14} {5.9}
Lightning strike {0.006} {7.2}

The safety of air travel is particularly remarkable: a given hour of flying in general aviation has a fatality rate of {0.00001}, or about {5} nines of safety, while for the major carriers the fatality rate drops down to {0.0000005}, or about {7.3} nines of safety.

Of course, in 2020, COVID-19 deaths became significant. In this year in the US, the mortality rate for COVID-19 (as the underlying or contributing cause of death) was {91.5} per {100,\! 000}, corresponding to {3.0} nines of safety, which was less safe than all other causes of death except for heart disease and cancer. At this time of writing, data for all of 2021 is of course not yet available, but it seems likely that the safety level would be even lower for this year.

Some further illustrations of the concept of nines of safety:

  • Each round of Russian roulette has a success rate of {5/6}, providing only {0.8} nines of safety. Of course, the safety will decrease with each additional round: one has only {0.5} nines of safety after two rounds, {0.4} nines after three rounds, and so forth. (See also Proposition 7 below.)
  • The ancient Roman punishment of decimation, by definition, provided exactly one nine of safety to each soldier being punished.
  • Rolling a {1} on a {20}-sided die is a risk that carries about {1.3} nines of safety.
  • Rolling a double one (“snake eyes“) from two six-sided dice carries about {1.6} nines of safety.
  • One has about {2.6} nines of safety against the risk of someone randomly guessing your birthday on the first attempt.
  • A null hypothesis has {1.3} nines of safety against producing a {p = 0.05} statistically significant result, and {2.0} nines against producing a {p=0.01} statistically significant result. (However, one has to be careful when reversing the conditional; a {p=0.01} statistically significant result does not necessarily have {2.0} nines of safety against the null hypothesis. In Bayesian statistics, the precise relationship between the two risks is given by Bayes’ theorem.)
  • If a poker opponent is dealt a five-card hand, one has {5.8} nines of safety against that opponent being dealt a royal flush, {4.8} against a straight flush or higher, {3.6} against four-of-a-kind or higher, {2.8} against a full house or higher, {2.4} against a flush or higher, {2.1} against a straight or higher, {1.5} against three-of-a-kind or higher, {1.1} against two pairs or higher, and just {0.3} against one pair or higher. (This data was converted from this Wikipedia table.)
  • A {k}-digit PIN number (or a {k}-digit combination lock) carries {k} nines of safety against each attempt to randomly guess the PIN. A length {k} password that allows for numbers, upper and lower case letters, and punctuation carries about {2k} nines of safety against a single guess. (For the reduction in safety caused by multiple guesses, see Proposition 7 below.)

Here is another way to think about nines of safety:

Proposition 6 (Nines of safety extend expected onset of risk) Suppose a certain risky activity has {k} nines of safety. If one repeatedly indulges in this activity until the risk occurs, then the expected number of trials before the risk occurs is {10^k}.

Proof: The probability that the risk is activated after exactly {n} trials is {(1-10^{-k})^{n-1} 10^{-k}}, which is a geometric distribution of parameter {10^{-k}}. The claim then follows from the standard properties of that distribution. \Box

Thus, for instance, if one performs some risky activity daily, then the expected length of time before the risk occurs is given by the following table:

Daily nines of safety Expected onset of risk
{0} One day
{0.8} One week
{1.5} One month
{2.6} One year
{2.9} Two years
{3.3} Five years
{3.6} Ten years
{3.9} Twenty years
{4.3} Fifty years
{4.6} A century

Or, if one wants to convert the yearly risks of dying from a specific cause into expected years before that cause of death would occur (assuming for sake of discussion that no other cause of death exists):

Yearly nines of safety Expected onset of risk
{0} One year
{0.3} Two years
{0.7} Five years
{1} Ten years
{1.3} Twenty years
{1.7} Fifty years
{2.0} A century

These tables suggest a relationship between the amount of safety one would have in a short timeframe, such as a day, and a longer time frame, such as a year. Here is an approximate formalisation of that relationship:

Proposition 7 (Repeated exposure reduces nines of safety) If a risky activity with {k} nines of safety is (independently) repeated {m} times, then (assuming {k} is large enough depending on {m}), the repeated activity will have approximately {k - \log_{10} m} nines of safety. Conversely: if the repeated activity has {k'} nines of safety, the individual activity will have approximately {k' + \log_{10} m} nines of safety.

Proof: An activity with {k} nines of safety will be safe with probability {1-10^{-k}}, hence safe with probability {(1-10^{-k})^m} if repeated independently {m} times. For {k} large, we can approximate

\displaystyle  (1 - 10^{-k})^m \approx 1 - m 10^{-k} = 1 - 10^{-(k - \log_{10} m)}

giving the former claim. The latter claim follows from inverting the former. \Box

Remark 8 The hypothesis of independence here is key. If there is a lot of correlation between the risks between different repetitions of the activity, then there can be much less reduction in safety caused by that repetition. As a simple example, suppose that {90\%} of a workforce are trained to perform some task flawlessly no matter how many times they repeat the task, but the remaining {10\%} are untrained and will always fail at that task. If one selects a random worker and asks them to perform the task, one has {1.0} nines of safety against the task failing. If one took that same random worker and asked them to perform the task {m} times, the above proposition might suggest that the number of nines of safety would drop to approximately {1.0 - \log_{10} m}; but in this case there is perfect correlation, and in fact the number of nines of safety remains steady at {1.0} since it is the same {10\%} of the workforce that would fail each time.

Because of this caveat, one should view the above proposition as only a crude first approximation that can be used as a simple rule of thumb, but should not be relied upon for more precise calculations.

One can repeat a risk either in time (extending the time of exposure to the risk, say from a day to a year), or in space (by exposing the risk to more people). The above proposition then gives an additive conversion law for nines of safety in either case. Here are some conversion tables for time:

From/to Daily Weekly Monthly Yearly
Daily 0 -0.8 -1.5 -2.6
Weekly +0.8 0 -0.6 -1.7
Monthly +1.5 +0.6 0 -1.1
Yearly +2.6 +1.7 +1.1 0

From/to Yearly Per 5 yr Per decade Per century
Yearly 0 -0.7 -1.0 -2.0
Per 5 yr +0.7 0 -0.3 -1.3
Per decade +1.0 + -0.3 0 -1.0
Per century +2.0 +1.3 +1.0 0

For instance, as mentioned before, the yearly amount of safety against cancer is about {2.7}. Using the above table (and making the somewhat unrealistic hypothesis of independence), we then predict the daily amount of safety against cancer to be about {2.7 + 2.6 = 5.3} nines, the weekly amount to be about {2.7 + 1.7 = 4.4} nines, and the amount of safety over five years to drop to about {2.7 - 0.7 = 2.0} nines.

Now we turn to conversions in space. If one knows the level of safety against a certain risk for an individual, and then one (independently) exposes a group of such individuals to that risk, then the reduction in nines of safety when considering the possibility that at least one group member experiences this risk is given by the following table:

Group Reduction in safety
You ({1} person) {0}
You and your partner ({2} people) {-0.3}
You and your parents ({3} people) {-0.5}
You, your partner, and three children ({5} people) {-0.7}
An extended family of {10} people {-1.0}
A class of {30} people {-1.5}
A workplace of {100} people {-2.0}
A school of {1,\! 000} people {-3.0}
A university of {10,\! 000} people {-4.0}
A town of {100,\! 000} people {-5.0}
A city of {1} million people {-6.0}
A state of {10} million people {-7.0}
A country of {100} million people {-8.0}
A continent of {1} billion people {-9.0}
The entire planet {-9.8}

For instance, in a given year (and making the somewhat implausible assumption of independence), you might have {2.7} nines of safety against cancer, but you and your partner collectively only have about {2.7 - 0.3 = 2.4} nines of safety against this risk, your family of five might only have about {2.7 - 0.7 = 2} nines of safety, and so forth. By the time one gets to a group of {1,\! 000} people, it actually becomes very likely that at least one member of the group will die of cancer in that year. (Here the precise conversion table breaks down, because a negative number of nines such as {2.7 - 3.0 = -0.3} is not possible, but one should interpret a prediction of a negative number of nines as an assertion that failure is very likely to happen. Also, in practice the reduction in safety is less than this rule predicts, due to correlations such as risk factors that are common to the group being considered that are incompatible with the assumption of independence.)

In the opposite direction, any reduction in exposure (either in time or space) to a risk will increase one’s safety level, as per the following table:

Reduction in exposure Additional nines of safety
{\div 1} {0}
{\div 2} {+0.3}
{\div 3} {+0.5}
{\div 5} {+0.7}
{\div 10} {+1.0}
{\div 100} {+2.0}

For instance, a five-fold reduction in exposure will reclaim about {0.7} additional nines of safety.

Here is a slightly different way to view nines of safety:

Proposition 9 Suppose that a group of {m} people are independently exposed to a given risk. If there are at most

\displaystyle  \log_{10} \frac{1}{1-2^{-1/m}}

nines of individual safety against that risk, then there is at least a {50\%} chance that one member of the group is affected by the risk.

Proof: If individually there are {k} nines of safety, then the probability that all the members of the group avoid the risk is {(1-10^{-k})^m}. Since the inequality

\displaystyle  (1-10^{-k})^m \leq \frac{1}{2}

is equivalent to

\displaystyle  k \leq \log_{10} \frac{1}{1-2^{-1/m}},

the claim follows. \Box

Thus, for a group to collectively avoid a risk with at least a {50\%} chance, one needs the following level of individual safety:

Group Individual safety level required
You ({1} person) {0.3}
You and your partner ({2} people) {0.5}
You and your parents ({3} people) {0.7}
You, your partner, and three children ({5} people) {0.9}
An extended family of {10} people {1.2}
A class of {30} people {1.6}
A workplace of {100} people {2.2}
A school of {1,\! 000} people {3.2}
A university of {10,\! 000} people {4.2}
A town of {100,\! 000} people {5.2}
A city of {1} million people {6.2}
A state of {10} million people {7.2}
A country of {100} million people {8.2}
A continent of {1} billion people {9.2}
The entire planet {10.0}

For large {m}, the level {k} of nines of individual safety required to protect a group of size {m} with probability at least {50\%} is approximately {\log_{10} \frac{m}{\ln 2} \approx (\log_{10} m) + 0.2}.

Precautions that can work to prevent a certain risk from occurring will add additional nines of safety against that risk, even if the precaution is not {100\%} effective. Here is the precise rule:

Proposition 10 (Precautions add nines of safety) Suppose an activity carries {k} nines of safety against a certain risk, and a separate precaution can independently protect against that risk with {l} nines of safety (that is to say, the probability that the protection is effective is {1 - 10^{-l}}). Then applying that precaution increases the number of nines in the activity from {k} to {k+l}.

Proof: The probability that the precaution fails and the risk then occurs is {10^{-l} \times 10^{-k} = 10^{-(k+l)}}. The claim now follows from Definition 1. \Box

In particular, we can repurpose the table at the start of this post as a conversion chart for effectiveness of a precaution:

Effectiveness Failure rate Additional nines provided
{0\%} {100\%} {+0.0}
{50\%} {50\%} {+0.3}
{75\%} {25\%} {+0.6}
{80\%} {20\%} {+0.7}
{90\%} {10\%} {+1.0}
{95\%} {5\%} {+1.3}
{97.5\%} {2.5\%} {+1.6}
{98\%} {2\%} {+1.7}
{99\%} {1\%} {+2.0}
{99.5\%} {0.5\%} {+2.3}
{99.75\%} {0.25\%} {+2.6}
{99.8\%} {0.2\%} {+2.7}
{99.9\%} {0.1\%} {+3.0}
{99.95\%} {0.05\%} {+3.3}
{99.975\%} {0.025\%} {+3.6}
{99.98\%} {0.02\%} {+3.7}
{99.99\%} {0.01\%} {+4.0}
{100\%} {0\%} infinite

Thus for instance a precaution that is {80\%} effective will add {0.7} nines of safety, a precaution that is {99.8\%} effective will add {2.7} nines of safety, and so forth. The mRNA COVID vaccines by Pfizer and Moderna have somewhere between {88\% - 96\%} effectiveness against symptomatic COVID illness, providing about {0.9-1.4} nines of safety against that risk, and over {95\%} effectiveness against severe illness, thus adding at least {1.3} nines of safety in this regard.

A slight variant of the above rule can be stated using the concept of relative risk:

Proposition 11 (Relative risk and nines of safety) Suppose an activity carries {k} nines of safety against a certain risk, and an action multiplies the chance of failure by some relative risk {R}. Then the action removes {\log_{10} R} nines of safety (if {R > 1}) or adds {-\log_{10} R} nines of safety (if {R<1}) to the original activity.

Proof: The additional action adjusts the probability of failure from {10^{-k}} to {R \times 10^{-k} = 10^{-(k - \log_{10} R)}}. The claim now follows from Definition 1. \Box

Here is a conversion chart between relative risk and change in nines of safety:

Relative risk Change in nines of safety
{0.01} {+2.0}
{0.02} {+1.7}
{0.05} {+1.3}
{0.1} {+1.0}
{0.2} {+0.7}
{0.5} {+0.3}
{1} {0}
{2} {-0.3}
{5} {-0.7}
{10} {-1.0}
{20} {-1.3}
{50} {-1.7}
{100} {-2.0}

Some examples:

  • Smoking increases the fatality rate of lung cancer by a factor of about {20}, thus removing about {1.3} nines of safety from this particular risk; it also increases the fatality rates of several other diseases, though not quite as dramatically an extent.
  • Seatbelts reduce the fatality rate in car accidents by a factor of about two, adding about {0.3} nines of safety. Airbags achieve a reduction of about {30-50\%}, adding about {0.2-0.3} additional nines of safety.
  • As far as transmission of COVID is concerned, it seems that constant use of face masks reduces transmission by a factor of about five (thus adding about {0.7} nines of safety), and similarly for constant adherence to social distancing; whereas for instance a {30\%} compliance with mask usage reduced transmission by about {10\%} (adding only {0.05} or so nines of safety).

The effect of combining multiple (independent) precautions together is cumulative; one can achieve quite a high level of safety by stacking together several precautions that individually have relatively low levels of effectiveness. Again, see the “swiss cheese model” referred to in Remark 5. For instance, if face masks add {0.7} nines of safety against contracting COVID, social distancing adds another {0.7} nines, and the vaccine provide another {1.0} nine of safety, implementing all three mitigation methods would (assuming independence) add a net of {2.4} nines of safety against contracting COVID.

In summary, when debating the value of a given risk mitigation measure, the correct question to ask is not quite “Is it certain to work” or “Can it fail?”, but rather “How many extra nines of safety does it add?”.

As one final comparison between nines of safety and other standard risk measures, we give the following proposition regarding large deviations from the mean.

Proposition 12 Let {X} be a normally distributed random variable of standard deviation {\sigma}, and let {\lambda > 0}. Then the “one-sided risk” of {X} exceeding its mean {{\bf E} X} by at least {\lambda \sigma} (i.e., {X \geq {\bf E} X + \lambda \sigma}) carries

\displaystyle  -\log_{10} \frac{1 - \mathrm{erf}(\lambda/\sqrt{2})}{2}

nines of safety, the “two-sided risk” of {X} deviating (in either direction) from its mean by at least {\lambda \sigma} (i.e., {|X-{\bf E} X| \geq \lambda \sigma}) carries

\displaystyle  -\log_{10} (1 - \mathrm{erf}(\lambda/\sqrt{2}))

nines of safety, where {\mathrm{erf}} is the error function.

Proof: This is a routine calculation using the cumulative distribution function of the normal distribution. \Box

Here is a short table illustrating this proposition:

Number {\lambda} of deviations from the mean One-sided nines of safety Two-sided nines of safety
{0} {0.3} {0.0}
{1} {0.8} {0.5}
{2} {1.6} {1.3}
{3} {2.9} {2.6}
{4} {4.5} {4.2}
{5} {6.5} {6.2}
{6} {9.0} {8.7}

Thus, for instance, the risk of a five sigma event (deviating by more than five standard deviations from the mean in either direction) should carry {6.2} nines of safety assuming a normal distribution, and so one would ordinarily feel extremely safe against the possibility of such an event, unless one started doing hundreds of thousands of trials. (However, we caution that this conclusion relies heavily on the assumption that one has a normal distribution!)

See also this older essay I wrote on anonymity on the internet, using bits as a measure of anonymity in much the same way that nines are used here as a measure of safety.