You are currently browsing the category archive for the ‘Mathematics’ category.

In orthodox first-order logic, variables and expressions are only allowed to take one value at a time; a variable {x}, for instance, is not allowed to equal {+3} and {-3} simultaneously. We will call such variables completely specified. If one really wants to deal with multiple values of objects simultaneously, one is encouraged to use the language of set theory and/or logical quantifiers to do so.

However, the ability to allow expressions to become only partially specified is undeniably convenient, and also rather intuitive. A classic example here is that of the quadratic formula:

\displaystyle  \hbox{If } x,a,b,c \in {\bf R} \hbox{ with } a \neq 0, \hbox{ then }

\displaystyle  ax^2+bx+c=0 \hbox{ if and only if } x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}. \ \ \ \ \ (1)

Strictly speaking, the expression {x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}} is not well-formed according to the grammar of first-order logic; one should instead use something like

\displaystyle x = \frac{-b - \sqrt{b^2-4ac}}{2a} \hbox{ or } x = \frac{-b + \sqrt{b^2-4ac}}{2a}

or

\displaystyle x \in \left\{ \frac{-b - \sqrt{b^2-4ac}}{2a}, \frac{-b + \sqrt{b^2-4ac}}{2a} \right\}

or

\displaystyle x = \frac{-b + \epsilon \sqrt{b^2-4ac}}{2a} \hbox{ for some } \epsilon \in \{-1,+1\}

in order to strictly adhere to this grammar. But none of these three reformulations are as compact or as conceptually clear as the original one. In a similar spirit, a mathematical English sentence such as

\displaystyle  \hbox{The sum of two odd numbers is an even number} \ \ \ \ \ (2)

is also not a first-order sentence; one would instead have to write something like

\displaystyle  \hbox{For all odd numbers } x, y, \hbox{ the number } x+y \hbox{ is even} \ \ \ \ \ (3)

or

\displaystyle  \hbox{For all odd numbers } x,y \hbox{ there exists an even number } z \ \ \ \ \ (4)

\displaystyle  \hbox{ such that } x+y=z

instead. These reformulations are not all that hard to decipher, but they do have the aesthetically displeasing effect of cluttering an argument with temporary variables such as {x,y,z} which are used once and then discarded.

Another example of partially specified notation is the innocuous {\ldots} notation. For instance, the assertion

\displaystyle \pi=3.14\ldots,

when written formally using first-order logic, would become something like

\displaystyle \pi = 3 + \frac{1}{10} + \frac{4}{10^2} + \sum_{n=3}^\infty \frac{a_n}{10^n} \hbox{ for some sequence } (a_n)_{n=3}^\infty

\displaystyle  \hbox{ with } a_n \in \{0,1,2,3,4,5,6,7,8,9\} \hbox{ for all } n,

which is not exactly an elegant reformulation. Similarly with statements such as

\displaystyle \tan x = x + \frac{x^3}{3} + \ldots \hbox{ for } |x| < \pi/2

or

\displaystyle \tan x = x + \frac{x^3}{3} + O(|x|^5) \hbox{ for } |x| < \pi/2.

Below the fold I’ll try to assign a formal meaning to partially specified expressions such as (1), for instance allowing one to condense (2), (3), (4) to just

\displaystyle  \hbox{odd} + \hbox{odd} = \hbox{even}.

When combined with another common (but often implicit) extension of first-order logic, namely the ability to reason using ambient parameters, we become able to formally introduce asymptotic notation such as the big-O notation {O()} or the little-o notation {o()}. We will explain how to do this at the end of this post.

Read the rest of this entry »

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.

Just a brief update to the previous post. Gerhard Paseman and I have now set up a web site for the Short Communication Satellite (SCS) for the virtual International Congress of Mathematicians (ICM), which will be an experimental independent online satellite event in which short communications on topics relevant to one or two of the sections of the ICM can be submitted, reviewed by peers, and (if appropriate for the SCS event) displayed in a virtual “poster room” during the Congress on July 6-14 (which, by the way, has recently released its schedule and list of speakers). Our plan is to open the registration for this event on April 5, and start taking submissions on April 20; we are also currently accepting any expressions of interest in helping out with the event, for instance by serving as a reviewer. For more information about the event, please see the overview page, the guidelines page, and the FAQ page of the web site. As viewers will see, the web site is still somewhat under construction, but will be updated as we move closer to the actual Congress.

The comments section of this post would be a suitable place to ask further questions about this event, or give any additional feedback.

UPDATE: for readers who have difficulty accessing the links above, here are backup copies of the overview page and guidelines page.


Jan Grebik, Rachel Greenfeld, Vaclav Rozhon and I have just uploaded to the arXiv our preprint “Measurable tilings by abelian group actions“. This paper is related to an earlier paper of Rachel Greenfeld and myself concerning tilings of lattices {{\bf Z}^d}, but now we consider the more general situation of tiling a measure space {X} by a tile {A \subset X} shifted by a finite subset {F} of shifts of an abelian group {G = (G,+)} that acts in a measure-preserving (or at least quasi-measure-preserving) fashion on {X}. For instance, {X} could be a torus {{\bf T}^d = {\bf R}^d/{\bf Z}^d}, {A} could be a positive measure subset of that torus, and {G} could be the group {{\bf R}^d}, acting on {X} by translation.


If {F} is a finite subset of {G} with the property that the translates {f+A}, {f \in F} of {A \subset X} partition {X} up to null sets, we write {F \oplus A =_{a.e.} X}, and refer to this as a measurable tiling of {X} by {A} (with tiling set {F}). For instance, if {X} is the torus {{\bf T}^2}, we can create a measurable tiling with {A = [0,1/2]^2 \hbox{ mod } {\bf Z}^2} and {F = \{0,1/2\}^2}. Our main results are the following:

  • By modifying arguments from previous papers (including the one with Greenfeld mentioned above), we can establish the following “dilation lemma”: a measurable tiling {F \oplus A =_{a.e.} X} automatically implies further measurable tilings {rF \oplus A =_{a.e.} X}, whenever {r} is an integer coprime to all primes up to the cardinality {\# F} of {F}.
  • By averaging the above dilation lemma, we can also establish a “structure theorem” that decomposes the indicator function {1_A} of {A} into components, each of which are invariant with respect to a certain shift in {G}. We can establish this theorem in the case of measure-preserving actions on probability spaces via the ergodic theorem, but one can also generalize to other settings by using the device of “measurable medial means” (which relates to the concept of a universally measurable set).
  • By applying this structure theorem, we can show that all measurable tilings {F \oplus A = {\bf T}^1} of the one-dimensional torus {{\bf T}^1} are rational, in the sense that {F} lies in a coset of the rationals {{\bf Q} = {\bf Q}^1}. This answers a recent conjecture of Conley, Grebik, and Pikhurko; we also give an alternate proof of this conjecture using some previous results of Lagarias and Wang.
  • For tilings {F \oplus A = {\bf T}^d} of higher-dimensional tori, the tiling need not be rational. However, we can show that we can “slide” the tiling to be rational by giving each translate {f + A} of {A} a “velocity” {v_f \in {\bf R}^d}, and for every time {t}, the translates {f + tv_f + A} still form a partition of {{\bf T}^d} modulo null sets, and at time {t=1} the tiling becomes rational. In particular, if a set {A} can tile a torus in an irrational fashion, then it must also be able to tile the torus in a rational fashion.
  • In the two-dimensional case {d=2} one can arrange matters so that all the velocities {v_f} are parallel. If we furthermore assume that the tile {A} is connected, we can also show that the union of all the translates {f+A} with a common velocity {v_f = v} form a {v}-invariant subset of the torus.
  • Finally, we show that tilings {F \oplus A = {\bf Z}^d \times G} of a finitely generated discrete group {{\bf Z}^d \times G}, with {G} a finite group, cannot be constructed in a “local” fashion (we formalize this probabilistically using the notion of a “factor of iid process”) unless the tile {F} is contained in a single coset of {\{0\} \times G}. (Nonabelian local tilings, for instance of the sphere by rotations, are of interest due to connections with the Banach-Tarski paradox; see the aforementioned paper of Conley, Grebik, and Pikhurko. Unfortunately, our methods seem to break down completely in the nonabelian case.)

As I have mentioned in some recent posts, I am interested in exploring unconventional modalities for presenting mathematics, for instance using media with high production value. One such recent example of this I saw was a presentation of the fundamental zero product property (or domain property) of the real numbers – namely, that ab=0 implies a=0 or b=0 for real numbers a,b – expressed through the medium of German-language rap:

EDIT: and here is a lesson on fractions, expressed through the medium of a burger chain advertisement:

I’d be interested to know what further examples of this type are out there.

SECOND EDIT: The following two examples from Wired magazine are slightly more conventional in nature, but still worth mentioning, I think. Firstly, my colleague at UCLA, Amit Sahai, presents the concept of zero knowledge proofs at various levels of technicality:

Secondly, Moon Duchin answers math questions of all sorts from Twitter:

I’ve just uploaded to the arXiv my preprint “Perfectly packing a square by squares of nearly harmonic sidelength“. This paper concerns a variant of an old problem of Meir and Moser, who asks whether it is possible to perfectly pack squares of sidelength {1/n} for {n \geq 2} into a single square or rectangle of area {\sum_{n=2}^\infty \frac{1}{n^2} = \frac{\pi^2}{6} - 1}. (The following variant problem, also posed by Meir and Moser and discussed for instance in this MathOverflow post, is perhaps even more well known: is it possible to perfectly pack rectangles of dimensions {1/n \times 1/(n+1)} for {n \geq 1} into a single square of area {\sum_{n=1}^\infty \frac{1}{n(n+1)} = 1}?) For the purposes of this paper, rectangles and squares are understood to have sides parallel to the axes, and a packing is perfect if it partitions the region being packed up to sets of measure zero. As one partial result towards these problems, it was shown by Paulhus that squares of sidelength {1/n} for {n \geq 2} can be packed (not quite perfectly) into a single rectangle of area {\frac{\pi^2}{6} - 1 + \frac{1}{1244918662}}, and rectangles of dimensions {1/n \times 1/n+1} for {n \geq 1} can be packed (again not quite perfectly) into a single square of area {1 + \frac{1}{10^9+1}}. (Paulhus’s paper had some gaps in it, but these were subsequently repaired by Grzegorek and Januszewski.)

Another direction in which partial progress has been made is to consider instead the problem of packing squares of sidelength {n^{-t}}, {n \geq 1} perfectly into a square or rectangle of total area {\sum_{n=1}^\infty \frac{1}{n^{2t}}}, for some fixed constant {t > 1/2} (this lower bound is needed to make the total area {\sum_{n=1}^\infty \frac{1}{n^{2t}}} finite), with the aim being to get {t} as close to {1} as possible. Prior to this paper, the most recent advance in this direction was by Januszewski and Zielonka last year, who achieved such a packing in the range {1/2 < t \leq 2/3}.

In this paper we are able to get {t} arbitrarily close to {1} (which turns out to be a “critical” value of this parameter), but at the expense of deleting the first few tiles:

Theorem 1 If {1/2 < t < 1}, and {n_0} is sufficiently large depending on {t}, then one can pack squares of sidelength {n^{-t}}, {n \geq n_0} perfectly into a square of area {\sum_{n=n_0}^\infty \frac{1}{n^{2t}}}.

As in previous works, the general strategy is to execute a greedy algorithm, which can be described somewhat incompletely as follows.

  • Step 1: Suppose that one has already managed to perfectly pack a square {S} of area {\sum_{n=n_0}^\infty \frac{1}{n^{2t}}} by squares of sidelength {n^{-t}} for {n_0 \leq n < n_1}, together with a further finite collection {{\mathcal R}} of rectangles with disjoint interiors. (Initially, we would have {n_1=n_0} and {{\mathcal R} = \{S\}}, but these parameter will change over the course of the algorithm.)
  • Step 2: Amongst all the rectangles in {{\mathcal R}}, locate the rectangle {R} of the largest width (defined as the shorter of the two sidelengths of {R}).
  • Step 3: Pack (as efficiently as one can) squares of sidelength {n^{-t}} for {n_1 \leq n < n_2} into {R} for some {n_2>n_1}, and decompose the portion of {R} not covered by this packing into rectangles {{\mathcal R}'}.
  • Step 4: Replace {n_1} by {n_2}, replace {{\mathcal R}} by {({\mathcal R} \backslash \{R\}) \cup {\mathcal R}'}, and return to Step 1.

The main innovation of this paper is to perform Step 3 somewhat more efficiently than in previous papers.

The above algorithm can get stuck if one reaches a point where one has already packed squares of sidelength {1/n^t} for {n_0 \leq n < n_1}, but that all remaining rectangles {R} in {{\mathcal R}} have width less than {n_1^{-t}}, in which case there is no obvious way to fit in the next square. If we let {w(R)} and {h(R)} denote the width and height of these rectangles {R}, then the total area of the rectangles must be

\displaystyle  \sum_{R \in {\mathcal R}} w(R) h(R) = \sum_{n=n_0}^\infty \frac{1}{n^{2t}} - \sum_{n=n_0}^{n_1-1} \frac{1}{n^{2t}} \asymp n_1^{1-2t}

and the total perimeter {\mathrm{perim}({\mathcal R})} of these rectangles is

\displaystyle  \mathrm{perim}({\mathcal R}) = \sum_{R \in {\mathcal R}} 2(w(R)+h(R)) \asymp \sum_{R \in {\mathcal R}} h(R).

Thus we have

\displaystyle  n_1^{1-2t} \ll \mathrm{perim}({\mathcal R}) \sup_{R \in {\mathcal R}} w(R)

and so to ensure that there is at least one rectangle {R} with {w(R) \geq n_1^{-t}} it would be enough to have the perimeter bound

\displaystyle  \mathrm{perim}({\mathcal R}) \leq c n_1^{1-t}

for a sufficiently small constant {c>0}. It is here that we now see the critical nature of the exponent {t=1}: for {t<1}, the amount of perimeter we are permitted to have in the remaining rectangles increases as one progresses with the packing, but for {t=1} the amount of perimeter one is “budgeted” for stays constant (and for {t>1} the situation is even worse, in that the remaining rectangles {{\mathcal R}} should steadily decrease in total perimeter).

In comparison, the perimeter of the squares that one has already packed is equal to

\displaystyle  \sum_{n=n_0}^{n_1-1} 4 n^{-t}

which is comparable to {n_1^{1-t}} for {n_1} large (with the constants blowing up as {t} approaches the critical value of {1}). In previous algorithms, the total perimeter of the remainder rectangles {{\mathcal R}} was basically comparable to the perimeter of the squares already packed, and this is the main reason why the results only worked when {t} was sufficiently far away from {1}. In my paper, I am able to get the perimeter of {{\mathcal R}} significantly smaller than the perimeter of the squares already packed, by grouping those squares into lattice-like clusters (of about {M^2} squares arranged in an {M \times M} pattern), and sliding the squares in each cluster together to almost entirely eliminate the wasted space between each square, leaving only the space around the cluster as the main source of residual perimeter, which will be comparable to about {M n_1^{-t}} per cluster, as compared to the total perimeter of the squares in the cluster which is comparable to {M^2 n_1^{-t}}. This strategy is perhaps easiest to illustrate with a picture, in which {3 \times 4} squares {S_{i,j}} of slowly decreasing sidelength are packed together with relatively little wasted space:

By choosing the parameter {M} suitably large (and taking {n_0} sufficiently large depending on {M}), one can then prove the theorem. (In order to do some technical bookkeeping and to allow one to close an induction in the verification of the algorithm’s correctness, it is convenient to replace the perimeter {\sum_{R \in {\mathcal R}} 2(w(R)+h(R))} by a slightly weighted variant {\sum_{R \in {\mathcal R}} w(R)^\delta h(R)} for a small exponent {\delta}, but this is a somewhat artificial device that somewhat obscures the main ideas.)

Asgar Jamneshan and myself have just uploaded to the arXiv our preprint “The inverse theorem for the {U^3} Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches“. This paper, which is a companion to another recent paper of ourselves and Or Shalom, studies the inverse theory for the third Gowers uniformity norm

\displaystyle  \| f \|_{U^3(G)}^8 = {\bf E}_{h_1,h_2,h_3,x \in G} \Delta_{h_1} \Delta_{h_2} \Delta_{h_3} f(x)

on an arbitrary finite abelian group {G}, where {\Delta_h f(x) := f(x+h) \overline{f(x)}} is the multiplicative derivative. Our main result is as follows:

Theorem 1 (Inverse theorem for {U^3(G)}) Let {G} be a finite abelian group, and let {f: G \rightarrow {\bf C}} be a {1}-bounded function with {\|f\|_{U^3(G)} \geq \eta} for some {0 < \eta \leq 1/2}. Then:
  • (i) (Correlation with locally quadratic phase) There exists a regular Bohr set {B(S,\rho) \subset G} with {|S| \ll \eta^{-O(1)}} and {\exp(-\eta^{-O(1)}) \ll \rho \leq 1/2}, a locally quadratic function {\phi: B(S,\rho) \rightarrow {\bf R}/{\bf Z}}, and a function {\xi: G \rightarrow \hat G} such that

    \displaystyle  {\bf E}_{x \in G} |{\bf E}_{h \in B(S,\rho)} f(x+h) e(-\phi(h)-\xi(x) \cdot h)| \gg \eta^{O(1)}.

  • (ii) (Correlation with nilsequence) There exists an explicit degree two filtered nilmanifold {H/\Lambda} of dimension {O(\eta^{-O(1)})}, a polynomial map {g: G \rightarrow H/\Lambda}, and a Lipschitz function {F: H/\Lambda \rightarrow {\bf C}} of constant {O(\exp(\eta^{-O(1)}))} such that

    \displaystyle  |{\bf E}_{x \in G} f(x) \overline{F}(g(x))| \gg \exp(-\eta^{-O(1)}).

Such a theorem was proven by Ben Green and myself in the case when {|G|} was odd, and by Samorodnitsky in the {2}-torsion case {G = {\bf F}_2^n}. In all cases one uses the “higher order Fourier analysis” techniques introduced by Gowers. After some now-standard manipulations (using for instance what is now known as the Balog-Szemerédi-Gowers lemma), one arrives (for arbitrary {G}) at an estimate that is roughly of the form

\displaystyle  |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) b(x,k) b(x,h) e(-B(h,k))| \gg \eta^{O(1)}

where {b} denotes various {1}-bounded functions whose exact values are not too important, and {B: B(S,\rho) \times B(S,\rho) \rightarrow {\bf R}/{\bf Z}} is a symmetric locally bilinear form. The idea is then to “integrate” this form by expressing it in the form

\displaystyle  B(h,k) = \phi(h+k) - \phi(h) - \phi(k) \ \ \ \ \ (1)

for some locally quadratic {\phi: B(S,\rho) \rightarrow {\bf C}}; this then allows us to write the above correlation as

\displaystyle  |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) e(-\phi(h+k)) b(x,k) b(x,h)| \gg \eta^{O(1)}

(after adjusting the {b} functions suitably), and one can now conclude part (i) of the above theorem using some linear Fourier analysis. Part (ii) follows by encoding locally quadratic phase functions as nilsequences; for this we adapt an algebraic construction of Manners.

So the key step is to obtain a representation of the form (1), possibly after shrinking the Bohr set {B(S,\rho)} a little if needed. This has been done in the literature in two ways:

  • When {|G|} is odd, one has the ability to divide by {2}, and on the set {2 \cdot B(S,\frac{\rho}{10}) = \{ 2x: x \in B(S,\frac{\rho}{10})\}} one can establish (1) with {\phi(h) := B(\frac{1}{2} h, h)}. (This is similar to how in single variable calculus the function {x \mapsto \frac{1}{2} x^2} is a function whose second derivative is equal to {1}.)
  • When {G = {\bf F}_2^n}, then after a change of basis one can take the Bohr set {B(S,\rho)} to be {{\bf F}_2^m} for some {m}, and the bilinear form can be written in coordinates as

    \displaystyle  B(h,k) = \sum_{1 \leq i,j \leq m} a_{ij} h_i k_j / 2 \hbox{ mod } 1

    for some {a_{ij} \in {\bf F}_2} with {a_{ij}=a_{ji}}. The diagonal terms {a_{ii}} cause a problem, but by subtracting off the rank one form {(\sum_{i=1}^m a_{ii} h_i) ((\sum_{i=1}^m a_{ii} k_i) / 2} one can write

    \displaystyle  B(h,k) = \sum_{1 \leq i,j \leq m} b_{ij} h_i k_j / 2 \hbox{ mod } 1

    on the orthogonal complement of {(a_{11},\dots,a_{mm})} for some coefficients {b_{ij}=b_{ji}} which now vanish on the diagonal: {b_{ii}=0}. One can now obtain (1) on this complement by taking

    \displaystyle  \phi(h) := \sum_{1 \leq i < j \leq m} b_{ij} h_i h_k / 2 \hbox{ mod } 1.

In our paper we can now treat the case of arbitrary finite abelian groups {G}, by means of the following two new ingredients:

  • (i) Using some geometry of numbers, we can lift the group {G} to a larger (possibly infinite, but still finitely generated) abelian group {G_S} with a projection map {\pi: G_S \rightarrow G}, and find a globally bilinear map {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} on the latter group, such that one has a representation

    \displaystyle  B(\pi(x), \pi(y)) = \tilde B(x,y) \ \ \ \ \ (2)

    of the locally bilinear form {B} by the globally bilinear form {\tilde B} when {x,y} are close enough to the origin.
  • (ii) Using an explicit construction, one can show that every globally bilinear map {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} has a representation of the form (1) for some globally quadratic function {\tilde \phi: G_S \rightarrow {\bf R}/{\bf Z}}.

To illustrate (i), consider the Bohr set {B(S,1/10) = \{ x \in {\bf Z}/N{\bf Z}: \|x/N\|_{{\bf R}/{\bf Z}} < 1/10\}} in {G = {\bf Z}/N{\bf Z}} (where {\|\|_{{\bf R}/{\bf Z}}} denotes the distance to the nearest integer), and consider a locally bilinear form {B: B(S,1/10) \times B(S,1/10) \rightarrow {\bf R}/{\bf Z}} of the form {B(x,y) = \alpha x y \hbox{ mod } 1} for some real number {\alpha} and all integers {x,y \in (-N/10,N/10)} (which we identify with elements of {G}. For generic {\alpha}, this form cannot be extended to a globally bilinear form on {G}; however if one lifts {G} to the finitely generated abelian group

\displaystyle  G_S := \{ (x,\theta) \in {\bf Z}/N{\bf Z} \times {\bf R}: \theta = x/N \hbox{ mod } 1 \}

(with projection map {\pi: (x,\theta) \mapsto x}) and introduces the globally bilinear form {\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}} by the formula

\displaystyle  \tilde B((x,\theta),(y,\sigma)) = N^2 \alpha \theta \sigma \hbox{ mod } 1

then one has (2) when {\theta,\sigma} lie in the interval {(-1/10,1/10)}. A similar construction works for higher rank Bohr sets.

To illustrate (ii), the key case turns out to be when {G_S} is a cyclic group {{\bf Z}/N{\bf Z}}, in which case {\tilde B} will take the form

\displaystyle  \tilde B(x,y) = \frac{axy}{N} \hbox{ mod } 1

for some integer {a}. One can then check by direct construction that (1) will be obeyed with

\displaystyle  \tilde \phi(x) = \frac{a \binom{x}{2}}{N} - \frac{a x \binom{N}{2}}{N^2} \hbox{ mod } 1

regardless of whether {N} is even or odd. A variant of this construction also works for {{\bf Z}}, and the general case follows from a short calculation verifying that the claim (ii) for any two groups {G_S, G'_S} implies the corresponding claim (ii) for the product {G_S \times G'_S}.

This concludes the Fourier-analytic proof of Theorem 1. In this paper we also give an ergodic theory proof of (a qualitative version of) Theorem 1(ii), using a correspondence principle argument adapted from this previous paper of Ziegler, and myself. Basically, the idea is to randomly generate a dynamical system on the group {G}, by selecting an infinite number of random shifts {g_1, g_2, \dots \in G}, which induces an action of the infinitely generated free abelian group {{\bf Z}^\omega = \bigcup_{n=1}^\infty {\bf Z}^n} on {G} by the formula

\displaystyle  T^h x := x + \sum_{i=1}^\infty h_i g_i.

Much as the law of large numbers ensures the almost sure convergence of Monte Carlo integration, one can show that this action is almost surely ergodic (after passing to a suitable Furstenberg-type limit {X} where the size of {G} goes to infinity), and that the dynamical Host-Kra-Gowers seminorms of that system coincide with the combinatorial Gowers norms of the original functions. One is then well placed to apply an inverse theorem for the third Host-Kra-Gowers seminorm {U^3(X)} for {{\bf Z}^\omega}-actions, which was accomplished in the companion paper to this one. After doing so, one almost gets the desired conclusion of Theorem 1(ii), except that after undoing the application of the Furstenberg correspondence principle, the map {g: G \rightarrow H/\Lambda} is merely an almost polynomial rather than a polynomial, which roughly speaking means that instead of certain derivatives of {g} vanishing, they instead are merely very small outside of a small exceptional set. To conclude we need to invoke a “stability of polynomials” result, which at this level of generality was first established by Candela and Szegedy (though we also provide an independent proof here in an appendix), which roughly speaking asserts that every approximate polynomial is close in measure to an actual polynomial. (This general strategy is also employed in the Candela-Szegedy paper, though in the absence of the ergodic inverse theorem input that we rely upon here, the conclusion is weaker in that the filtered nilmanifold {H/\Lambda} is replaced with a general space known as a “CFR nilspace”.)

This transference principle approach seems to work well for the higher step cases (for instance, the stability of polynomials result is known in arbitrary degree); the main difficulty is to establish a suitable higher step inverse theorem in the ergodic theory setting, which we hope to do in future research.

Asgar Jamneshan, Or Shalom, and myself have just uploaded to the arXiv our preprint “The structure of arbitrary Conze–Lesigne systems“. As the title suggests, this paper is devoted to the structural classification of Conze-Lesigne systems, which are a type of measure-preserving system that are “quadratic” or of “complexity two” in a certain technical sense, and are of importance in the theory of multiple recurrence. There are multiple ways to define such systems; here is one. Take a countable abelian group {\Gamma} acting in a measure-preserving fashion on a probability space {(X,\mu)}, thus each group element {\gamma \in \Gamma} gives rise to a measure-preserving map {T^\gamma: X \rightarrow X}. Define the third Gowers-Host-Kra seminorm {\|f\|_{U^3(X)}} of a function {f \in L^\infty(X)} via the formula

\displaystyle  \|f\|_{U^3(X)}^8 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2,h_3 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2,\omega_3 \in \{0,1\}}

\displaystyle {\mathcal C}^{\omega_1+\omega_2+\omega_3} f(T^{\omega_1 h_1 + \omega_2 h_2 + \omega_3 h_3} x)\ d\mu(x)

where {\Phi_n} is a Folner sequence for {\Gamma} and {{\mathcal C}: z \mapsto \overline{z}} is the complex conjugation map. One can show that this limit exists and is independent of the choice of Folner sequence, and that the {\| \|_{U^3(X)}} seminorm is indeed a seminorm. A Conze-Lesigne system is an ergodic measure-preserving system in which the {U^3(X)} seminorm is in fact a norm, thus {\|f\|_{U^3(X)}>0} whenever {f \in L^\infty(X)} is non-zero. Informally, this means that when one considers a generic parallelepiped in a Conze–Lesigne system {X}, the location of any vertex of that parallelepiped is more or less determined by the location of the other seven vertices. These are the important systems to understand in order to study “complexity two” patterns, such as arithmetic progressions of length four. While not all systems {X} are Conze-Lesigne systems, it turns out that they always have a maximal factor {Z^2(X)} that is a Conze-Lesigne system, known as the Conze-Lesigne factor or the second Host-Kra-Ziegler factor of the system, and this factor controls all the complexity two recurrence properties of the system.

The analogous theory in complexity one is well understood. Here, one replaces the {U^3(X)} norm by the {U^2(X)} norm

\displaystyle  \|f\|_{U^2(X)}^4 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2 \in \{0,1\}} {\mathcal C}^{\omega_1+\omega_2} f(T^{\omega_1 h_1 + \omega_2 h_2} x)\ d\mu(x)

and the ergodic systems for which {U^2} is a norm are called Kronecker systems. These systems are completely classified: a system is Kronecker if and only if it arises from a compact abelian group {Z} equipped with Haar probability measure and a translation action {T^\gamma \colon z \mapsto z + \phi(\gamma)} for some homomorphism {\phi: \Gamma \rightarrow Z} with dense image. Such systems can then be analyzed quite efficiently using the Fourier transform, and this can then be used to satisfactory analyze “complexity one” patterns, such as length three progressions, in arbitrary systems (or, when translated back to combinatorial settings, in arbitrary dense sets of abelian groups).

We return now to the complexity two setting. The most famous examples of Conze-Lesigne systems are (order two) nilsystems, in which the space {X} is a quotient {G/\Lambda} of a two-step nilpotent Lie group {G} by a lattice {\Lambda} (equipped with Haar probability measure), and the action is given by a translation {T^\gamma x = \phi(\gamma) x} for some group homomorphism {\phi: \Gamma \rightarrow G}. For instance, the Heisenberg {{\bf Z}}-nilsystem

\displaystyle  \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix} / \begin{pmatrix} 1 & {\bf Z} & {\bf Z} \\ 0 & 1 & {\bf Z} \\ 0 & 0 & 1 \end{pmatrix}

with a shift of the form

\displaystyle  Tx = \begin{pmatrix} 1 & \alpha & 0 \\ 0 & 1 & \beta \\ 0 & 0 & 1 \end{pmatrix} x

for {\alpha,\beta} two real numbers with {1,\alpha,\beta} linearly independent over {{\bf Q}}, is a Conze-Lesigne system. As the base case of a well known result of Host and Kra, it is shown in fact that all Conze-Lesigne {{\bf Z}}-systems are inverse limits of nilsystems (previous results in this direction were obtained by Conze-Lesigne, Furstenberg-Weiss, and others). Similar results are known for {\Gamma}-systems when {\Gamma} is finitely generated, thanks to the thesis work of Griesmer (with further proofs by Gutman-Lian and Candela-Szegedy). However, this is not the case once {\Gamma} is not finitely generated; as a recent example of Shalom shows, Conze-Lesigne systems need not be the inverse limit of nilsystems in this case.

Our main result is that even in the infinitely generated case, Conze-Lesigne systems are still inverse limits of a slight generalisation of the nilsystem concept, in which {G} is a locally compact Polish group rather than a Lie group:

Theorem 1 (Classification of Conze-Lesigne systems) Let {\Gamma} be a countable abelian group, and {X} an ergodic measure-preserving {\Gamma}-system. Then {X} is a Conze-Lesigne system if and only if it is the inverse limit of translational systems {G/\Lambda}, where {G} is a nilpotent locally compact Polish group of nilpotency class two, and {\Lambda} is a lattice in {G} (and also a lattice in the commutator group {[G,G]}), with {G/\Lambda} equipped with the Haar probability measure and a translation action {T^\gamma x = \phi(\gamma) x} for some homomorphism {\phi: \Gamma \rightarrow G}.

In a forthcoming companion paper to this one, Asgar Jamneshan and I will use this theorem to derive an inverse theorem for the Gowers norm {U^3(G)} for an arbitrary finite abelian group {G} (with no restrictions on the order of {G}, in particular our result handles the case of even and odd {|G|} in a unified fashion). In principle, having a higher order version of this theorem will similarly allow us to derive inverse theorems for {U^{s+1}(G)} norms for arbitrary {s} and finite abelian {G}; we hope to investigate this further in future work.

We sketch some of the main ideas used to prove the theorem. The existing machinery developed by Conze-Lesigne, Furstenberg-Weiss, Host-Kra, and others allows one to describe an arbitrary Conze-Lesigne system as a group extension {Z \rtimes_\rho K}, where {Z} is a Kronecker system (a rotational system on a compact abelian group {Z = (Z,+)} and translation action {\phi: \Gamma \rightarrow Z}), {K = (K,+)} is another compact abelian group, and the cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is a collection of measurable maps {\rho_\gamma: Z \rightarrow K} obeying the cocycle equation

\displaystyle  \rho_{\gamma_1+\gamma_2}(x) = \rho_{\gamma_1}(T^{\gamma_2} x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)

for almost all {x \in Z}. Furthermore, {\rho} is of “type two”, which means in this concrete setting that it obeys an additional equation

\displaystyle  \rho_\gamma(x + z_1 + z_2) - \rho_\gamma(x+z_1) - \rho_\gamma(x+z_2) + \rho_\gamma(x) \ \ \ \ \ (2)

\displaystyle  = F(x + \phi(\gamma), z_1, z_2) - F(x,z_1,z_2)

for all {\gamma \in \Gamma} and almost all {x,z_1,z_2 \in Z}, and some measurable function {F: Z^3 \rightarrow K}; roughly speaking this asserts that {\phi_\gamma} is “linear up to coboundaries”. For technical reasons it is also convenient to reduce to the case where {Z} is separable. The problem is that the equation (2) is unwieldy to work with. In the model case when the target group {K} is a circle {{\bf T} = {\bf R}/{\bf Z}}, one can use some Fourier analysis to convert (2) into the more tractable Conze-Lesigne equation

\displaystyle  \rho_\gamma(x+z) - \rho_\gamma(x) = F_z(x+\phi(\gamma)) - F_z(x) + c_z(\gamma) \ \ \ \ \ (3)

for all {\gamma \in \Gamma}, all {z \in Z}, and almost all {x \in Z}, where for each {z}, {F_z: Z \rightarrow K} is a measurable function, and {c_z: \Gamma \rightarrow K} is a homomorphism. (For technical reasons it is often also convenient to enforce that {F_z, c_z} depend in a measurable fashion on {z}; this can always be achieved, at least when the Conze-Lesigne system is separable, but actually verifying that this is possible actually requires a certain amount of effort, which we devote an appendix to in our paper.) It is not difficult to see that (3) implies (2) for any group {K} (as long as one has the measurability in {z} mentioned previously), but the converse turns out to fail for some groups {K}, such as solenoid groups (e.g., inverse limits of {{\bf R}/2^n{\bf Z}} as {n \rightarrow \infty}), as was essentially shown by Rudolph. However, in our paper we were able to find a separate argument that also derived the Conze-Lesigne equation in the case of a cyclic group {K = \frac{1}{N}{\bf Z}/{\bf Z}}. Putting together the {K={\bf T}} and {K = \frac{1}{N}{\bf Z}/{\bf Z}} cases, one can then derive the Conze-Lesigne equation for arbitrary compact abelian Lie groups {K} (as such groups are isomorphic to direct products of finitely many tori and cyclic groups). As has been known for some time (see e.g., this paper of Host and Kra), once one has a Conze-Lesigne equation, one can more or less describe the system {X} as a translational system {G/\Lambda}, where the Host-Kra group {G} is the set of all pairs {(z, F_z)} that solve an equation of the form (3) (with these pairs acting on {X \equiv Z \rtimes_\rho K} by the law {(z,F_z) \cdot (x,k) := (x+z, k+F_z(x))}), and {\Lambda} is the stabiliser of a point in this system. This then establishes the theorem in the case when {K} is a Lie group, and the general case basically comes from the fact (from Fourier analysis or the Peter-Weyl theorem) that an arbitrary compact abelian group is an inverse limit of Lie groups. (There is a technical issue here in that one has to check that the space of translational system factors of {X} form a directed set in order to have a genuine inverse limit, but this can be dealt with by modifications of the tools mentioned here.)

There is an additional technical issue worth pointing out here (which unfortunately was glossed over in some previous work in the area). Because the cocycle equation (1) and the Conze-Lesigne equation (3) are only valid almost everywhere instead of everywhere, the action of {G} on {X} is technically only a near-action rather than a genuine action, and as such one cannot directly define {\Lambda} to be the stabiliser of a point without running into multiple problems. To fix this, one has to pass to a topological model of {X} in which the action becomes continuous, and the stabilizer becomes well defined, although one then has to work a little more to check that the action is still transitive. This can be done via Gelfand duality; we proceed using a mixture of a construction from this book of Host and Kra, and the machinery in this recent paper of Asgar and myself.

Now we discuss how to establish the Conze-Lesigne equation (3) in the cyclic group case {K = \frac{1}{N}{\bf Z}/{\bf Z}}. As this group embeds into the torus {{\bf T}}, it is easy to use existing methods obtain (3) but with the homomorphism {c_z} and the function {F_z} taking values in {{\bf R}/{\bf Z}} rather than in {\frac{1}{N}{\bf Z}/{\bf Z}}. The main task is then to fix up the homomorphism {c_z} so that it takes values in {\frac{1}{N}{\bf Z}/{\bf Z}}, that is to say that {Nc_z} vanishes. This only needs to be done locally near the origin, because the claim is easy when {z} lies in the dense subgroup {\phi(\Gamma)} of {Z}, and also because the claim can be shown to be additive in {z}. Near the origin one can leverage the Steinhaus lemma to make {c_z} depend linearly (or more precisely, homomorphically) on {z}, and because the cocycle {\rho} already takes values in {\frac{1}{N}{\bf Z}/{\bf Z}}, {N\rho} vanishes and {Nc_z} must be an eigenvalue of the system {Z}. But as {Z} was assumed to be separable, there are only countably many eigenvalues, and by another application of Steinhaus and linearity one can then make {Nc_z} vanish on an open neighborhood of the identity, giving the claim.

A popular way to visualise relationships between some finite number of sets is via Venn diagrams, or more generally Euler diagrams. In these diagrams, a set is depicted as a two-dimensional shape such as a disk or a rectangle, and the various Boolean relationships between these sets (e.g., that one set is contained in another, or that the intersection of two of the sets is equal to a third) is represented by the Boolean algebra of these shapes; Venn diagrams correspond to the case where the sets are in “general position” in the sense that all non-trivial Boolean combinations of the sets are non-empty. For instance to depict the general situation of two sets {A,B} together with their intersection {A \cap B} and {A \cup B} one might use a Venn diagram such as

venn

(where we have given each region depicted a different color, and moved the edges of each region a little away from each other in order to make them all visible separately), but if one wanted to instead depict a situation in which the intersection {A \cap B} was empty, one could use an Euler diagram such as

euler

One can use the area of various regions in a Venn or Euler diagram as a heuristic proxy for the cardinality {|A|} (or measure {\mu(A)}) of the set {A} corresponding to such a region. For instance, the above Venn diagram can be used to intuitively justify the inclusion-exclusion formula

\displaystyle  |A \cup B| = |A| + |B| - |A \cap B|

for finite sets {A,B}, while the above Euler diagram similarly justifies the special case

\displaystyle  |A \cup B| = |A| + |B|

for finite disjoint sets {A,B}.

While Venn and Euler diagrams are traditionally two-dimensional in nature, there is nothing preventing one from using one-dimensional diagrams such as

venn1d

or even three-dimensional diagrams such as this one from Wikipedia:

venn-3d

Of course, in such cases one would use length or volume as a heuristic proxy for cardinality or measure, rather than area.

With the addition of arrows, Venn and Euler diagrams can also accommodate (to some extent) functions between sets. Here for instance is a depiction of a function {f: A \rightarrow B}, the image {f(A)} of that function, and the image {f(A')} of some subset {A'} of {A}:

afb

Here one can illustrate surjectivity of {f: A \rightarrow B} by having {f(A)} fill out all of {B}; one can similarly illustrate injectivity of {f} by giving {f(A)} exactly the same shape (or at least the same area) as {A}. So here for instance might be how one would illustrate an injective function {f: A \rightarrow B}:

afb-injective

Cartesian product operations can be incorporated into these diagrams by appropriate combinations of one-dimensional and two-dimensional diagrams. Here for instance is a diagram that illustrates the identity {(A \cup B) \times C = (A \times C) \cup (B \times C)}:

cartesian

In this blog post I would like to propose a similar family of diagrams to illustrate relationships between vector spaces (over a fixed base field {k}, such as the reals) or abelian groups, rather than sets. The categories of ({k}-)vector spaces and abelian groups are quite similar in many ways; the former consists of modules over a base field {k}, while the latter consists of modules over the integers {{\bf Z}}; also, both categories are basic examples of abelian categories. The notion of a dimension in a vector space is analogous in many ways to that of cardinality of a set; see this previous post for an instance of this analogy (in the context of Shannon entropy). (UPDATE: I have learned that an essentially identical notation has also been proposed in an unpublished manuscript of Ravi Vakil.)

Read the rest of this entry »

In everyday usage, we rely heavily on percentages to quantify probabilities and proportions: we might say that a prediction is {50\%} accurate or {80\%} accurate, that there is a {2\%} chance of dying from some disease, and so forth. However, for those without extensive mathematical training, it can sometimes be difficult to assess whether a given percentage amounts to a “good” or “bad” outcome, because this depends very much on the context of how the percentage is used. For instance:

  • (i) In a two-party election, an outcome of say {51\%} to {49\%} might be considered close, but {55\%} to {45\%} would probably be viewed as a convincing mandate, and {60\%} to {40\%} would likely be viewed as a landslide.
  • (ii) Similarly, if one were to poll an upcoming election, a poll of {51\%} to {49\%} would be too close to call, {55\%} to {45\%} would be an extremely favorable result for the candidate, and {60\%} to {40\%} would mean that it would be a major upset if the candidate lost the election.
  • (iii) On the other hand, a medical operation that only had a {51\%}, {55\%}, or {60\%} chance of success would be viewed as being incredibly risky, especially if failure meant death or permanent injury to the patient. Even an operation that was {90\%} or {95\%} likely to be non-fatal (i.e., a {10\%} or {5\%} chance of death) would not be conducted lightly.
  • (iv) A weather prediction of, say, {30\%} chance of rain during a vacation trip might be sufficient cause to pack an umbrella, even though it is more likely than not that rain would not occur. On the other hand, if the prediction was for an {80\%} chance of rain, and it ended up that the skies remained clear, this does not seriously damage the accuracy of the prediction – indeed, such an outcome would be expected in one out of every five such predictions.
  • (v) Even extremely tiny percentages of toxic chemicals in everyday products can be considered unacceptable. For instance, EPA rules require action to be taken when the percentage of lead in drinking water exceeds {0.0000015\%} (15 parts per billion). At the opposite extreme, recycling contamination rates as high as {10\%} are often considered acceptable.

Because of all the very different ways in which percentages could be used, I think it may make sense to propose an alternate system of units to measure one class of probabilities, namely the probabilities of avoiding some highly undesirable outcome, such as death, accident or illness. The units I propose are that of “nines“, which are already commonly used to measure availability of some service or purity of a material, but can be equally used to measure the safety (i.e., lack of risk) of some activity. Informally, nines measure how many consecutive appearances of the digit {9} are in the probability of successfully avoiding the negative outcome, thus

  • {90\%} success = one nine of safety
  • {99\%} success = two nines of safety
  • {99.9\%} success = three nines of safety
and so forth. Using the mathematical device of logarithms, one can also assign a fractional number of nines of safety to a general probability:

Definition 1 (Nines of safety) An activity (affecting one or more persons, over some given period of time) that has a probability {p} of the “safe” outcome and probability {1-p} of the “unsafe” outcome will have {k} nines of safety against the unsafe outcome, where {k} is defined by the formula

\displaystyle  k = -\log_{10}(1-p) \ \ \ \ \ (1)

(where {\log_{10}} is the logarithm to base ten), or equivalently

\displaystyle  p = 1 - 10^{-k}. \ \ \ \ \ (2)

Remark 2 Because of the various uncertainties in measuring probabilities, as well as the inaccuracies in some of the assumptions and approximations we will be making later, we will not attempt to measure the number of nines of safety beyond the first decimal point; thus we will round to the nearest tenth of a nine of safety throughout this post.

Here is a conversion table between percentage rates of success (the safe outcome), failure (the unsafe outcome), and the number of nines of safety one has:

Success rate {p} Failure rate {1-p} Number of nines {k}
{0\%} {100\%} {0.0}
{50\%} {50\%} {0.3}
{75\%} {25\%} {0.6}
{80\%} {20\%} {0.7}
{90\%} {10\%} {1.0}
{95\%} {5\%} {1.3}
{97.5\%} {2.5\%} {1.6}
{98\%} {2\%} {1.7}
{99\%} {1\%} {2.0}
{99.5\%} {0.5\%} {2.3}
{99.75\%} {0.25\%} {2.6}
{99.8\%} {0.2\%} {2.7}
{99.9\%} {0.1\%} {3.0}
{99.95\%} {0.05\%} {3.3}
{99.975\%} {0.025\%} {3.6}
{99.98\%} {0.02\%} {3.7}
{99.99\%} {0.01\%} {4.0}
{100\%} {0\%} infinite

Thus, if one has no nines of safety whatsoever, one is guaranteed to fail; but each nine of safety one has reduces the failure rate by a factor of {10}. In an ideal world, one would have infinitely many nines of safety against any risk, but in practice there are no {100\%} guarantees against failure, and so one can only expect a finite amount of nines of safety in any given situation. Realistically, one should thus aim to have as many nines of safety as one can reasonably expect to have, but not to demand an infinite amount.

Remark 3 The number of nines of safety against a certain risk is not absolute; it will depend not only on the risk itself, but (a) the number of people exposed to the risk, and (b) the length of time one is exposed to the risk. Exposing more people or increasing the duration of exposure will reduce the number of nines, and conversely exposing fewer people or reducing the duration will increase the number of nines; see Proposition 7 below for a rough rule of thumb in this regard.

Remark 4 Nines of safety are a logarithmic scale of measurement, rather than a linear scale. Other familiar examples of logarithmic scales of measurement include the Richter scale of earthquake magnitude, the pH scale of acidity, the decibel scale of sound level, octaves in music, and the magnitude scale for stars.

Remark 5 One way to think about nines of safety is via the Swiss cheese model that was created recently to describe pandemic risk management. In this model, each nine of safety can be thought of as a slice of Swiss cheese, with holes occupying {10\%} of that slice. Having {k} nines of safety is then analogous to standing behind {k} such slices of Swiss cheese. In order for a risk to actually impact you, it must pass through each of these {k} slices. A fractional nine of safety corresponds to a fractional slice of Swiss cheese that covers the amount of space given by the above table. For instance, {0.6} nines of safety corresponds to a fractional slice that covers about {75\%} of the given area (leaving {25\%} uncovered).

Now to give some real-world examples of nines of safety. Using data for deaths in the US in 2019 (without attempting to account for factors such as age and gender), a random US citizen will have had the following amount of safety from dying from some selected causes in that year:

Cause of death Mortality rate per {100,\! 000} (approx.) Nines of safety
All causes {870} {2.0}
Heart disease {200} {2.7}
Cancer {180} {2.7}
Accidents {52} {3.3}
Drug overdose {22} {3.7}
Influenza/Pneumonia {15} {3.8}
Suicide {14} {3.8}
Gun violence {12} {3.9}
Car accident {11} {4.0}
Murder {5} {4.3}
Airplane crash {0.14} {5.9}
Lightning strike {0.006} {7.2}

The safety of air travel is particularly remarkable: a given hour of flying in general aviation has a fatality rate of {0.00001}, or about {5} nines of safety, while for the major carriers the fatality rate drops down to {0.0000005}, or about {7.3} nines of safety.

Of course, in 2020, COVID-19 deaths became significant. In this year in the US, the mortality rate for COVID-19 (as the underlying or contributing cause of death) was {91.5} per {100,\! 000}, corresponding to {3.0} nines of safety, which was less safe than all other causes of death except for heart disease and cancer. At this time of writing, data for all of 2021 is of course not yet available, but it seems likely that the safety level would be even lower for this year.

Some further illustrations of the concept of nines of safety:

  • Each round of Russian roulette has a success rate of {5/6}, providing only {0.8} nines of safety. Of course, the safety will decrease with each additional round: one has only {0.5} nines of safety after two rounds, {0.4} nines after three rounds, and so forth. (See also Proposition 7 below.)
  • The ancient Roman punishment of decimation, by definition, provided exactly one nine of safety to each soldier being punished.
  • Rolling a {1} on a {20}-sided die is a risk that carries about {1.3} nines of safety.
  • Rolling a double one (“snake eyes“) from two six-sided dice carries about {1.6} nines of safety.
  • One has about {2.6} nines of safety against the risk of someone randomly guessing your birthday on the first attempt.
  • A null hypothesis has {1.3} nines of safety against producing a {p = 0.05} statistically significant result, and {2.0} nines against producing a {p=0.01} statistically significant result. (However, one has to be careful when reversing the conditional; a {p=0.01} statistically significant result does not necessarily have {2.0} nines of safety against the null hypothesis. In Bayesian statistics, the precise relationship between the two risks is given by Bayes’ theorem.)
  • If a poker opponent is dealt a five-card hand, one has {5.8} nines of safety against that opponent being dealt a royal flush, {4.8} against a straight flush or higher, {3.6} against four-of-a-kind or higher, {2.8} against a full house or higher, {2.4} against a flush or higher, {2.1} against a straight or higher, {1.5} against three-of-a-kind or higher, {1.1} against two pairs or higher, and just {0.3} against one pair or higher. (This data was converted from this Wikipedia table.)
  • A {k}-digit PIN number (or a {k}-digit combination lock) carries {k} nines of safety against each attempt to randomly guess the PIN. A length {k} password that allows for numbers, upper and lower case letters, and punctuation carries about {2k} nines of safety against a single guess. (For the reduction in safety caused by multiple guesses, see Proposition 7 below.)

Here is another way to think about nines of safety:

Proposition 6 (Nines of safety extend expected onset of risk) Suppose a certain risky activity has {k} nines of safety. If one repeatedly indulges in this activity until the risk occurs, then the expected number of trials before the risk occurs is {10^k}.

Proof: The probability that the risk is activated after exactly {n} trials is {(1-10^{-k})^{n-1} 10^{-k}}, which is a geometric distribution of parameter {10^{-k}}. The claim then follows from the standard properties of that distribution. \Box

Thus, for instance, if one performs some risky activity daily, then the expected length of time before the risk occurs is given by the following table:

Daily nines of safety Expected onset of risk
{0} One day
{0.8} One week
{1.5} One month
{2.6} One year
{2.9} Two years
{3.3} Five years
{3.6} Ten years
{3.9} Twenty years
{4.3} Fifty years
{4.6} A century

Or, if one wants to convert the yearly risks of dying from a specific cause into expected years before that cause of death would occur (assuming for sake of discussion that no other cause of death exists):

Yearly nines of safety Expected onset of risk
{0} One year
{0.3} Two years
{0.7} Five years
{1} Ten years
{1.3} Twenty years
{1.7} Fifty years
{2.0} A century

These tables suggest a relationship between the amount of safety one would have in a short timeframe, such as a day, and a longer time frame, such as a year. Here is an approximate formalisation of that relationship:

Proposition 7 (Repeated exposure reduces nines of safety) If a risky activity with {k} nines of safety is (independently) repeated {m} times, then (assuming {k} is large enough depending on {m}), the repeated activity will have approximately {k - \log_{10} m} nines of safety. Conversely: if the repeated activity has {k'} nines of safety, the individual activity will have approximately {k' + \log_{10} m} nines of safety.

Proof: An activity with {k} nines of safety will be safe with probability {1-10^{-k}}, hence safe with probability {(1-10^{-k})^m} if repeated independently {m} times. For {k} large, we can approximate

\displaystyle  (1 - 10^{-k})^m \approx 1 - m 10^{-k} = 1 - 10^{-(k - \log_{10} m)}

giving the former claim. The latter claim follows from inverting the former. \Box

Remark 8 The hypothesis of independence here is key. If there is a lot of correlation between the risks between different repetitions of the activity, then there can be much less reduction in safety caused by that repetition. As a simple example, suppose that {90\%} of a workforce are trained to perform some task flawlessly no matter how many times they repeat the task, but the remaining {10\%} are untrained and will always fail at that task. If one selects a random worker and asks them to perform the task, one has {1.0} nines of safety against the task failing. If one took that same random worker and asked them to perform the task {m} times, the above proposition might suggest that the number of nines of safety would drop to approximately {1.0 - \log_{10} m}; but in this case there is perfect correlation, and in fact the number of nines of safety remains steady at {1.0} since it is the same {10\%} of the workforce that would fail each time.

Because of this caveat, one should view the above proposition as only a crude first approximation that can be used as a simple rule of thumb, but should not be relied upon for more precise calculations.

One can repeat a risk either in time (extending the time of exposure to the risk, say from a day to a year), or in space (by exposing the risk to more people). The above proposition then gives an additive conversion law for nines of safety in either case. Here are some conversion tables for time:

From/to Daily Weekly Monthly Yearly
Daily 0 -0.8 -1.5 -2.6
Weekly +0.8 0 -0.6 -1.7
Monthly +1.5 +0.6 0 -1.1
Yearly +2.6 +1.7 +1.1 0

From/to Yearly Per 5 yr Per decade Per century
Yearly 0 -0.7 -1.0 -2.0
Per 5 yr +0.7 0 -0.3 -1.3
Per decade +1.0 + -0.3 0 -1.0
Per century +2.0 +1.3 +1.0 0

For instance, as mentioned before, the yearly amount of safety against cancer is about {2.7}. Using the above table (and making the somewhat unrealistic hypothesis of independence), we then predict the daily amount of safety against cancer to be about {2.7 + 2.6 = 5.3} nines, the weekly amount to be about {2.7 + 1.7 = 4.4} nines, and the amount of safety over five years to drop to about {2.7 - 0.7 = 2.0} nines.

Now we turn to conversions in space. If one knows the level of safety against a certain risk for an individual, and then one (independently) exposes a group of such individuals to that risk, then the reduction in nines of safety when considering the possibility that at least one group member experiences this risk is given by the following table:

Group Reduction in safety
You ({1} person) {0}
You and your partner ({2} people) {-0.3}
You and your parents ({3} people) {-0.5}
You, your partner, and three children ({5} people) {-0.7}
An extended family of {10} people {-1.0}
A class of {30} people {-1.5}
A workplace of {100} people {-2.0}
A school of {1,\! 000} people {-3.0}
A university of {10,\! 000} people {-4.0}
A town of {100,\! 000} people {-5.0}
A city of {1} million people {-6.0}
A state of {10} million people {-7.0}
A country of {100} million people {-8.0}
A continent of {1} billion people {-9.0}
The entire planet {-9.8}

For instance, in a given year (and making the somewhat implausible assumption of independence), you might have {2.7} nines of safety against cancer, but you and your partner collectively only have about {2.7 - 0.3 = 2.4} nines of safety against this risk, your family of five might only have about {2.7 - 0.7 = 2} nines of safety, and so forth. By the time one gets to a group of {1,\! 000} people, it actually becomes very likely that at least one member of the group will die of cancer in that year. (Here the precise conversion table breaks down, because a negative number of nines such as {2.7 - 3.0 = -0.3} is not possible, but one should interpret a prediction of a negative number of nines as an assertion that failure is very likely to happen. Also, in practice the reduction in safety is less than this rule predicts, due to correlations such as risk factors that are common to the group being considered that are incompatible with the assumption of independence.)

In the opposite direction, any reduction in exposure (either in time or space) to a risk will increase one’s safety level, as per the following table:

Reduction in exposure Additional nines of safety
{\div 1} {0}
{\div 2} {+0.3}
{\div 3} {+0.5}
{\div 5} {+0.7}
{\div 10} {+1.0}
{\div 100} {+2.0}

For instance, a five-fold reduction in exposure will reclaim about {0.7} additional nines of safety.

Here is a slightly different way to view nines of safety:

Proposition 9 Suppose that a group of {m} people are independently exposed to a given risk. If there are at most

\displaystyle  \log_{10} \frac{1}{1-2^{-1/m}}

nines of individual safety against that risk, then there is at least a {50\%} chance that one member of the group is affected by the risk.

Proof: If individually there are {k} nines of safety, then the probability that all the members of the group avoid the risk is {(1-10^{-k})^m}. Since the inequality

\displaystyle  (1-10^{-k})^m \leq \frac{1}{2}

is equivalent to

\displaystyle  k \leq \log_{10} \frac{1}{1-2^{-1/m}},

the claim follows. \Box

Thus, for a group to collectively avoid a risk with at least a {50\%} chance, one needs the following level of individual safety:

Group Individual safety level required
You ({1} person) {0.3}
You and your partner ({2} people) {0.5}
You and your parents ({3} people) {0.7}
You, your partner, and three children ({5} people) {0.9}
An extended family of {10} people {1.2}
A class of {30} people {1.6}
A workplace of {100} people {2.2}
A school of {1,\! 000} people {3.2}
A university of {10,\! 000} people {4.2}
A town of {100,\! 000} people {5.2}
A city of {1} million people {6.2}
A state of {10} million people {7.2}
A country of {100} million people {8.2}
A continent of {1} billion people {9.2}
The entire planet {10.0}

For large {m}, the level {k} of nines of individual safety required to protect a group of size {m} with probability at least {50\%} is approximately {\log_{10} \frac{m}{\ln 2} \approx (\log_{10} m) + 0.2}.

Precautions that can work to prevent a certain risk from occurring will add additional nines of safety against that risk, even if the precaution is not {100\%} effective. Here is the precise rule:

Proposition 10 (Precautions add nines of safety) Suppose an activity carries {k} nines of safety against a certain risk, and a separate precaution can independently protect against that risk with {l} nines of safety (that is to say, the probability that the protection is effective is {1 - 10^{-l}}). Then applying that precaution increases the number of nines in the activity from {k} to {k+l}.

Proof: The probability that the precaution fails and the risk then occurs is {10^{-l} \times 10^{-k} = 10^{-(k+l)}}. The claim now follows from Definition 1. \Box

In particular, we can repurpose the table at the start of this post as a conversion chart for effectiveness of a precaution:

Effectiveness Failure rate Additional nines provided
{0\%} {100\%} {+0.0}
{50\%} {50\%} {+0.3}
{75\%} {25\%} {+0.6}
{80\%} {20\%} {+0.7}
{90\%} {10\%} {+1.0}
{95\%} {5\%} {+1.3}
{97.5\%} {2.5\%} {+1.6}
{98\%} {2\%} {+1.7}
{99\%} {1\%} {+2.0}
{99.5\%} {0.5\%} {+2.3}
{99.75\%} {0.25\%} {+2.6}
{99.8\%} {0.2\%} {+2.7}
{99.9\%} {0.1\%} {+3.0}
{99.95\%} {0.05\%} {+3.3}
{99.975\%} {0.025\%} {+3.6}
{99.98\%} {0.02\%} {+3.7}
{99.99\%} {0.01\%} {+4.0}
{100\%} {0\%} infinite

Thus for instance a precaution that is {80\%} effective will add {0.7} nines of safety, a precaution that is {99.8\%} effective will add {2.7} nines of safety, and so forth. The mRNA COVID vaccines by Pfizer and Moderna have somewhere between {88\% - 96\%} effectiveness against symptomatic COVID illness, providing about {0.9-1.4} nines of safety against that risk, and over {95\%} effectiveness against severe illness, thus adding at least {1.3} nines of safety in this regard.

A slight variant of the above rule can be stated using the concept of relative risk:

Proposition 11 (Relative risk and nines of safety) Suppose an activity carries {k} nines of safety against a certain risk, and an action multiplies the chance of failure by some relative risk {R}. Then the action removes {\log_{10} R} nines of safety (if {R > 1}) or adds {-\log_{10} R} nines of safety (if {R<1}) to the original activity.

Proof: The additional action adjusts the probability of failure from {10^{-k}} to {R \times 10^{-k} = 10^{-(k - \log_{10} R)}}. The claim now follows from Definition 1. \Box

Here is a conversion chart between relative risk and change in nines of safety:

Relative risk Change in nines of safety
{0.01} {+2.0}
{0.02} {+1.7}
{0.05} {+1.3}
{0.1} {+1.0}
{0.2} {+0.7}
{0.5} {+0.3}
{1} {0}
{2} {-0.3}
{5} {-0.7}
{10} {-1.0}
{20} {-1.3}
{50} {-1.7}
{100} {-2.0}

Some examples:

  • Smoking increases the fatality rate of lung cancer by a factor of about {20}, thus removing about {1.3} nines of safety from this particular risk; it also increases the fatality rates of several other diseases, though not quite as dramatically an extent.
  • Seatbelts reduce the fatality rate in car accidents by a factor of about two, adding about {0.3} nines of safety. Airbags achieve a reduction of about {30-50\%}, adding about {0.2-0.3} additional nines of safety.
  • As far as transmission of COVID is concerned, it seems that constant use of face masks reduces transmission by a factor of about five (thus adding about {0.7} nines of safety), and similarly for constant adherence to social distancing; whereas for instance a {30\%} compliance with mask usage reduced transmission by about {10\%} (adding only {0.05} or so nines of safety).

The effect of combining multiple (independent) precautions together is cumulative; one can achieve quite a high level of safety by stacking together several precautions that individually have relatively low levels of effectiveness. Again, see the “swiss cheese model” referred to in Remark 5. For instance, if face masks add {0.7} nines of safety against contracting COVID, social distancing adds another {0.7} nines, and the vaccine provide another {1.0} nine of safety, implementing all three mitigation methods would (assuming independence) add a net of {2.4} nines of safety against contracting COVID.

In summary, when debating the value of a given risk mitigation measure, the correct question to ask is not quite “Is it certain to work” or “Can it fail?”, but rather “How many extra nines of safety does it add?”.

As one final comparison between nines of safety and other standard risk measures, we give the following proposition regarding large deviations from the mean.

Proposition 12 Let {X} be a normally distributed random variable of standard deviation {\sigma}, and let {\lambda > 0}. Then the “one-sided risk” of {X} exceeding its mean {{\bf E} X} by at least {\lambda \sigma} (i.e., {X \geq {\bf E} X + \lambda \sigma}) carries

\displaystyle  -\log_{10} \frac{1 - \mathrm{erf}(\lambda/\sqrt{2})}{2}

nines of safety, the “two-sided risk” of {X} deviating (in either direction) from its mean by at least {\lambda \sigma} (i.e., {|X-{\bf E} X| \geq \lambda \sigma}) carries

\displaystyle  -\log_{10} (1 - \mathrm{erf}(\lambda/\sqrt{2}))

nines of safety, where {\mathrm{erf}} is the error function.

Proof: This is a routine calculation using the cumulative distribution function of the normal distribution. \Box

Here is a short table illustrating this proposition:

Number {\lambda} of deviations from the mean One-sided nines of safety Two-sided nines of safety
{0} {0.3} {0.0}
{1} {0.8} {0.5}
{2} {1.6} {1.3}
{3} {2.9} {2.6}
{4} {4.5} {4.2}
{5} {6.5} {6.2}
{6} {9.0} {8.7}

Thus, for instance, the risk of a five sigma event (deviating by more than five standard deviations from the mean in either direction) should carry {6.2} nines of safety assuming a normal distribution, and so one would ordinarily feel extremely safe against the possibility of such an event, unless one started doing hundreds of thousands of trials. (However, we caution that this conclusion relies heavily on the assumption that one has a normal distribution!)

See also this older essay I wrote on anonymity on the internet, using bits as a measure of anonymity in much the same way that nines are used here as a measure of safety.

Archives