Theorem 1 (Inverse theorem for ) Let be a finite abelian group, and let be a -bounded function with for some . Then:
- (i) (Correlation with locally quadratic phase) There exists a regular Bohr set with and , a locally quadratic function , and a function such that
- (ii) (Correlation with nilsequence) There exists an explicit degree two filtered nilmanifold of dimension , a polynomial map , and a Lipschitz function of constant such that
Such a theorem was proven by Ben Green and myself in the case when was odd, and by Samorodnitsky in the -torsion case . In all cases one uses the “higher order Fourier analysis” techniques introduced by Gowers. After some now-standard manipulations (using for instance what is now known as the Balog-Szemerédi-Gowers lemma), one arrives (for arbitrary ) at an estimate that is roughly of the form
where denotes various -bounded functions whose exact values are not too important, and is a symmetric locally bilinear form. The idea is then to “integrate” this form by expressing it in the form for some locally quadratic ; this then allows us to write the above correlation as (after adjusting the functions suitably), and one can now conclude part (i) of the above theorem using some linear Fourier analysis. Part (ii) follows by encoding locally quadratic phase functions as nilsequences; for this we adapt an algebraic construction of Manners.So the key step is to obtain a representation of the form (1), possibly after shrinking the Bohr set a little if needed. This has been done in the literature in two ways:
In our paper we can now treat the case of arbitrary finite abelian groups , by means of the following two new ingredients:
To illustrate (i), consider the Bohr set in (where denotes the distance to the nearest integer), and consider a locally bilinear form of the form for some real number and all integers (which we identify with elements of . For generic , this form cannot be extended to a globally bilinear form on ; however if one lifts to the finitely generated abelian group
(with projection map ) and introduces the globally bilinear form by the formula then one has (2) when lie in the interval . A similar construction works for higher rank Bohr sets.To illustrate (ii), the key case turns out to be when is a cyclic group , in which case will take the form
for some integer . One can then check by direct construction that (1) will be obeyed with regardless of whether is even or odd. A variant of this construction also works for , and the general case follows from a short calculation verifying that the claim (ii) for any two groups implies the corresponding claim (ii) for the product .This concludes the Fourier-analytic proof of Theorem 1. In this paper we also give an ergodic theory proof of (a qualitative version of) Theorem 1(ii), using a correspondence principle argument adapted from this previous paper of Ziegler, and myself. Basically, the idea is to randomly generate a dynamical system on the group , by selecting an infinite number of random shifts , which induces an action of the infinitely generated free abelian group on by the formula
Much as the law of large numbers ensures the almost sure convergence of Monte Carlo integration, one can show that this action is almost surely ergodic (after passing to a suitable Furstenberg-type limit where the size of goes to infinity), and that the dynamical Host-Kra-Gowers seminorms of that system coincide with the combinatorial Gowers norms of the original functions. One is then well placed to apply an inverse theorem for the third Host-Kra-Gowers seminorm for -actions, which was accomplished in the companion paper to this one. After doing so, one almost gets the desired conclusion of Theorem 1(ii), except that after undoing the application of the Furstenberg correspondence principle, the map is merely an almost polynomial rather than a polynomial, which roughly speaking means that instead of certain derivatives of vanishing, they instead are merely very small outside of a small exceptional set. To conclude we need to invoke a “stability of polynomials” result, which at this level of generality was first established by Candela and Szegedy (though we also provide an independent proof here in an appendix), which roughly speaking asserts that every approximate polynomial is close in measure to an actual polynomial. (This general strategy is also employed in the Candela-Szegedy paper, though in the absence of the ergodic inverse theorem input that we rely upon here, the conclusion is weaker in that the filtered nilmanifold is replaced with a general space known as a “CFR nilspace”.)This transference principle approach seems to work well for the higher step cases (for instance, the stability of polynomials result is known in arbitrary degree); the main difficulty is to establish a suitable higher step inverse theorem in the ergodic theory setting, which we hope to do in future research.
]]>The analogous theory in complexity one is well understood. Here, one replaces the norm by the norm
and the ergodic systems for which is a norm are called Kronecker systems. These systems are completely classified: a system is Kronecker if and only if it arises from a compact abelian group equipped with Haar probability measure and a translation action for some homomorphism with dense image. Such systems can then be analyzed quite efficiently using the Fourier transform, and this can then be used to satisfactory analyze “complexity one” patterns, such as length three progressions, in arbitrary systems (or, when translated back to combinatorial settings, in arbitrary dense sets of abelian groups).We return now to the complexity two setting. The most famous examples of Conze-Lesigne systems are (order two) nilsystems, in which the space is a quotient of a two-step nilpotent Lie group by a lattice (equipped with Haar probability measure), and the action is given by a translation for some group homomorphism . For instance, the Heisenberg -nilsystem
with a shift of the form for two real numbers with linearly independent over , is a Conze-Lesigne system. As the base case of a well known result of Host and Kra, it is shown in fact that all Conze-Lesigne -systems are inverse limits of nilsystems (previous results in this direction were obtained by Conze-Lesigne, Furstenberg-Weiss, and others). Similar results are known for -systems when is finitely generated, thanks to the thesis work of Griesmer (with further proofs by Gutman-Lian and Candela-Szegedy). However, this is not the case once is not finitely generated; as a recent example of Shalom shows, Conze-Lesigne systems need not be the inverse limit of nilsystems in this case.Our main result is that even in the infinitely generated case, Conze-Lesigne systems are still inverse limits of a slight generalisation of the nilsystem concept, in which is a locally compact Polish group rather than a Lie group:
Theorem 1 (Classification of Conze-Lesigne systems) Let be a countable abelian group, and an ergodic measure-preserving -system. Then is a Conze-Lesigne system if and only if it is the inverse limit of translational systems , where is a nilpotent locally compact Polish group of nilpotency class two, and is a lattice in (and also a lattice in the commutator group ), with equipped with the Haar probability measure and a translation action for some homomorphism .
In a forthcoming companion paper to this one, Asgar Jamneshan and I will use this theorem to derive an inverse theorem for the Gowers norm for an arbitrary finite abelian group (with no restrictions on the order of , in particular our result handles the case of even and odd in a unified fashion). In principle, having a higher order version of this theorem will similarly allow us to derive inverse theorems for norms for arbitrary and finite abelian ; we hope to investigate this further in future work.
We sketch some of the main ideas used to prove the theorem. The existing machinery developed by Conze-Lesigne, Furstenberg-Weiss, Host-Kra, and others allows one to describe an arbitrary Conze-Lesigne system as a group extension , where is a Kronecker system (a rotational system on a compact abelian group and translation action ), is another compact abelian group, and the cocycle is a collection of measurable maps obeying the cocycle equation
for almost all . Furthermore, is of “type two”, which means in this concrete setting that it obeys an additional equation for all and almost all , and some measurable function ; roughly speaking this asserts that is “linear up to coboundaries”. For technical reasons it is also convenient to reduce to the case where is separable. The problem is that the equation (2) is unwieldy to work with. In the model case when the target group is a circle , one can use some Fourier analysis to convert (2) into the more tractable Conze-Lesigne equation for all , all , and almost all , where for each , is a measurable function, and is a homomorphism. (For technical reasons it is often also convenient to enforce that depend in a measurable fashion on ; this can always be achieved, at least when the Conze-Lesigne system is separable, but actually verifying that this is possible actually requires a certain amount of effort, which we devote an appendix to in our paper.) It is not difficult to see that (3) implies (2) for any group (as long as one has the measurability in mentioned previously), but the converse turns out to fail for some groups , such as solenoid groups (e.g., inverse limits of as ), as was essentially shown by Rudolph. However, in our paper we were able to find a separate argument that also derived the Conze-Lesigne equation in the case of a cyclic group . Putting together the and cases, one can then derive the Conze-Lesigne equation for arbitrary compact abelian Lie groups (as such groups are isomorphic to direct products of finitely many tori and cyclic groups). As has been known for some time (see e.g., this paper of Host and Kra), once one has a Conze-Lesigne equation, one can more or less describe the system as a translational system , where the Host-Kra group is the set of all pairs that solve an equation of the form (3) (with these pairs acting on by the law ), and is the stabiliser of a point in this system. This then establishes the theorem in the case when is a Lie group, and the general case basically comes from the fact (from Fourier analysis or the Peter-Weyl theorem) that an arbitrary compact abelian group is an inverse limit of Lie groups. (There is a technical issue here in that one has to check that the space of translational system factors of form a directed set in order to have a genuine inverse limit, but this can be dealt with by modifications of the tools mentioned here.)There is an additional technical issue worth pointing out here (which unfortunately was glossed over in some previous work in the area). Because the cocycle equation (1) and the Conze-Lesigne equation (3) are only valid almost everywhere instead of everywhere, the action of on is technically only a near-action rather than a genuine action, and as such one cannot directly define to be the stabiliser of a point without running into multiple problems. To fix this, one has to pass to a topological model of in which the action becomes continuous, and the stabilizer becomes well defined, although one then has to work a little more to check that the action is still transitive. This can be done via Gelfand duality; we proceed using a mixture of a construction from this book of Host and Kra, and the machinery in this recent paper of Asgar and myself.
Now we discuss how to establish the Conze-Lesigne equation (3) in the cyclic group case . As this group embeds into the torus , it is easy to use existing methods obtain (3) but with the homomorphism and the function taking values in rather than in . The main task is then to fix up the homomorphism so that it takes values in , that is to say that vanishes. This only needs to be done locally near the origin, because the claim is easy when lies in the dense subgroup of , and also because the claim can be shown to be additive in . Near the origin one can leverage the Steinhaus lemma to make depend linearly (or more precisely, homomorphically) on , and because the cocycle already takes values in , vanishes and must be an eigenvalue of the system . But as was assumed to be separable, there are only countably many eigenvalues, and by another application of Steinhaus and linearity one can then make vanish on an open neighborhood of the identity, giving the claim.
]]>
(where we have given each region depicted a different color, and moved the edges of each region a little away from each other in order to make them all visible separately), but if one wanted to instead depict a situation in which the intersection was empty, one could use an Euler diagram such as
One can use the area of various regions in a Venn or Euler diagram as a heuristic proxy for the cardinality (or measure ) of the set corresponding to such a region. For instance, the above Venn diagram can be used to intuitively justify the inclusion-exclusion formula
for finite sets , while the above Euler diagram similarly justifies the special case for finite disjoint sets .While Venn and Euler diagrams are traditionally two-dimensional in nature, there is nothing preventing one from using one-dimensional diagrams such as
or even three-dimensional diagrams such as this one from Wikipedia:
Of course, in such cases one would use length or volume as a heuristic proxy for cardinality or measure, rather than area.
With the addition of arrows, Venn and Euler diagrams can also accommodate (to some extent) functions between sets. Here for instance is a depiction of a function , the image of that function, and the image of some subset of :
Here one can illustrate surjectivity of by having fill out all of ; one can similarly illustrate injectivity of by giving exactly the same shape (or at least the same area) as . So here for instance might be how one would illustrate an injective function :
Cartesian product operations can be incorporated into these diagrams by appropriate combinations of one-dimensional and two-dimensional diagrams. Here for instance is a diagram that illustrates the identity :
In this blog post I would like to propose a similar family of diagrams to illustrate relationships between vector spaces (over a fixed base field , such as the reals) or abelian groups, rather than sets. The categories of (-)vector spaces and abelian groups are quite similar in many ways; the former consists of modules over a base field , while the latter consists of modules over the integers ; also, both categories are basic examples of abelian categories. The notion of a dimension in a vector space is analogous in many ways to that of cardinality of a set; see this previous post for an instance of this analogy (in the context of Shannon entropy). (UPDATE: I have learned that an essentially identical notation has also been proposed in an unpublished manuscript of Ravi Vakil.)
As with Venn and Euler diagrams, the diagrams I propose for vector spaces (or abelian groups) can be set up in any dimension. For simplicity, let’s begin with one dimension, and restrict attention to vector spaces (the situation for abelian groups is basically identical). In this one-dimensional model we will be able to depict the following relations and operations between vector spaces:
The idea is to use half-open intervals in the real line for any to model vector spaces. In fact we can make an explicit correspondence: let us identify each half-open interval with the (infinite-dimensional) vector space
that is is identified with the space of continuous functions on the interval that vanish at the right-endpoint . Such functions can be continuously extended by zero to the half-line .Note that if then the vector space is a subspace of , if we extend the functions in both spaces by zero to the half-line ; furthermore, the quotient of by is naturally identifiable with . Thus, an inclusion , as well as the quotient space , can be modeled here as follows:
In contrast, if , it is significantly less “natural” to view as a subspace of ; one could do it by extending functions in to the right by zero and to the left by constants, but in this notational convention one should view such an identification as “artificial” and to be avoided.
All of the spaces are infinite dimensional, but morally speaking the dimension of the vector space is “proportional” to the length of the corresponding interval. Intuitively, if we try to discretise this vector space by sampling at some mesh of spacing , one gets a finite-dimensional vector space of dimension roughly . Already the above diagram now depicts the basic identity
between a finite-dimensional vector space , a subspace of that space, and a quotient of that space.Note that if , then there is a linear transformation from the vector space to the vector space which takes a function in , restricts it to , then extends it by zero to . The kernel of this transformation is , the image is (isomorphic to) , the cokernel is (isomorphic to) , and the coimage is (isomorphic to) . With this in mind, we can now depict a general linear transformation and its associated spaces by the following diagram:
Note how the first isomorphism theorem and the rank-nullity theorem are heuristically illustrated by this diagram. One can specialise to the case of injective, surjective, or bijective transformations by degenerating one or more of the half-open intervals in the above diagram to the empty interval. A left shift on gives rise to a nilpotent linear transformation from to itself:
In a similar spirit, a short exact sequence of vector spaces (or abelian groups) can now be depicted by the diagram
and a long exact sequence can similarly be depicted by the diagram
UPDATE: As I have learned from an unpublished manuscript of Ravi Vakil, this notation can also easily depict the cohomology groups of a cochain complex by the diagram
and similarly depict the homology groups of a chain complex by the diagram
One can associate the disjoint union of half-open intervals to the direct sum of the associated vector spaces, giving a way to depict direct sums via this notation:
To increase the expressiveness of this notation we now move to two dimensions, where we can also depict the following additional relations and operations:
Here, we replace half-open intervals by half-open sets: geometric shapes , such as polygons or disks, which contain some portion of the boundary (drawn using solid lines) but omit some other portion of the boundary (drawn with dashed lines). Each such shape can be associated with a vector space, namely the continuous functions on that vanish on the omitted portion of the boundary. All of the relations that were previously depicted using one-dimensional diagrams can now be similarly depicted using two-dimensional diagrams. For instance, here is a two-dimensional depiction of a vector space , a subspace , and its quotient :
(In this post I will try to consistently make the lower and left boundaries of these regions closed, and the upper and right boundaries open, although this is not essential for this notation to be applicable.)
But now we can depict some additional relations. Here for instance is one way to depict the intersection and sum of two subspaces :
Note how this illustrates the identity
between finite-dimensional vector spaces , as well as some standard isomorphisms such as .Two finite subgroups of an abelian group are said to be commensurable if is a finite index subgroup of . One can depict this by making the area of the region between and small and/or colored with some specific color:
Here the commensurability of is equivalent to the finiteness of the groups and , which correspond to the gray triangles in the above diagram. Now for instance it becomes intuitively clear why commensurability should be an equivalence relation.
To illustrate how this notation can support multiple short exact sequences, I gave myself the exercise of using this notation to depict the snake lemma, as labeled by this following diagram taken from the just linked Wikipedia page:
This turned out to be remarkably tricky to accomplish without introducing degeneracies (e.g., one of the kernels or cokernels vanishing). Here is one solution I came up with; perhaps there are more elegant ones. In particular, there should be a depiction that more explicitly captures the duality symmetry of the snake diagram.
Here, the maps between the six spaces are the obvious restriction maps (and one can visually verify that the two squares in the snake diagram actually commute). Each of the kernel and cokernel spaces of the three vertical restriction maps are then associated to the union of two of the subregions as indicated in the diagram. Note how the overlaps between these kernels and cokernels generate the long exact “snake”.
UPDATE: by modifying a similar diagram in an unpublished manuscript of Ravi Vakil, I can now produce a more symmetric version of the above diagram, again with a very visible “snake”:
With our notation, the (algebraic) tensor product of an interval and another interval is not quite , but this becomes true if one uses the -algebra version of the tensor product, thanks to the Stone-Weierstrass theorem. So one can plausibly use Cartesian products as a proxy for the vector space tensor product. For instance, here is a depiction of the relation when is a subspace of :
There are unfortunately some limitations to this notation: for instance, no matter how many dimensions one uses for one’s diagrams, these diagrams would suggest the incorrect identity
(which incidentally is, at this time of writing, the highest-voted answer to the MathOverflow question “Examples of common false beliefs in mathematics“). (See also this previous blog post for a similar phenomenon when using sets or vector spaces to model entropy of information variables.) Nevertheless it seems accurate enough to be of use in illustrating many common relations between vector spaces and abelian groups. With appropriate grains of salt it might also be usable for further categories beyond these two, though for non-abelian categories one should proceed with caution, as the diagram may suggest relations that are not actually true in this category. For instance, in the category of topological groups one might use the diagram
to describe the fact that an arbitrary topological group splits into a connected subgroup and a totally disconnected quotient, or in the category of finite-dimensional Lie algebras over the reals one might use the diagram
to describe the fact that such algebras split into the solvable radical and a semisimple quotient.
]]>
Because of all the very different ways in which percentages could be used, I think it may make sense to propose an alternate system of units to measure one class of probabilities, namely the probabilities of avoiding some highly undesirable outcome, such as death, accident or illness. The units I propose are that of “nines“, which are already commonly used to measure availability of some service or purity of a material, but can be equally used to measure the safety (i.e., lack of risk) of some activity. Informally, nines measure how many consecutive appearances of the digit are in the probability of successfully avoiding the negative outcome, thus
Definition 1 (Nines of safety) An activity (affecting one or more persons, over some given period of time) that has a probability of the “safe” outcome and probability of the “unsafe” outcome will have nines of safety against the unsafe outcome, where is defined by the formula (where is the logarithm to base ten), or equivalently
Remark 2 Because of the various uncertainties in measuring probabilities, as well as the inaccuracies in some of the assumptions and approximations we will be making later, we will not attempt to measure the number of nines of safety beyond the first decimal point; thus we will round to the nearest tenth of a nine of safety throughout this post.
Here is a conversion table between percentage rates of success (the safe outcome), failure (the unsafe outcome), and the number of nines of safety one has:
Success rate | Failure rate | Number of nines |
infinite |
Thus, if one has no nines of safety whatsoever, one is guaranteed to fail; but each nine of safety one has reduces the failure rate by a factor of . In an ideal world, one would have infinitely many nines of safety against any risk, but in practice there are no guarantees against failure, and so one can only expect a finite amount of nines of safety in any given situation. Realistically, one should thus aim to have as many nines of safety as one can reasonably expect to have, but not to demand an infinite amount.
Remark 3 The number of nines of safety against a certain risk is not absolute; it will depend not only on the risk itself, but (a) the number of people exposed to the risk, and (b) the length of time one is exposed to the risk. Exposing more people or increasing the duration of exposure will reduce the number of nines, and conversely exposing fewer people or reducing the duration will increase the number of nines; see Proposition 7 below for a rough rule of thumb in this regard.
Remark 4 Nines of safety are a logarithmic scale of measurement, rather than a linear scale. Other familiar examples of logarithmic scales of measurement include the Richter scale of earthquake magnitude, the pH scale of acidity, the decibel scale of sound level, octaves in music, and the magnitude scale for stars.
Remark 5 One way to think about nines of safety is via the Swiss cheese model that was created recently to describe pandemic risk management. In this model, each nine of safety can be thought of as a slice of Swiss cheese, with holes occupying of that slice. Having nines of safety is then analogous to standing behind such slices of Swiss cheese. In order for a risk to actually impact you, it must pass through each of these slices. A fractional nine of safety corresponds to a fractional slice of Swiss cheese that covers the amount of space given by the above table. For instance, nines of safety corresponds to a fractional slice that covers about of the given area (leaving uncovered).
Now to give some real-world examples of nines of safety. Using data for deaths in the US in 2019 (without attempting to account for factors such as age and gender), a random US citizen will have had the following amount of safety from dying from some selected causes in that year:
Cause of death | Mortality rate per (approx.) | Nines of safety |
All causes | ||
Heart disease | ||
Cancer | ||
Accidents | ||
Drug overdose | ||
Influenza/Pneumonia | ||
Suicide | ||
Gun violence | ||
Car accident | ||
Murder | ||
Airplane crash | ||
Lightning strike |
The safety of air travel is particularly remarkable: a given hour of flying in general aviation has a fatality rate of , or about nines of safety, while for the major carriers the fatality rate drops down to , or about nines of safety.
Of course, in 2020, COVID-19 deaths became significant. In this year in the US, the mortality rate for COVID-19 (as the underlying or contributing cause of death) was per , corresponding to nines of safety, which was less safe than all other causes of death except for heart disease and cancer. At this time of writing, data for all of 2021 is of course not yet available, but it seems likely that the safety level would be even lower for this year.
Some further illustrations of the concept of nines of safety:
Here is another way to think about nines of safety:
Proposition 6 (Nines of safety extend expected onset of risk) Suppose a certain risky activity has nines of safety. If one repeatedly indulges in this activity until the risk occurs, then the expected number of trials before the risk occurs is .
Proof: The probability that the risk is activated after exactly trials is , which is a geometric distribution of parameter . The claim then follows from the standard properties of that distribution.
Thus, for instance, if one performs some risky activity daily, then the expected length of time before the risk occurs is given by the following table:
Daily nines of safety | Expected onset of risk |
One day | |
One week | |
One month | |
One year | |
Two years | |
Five years | |
Ten years | |
Twenty years | |
Fifty years | |
A century |
Or, if one wants to convert the yearly risks of dying from a specific cause into expected years before that cause of death would occur (assuming for sake of discussion that no other cause of death exists):
Yearly nines of safety | Expected onset of risk |
One year | |
Two years | |
Five years | |
Ten years | |
Twenty years | |
Fifty years | |
A century |
These tables suggest a relationship between the amount of safety one would have in a short timeframe, such as a day, and a longer time frame, such as a year. Here is an approximate formalisation of that relationship:
Proposition 7 (Repeated exposure reduces nines of safety) If a risky activity with nines of safety is (independently) repeated times, then (assuming is large enough depending on ), the repeated activity will have approximately nines of safety. Conversely: if the repeated activity has nines of safety, the individual activity will have approximately nines of safety.
Proof: An activity with nines of safety will be safe with probability , hence safe with probability if repeated independently times. For large, we can approximate
giving the former claim. The latter claim follows from inverting the former.
Remark 8 The hypothesis of independence here is key. If there is a lot of correlation between the risks between different repetitions of the activity, then there can be much less reduction in safety caused by that repetition. As a simple example, suppose that of a workforce are trained to perform some task flawlessly no matter how many times they repeat the task, but the remaining are untrained and will always fail at that task. If one selects a random worker and asks them to perform the task, one has nines of safety against the task failing. If one took that same random worker and asked them to perform the task times, the above proposition might suggest that the number of nines of safety would drop to approximately ; but in this case there is perfect correlation, and in fact the number of nines of safety remains steady at since it is the same of the workforce that would fail each time.Because of this caveat, one should view the above proposition as only a crude first approximation that can be used as a simple rule of thumb, but should not be relied upon for more precise calculations.
One can repeat a risk either in time (extending the time of exposure to the risk, say from a day to a year), or in space (by exposing the risk to more people). The above proposition then gives an additive conversion law for nines of safety in either case. Here are some conversion tables for time:
From/to | Daily | Weekly | Monthly | Yearly |
Daily | 0 | -0.8 | -1.5 | -2.6 |
Weekly | +0.8 | 0 | -0.6 | -1.7 |
Monthly | +1.5 | +0.6 | 0 | -1.1 |
Yearly | +2.6 | +1.7 | +1.1 | 0 |
From/to | Yearly | Per 5 yr | Per decade | Per century |
Yearly | 0 | -0.7 | -1.0 | -2.0 |
Per 5 yr | +0.7 | 0 | -0.3 | -1.3 |
Per decade | +1.0 | + -0.3 | 0 | -1.0 |
Per century | +2.0 | +1.3 | +1.0 | 0 |
For instance, as mentioned before, the yearly amount of safety against cancer is about . Using the above table (and making the somewhat unrealistic hypothesis of independence), we then predict the daily amount of safety against cancer to be about nines, the weekly amount to be about nines, and the amount of safety over five years to drop to about nines.
Now we turn to conversions in space. If one knows the level of safety against a certain risk for an individual, and then one (independently) exposes a group of such individuals to that risk, then the reduction in nines of safety when considering the possibility that at least one group member experiences this risk is given by the following table:
Group | Reduction in safety |
You ( person) | |
You and your partner ( people) | |
You and your parents ( people) | |
You, your partner, and three children ( people) | |
An extended family of people | |
A class of people | |
A workplace of people | |
A school of people | |
A university of people | |
A town of people | |
A city of million people | |
A state of million people | |
A country of million people | |
A continent of billion people | |
The entire planet |
For instance, in a given year (and making the somewhat implausible assumption of independence), you might have nines of safety against cancer, but you and your partner collectively only have about nines of safety against this risk, your family of five might only have about nines of safety, and so forth. By the time one gets to a group of people, it actually becomes very likely that at least one member of the group will die of cancer in that year. (Here the precise conversion table breaks down, because a negative number of nines such as is not possible, but one should interpret a prediction of a negative number of nines as an assertion that failure is very likely to happen. Also, in practice the reduction in safety is less than this rule predicts, due to correlations such as risk factors that are common to the group being considered that are incompatible with the assumption of independence.)
In the opposite direction, any reduction in exposure (either in time or space) to a risk will increase one’s safety level, as per the following table:
Reduction in exposure | Additional nines of safety |
For instance, a five-fold reduction in exposure will reclaim about additional nines of safety.
Here is a slightly different way to view nines of safety:
Proposition 9 Suppose that a group of people are independently exposed to a given risk. If there are at most nines of individual safety against that risk, then there is at least a chance that one member of the group is affected by the risk.
Proof: If individually there are nines of safety, then the probability that all the members of the group avoid the risk is . Since the inequality
is equivalent to the claim follows.Thus, for a group to collectively avoid a risk with at least a chance, one needs the following level of individual safety:
Group | Individual safety level required |
You ( person) | |
You and your partner ( people) | |
You and your parents ( people) | |
You, your partner, and three children ( people) | |
An extended family of people | |
A class of people | |
A workplace of people | |
A school of people | |
A university of people | |
A town of people | |
A city of million people | |
A state of million people | |
A country of million people | |
A continent of billion people | |
The entire planet |
For large , the level of nines of individual safety required to protect a group of size with probability at least is approximately .
Precautions that can work to prevent a certain risk from occurring will add additional nines of safety against that risk, even if the precaution is not effective. Here is the precise rule:
Proposition 10 (Precautions add nines of safety) Suppose an activity carries nines of safety against a certain risk, and a separate precaution can independently protect against that risk with nines of safety (that is to say, the probability that the protection is effective is ). Then applying that precaution increases the number of nines in the activity from to .
Proof: The probability that the precaution fails and the risk then occurs is . The claim now follows from Definition 1.
In particular, we can repurpose the table at the start of this post as a conversion chart for effectiveness of a precaution:
Effectiveness | Failure rate | Additional nines provided |
infinite |
Thus for instance a precaution that is effective will add nines of safety, a precaution that is effective will add nines of safety, and so forth. The mRNA COVID vaccines by Pfizer and Moderna have somewhere between effectiveness against symptomatic COVID illness, providing about nines of safety against that risk, and over effectiveness against severe illness, thus adding at least nines of safety in this regard.
A slight variant of the above rule can be stated using the concept of relative risk:
Proposition 11 (Relative risk and nines of safety) Suppose an activity carries nines of safety against a certain risk, and an action multiplies the chance of failure by some relative risk . Then the action removes nines of safety (if ) or adds nines of safety (if ) to the original activity.
Proof: The additional action adjusts the probability of failure from to . The claim now follows from Definition 1.
Here is a conversion chart between relative risk and change in nines of safety:
Relative risk | Change in nines of safety |
Some examples:
The effect of combining multiple (independent) precautions together is cumulative; one can achieve quite a high level of safety by stacking together several precautions that individually have relatively low levels of effectiveness. Again, see the “swiss cheese model” referred to in Remark 5. For instance, if face masks add nines of safety against contracting COVID, social distancing adds another nines, and the vaccine provide another nine of safety, implementing all three mitigation methods would (assuming independence) add a net of nines of safety against contracting COVID.
In summary, when debating the value of a given risk mitigation measure, the correct question to ask is not quite “Is it certain to work” or “Can it fail?”, but rather “How many extra nines of safety does it add?”.
As one final comparison between nines of safety and other standard risk measures, we give the following proposition regarding large deviations from the mean.
Proposition 12 Let be a normally distributed random variable of standard deviation , and let . Then the “one-sided risk” of exceeding its mean by at least (i.e., ) carries nines of safety, the “two-sided risk” of deviating (in either direction) from its mean by at least (i.e., ) carries nines of safety, where is the error function.
Proof: This is a routine calculation using the cumulative distribution function of the normal distribution.
Here is a short table illustrating this proposition:
Number of deviations from the mean | One-sided nines of safety | Two-sided nines of safety |
Thus, for instance, the risk of a five sigma event (deviating by more than five standard deviations from the mean in either direction) should carry nines of safety assuming a normal distribution, and so one would ordinarily feel extremely safe against the possibility of such an event, unless one started doing hundreds of thousands of trials. (However, we caution that this conclusion relies heavily on the assumption that one has a normal distribution!)
See also this older essay I wrote on anonymity on the internet, using bits as a measure of anonymity in much the same way that nines are used here as a measure of safety.
]]>For the purposes of this paper, a Siegel zero is a zero of a Dirichlet -function corresponding to a primitive quadratic character of some conductor , which is close to in the sense that
for some large (which we will call the quality) of the Siegel zero. The significance of these zeroes is that they force the Möbius function and the Liouville function to “pretend” to be like the exceptional character for primes of magnitude comparable to . Indeed, if we define an exceptional prime to be a prime in which , then very few primes near will be exceptional; in our paper we use some elementary arguments to establish the bounds for any and , where the sum is over exceptional primes in the indicated range ; this bound is non-trivial for as large as . (See Section 1 of this blog post for some variants of this argument, which were inspired by work of Heath-Brown.) There is also a companion bound (somewhat weaker) that covers a range of a little bit below .One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:
Theorem 1 (Hardy-Littlewood assuming Siegel zero) Let be a fixed natural number. Suppose one has a Siegel zero associated to some conductor . Then we have for all , where is the von Mangoldt function and is the singular series
In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.
Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):
Theorem 2 (Chowla assuming Siegel zero) Let be distinct fixed natural numbers. Suppose one has a Siegel zero associated to some conductor . Then one has in the range , where is the Liouville function.
In our paper we unify these results and also improve the quantitative estimates and range of :
Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Let be distinct fixed natural numbers with . Suppose one has a Siegel zero associated to some conductor . Then one has for for any fixed .
Our argument proceeds by a series of steps in which we replace and by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:
Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.
The restriction in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant to is a twisted, sieved version of the divisor function , and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum
For this sum can be easily controlled by the Dirichlet hyperbola method. For one needs the fact that has a level of distribution greater than ; in fact Kloosterman sum bounds give a level of distribution of , a folklore fact that seems to have first been observed by Linnik and Selberg. We use a (notationally more complicated) version of this argument to treat the sums arising in (iv) for . Unfortunately for there are no known techniques to unconditionally obtain asymptotics, even for the model sum although we do at least have fairly convincing conjectures as to what the asymptotics should be. Because of this, it seems unlikely that one will be able to relax the hypothesis in our main theorem at our current level of understanding of analytic number theory.Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.
]]>
Feel free to post answers or other thoughts on these questions in the comments.
]]>A bit more specifically, the paper studies the decidability of the above question. There are two slightly different types of decidability one could consider here:
Note that the notion of logical decidability is “pointwise” in the sense that it pertains to a single choice of data , whereas the notion of algorithmic decidability pertains instead to classes of data, and is only interesting when this class is infinite. Indeed, any tiling problem with a finite class of data is trivially decidable because one could simply code a Turing machine that is basically a lookup table that returns the correct answer for each choice of data in the class. (This is akin to how a student with a good memory could pass any exam if the questions are drawn from a finite list, merely by memorising an answer key for that list of questions.)
The two notions are related as follows: if a tiling problem (1) is algorithmically undecidable for some class of data, then the tiling equation must be logically undecidable for at least one choice of data for this class. For if this is not the case, one could algorithmically decide the tiling problem by searching for proofs or disproofs that the equation (1) is solvable for a given choice of data; the logical decidability of all such solvability questions will ensure that this algorithm always terminates in finite time.
One can use the Gödel completeness theorem to interpret logical decidability in terms of universes (also known as structures or models) of ZFC. In addition to the “standard” universe of sets that we believe satisfies the axioms of ZFC, there are also other “nonstandard” universes that also obey the axioms of ZFC. If the solvability of a tiling equation (1) is logically undecidable, this means that such a tiling exists in some universes of ZFC, but not in others.
(To continue the exam analogy, we thus see that a yes-no exam question is logically undecidable if the answer to the question is yes in some parallel universes, but not in others. A course syllabus is algorithmically undecidable if there is no way to prepare for the final exam for the course in a way that guarantees a perfect score (in the standard universe).)
Questions of decidability are also related to the notion of aperiodicity. For a given , a tiling equation (1) is said to be aperiodic if the equation (1) is solvable (in the standard universe of ZFC), but none of the solutions (in that universe) are completely periodic (i.e., there are no solutions where all of the are periodic). Perhaps the most well-known example of an aperiodic tiling (in the context of , and using rotations as well as translations) come from the Penrose tilings, but there are many others besides.
It was (essentially) observed by Hao Wang in the 1960s that if a tiling equation is logically undecidable, then it must necessarily be aperiodic. Indeed, if a tiling equation fails to be aperiodic, then (in the standard universe) either there is a periodic tiling, or there are no tilings whatsoever. In the former case, the periodic tiling can be used to give a finite proof that the tiling equation is solvable; in the latter case, the compactness theorem implies that there is some finite fragment of that is not compatible with being tiled by , and this provides a finite proof that the tiling equation is unsolvable. Thus in either case the tiling equation is logically decidable.
This observation of Wang clarifies somewhat how logically undecidable tiling equations behave in the various universes of ZFC. In the standard universe, tilings exist, but none of them will be periodic. In nonstandard universes, tilings may or may not exist, and the tilings that do exist may be periodic (albeit with a nonstandard period); but there must be at least one universe in which no tiling exists at all.
In one dimension when (or more generally with a finite group), a simple pigeonholing argument shows that no tiling equations are aperiodic, and hence all tiling equations are decidable. However the situation changes in two dimensions. In 1966, Berger (a student of Wang) famously showed that there exist tiling equations (1) in the discrete plane that are aperiodic, or even logically undecidable; in fact he showed that the tiling problem in this case (with arbitrary choices of data ) was algorithmically undecidable. (Strictly speaking, Berger established this for a variant of the tiling problem known as the domino problem, but later work of Golomb showed that the domino problem could be easily encoded within the tiling problem.) This was accomplished by encoding the halting problem for Turing machines into the tiling problem (or domino problem); the latter is well known to be algorithmically undecidable (and thus have logically undecidable instances), and so the latter does also. However, the number of tiles required for Berger’s construction was quite large: his construction of an aperiodic tiling required tiles, and his construction of a logically undecidable tiling required an even larger (and not explicitly specified) collection of tiles. Subsequent work by many authors did reduce the number of tiles required; in the setting, the current world record for the fewest number of tiles in an aperiodic tiling is (due to Amman, Grunbaum, and Shephard) and for a logically undecidable tiling is (due to Ollinger). On the other hand, it is conjectured (see Grunbaum-Shephard and Lagarias-Wang) that one cannot lower all the way to :
Conjecture 1 (Periodic tiling conjecture) If is a periodic subset of a finitely generated abelian group , and is a finite subset of , then the tiling equation is not aperiodic.
This conjecture is known to be true in two dimensions (by work of Bhattacharya when , and more recently by us when ), but remains open in higher dimensions. By the preceding discussion, the conjecture implies that every tiling equation with a single tile is logically decidable, and the problem of whether a given periodic set can be tiled by a single tile is algorithmically decidable.
In this paper we show on the other hand that aperiodic and undecidable tilings exist when , at least if one is permitted to enlarge the group a bit:
Theorem 2 (Logically undecidable tilings)
- (i) There exists a group of the form for some finite abelian , a subset of , and finite sets such that the tiling equation is logically undecidable (and hence also aperiodic).
- (ii) There exists a dimension , a periodic subset of , and finite sets such that tiling equation is logically undecidable (and hence also aperiodic).
- (iii) There exists a non-abelian finite group (with the group law still written additively), a subset of , and a finite set such that the nonabelian tiling equation is logically undecidable (and hence also aperiodic).
We also have algorithmic versions of this theorem. For instance, the algorithmic version of (i) is that the problem of determining solvability of the tiling equation for a given choice of finite abelian group , subset of , and finite sets is algorithmically undecidable. Similarly for (ii), (iii).
This result (together with a negative result discussed below) suggest to us that there is a significant qualitative difference in the theory of tiling by a single (abelian) tile, and the theory of tiling with multiple tiles (or one non-abelian tile). (The positive results on the periodic tiling conjecture certainly rely heavily on the fact that there is only one tile, in particular there is a “dilation lemma” that is only available in this setting that is of key importance in the two dimensional theory.) It would be nice to eliminate the group from (i) (or to set in (ii)), but I think this would require a fairly significant modification of our methods.
Like many other undecidability results, the proof of Theorem 2 proceeds by a sequence of reductions, in which the undecidability of one problem is shown to follow from the undecidability of another, more “expressive” problem that can be encoded inside the original problem, until one reaches a problem that is so expressive that it encodes a problem already known to be undecidable. Indeed, all three undecidability results are ultimately obtained from Berger’s undecidability result on the domino problem.
The first step in increasing expressiveness is to observe that the undecidability of a single tiling equation follows from the undecidability of a system of tiling equations. More precisely, suppose we have non-empty finite subsets of a finitely generated group for and , as well as periodic sets of for , such that it is logically undecidable whether the system of tiling equations
for has no solution in . Then, for any , we can “stack” these equations into a single tiling equation in the larger group , and specifically to the equation where and It is a routine exercise to check that the system of equations (2) admits a solution in if and only if the single equation (3) admits a equation in . Thus, to prove the undecidability of a single equation of the form (3) it suffices to establish undecidability of a system of the form (2); note here how the freedom to select the auxiliary group is important here.We view systems of the form (2) as belonging to a kind of “language” in which each equation in the system is a “sentence” in the language imposing additional constraints on a tiling. One can now pick and choose various sentences in this language to try to encode various interesting problems. For instance, one can encode the concept of a function taking values in a finite group as a single tiling equation
since the solutions to this equation are precisely the graphs of a function . By adding more tiling equations to this equation to form a larger system, we can start imposing additional constraints on this function . For instance, if is a coset of some subgroup of , we can impose the additional equation to impose the additional constraint that for all , if we desire. If happens to contain two distinct elements , and , then the additional equation imposes the additional constraints that for all , and additionally that for all .This begins to resemble the equations that come up in the domino problem. Here one has a finite set of Wang tiles – unit squares where each of the four sides is colored with a color (corresponding to the four cardinal directions North, South, East, and West) from some finite set of colors. The domino problem is then to tile the plane with copies of these tiles in such a way that adjacent sides match. In terms of equations, one is seeking to find functions obeying the pointwise constraint
for all where is the set of colors associated to the set of Wang tiles being used, and the matching constraints for all . As it turns out, the pointwise constraint (7) can be encoded by tiling equations that are fancier versions of (4), (5), (6) that involve only one unknown tiling set , but in order to encode the matching constraints (8) we were forced to introduce a second tile (or work with nonabelian tiling equations). This appears to be an inherent feature of the method, since we found a partial rigidity result for tilings of one tile in one dimension that obstructs this encoding strategy from working when one only has one tile available. The result is as follows:
Proposition 3 (Swapping property) Consider the solutions to a tiling equation in a one-dimensional group (with a finite abelian group, finite, and periodic). Suppose there are two solutions to this equation that agree on the left in the sense that For any function , define the “swap” of and to be the set Then also solves the equation (9).
One can think of and as “genes” with “nucleotides” , at each position , and is a new gene formed by choosing one of the nucleotides from the “parent” genes , at each position. The above proposition then says that the solutions to the equation (9) must be closed under “genetic transfer” among any pair of genes that agree on the left. This seems to present an obstruction to trying to encode equation such as
for two functions (say), which is a toy version of the matching constraint (8), since the class of solutions to this equation turns out not to obey this swapping property. On the other hand, it is easy to encode such equations using two tiles instead of one, and an elaboration of this construction is used to prove our main theorem. ]]>Let us first discuss the algebraic geometry application. Given a smooth complex -dimensional projective variety there is a standard line bundle attached to it, known as the canonical line bundle; -forms on the variety become sections of this bundle. The bundle may not actually admit global sections; that is to say, the dimension of global sections may vanish. But as one raises the canonical line bundle to higher and higher powers to form further line bundles , the number of global sections tends to increase; in particular, the dimension of global sections (known as the plurigenus) always obeys an asymptotic of the form
as for some non-negative number , which is called the volume of the variety , which is an invariant that reveals some information about the birational geometry of . For instance, if the canonical line bundle is ample (or more generally, nef), this volume is equal to the intersection number (roughly speaking, the number of common zeroes of generic sections of the canonical line bundle); this is a special case of the asymptotic Riemann-Roch theorem. In particular, the volume is a natural number in this case. However, it is possible for the volume to also be fractional in nature. One can then ask: how small can the volume get without vanishing entirely? (By definition, varieties with non-vanishing volume are known as varieties of general type.)It follows from a deep result obtained independently by Hacon–McKernan, Takayama and Tsuji that there is a uniform lower bound for the volume of all -dimensional projective varieties of general type. However, the precise lower bound is not known, and the current paper is a contribution towards probing this bound by constructing varieties of particularly small volume in the high-dimensional limit . Prior to this paper, the best such constructions of -dimensional varieties basically had exponentially small volume, with a construction of volume at most given by Ballico–Pignatelli–Tasin, and an improved construction with a volume bound of given by Totaro and Wang. In this paper, we obtain a variant construction with the somewhat smaller volume bound of ; the method also gives comparable bounds for some other related algebraic geometry statistics, such as the largest for which the pluricanonical map associated to the linear system is not a birational embedding into projective space.
The space is constructed by taking a general hypersurface of a certain degree in a weighted projective space and resolving the singularities. These varieties are relatively tractable to work with, as one can use standard algebraic geometry tools (such as the Reid–Tai inequality) to provide sufficient conditions to guarantee that the hypersurface has only canonical singularities and that the canonical bundle is a reflexive sheaf, which allows one to calculate the volume exactly in terms of the degree and weights . The problem then reduces to optimizing the resulting volume given the constraints needed for the above-mentioned sufficient conditions to hold. After working with a particular choice of weights (which consist of products of mostly consecutive primes, with each product occuring with suitable multiplicities ), the problem eventually boils down to trying to minimize the total multiplicity , subject to certain congruence conditions and other bounds on the . Using crude bounds on the eventually leads to a construction with volume at most , but by taking advantage of the ability to “dilate” the congruence conditions and optimizing over all dilations, we are able to improve the constant to .
Now it is time to turn to the analytic side of the paper by describing the optimization problem that we solve. We consider the sawtooth function , with defined as the unique real number in that is equal to mod . We consider a (Borel) probability measure on the real line, and then compute the average value of this sawtooth function
as well as various dilates of this expectation. Since is bounded above by , we certainly have the trivial bound However, this bound is not very sharp. For instance, the only way in which could attain the value of is if the probability measure was supported on half-integers, but in that case would vanish. For the algebraic geometry application discussed above one is then led to the following question: for a given choice of , what is the best upper bound on the quantity that holds for all probability measures ?If one considers the deterministic case in which is a Dirac mass supported at some real number , then the Dirichlet approximation theorem tells us that there is such that is within of an integer, so we have
in this case, and this bound is sharp for deterministic measures . Thus we have However, both of these bounds turn out to be far from the truth, and the optimal value of is comparable to . In fact we were able to compute this quantity precisely:
Theorem 1 (Optimal bound for sawtooth inequality) Let .In particular, we have as .
- (i) If for some natural number , then .
- (ii) If for some natural number , then .
We establish this bound through duality. Indeed, suppose we could find non-negative coefficients such that one had the pointwise bound
for all real numbers . Integrating this against an arbitrary probability measure , we would conclude and hence Conversely, one can find lower bounds on by selecting suitable candidate measures and computing the means . The theory of linear programming duality tells us that this method must give us the optimal bound, but one has to locate the optimal measure and optimal weights . This we were able to do by first doing some extensive numerics to discover these weights and measures for small values of , and then doing some educated guesswork to extrapolate these examples to the general case, and then to verify the required inequalities. In case (i) the situation is particularly simple, as one can take to be the discrete measure that assigns a probability to the numbers and the remaining probability of to , while the optimal weighted inequality (1) turns out to be which is easily proven by telescoping series. However the general case turned out to be significantly tricker to work out, and the verification of the optimal inequality required a delicate case analysis (reflecting the fact that equality was attained in this inequality in a large number of places).After solving the sawtooth problem, we became interested in the analogous question for the sine function, that is to say what is the best bound for the inequality
The left-hand side is the smallest imaginary part of the first Fourier coefficients of . To our knowledge this quantity has not previously been studied in the Fourier analysis literature. By adopting a similar approach as for the sawtooth problem, we were able to compute this quantity exactly also:
Theorem 2 For any , one has In particular,
Interestingly, a closely related cotangent sum recently appeared in this MathOverflow post. Verifying the lower bound on boils down to choosing the right test measure ; it turns out that one should pick the probability measure supported the with odd, with probability proportional to , and the lower bound verification eventually follows from a classical identity
for , first posed by Eisenstein in 1844 and proved by Stern in 1861. The upper bound arises from establishing the trigonometric inequality for all real numbers , which to our knowledge is new; the left-hand side has a Fourier-analytic intepretation as convolving the Fejér kernel with a certain discretized square wave function, and this interpretation is used heavily in our proof of the inequality. ]]>The significance of the Gowers norms is that they control other multilinear forms that show up in additive combinatorics. Given any polynomials and functions , we define the multilinear form
(assuming that the denominator is finite and non-zero). Thus for instance where we view as formal (indeterminate) variables, and are understood to be extended by zero to all of . These forms are used to count patterns in various sets; for instance, the quantity is closely related to the number of length three arithmetic progressions contained in . Let us informally say that a form is controlled by the norm if the form is small whenever are -bounded functions with at least one of the small in norm. This definition was made more precise by Gowers and Wolf, who then defined the true complexity of a form to be the least such that is controlled by the norm. For instance,Gowers and Wolf formulated a conjecture on what this complexity should be, at least for linear polynomials ; Ben Green and I thought we had resolved this conjecture back in 2010, though it turned out there was a subtle gap in our arguments and we were only able to resolve the conjecture in a partial range of cases. However, the full conjecture was recently resolved by Daniel Altman.
The (semi-)norm is so weak that it barely controls any averages at all. For instance the average
is not controlled by the semi-norm: it is perfectly possible for a -bounded function to even have vanishing norm but have large value of (consider for instance the parity function ).Because of this, I propose inserting an additional norm in the Gowers uniformity norm hierarchy between the and norms, which I will call the (or “profinite “) norm:
where ranges over all arithmetic progressions in . This can easily be seen to be a norm on functions that controls the norm. It is also basically controlled by the norm for -bounded functions ; indeed, if is an arithmetic progression in of some spacing , then we can write as the intersection of an interval with a residue class modulo , and from Fourier expansion we have If we let be a standard bump function supported on with total mass and is a parameter then (extending by zero outside of ), as can be seen by using the triangle inequality and the estimate After some Fourier expansion of we now have Writing as a linear combination of and using the Gowers–Cauchy–Schwarz inequality, we conclude hence on optimising in we have Forms which are controlled by the norm (but not ) would then have their true complexity adjusted to with this insertion.The norm recently appeared implicitly in work of Peluse and Prendiville, who showed that the form had true complexity in this notation (with polynomially strong bounds). [Actually, strictly speaking this control was only shown for the third function ; for the first two functions one needs to localize the norm to intervals of length . But I will ignore this technical point to keep the exposition simple.] The weaker claim that has true complexity is substantially easier to prove (one can apply the circle method together with Gauss sum estimates).
The well known inverse theorem for the norm tells us that if a -bounded function has norm at least for some , then there is a Fourier phase such that
this follows easily from (1) and Plancherel’s theorem. Conversely, from the Gowers–Cauchy–Schwarz inequality one hasFor one has a trivial inverse theorem; by definition, the norm of is at least if and only if
Thus the frequency appearing in the inverse theorem can be taken to be zero when working instead with the norm.For one has the intermediate situation in which the frequency is not taken to be zero, but is instead major arc. Indeed, suppose that is -bounded with , thus
for some progression . This forces the spacing of this progression to be . We write the above inequality as for some residue class and some interval . By Fourier expansion and the triangle inequality we then have for some integer . Convolving by for a small multiple of and a Schwartz function of unit mass with Fourier transform supported on , we have The Fourier transform of is bounded by and supported on , thus by Fourier expansion and the triangle inequality we have for some , so in particular . Thus we have for some of the major arc form with . Conversely, for of this form, some routine summation by parts gives the bound so if (2) holds for a -bounded then one must have .Here is a diagram showing some of the control relationships between various Gowers norms, multilinear forms, and duals of classes of functions (where each class of functions induces a dual norm :
Here I have included the three classes of functions that one can choose from for the inverse theorem, namely degree two nilsequences, bracket quadratic phases, and local quadratic phases, as well as the more narrow class of globally quadratic phases.
The Gowers norms have counterparts for measure-preserving systems , known as Host-Kra seminorms. The norm can be defined for as
and the norm can be defined as The seminorm is orthogonal to the invariant factor (generated by the (almost everywhere) invariant measurable subsets of ) in the sense that a function has vanishing seminorm if and only if it is orthogonal to all -measurable (bounded) functions. Similarly, the norm is orthogonal to the Kronecker factor , generated by the eigenfunctions of (that is to say, those obeying an identity for some -invariant ); for ergodic systems, it is the largest factor isomorphic to rotation on a compact abelian group. In analogy to the Gowers norm, one can then define the Host-Kra seminorm by it is orthogonal to the profinite factor , generated by the periodic sets of (or equivalently, by those eigenfunctions whose eigenvalue is a root of unity); for ergodic systems, it is the largest factor isomorphic to rotation on a profinite abelian group. ]]>