You are currently browsing Terence Tao’s articles.

How many groups of order four are there? Technically, there are an enormous number, so much so, in fact, that the class of groups of order four is not even a set, but merely a proper class. This is because *any* four objects can be turned into a group by designating one of the four objects, say , to be the group identity, and imposing a suitable multiplication table (and inversion law) on the four elements in a manner that obeys the usual group axioms. Since all sets are themselves objects, the class of four-element groups is thus at least as large as the class of all sets, which by Russell’s paradox is known not to itself be a set (assuming the usual ZFC axioms of set theory).

A much better question is to ask how many groups of order four there are *up to isomorphism*, counting each isomorphism class of groups exactly once. Now, as one learns in undergraduate group theory classes, the answer is just “two”: the cyclic group and the Klein four-group .

More generally, given a class of objects and some equivalence relation on (which one should interpret as describing the property of two objects in being “isomorphic”), one can consider the number of objects in “up to isomorphism”, which is simply the cardinality of the collection of equivalence classes of . In the case where is finite, one can express this cardinality by the formula

thus one counts elements in , weighted by the reciprocal of the number of objects they are isomorphic to.

Example 1Let be the five-element set of integers between and . Let us say that two elements of are isomorphic if they have the same magnitude: . Then the quotient space consists of just three equivalence classes: , , and . Thus there are three objects in up to isomorphism, and the identity (1) is then justThus, to count elements in up to equivalence, the elements are given a weight of because they are each isomorphic to two elements in , while the element is given a weight of because it is isomorphic to just one element in (namely, itself).

Given a finite probability set , there is also a natural probability distribution on , namely the *uniform distribution*, according to which a random variable is set equal to any given element of with probability :

Given a notion of isomorphism on , one can then define the random equivalence class that the random element belongs to. But if the isomorphism classes are unequal in size, we now encounter a biasing effect: even if was drawn uniformly from , the equivalence class need not be uniformly distributed in . For instance, in the above example, if was drawn uniformly from , then the equivalence class will not be uniformly distributed in the three-element space , because it has a probability of being equal to the class or to the class , and only a probability of being equal to the class .

However, it is possible to remove this bias by changing the weighting in (1), and thus changing the notion of what cardinality means. To do this, we generalise the previous situation. Instead of considering sets with an equivalence relation to capture the notion of isomorphism, we instead consider groupoids, which are sets in which there are allowed to be *multiple* isomorphisms between elements in (and in particular, there are allowed to be multiple *automorphisms* from an element to itself). More precisely:

Definition 2A groupoid is a set (or proper class) , together with a (possibly empty) collection of “isomorphisms” between any pair of elements of , and a composition map from isomorphisms , to isomorphisms in for every , obeying the following group-like axioms:

- (Identity) For every , there is an identity isomorphism , such that for all and .
- (Associativity) If , , and for some , then .
- (Inverse) If for some , then there exists an inverse isomorphism such that and .
We say that two elements of a groupoid are

isomorphic, and write , if there is at least one isomorphism from to .

Example 3Any category gives a groupoid by taking to be the set (or class) of objects, and to be the collection of invertible morphisms from to . For instance, in the category of sets, would be the collection of bijections from to ; in the category of linear vector spaces over some given base field , would be the collection of invertible linear transformations from to ; and so forth.

Every set equipped with an equivalence relation can be turned into a groupoid by assigning precisely one isomorphism from to for any pair with , and no isomorphisms from to when , with the groupoid operations of identity, composition, and inverse defined in the only way possible consistent with the axioms. We will call this the *simply connected groupoid* associated with this equivalence relation. For instance, with as above, if we turn into a simply connected groupoid, there will be precisely one isomorphism from to , and also precisely one isomorphism from to , but no isomorphisms from to , , or .

However, one can also form multiply-connected groupoids in which there can be multiple isomorphisms from one element of to another. For instance, one can view as a space that is acted on by multiplication by the two-element group . This gives rise to two types of isomorphisms, an identity isomorphism from to for each , and a negation isomorphism from to for each ; in particular, there are *two* automorphisms of (i.e., isomorphisms from to itself), namely and , whereas the other four elements of only have a single automorphism (the identity isomorphism). One defines composition, identity, and inverse in this groupoid in the obvious fashion (using the group law of the two-element group ); for instance, we have .

For a finite multiply-connected groupoid, it turns out that the natural notion of “cardinality” (or as I prefer to call it, “cardinality up to isomorphism”) is given by the variant

of (1). That is to say, in the multiply connected case, the denominator is no longer the number of objects isomorphic to , but rather the number of *isomorphisms* from to other objects . Grouping together all summands coming from a single equivalence class in , we can also write this expression as

where is the automorphism group of , that is to say the group of isomorphisms from to itself. (Note that if belong to the same equivalence class , then the two groups and will be isomorphic and thus have the same cardinality, and so the above expression is well-defined.

For instance, if we take to be the simply connected groupoid on , then the number of elements of up to isomorphism is

exactly as before. If however we take the multiply connected groupoid on , in which has two automorphisms, the number of elements of up to isomorphism is now the smaller quantity

the equivalence class is now counted with weight rather than due to the two automorphisms on . Geometrically, one can think of this groupoid as being formed by taking the five-element set , and “folding it in half” around the fixed point , giving rise to two “full” quotient points and one “half” point . More generally, given a finite group acting on a finite set , and forming the associated multiply connected groupoid, the cardinality up to isomorphism of this groupoid will be , since each element of will have isomorphisms on it (whether they be to the same element , or to other elements of ).

The definition (2) can also make sense for some infinite groupoids; to my knowledge this was first explicitly done in this paper of Baez and Dolan. Consider for instance the category of finite sets, with isomorphisms given by bijections as in Example 3. Every finite set is isomorphic to for some natural number , so the equivalence classes of may be indexed by the natural numbers. The automorphism group of has order , so the cardinality of up to isomorphism is

(This fact is sometimes loosely stated as “the number of finite sets is “, but I view this statement as somewhat misleading if the qualifier “up to isomorphism” is not added.) Similarly, when one allows for multiple isomorphisms from a group to itself, the number of groups of order four up to isomorphism is now

because the cyclic group has two automorphisms, whereas the Klein four-group has six.

In the case that the cardinality of a groupoid up to isomorphism is finite and non-empty, one can now define the notion of a random isomorphism class in drawn “uniformly up to isomorphism”, by requiring the probability of attaining any given isomorphism class to be

thus the probability of being isomorphic to a given element will be inversely proportional to the number of automorphisms that has. For instance, if we take to be the set with the simply connected groupoid, will be drawn uniformly from the three available equivalence classes , with a probability of attaining each; but if instead one uses the multiply connected groupoid coming from the action of , and draws uniformly up to isomorphism, then and will now be selected with probability each, and will be selected with probability . Thus this distribution has accounted for the bias mentioned previously: if a finite group acts on a finite space , and is drawn uniformly from , then now still be drawn uniformly up to isomorphism from , if we use the multiply connected groupoid coming from the action, rather than the simply connected groupoid coming from just the -orbit structure on .

Using the groupoid of finite sets, we see that a finite set chosen uniformly up to isomorphism will have a cardinality that is distributed according to the Poisson distribution of parameter , that is to say it will be of cardinality with probability .

One important source of groupoids are the fundamental groupoids of a manifold (one can also consider more general topological spaces than manifolds, but for simplicity we will restrict this discussion to the manifold case), in which the underlying space is simply , and the isomorphisms from to are the equivalence classes of paths from to up to homotopy; in particular, the automorphism group of any given point is just the fundamental group of at that base point. The equivalence class of a point in is then the connected component of in . The cardinality up to isomorphism of the fundamental groupoid is then

where is the collection of connected components of , and is the order of the fundamental group of . Thus, simply connected components of count for a full unit of cardinality, whereas multiply connected components (which can be viewed as quotients of their simply connected cover by their fundamental group) will count for a fractional unit of cardinality, inversely to the order of their fundamental group.

This notion of cardinality up to isomorphism of a groupoid behaves well with respect to various basic notions. For instance, suppose one has an -fold covering map of one finite groupoid by another . This means that is a functor that is surjective, with all preimages of cardinality , with the property that given any pair in the base space and any in the preimage of , every isomorphism has a unique lift from the given initial point (and some in the preimage of ). Then one can check that the cardinality up to isomorphism of is times the cardinality up to isomorphism of , which fits well with the geometric picture of as the -fold cover of . (For instance, if one covers a manifold with finite fundamental group by its universal cover, this is a -fold cover, the base has cardinality up to isomorphism, and the universal cover has cardinality one up to isomorphism.) Related to this, if one draws an equivalence class of uniformly up to isomorphism, then will be an equivalence class of drawn uniformly up to isomorphism also.

Indeed, one can show that this notion of cardinality up to isomorphism for groupoids is uniquely determined by a small number of axioms such as these (similar to the axioms that determine Euler characteristic); see this blog post of Qiaochu Yuan for details.

The probability distributions on isomorphism classes described by the above recipe seem to arise naturally in many applications. For instance, if one draws a profinite abelian group up to isomorphism at random in this fashion (so that each isomorphism class of a profinite abelian group occurs with probability inversely proportional to the number of automorphisms of this group), then the resulting distribution is known as the *Cohen-Lenstra distribution*, and seems to emerge as the natural asymptotic distribution of many randomly generated profinite abelian groups in number theory and combinatorics, such as the class groups of random quadratic fields; see this previous blog post for more discussion. For a simple combinatorial example, the set of fixed points of a random permutation on elements will have a cardinality that converges in distribution to the Poisson distribution of rate (as discussed in this previous post), thus we see that the fixed points of a large random permutation asymptotically are distributed uniformly up to isomorphism. I’ve been told that this notion of cardinality up to isomorphism is also particularly compatible with stacks (which are a good framework to describe such objects as moduli spaces of algebraic varieties up to isomorphism), though I am not sufficiently acquainted with this theory to say much more than this.

Just a short post to note that Norwegian Academy of Science and Letters has just announced that the 2017 Abel prize has been awarded to Yves Meyer, “for his pivotal role in the development of the mathematical theory of wavelets”. The actual prize ceremony will be at Oslo in May.

I am actually in Oslo myself currently, having just presented Meyer’s work at the announcement ceremony (and also having written a brief description of some of his work). The Abel prize has a somewhat unintuitive (and occasionally misunderstood) arrangement in which the presenter of the work of the prize is selected independently of the winner of the prize (I think in part so that the choice of presenter gives no clues as to the identity of the laureate). In particular, like other presenters before me (which in recent years have included Timothy Gowers, Jordan Ellenberg, and Alex Bellos), I agreed to present the laureate’s work before knowing who the laureate was! But in this case the task was very easy, because Meyer’s areas of (both pure and applied) harmonic analysis and PDE fell rather squarely within my own area of expertise. (I had previously written about some other work of Meyer in this blog post.) Indeed I had learned about Meyer’s wavelet constructions as a graduate student while taking a course from Ingrid Daubechies. Daubechies also made extremely important contributions to the theory of wavelets, but due to a conflict of interest (as per the guidelines for the prize committee) arising from Daubechies’ presidency of the International Mathematical Union (which nominates the majority of the members of the Abel prize committee, who then serve for two years) from 2011 to 2014 (and her continuing service *ex officio* on the IMU executive committee from 2015 to 2018), she will not be eligible for the prize until 2021 at the earliest, and so I do not think this prize should be necessarily construed as a judgement on the relative contributions of Meyer and Daubechies to this field. (In any case I fully agree with the Abel prize committee’s citation of Meyer’s pivotal role in the development of the theory of wavelets.)

*[Update, Mar 28: link to prize committee guidelines and clarification of the extent of Daubechies’ conflict of interest added. -T]*

Given a function on the natural numbers taking values in , one can invoke the Furstenberg correspondence principle to locate a measure preserving system – a probability space together with a measure-preserving shift (or equivalently, a measure-preserving -action on ) – together with a measurable function (or “observable”) that has essentially the same statistics as in the sense that

for any integers . In particular, one has

whenever the limit on the right-hand side exists. We will refer to the system together with the designated function as a *Furstenberg limit* ot the sequence . These Furstenberg limits capture some, but not all, of the asymptotic behaviour of ; roughly speaking, they control the typical “local” behaviour of , involving correlations such as in the regime where are much smaller than . However, the control on error terms here is usually only qualitative at best, and one usually does not obtain non-trivial control on correlations in which the are allowed to grow at some significant rate with (e.g. like some power of ).

The correspondence principle is discussed in these previous blog posts. One way to establish the principle is by introducing a Banach limit that extends the usual limit functional on the subspace of consisting of convergent sequences while still having operator norm one. Such functionals cannot be constructed explicitly, but can be proven to exist (non-constructively and non-uniquely) using the Hahn-Banach theorem; one can also use a non-principal ultrafilter here if desired. One can then seek to construct a system and a measurable function for which one has the statistics

for all . One can explicitly construct such a system as follows. One can take to be the Cantor space with the product -algebra and the shift

with the function being the coordinate function at zero:

(so in particular for any ). The only thing remaining is to construct the invariant measure . In order to be consistent with (2), one must have

for any distinct integers and signs . One can check that this defines a premeasure on the Boolean algebra of defined by cylinder sets, and the existence of then follows from the Hahn-Kolmogorov extension theorem (or the closely related Kolmogorov extension theorem). One can then check that the correspondence (2) holds, and that is translation-invariant; the latter comes from the translation invariance of the (Banach-)Césaro averaging operation . A variant of this construction shows that the Furstenberg limit is unique up to equivalence if and only if all the limits appearing in (1) actually exist.

One can obtain a slightly tighter correspondence by using a smoother average than the Césaro average. For instance, one can use the logarithmic Césaro averages in place of the Césaro average , thus one replaces (2) by

Whenever the Césaro average of a bounded sequence exists, then the logarithmic Césaro average exists and is equal to the Césaro average. Thus, a Furstenberg limit constructed using logarithmic Banach-Césaro averaging still obeys (1) for all when the right-hand side limit exists, but also obeys the more general assertion

whenever the limit of the right-hand side exists.

In a recent paper of Frantizinakis, the Furstenberg limits of the Liouville function (with logarithmic averaging) were studied. Some (but not all) of the known facts and conjectures about the Liouville function can be interpreted in the Furstenberg limit. For instance, in a recent breakthrough result of Matomaki and Radziwill (discussed previously here), it was shown that the Liouville function exhibited cancellation on short intervals in the sense that

In terms of Furstenberg limits of the Liouville function, this assertion is equivalent to the assertion that

for all Furstenberg limits of Liouville (including those without logarithmic averaging). Invoking the mean ergodic theorem (discussed in this previous post), this assertion is in turn equivalent to the observable that corresponds to the Liouville function being orthogonal to the invariant factor of ; equivalently, the first Gowers-Host-Kra seminorm of (as defined for instance in this previous post) vanishes. The Chowla conjecture, which asserts that

for all distinct integers , is equivalent to the assertion that all the Furstenberg limits of Liouville are equivalent to the Bernoulli system ( with the product measure arising from the uniform distribution on , with the shift and observable as before). Similarly, the logarithmically averaged Chowla conjecture

is equivalent to the assertion that all the Furstenberg limits of Liouville with logarithmic averaging are equivalent to the Bernoulli system. Recently, I was able to prove the two-point version

of the logarithmically averaged Chowla conjecture, for any non-zero integer ; this is equivalent to the perfect strong mixing property

for any Furstenberg limit of Liouville with logarithmic averaging, and any .

The situation is more delicate with regards to the Sarnak conjecture, which is equivalent to the assertion that

for any zero-entropy sequence (see this previous blog post for more discussion). Morally speaking, this conjecture should be equivalent to the assertion that any Furstenberg limit of Liouville is disjoint from any zero entropy system, but I was not able to formally establish an implication in either direction due to some technical issues regarding the fact that the Furstenberg limit does not directly control long-range correlations, only short-range ones. (There are however ergodic theoretic interpretations of the Sarnak conjecture that involve the notion of generic points; see this paper of El Abdalaoui, Lemancyk, and de la Rue.) But the situation is currently better with the logarithmically averaged Sarnak conjecture

as I was able to show that this conjecture was equivalent to the logarithmically averaged Chowla conjecture, and hence to all Furstenberg limits of Liouville with logarithmic averaging being Bernoulli; I also showed the conjecture was equivalent to local Gowers uniformity of the Liouville function, which is in turn equivalent to the function having all Gowers-Host-Kra seminorms vanishing in every Furstenberg limit with logarithmic averaging. In this recent paper of Frantzikinakis, this analysis was taken further, showing that the logarithmically averaged Chowla and Sarnak conjectures were in fact equivalent to the much milder seeming assertion that all Furstenberg limits with logarithmic averaging were ergodic.

Actually, the logarithmically averaged Furstenberg limits have more structure than just a -action on a measure preserving system with a single observable . Let denote the semigroup of affine maps on the integers with and positive. Also, let denote the profinite integers (the inverse limit of the cyclic groups ). Observe that acts on by taking the inverse limit of the obvious actions of on .

Proposition 1 (Enriched logarithmically averaged Furstenberg limit of Liouville)Let be a Banach limit. Then there exists a probability space with an action of the affine semigroup , as well as measurable functions and , with the following properties:

- (i) (Affine Furstenberg limit) For any , and any congruence class , one has
- (ii) (Equivariance of ) For any , one has
for -almost every .

- (iii) (Multiplicativity at fixed primes) For any prime , one has
for -almost every , where is the dilation map .

- (iv) (Measure pushforward) If is of the form and is the set , then the pushforward of by is equal to , that is to say one has
for every measurable .

Note that can be viewed as the subgroup of consisting of the translations . If one only keeps the -portion of the action and forgets the rest (as well as the function ) then the action becomes measure-preserving, and we recover an ordinary Furstenberg limit with logarithmic averaging. However, the additional structure here can be quite useful; for instance, one can transfer the proof of (3) to this setting, which we sketch below the fold, after proving the proposition.

The observable , roughly speaking, means that points in the Furstenberg limit constructed by this proposition are still “virtual integers” in the sense that one can meaningfully compute the residue class of modulo any natural number modulus , by first applying and then reducing mod . The action of means that one can also meaningfully multiply by any natural number, and translate it by any integer. As with other applications of the correspondence principle, the main advantage of moving to this more “virtual” setting is that one now acquires a probability measure , so that the tools of ergodic theory can be readily applied.

Given a random variable that takes on only finitely many values, we can define its Shannon entropy by the formula

with the convention that . (In some texts, one uses the logarithm to base rather than the natural logarithm, but the choice of base will not be relevant for this discussion.) This is clearly a nonnegative quantity. Given two random variables taking on finitely many values, the joint variable is also a random variable taking on finitely many values, and also has an entropy . It obeys the *Shannon inequalities*

so we can define some further nonnegative quantities, the mutual information

and the conditional entropies

More generally, given three random variables , one can define the conditional mutual information

and the final of the Shannon entropy inequalities asserts that this quantity is also non-negative.

The mutual information is a measure of the extent to which and fail to be independent; indeed, it is not difficult to show that vanishes if and only if and are independent. Similarly, vanishes if and only if and are *conditionally* independent relative to . At the other extreme, is a measure of the extent to which fails to depend on ; indeed, it is not difficult to show that if and only if is determined by in the sense that there is a deterministic function such that . In a related vein, if and are equivalent in the sense that there are deterministic functional relationships , between the two variables, then is interchangeable with for the purposes of computing the above quantities, thus for instance , , , , etc..

One can get some initial intuition for these information-theoretic quantities by specialising to a simple situation in which all the random variables being considered come from restricting a single random (and uniformly distributed) boolean function on a given finite domain to some subset of :

In this case, has the law of a random uniformly distributed boolean function from to , and the entropy here can be easily computed to be , where denotes the cardinality of . If is the restriction of to , and is the restriction of to , then the joint variable is equivalent to the restriction of to . If one discards the normalisation factor , one then obtains the following dictionary between entropy and the combinatorics of finite sets:

Random variables | Finite sets |

Entropy | Cardinality |

Joint variable | Union |

Mutual information | Intersection cardinality |

Conditional entropy | Set difference cardinality |

Conditional mutual information | |

independent | disjoint |

determined by | a subset of |

conditionally independent relative to |

Every (linear) inequality or identity about entropy (and related quantities, such as mutual information) then specialises to a combinatorial inequality or identity about finite sets that is easily verified. For instance, the Shannon inequality becomes the union bound , and the definition of mutual information becomes the inclusion-exclusion formula

For a more advanced example, consider the data processing inequality that asserts that if are conditionally independent relative to , then . Specialising to sets, this now says that if are disjoint outside of , then ; this can be made apparent by considering the corresponding Venn diagram. This dictionary also suggests how to *prove* the data processing inequality using the existing Shannon inequalities. Firstly, if and are not necessarily disjoint outside of , then a consideration of Venn diagrams gives the more general inequality

and a further inspection of the diagram then reveals the more precise identity

Using the dictionary in the reverse direction, one is then led to conjecture the identity

which (together with non-negativity of conditional mutual information) implies the data processing inequality, and this identity is in turn easily established from the definition of mutual information.

On the other hand, not every assertion about cardinalities of sets generalises to entropies of random variables that are not arising from restricting random boolean functions to sets. For instance, a basic property of sets is that disjointness from a given set is preserved by unions:

Indeed, one has the union bound

Applying the dictionary in the reverse direction, one might now conjecture that if was independent of and was independent of , then should also be independent of , and furthermore that

but these statements are well known to be false (for reasons related to pairwise independence of random variables being strictly weaker than joint independence). For a concrete counterexample, one can take to be independent, uniformly distributed random elements of the finite field of two elements, and take to be the sum of these two field elements. One can easily check that each of and is separately independent of , but the joint variable determines and thus is not independent of .

From the inclusion-exclusion identities

one can check that (1) is equivalent to the trivial lower bound . The basic issue here is that in the dictionary between entropy and combinatorics, there is no satisfactory entropy analogue of the notion of a triple intersection . (Even the double intersection only exists information theoretically in a “virtual” sense; the mutual information allows one to “compute the entropy” of this “intersection”, but does not actually describe this intersection itself as a random variable.)

However, this issue only arises with three or more variables; it is not too difficult to show that the only linear equalities and inequalities that are necessarily obeyed by the information-theoretic quantities associated to just two variables are those that are also necessarily obeyed by their combinatorial analogues . (See for instance the Venn diagram at the Wikipedia page for mutual information for a pictorial summation of this statement.)

One can work with a larger class of special cases of Shannon entropy by working with random *linear* functions rather than random *boolean* functions. Namely, let be some finite-dimensional vector space over a finite field , and let be a random linear functional on , selected uniformly among all such functions. Every subspace of then gives rise to a random variable formed by restricting to . This random variable is also distributed uniformly amongst all linear functions on , and its entropy can be easily computed to be . Given two random variables formed by restricting to respectively, the joint random variable determines the random linear function on the union on the two spaces, and thus by linearity on the Minkowski sum as well; thus is equivalent to the restriction of to . In particular, . This implies that and also , where is the quotient map. After discarding the normalising constant , this leads to the following dictionary between information theoretic quantities and linear algebra quantities, analogous to the previous dictionary:

Random variables | Subspaces |

Entropy | Dimension |

Joint variable | Sum |

Mutual information | Dimension of intersection |

Conditional entropy | Dimension of projection |

Conditional mutual information | |

independent | transverse () |

determined by | a subspace of |

conditionally independent relative to | , transverse. |

The combinatorial dictionary can be regarded as a specialisation of the linear algebra dictionary, by taking to be the vector space over the finite field of two elements, and only considering those subspaces that are coordinate subspaces associated to various subsets of .

As before, every linear inequality or equality that is valid for the information-theoretic quantities discussed above, is automatically valid for the linear algebra counterparts for subspaces of a vector space over a finite field by applying the above specialisation (and dividing out by the normalising factor of ). In fact, the requirement that the field be finite can be removed by applying the compactness theorem from logic (or one of its relatives, such as Los’s theorem on ultraproducts, as done in this previous blog post).

The linear algebra model captures more of the features of Shannon entropy than the combinatorial model. For instance, in contrast to the combinatorial case, it is possible in the linear algebra setting to have subspaces such that and are separately transverse to , but their sum is not; for instance, in a two-dimensional vector space , one can take to be the one-dimensional subspaces spanned by , , and respectively. Note that this is essentially the same counterexample from before (which took to be the field of two elements). Indeed, one can show that any necessarily true linear inequality or equality involving the dimensions of three subspaces (as well as the various other quantities on the above table) will also be necessarily true when applied to the entropies of three discrete random variables (as well as the corresponding quantities on the above table).

However, the linear algebra model does not completely capture the subtleties of Shannon entropy once one works with *four* or more variables (or subspaces). This was first observed by Ingleton, who established the dimensional inequality

for any subspaces . This is easiest to see when the three terms on the right-hand side vanish; then are transverse, which implies that ; similarly . But and are transverse, and this clearly implies that and are themselves transverse. To prove the general case of Ingleton’s inequality, one can define and use (and similarly for instead of ) to reduce to establishing the inequality

which can be rearranged using (and similarly for instead of ) and as

but this is clear since .

Returning to the entropy setting, the analogue

of (3) is true (exercise!), but the analogue

of Ingleton’s inequality is false in general. Again, this is easiest to see when all the terms on the right-hand side vanish; then are conditionally independent relative to , and relative to , and and are independent, and the claim (4) would then be asserting that and are independent. While there is no linear counterexample to this statement, there are simple non-linear ones: for instance, one can take to be independent uniform variables from , and take and to be (say) and respectively (thus are the indicators of the events and respectively). Once one conditions on either or , one of has positive conditional entropy and the other has zero entropy, and so are conditionally independent relative to either or ; also, or are independent of each other. But and are not independent of each other (they cannot be simultaneously equal to ). Somehow, the feature of the linear algebra model that is not present in general is that in the linear algebra setting, every pair of subspaces has a well-defined intersection that is also a subspace, whereas for arbitrary random variables , there does not necessarily exist the analogue of an intersection, namely a “common information” random variable that has the entropy of and is determined either by or by .

I do not know if there is any simpler model of Shannon entropy that captures all the inequalities available for four variables. One significant complication is that there exist some information inequalities in this setting that are not of Shannon type, such as the Zhang-Yeung inequality

One can however still use these simpler models of Shannon entropy to be able to guess arguments that would work for general random variables. An example of this comes from my paper on the logarithmically averaged Chowla conjecture, in which I showed among other things that

whenever was sufficiently large depending on , where is the Liouville function. The information-theoretic part of the proof was as follows. Given some intermediate scale between and , one can form certain random variables . The random variable is a sign pattern of the form where is a random number chosen from to (with logarithmic weighting). The random variable was tuple of reductions of to primes comparable to . Roughly speaking, what was implicitly shown in the paper (after using the multiplicativity of , the circle method, and the Matomaki-Radziwill theorem on short averages of multiplicative functions) is that if the inequality (5) fails, then there was a lower bound

on the mutual information between and . From translation invariance, this also gives the more general lower bound

for any , where denotes the shifted sign pattern . On the other hand, one had the entropy bounds

and from concatenating sign patterns one could see that is equivalent to the joint random variable for any . Applying these facts and using an “entropy decrement” argument, I was able to obtain a contradiction once was allowed to become sufficiently large compared to , but the bound was quite weak (coming ultimately from the unboundedness of as the interval of values of under consideration becomes large), something of the order of ; the quantity needs at various junctures to be less than a small power of , so the relationship between and becomes essentially quadruple exponential in nature, . The basic strategy was to observe that the lower bound (6) causes some slowdown in the growth rate of the mean entropy, in that this quantity decreased by as increased from to , basically by dividing into components , and observing from (6) each of these shares a bit of common information with the same variable . This is relatively clear when one works in a set model, in which is modeled by a set of size , and is modeled by a set of the form

for various sets of size (also there is some translation symmetry that maps to a shift while preserving all of the ).

However, on considering the set model recently, I realised that one can be a little more efficient by exploiting the fact (basically the Chinese remainder theorem) that the random variables are basically jointly independent as ranges over dyadic values that are much smaller than , which in the set model corresponds to the all being disjoint. One can then establish a variant

of (6), which in the set model roughly speaking asserts that each claims a portion of the of cardinality that is not claimed by previous choices of . This leads to a more efficient contradiction (relying on the unboundedness of rather than ) that looks like it removes one order of exponential growth, thus the relationship between and is now . Returning to the entropy model, one can use (7) and Shannon inequalities to establish an inequality of the form

for a small constant , which on iterating and using the boundedness of gives the claim. (A modification of this analysis, at least on the level of the back of the envelope calculation, suggests that the Matomaki-Radziwill theorem is needed only for ranges greater than or so, although at this range the theorem is not significantly simpler than the general case).

Daniel Kane and I have just uploaded to the arXiv our paper “A bound on partitioning clusters“, submitted to the Electronic Journal of Combinatorics. In this short and elementary paper, we consider a question that arose from biomathematical applications: given a finite family of sets (or “clusters”), how many ways can there be of partitioning a set in this family as the disjoint union of two other sets in this family? That is to say, what is the best upper bound one can place on the quantity

in terms of the cardinality of ? A trivial upper bound would be , since this is the number of possible pairs , and clearly determine . In our paper, we establish the improved bound

where is the somewhat strange exponent

so that . Furthermore, this exponent is best possible!

Actually, the latter claim is quite easy to show: one takes to be all the subsets of of cardinality either or , for a multiple of , and the claim follows readily from Stirling’s formula. So it is perhaps the former claim that is more interesting (since many combinatorial proof techniques, such as those based on inequalities such as the Cauchy-Schwarz inequality, tend to produce exponents that are rational or at least algebraic). We follow the common, though unintuitive, trick of generalising a problem to make it simpler. Firstly, one generalises the bound to the “trilinear” bound

for arbitrary finite collections of sets. One can place all the sets in inside a single finite set such as , and then by replacing every set in by its complement in , one can phrase the inequality in the equivalent form

for arbitrary collections of subsets of . We generalise further by turning sets into functions, replacing the estimate with the slightly stronger convolution estimate

for arbitrary functions on the Hamming cube , where the convolution is on the integer lattice rather than on the finite field vector space . The advantage of working in this general setting is that it becomes very easy to apply induction on the dimension ; indeed, to prove this estimate for arbitrary it suffices to do so for . This reduces matters to establishing the elementary inequality

for all , which can be done by a combination of undergraduate multivariable calculus and a little bit of numerical computation. (The left-hand side turns out to have local maxima at , with the latter being the cause of the numerology (1).)

The same sort of argument also gives an energy bound

for any subset of the Hamming cube, where

is the additive energy of . The example shows that the exponent cannot be improved.

The self-chosen remit of my blog is “Updates on my research and expository papers, discussion of open problems, and other maths-related topics”. Of the 774 posts on this blog, I estimate that about 99% of the posts indeed relate to mathematics, mathematicians, or the administration of this mathematical blog, and only about 1% are not related to mathematics or the community of mathematicians in any significant fashion.

This is not one of the 1%.

Mathematical research is clearly an international activity. But actually a stronger claim is true: mathematical research is a transnational activity, in that the specific nationality of individual members of a research team or research community are (or should be) of no appreciable significance for the purpose of advancing mathematics. For instance, even during the height of the Cold War, there was no movement in (say) the United States to boycott Soviet mathematicians or theorems, or to only use results from Western literature (though the latter did sometimes happen by default, due to the limited avenues of information exchange between East and West, and former did occasionally occur for political reasons, most notably with the Soviet Union preventing Gregory Margulis from traveling to receive his Fields Medal in 1978 EDIT: and also Sergei Novikov in 1970). The national origin of even the most fundamental components of mathematics, whether it be the geometry (γεωμετρία) of the ancient Greeks, the algebra (الجبر) of the Islamic world, or the Hindu-Arabic numerals , are primarily of historical interest, and have only a negligible impact on the worldwide adoption of these mathematical tools. While it is true that individual mathematicians or research teams sometimes compete with each other to be the first to solve some desired problem, and that a citizen could take pride in the mathematical achievements of researchers from their country, one did not see any significant state-sponsored “space races” in which it was deemed in the national interest that a particular result ought to be proven by “our” mathematicians and not “theirs”. Mathematical research ability is highly non-fungible, and the value added by foreign students and faculty to a mathematics department cannot be completely replaced by an equivalent amount of domestic students and faculty, no matter how large and well educated the country (though a state can certainly work at the margins to encourage and support more domestic mathematicians). It is no coincidence that all of the top mathematics department worldwide actively recruit the best mathematicians regardless of national origin, and often retain immigration counsel to assist with situations in which these mathematicians come from a country that is currently politically disfavoured by their own.

Of course, mathematicians cannot ignore the political realities of the modern international order altogether. Anyone who has organised an international conference or program knows that there will inevitably be visa issues to resolve because the host country makes it particularly difficult for certain nationals to attend the event. I myself, like many other academics working long-term in the United States, have certainly experienced my own share of immigration bureaucracy, starting with various glitches in the renewal or application of my J-1 and O-1 visas, then to the lengthy vetting process for acquiring permanent residency (or “green card”) status, and finally to becoming naturalised as a US citizen (retaining dual citizenship with Australia). Nevertheless, while the process could be slow and frustrating, there was at least an order to it. The rules of the game were complicated, but were known in advance, and did not abruptly change in the middle of playing it (save in truly exceptional situations, such as the days after the September 11 terrorist attacks). One just had to study the relevant visa regulations (or hire an immigration lawyer to do so), fill out the paperwork and submit to the relevant background checks, and remain in good standing until the application was approved in order to study, work, or participate in a mathematical activity held in another country. On rare occasion, some senior university administrator may have had to contact a high-ranking government official to approve some particularly complicated application, but for the most part one could work through normal channels in order to ensure for instance that the majority of participants of a conference could actually be physically present at that conference, or that an excellent mathematician hired by unanimous consent by a mathematics department could in fact legally work in that department.

With the recent and highly publicised executive order on immigration, many of these fundamental assumptions have been seriously damaged, if not destroyed altogether. Even if the order was withdrawn immediately, there is no longer an assurance, even for nationals not initially impacted by that order, that some similar abrupt and major change in the rules for entry to the United States could not occur, for instance for a visitor who has already gone through the lengthy visa application process and background checks, secured the appropriate visa, and is already in flight to the country. This is already affecting upcoming or ongoing mathematical conferences or programs in the US, with many international speakers (including those from countries not directly affected by the order) now cancelling their visit, either in protest or in concern about their ability to freely enter and leave the country. Even some conferences outside the US are affected, as some mathematicians currently in the US with a valid visa or even permanent residency are uncertain if they could ever return back to their place of work if they left the country to attend a meeting. In the slightly longer term, it is likely that the ability of elite US institutions to attract the best students and faculty will be seriously impacted. Again, the losses would be strongest regarding candidates that were nationals of the countries affected by the current executive order, but I fear that many other mathematicians from other countries would now be much more concerned about entering and living in the US than they would have previously.

It is still possible for this sort of long-term damage to the mathematical community (both within the US and abroad) to be reversed or at least contained, but at present there is a real risk of the damage becoming permanent. To prevent this, it seems insufficient for me for the current order to be rescinded, as desirable as that would be; some further legislative or judicial action would be needed to begin restoring enough trust in the stability of the US immigration and visa system that the international travel that is so necessary to modern mathematical research becomes “just” a bureaucratic headache again.

Of course, the impact of this executive order is far, far broader than just its effect on mathematicians and mathematical research. But there are countless other venues on the internet and elsewhere to discuss these other aspects (or politics in general). (For instance, discussion of the qualifications, or lack thereof, of the current US president can be carried out at this previous post.) I would therefore like to open this post to readers to discuss the effects or potential effects of this order on the mathematical community; I particularly encourage mathematicians who have been personally affected by this order to share their experiences. As per the rules of the blog, I request that “the discussions are kept constructive, polite, and at least tangentially relevant to the topic at hand”.

Some relevant links (please feel free to suggest more, either through comments or by email):

- AMS Board of Trustees opposes executive order on immigration
- MAA Executive Committee Statement on Immigration Ban
- SIAM responds to White House Executive Order on Visas and Immigration
- Multisociety letter on immigration
- EMS President on Trump’s Executive Order
- International Council for Science (ICSU) calls on the government of the United States to rescind the Executive Order “Protecting the Nation from Foreign Terrorist Entry into the United States”
- Public Universities Respond to New Immigration Order
- Statement from the Association for Women in Mathematics
- Simons Foundation Statement on Executive Order on Visas and Immigration
- A letter from the editors of the AMS graduate student blog on the Executive Order on Immigration
- Statement of inclusiveness (a petition, primarily aimed at mathematicians, created and hosted by Kasra Rafi and Juan Souto)
- Academics Against Executive Immigration Order (a petition, aimed at the broader academic community)
- First they came for the Iranians, blog post, Scott Aaronson
- IAS statement on the revised executive order
- The immigration ban is still antithetical to scientific progress, blog post, Boaz Barak and Omer Reingold

I’ve just uploaded to the arXiv my paper “Some remarks on the lonely runner conjecture“, submitted to Contributions to discrete mathematics. I had blogged about the lonely runner conjecture in this previous blog post, and I returned to the problem recently to see if I could obtain anything further. The results obtained were more modest than I had hoped, but they did at least seem to indicate a potential strategy to make further progress on the problem, and also highlight some of the difficulties of the problem.

One can rephrase the lonely runner conjecture as the following covering problem. Given any integer “velocity” and radius , define the *Bohr set* to be the subset of the unit circle given by the formula

where denotes the distance of to the nearest integer. Thus, for positive, is simply the union of the intervals for , projected onto the unit circle ; in the language of the usual formulation of the lonely runner conjecture, represents those times in which a runner moving at speed returns to within of his or her starting position. For any non-zero integers , let be the smallest radius such that the Bohr sets cover the unit circle:

Then define to be the smallest value of , as ranges over tuples of distinct non-zero integers. The Dirichlet approximation theorem quickly gives that

and hence

for any . The lonely runner conjecture is equivalent to the assertion that this bound is in fact optimal:

Conjecture 1 (Lonely runner conjecture)For any , one has .

This conjecture is currently known for (see this paper of Barajas and Serra), but remains open for higher .

It is natural to try to attack the problem by establishing lower bounds on the quantity . We have the following “trivial” bound, that gets within a factor of two of the conjecture:

Proposition 2 (Trivial bound)For any , one has .

*Proof:* It is not difficult to see that for any non-zero velocity and any , the Bohr set has Lebesgue measure . In particular, by the union bound

we see that the covering (1) is only possible if , giving the claim.

So, in some sense, all the difficulty is coming from the need to improve upon the trivial union bound (2) by a factor of two.

Despite the crudeness of the union bound (2), it has proven surprisingly hard to make substantial improvements on the trivial bound . In 1994, Chen obtained the slight improvement

which was improved a little by Chen and Cusick in 1999 to

when was prime. In a recent paper of Perarnau and Serra, the bound

was obtained for arbitrary . These bounds only improve upon the trivial bound by a multiplicative factor of . Heuristically, one reason for this is as follows. The union bound (2) would of course be sharp if the Bohr sets were all disjoint. Strictly speaking, such disjointness is not possible, because all the Bohr sets have to contain the origin as an interior point. However, it is possible to come up with a large number of Bohr sets which are *almost* disjoint. For instance, suppose that we had velocities that were all prime numbers between and , and that was equal to (and in particular was between and . Then each set can be split into a “kernel” interval , together with the “petal” intervals . Roughly speaking, as the prime varies, the kernel interval stays more or less fixed, but the petal intervals range over disjoint sets, and from this it is not difficult to show that

so that the union bound is within a multiplicative factor of of the truth in this case.

This does not imply that is within a multiplicative factor of of , though, because there are not enough primes between and to assign to distinct velocities; indeed, by the prime number theorem, there are only about such velocities that could be assigned to a prime. So, while the union bound could be close to tight for up to Bohr sets, the above counterexamples don’t exclude improvements to the union bound for larger collections of Bohr sets. Following this train of thought, I was able to obtain a logarithmic improvement to previous lower bounds:

Theorem 3For sufficiently large , one has for some absolute constant .

The factors of in the denominator are for technical reasons and might perhaps be removable by a more careful argument. However it seems difficult to adapt the methods to improve the in the numerator, basically because of the obstruction provided by the near-counterexample discussed above.

Roughly speaking, the idea of the proof of this theorem is as follows. If we have the covering (1) for very close to , then the multiplicity function will then be mostly equal to , but occasionally be larger than . On the other hand, one can compute that the norm of this multiplicity function is significantly larger than (in fact it is at least ). Because of this, the norm must be very large, which means that the triple intersections must be quite large for many triples . Using some basic Fourier analysis and additive combinatorics, one can deduce from this that the velocities must have a large structured component, in the sense that there exists an arithmetic progression of length that contains of these velocities. For simplicity let us take the arithmetic progression to be , thus of the velocities lie in . In particular, from the prime number theorem, most of these velocities will not be prime, and will in fact likely have a “medium-sized” prime factor (in the precise form of the argument, “medium-sized” is defined to be “between and “). Using these medium-sized prime factors, one can show that many of the will have quite a large overlap with many of the other , and this can be used after some elementary arguments to obtain a more noticeable improvement on the union bound (2) than was obtained previously.

A modification of the above argument also allows for the improved estimate

if one knows that *all* of the velocities are of size .

In my previous blog post, I showed that in order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that all of the velocities are of size ; I reproduce this argument (slightly cleaned up for publication) in the current preprint. There is unfortunately a huge gap between and , so the above bound (3) does not immediately give any new bounds for . However, one could perhaps try to start attacking the lonely runner conjecture by increasing the range for which one has good results, and by decreasing the range that one can reduce to. For instance, in the current preprint I give an elementary argument (using a certain amount of case-checking) that shows that the lonely runner bound

holds if all the velocities are assumed to lie between and . This upper threshold of is only a tiny improvement over the trivial threshold of , but it seems to be an interesting sub-problem of the lonely runner conjecture to increase this threshold further. One key target would be to get up to , as there are actually a number of -tuples in this range for which (4) holds with equality. The Dirichlet approximation theorem of course gives the tuple , but there is also the double of this tuple, and furthermore there is an additional construction of Goddyn and Wong that gives some further examples such as , or more generally one can start with the standard tuple and accelerate one of the velocities to ; this turns out to work as long as shares a common factor with every integer between and . There are a few more examples of this type in the paper of Goddyn and Wong, but all of them can be placed in an arithmetic progression of length at most, so if one were very optimistic, one could perhaps envision a strategy in which the upper bound of mentioned earlier was reduced all the way to something like , and then a separate argument deployed to treat this remaining case, perhaps isolating the constructions of Goddyn and Wong (and possible variants thereof) as the only extreme cases.

I just learned (from Emmanuel Kowalski’s blog) that the AMS has just started a repository of open-access mathematics lecture notes. There are only a few such sets of notes there at present, but hopefully it will grow in the future; I just submitted some old lecture notes of mine from an undergraduate linear algebra course I taught in 2002 (with some updating of format and fixing of various typos).

[Update, Dec 22: my own notes are now on the repository.]

[This guest post is authored by Caroline Series.]

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU

and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant

the highest level of recognition for outstanding achievements in the field of mathematics.

Funded by the CMF, the Medalist receives a cash prize of US$ 250,000. In addition, each

Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair: Caroline Series, email: chair@chern18.mathunion.org by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found at http://www.mathunion.org/general/prizes/

## Recent Comments