** — 1. System of order which is not Abramov of order — **

I gave a talk on this paper recently at the IAS; the slides for that talk are available here.

This project can be motivated by the inverse conjecture for the Gowers norm in finite fields, which is now a theorem:

Theorem 1 (Inverse conjecture for the Gowers norm in finite fields)Let be a prime and . Suppose that is a one-bounded function with a lower bound on the Gowers uniformity norm. Then there exists a (non-classical) polynomial of degree at most such that .

This is now known for all (see this paper of Ziegler and myself for the first proof of the general case, and this paper of Milicevic for the most recent developments concerning quantitative bounds), although initial results focused on either small values of , or the “high characteristic” case when is large compared to . One approach to this theorem proceeds via ergodic theory. Indeed it was observed in this previous paper of Ziegler and myself that for a given choice of and , the above theorem follows from the following ergodic analogue:

Conjecture 2 (Inverse conjecture for the Gowers-Host-Kra semi-norm in finite fields)Let be a prime and . Suppose that with an ergodic -system with positive Gowers-Host-Kra seminorm (see for instance this previous post for a definition). Then there exists a measurable polynomial of degree at most such that has a non-zero inner product with . (In the language of ergodic theory: every -system of order is an Abramov system of order .)

The implication proceeds by a correspondence principle analogous to the Furstenberg correspondence principle developed in that paper (see also this paper of Towsner for a closely related principle, and this paper of Jamneshan and I for a refinement). In a paper with Bergelson and Ziegler, we were able to establish Conjecture 2 in the “high characteristic” case , thus also proving Theorem 1 in this regime, and conjectured that Conjecture 2 was in fact true for all . This was recently verified in the slightly larger range by Candela, Gonzalez-Sanchez, and Szegedy.

Even though Theorem 1 is now known in full generality by other methods, there are still combinatorial reasons for investigating Conjecture 2. One of these is that the implication of Theorem 1 from Corollary 2 in fact gives additional control on the polynomial produced by Theorem 1, namely that it is some sense “measurable in the sigma-algebra generated by ” (basically because the ergodic theory polynomial produced by Conjecture 2 is also measurable in , as opposed to merely being measurable in an extension of ). What this means in the finitary setting of is a bit tricky to write down precisely (since the naive sigma-algebra generated by the translates of will mostly likely be the discrete sigma-algebra), but roughly speaking it means that can be approximated to arbitrary accuracy by functions of boundedly many (random) translates of . This can be interpreted in a complexity theory sense by stating that Theorem 1 can be made “algorithmic” in a “probabilistic bounded time oracle” or “local list decoding” sense which we will not make precise here.

The main result of this paper is

Theorem 3Conjecture 2 fails for . In fact the “measurable inverse theorem” alluded to above also fails in this case.

Informally, this means that for large , we can find -bounded “pseudo-quintic” functions with large norm, which then must necessarily correlate with at least one quintic by Theorem 1, but such that none of these quintics can be approximated to high accuracy by functions of (random) shifts of . Roughly speaking, this means that the inverse theorem cannot be made locally algorithmic (though it is still possible that a Goldreich-Levin type result of polynomial time algorithmic inverse theory is still possible, as is already known for for ; see this recent paper of Kim, Li and Tidor for further discussion).

The way we arrived at this theorem was by (morally) reducing matters to understanding a certain “finite nilspace cohomology problem”. In the end it boiled down to locating a certain function from a -element set to a two-element set which was a “strongly -homogeneous cocycle” but not a “coboundary” (these terms are defined precisely in the paper). This strongly -homogeneous cocycle can be expressed in terms of a simpler function that takes values on a -element space . The task of locating turned out to be one that was within the range of our (somewhat rudimentary) SAGE computation abilities (mostly involving computing the Smith normal form of some reasonably large integer matrices), but the counterexample functions this produced were initially somewhat opaque to us. After cleaning up these functions by hand (by subtracting off various “coboundaries”), we eventually found versions of these functions which were nice enough that we could verify all the claims needed in a purely human-readable fashion, without any further computer assistance. As a consequence, we can now describe the pseudo-quintic explicitly, though it is safe to say we would not have been able to come up with this example without the initial computer search, and we don’t currently have a broader conceptual understanding of which could potentially generate such counterexamples. The function takes the form

where is a randomly chosen (classical) quadratic polynomial, is a randomly chosen (non-classical) cubic polynomial, and is a randomly chosen (non-classical) quintic polynomial. This function correlates with and has a large norm, but this quintic is “non-measurable” in the sense that it cannot be recovered from and its shifts. The quadratic polynomial turns out to be measurable, as is the double of the cubic , but in order to recover one needs to apply a “square root” to the quadratic to recover a candidate for the cubic which can then be used to reconstruct .

** — 2. Structure of totally disconnected systems — **

Despite the above negative result, in our other paper we are able to get a weak version of Conjecture 2, that also extends to actions of bounded-torsion abelian groups:

Theorem 4 (Weak inverse conjecture for the Gowers-Host-Kra semi-norm in bounded torsion groups)Let be a bounded-torsion abelian group and . Suppose that with an ergodic -system with positive Gowers-Host-Kra seminorm . Then, after lifting to a torsion-free group , there exists a measurable polynomial of degree at most defined on anextensionof which has a non-zero inner product with .

Combining this with the correspondence principle and some additional tools, we obtain a weak version of Theorem 1 that also extends to bounded-torsion groups:

Theorem 5 (Inverse conjecture for the Gowers norm in bounded torsion groups)Let be a finite abelian -torsion group for some and . Suppose that is a one-bounded function with . Then there exists a (non-classical) polynomial of degree at most such that .

The degree produced by our arguments is polynomial in , but we conjecture that it should just be .

The way Theorem 4 (and hence Theorem 5) is proven is as follows. The now-standard machinery of Host and Kra (as discussed for instance in their book) allows us to reduce to a system of order , which is a certain tower of extensions of compact abelian structure groups by various cocycles . In the -torsion case, standard theory allows us to show that these structure groups are also -torsion, hence totally disconnected. So it would now suffice to understand the action of torsion-free groups on totally disconnected systems . For the purposes of proving Theorem 4 we have the freedom to extend as we please, and we take advantage of this freedom by “extending by radicals”, in the sense that whenever we locate a polynomial in the system, we adjoin to it roots of that polynomial (i.e., solutions to ) that are polynomials of the *same* degree as ; this is usually not possible to do in the original system , but can always be done in a suitable extension, analogously to how roots do not always exist in a given field, but can always be located in some extension of that field. After applying this process countably many times it turns out that we can arrive at a system which is -divisible in the sense that polynomials of any degree have roots of any order that are of the same degree. In other words, the group of polynomials of any fixed degree is a divisible abelian group, and thus injective in the category of such groups. This makes a lot of short exact sequences that show up in the theory split automatically, and greatly simplifies the cohomological issues one encounters in the theory, to the point where all the cocycles mentioned previously can now be “straightened” into polynomials of the expected degree (or, in the language of ergodic theory, this extension is a Weyl system of order , and hence also Abramov of order ). This is sufficient to establish Theorem 4. To get Theorem 5, we ran into a technical obstacle arising from the fact that the remainder map is not a polynomial mod if is not itself a prime power. To resolve this, we established ergodic theory analogues of the Sylow decomposition of abelian -torsion groups into -groups , as well as the Schur-Zassenhaus theorem. Roughly speaking, the upshot of these theorems is that any ergodic -system , with -torsion, can be split as the “direct sum” of ergodic -systems for primes dividing , where is the subgroup of consisting of those elements whose order is a power of . This allows us to reduce to the case when is a prime power without too much difficulty.

In fact, the above analysis gives stronger structural classifications of totally disconnected systems (in which the acting group is torsion-free). Weyl systems can also be interpreted as translational systems , where is a nilpotent Polish group and is a closed cocompact subgroup, with the action being given by left-translation by various elements of . Perhaps the most famous examples of such translational systems are nilmanifolds, but in this setting where the acting group is not finitely generated, it turns out to be necessary to consider more general translational systems, in which need not be a Lie group (or even locally compact), and not discrete. Our previous results then describe totally disconnected systems as *factors* of such translational systems. One natural candidate for such factors are the *double coset systems* formed by quotienting out by the action of another closed group that is normalized by the action of . We were able to show that all totally disconnected systems with torsion-free acting group had this double coset structure. This turned out to be surprisingly subtle at a technical level, for at least two reasons. Firstly, after locating the closed group (which in general is Polish, but not compact or even locally compact), it was not immediately obvious that was itself a Polish space (this amounts to the orbits of a closed set still being closed), and also not obvious that this double coset space had a good nilspace structure (in particular that the factor map from to is a nilspace fibration). This latter issue we were able to resolve with a tool kindly shared to us in a forthcoming work by Candela, Gonzales-Sanchez, and Szegedy, who observed that the nilspace fibration property was available if the quotient groups obeyed an algebraic “groupable” axiom which we were able to verify in this case (they also have counterexamples showing that the nilspace structure can break down without this axiom). There was however one further rather annoying complication. In order to fully obtain the identification of our system with a double coset system, we needed the equivalence

]]>

See also this Mathstodon post from fellow committee member John Baez last year where he solicited some preliminary suggestions for proposals, and my previous Mathstodon announcement of this programme.

]]>As a result of the conference I started thinking about what possible computer tools might now be developed that could be of broad use to mathematicians, particularly those who do not have prior expertise with the finer aspects of writing code or installing software. One idea that came to mind was a potential tool to could take, say, an arXiv preprint as input, and return some sort of diagram detailing the logical flow of the main theorems and lemmas in the paper. This is currently done by hand by authors in some, but not all, papers (and can often also be automatically generated from formally verified proofs, as seen for instance in the graphic accompanying the IPAM workshop, or this diagram generated from Massot’s blueprint software from a manually inputted set of theorems and dependencies as a precursor to formalization of a proof [thanks to Thomas Bloom for this example]). For instance, here is a diagram that my co-author Rachel Greenfeld and I drew for a recent paper:

This particular diagram incorporated a number of subjective design choices regarding layout, which results to be designated important enough to require a dedicated box (as opposed to being viewed as a mere tool to get from one box to another), and how to describe each of these results (and how to colour-code them). This is still a very human-intensive task (and my co-author and I went through several iterations of this particular diagram with much back-and-forth discussion until we were both satisfied). But I could see the possibility of creating an automatic tool that could provide an initial “first approximation” to such a diagram, which a human user could then modify as they see fit (perhaps using some convenient GUI interface, for instance some variant of the Quiver online tool for drawing commutative diagrams in LaTeX).

As a crude first attempt at automatically generating such a diagram, one couuld perhaps develop a tool to scrape a LaTeX file to locate all the instances of the theorem environment in the text (i.e., all the formally identified lemmas, corollaries, and so forth), and for each such theorem, locate a proof environment instance that looks like it is associated to that theorem (doing this with reasonable accuracy may require a small amount of machine learning, though perhaps one could just hope that proximity of the proof environment instance to the theorem environment instance suffices in many cases). Then identify all the references within that proof environment to other theorems to start building the tree of implications, which one could then depict in a diagram such as the above. Such an approach would likely miss many of the implications; for instance, because many lemmas might not be proven using a formal proof environment, but instead by some more free-flowing text discussion, or perhaps a one line justification such as “By combining Lemma 3.4 and Proposition 3.6, we conclude”. Also, some references to other results in the paper might not proceed by direct citation, but by more indirect justifications such as “invoking the previous lemma, we obtain” or “by repeating the arguments in Section 3, we have”. Still, even such a crude diagram might still be helpful, both as a starting point for authors to make an improved diagram, or for a student trying to understand a lengthy paper to get some initial idea of the logical structure.

More advanced features might be to try to use more of the text of the paper to assign some measure of importance to individual results (and then weight the diagram correspondingly to highlight the more important results), to try to give each result a natural language description, and to somehow capture key statements that are not neatly encapsulated in a theorem environment instance, but I would imagine that such tasks should be deferred until some cruder proof-of-concept prototype can be demonstrated.

Anyway, I would be interested to hear opinions about whether this idea (or some modification thereof) is (a) actually feasible with current technology (or better yet, already exists in some form), and (b) of interest to research mathematicians.

]]>

Theorem 1

- (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence of primes such that is prime for all .
- (ii) Unconditionally, there exist increasing sequences and of natural numbers such that is prime for all .
- (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).

We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but *finite* sequences of primes such that is prime for all . (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences and of natural numbers such that is prime for all .

The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given , there exist infinitely many -tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.

Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence recursively by repeated application of the prime tuples conjecture.

The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples for which there are infinitely many with all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence such that every initial segment is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:

Lemma 2 (Bergelson intersectivity lemma)Let be subsets of a probability space of measure uniformly bounded away from zero, thus . Then there exists a subsequence such that for all .

This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from , one can assume that all finite intersections are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function has a positive integral, hence must be positive at some point . Thus there is a subsequence whose finite intersections all contain , thus have positive measure as desired by the previous reduction.

It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events that show up (which roughly correspond to the event that is prime for some random number (with a well-chosen probability distribution) and some shift ) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts into various clusters , chosen in such a way that the probability that *at least one* of is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.

]]>

This post is an unofficial sequel to one of my first blog posts from 2007, which was entitled “Quantum mechanics and Tomb Raider“.

One of the oldest and most famous allegories is Plato’s allegory of the cave. This allegory centers around a group of people chained to a wall in a cave that cannot see themselves or each other, but only the two-dimensional shadows of themselves cast on the wall in front of them by some light source they cannot directly see. Because of this, they identify reality with this two-dimensional representation, and have significant conceptual difficulties in trying to view themselves (or the world as a whole) as three-dimensional, until they are freed from the cave and able to venture into the sunlight.

There is a similar conceptual difficulty when trying to understand Einstein’s theory of special relativity (and more so for general relativity, but let us focus on special relativity for now). We are very much accustomed to thinking of reality as a three-dimensional space endowed with a Euclidean geometry that we traverse through in time, but in order to have the clearest view of the universe of special relativity it is better to think of reality instead as a four-dimensional spacetime that is endowed instead with a Minkowski geometry, which mathematically is similar to a (four-dimensional) Euclidean space but with a crucial change of sign in the underlying metric. Indeed, whereas the distance between two points in Euclidean space is given by the three-dimensional Pythagorean theorem

under some standard Cartesian coordinate system of that space, and the distance in a four-dimensional Euclidean space would be similarly given by under a standard four-dimensional Cartesian coordinate system , the spacetime interval in Minkowski space is given by (though in many texts the opposite sign convention is preferred) in spacetime coordinates , where is the speed of light. The geometry of Minkowski space is then quite similar algebraically to the geometry of Euclidean space (with the sign change replacing the traditional trigonometric functions , etc. by their hyperbolic counterparts , and with various factors involving “” inserted in the formulae), but also has some qualitative differences to Euclidean space, most notably a causality structure connected to light cones that has no obvious counterpart in Euclidean space.That said, the analogy between Minkowski space and four-dimensional Euclidean space is strong enough that it serves as a useful conceptual aid when first learning special relativity; for instance the excellent introductory text “Spacetime physics” by Taylor and Wheeler very much adopts this view. On the other hand, this analogy doesn’t directly address the conceptual problem mentioned earlier of viewing reality as a four-dimensional spacetime in the first place, rather than as a three-dimensional space that objects move around in as time progresses. Of course, part of the issue is that we aren’t good at directly visualizing four dimensions in the first place. This latter problem can at least be easily addressed by removing one or two spatial dimensions from this framework – and indeed many relativity texts start with the simplified setting of only having one spatial dimension, so that spacetime becomes two-dimensional and can be depicted with relative ease by spacetime diagrams – but still there is conceptual resistance to the idea of treating time as another spatial dimension, since we clearly cannot “move around” in time as freely as we can in space, nor do we seem able to easily “rotate” between the spatial and temporal axes, the way that we can between the three coordinate axes of Euclidean space.

With this in mind, I thought it might be worth attempting a Plato-type allegory to reconcile the spatial and spacetime views of reality, in a way that can be used to describe (analogues of) some of the less intuitive features of relativity, such as time dilation, length contraction, and the relativity of simultaneity. I have (somewhat whimsically) decided to place this allegory in a Tolkienesque fantasy world (similarly to how my previous allegory to describe quantum mechanics was phrased in a world based on the computer game “Tomb Raider”). This is something of an experiment, and (like any other analogy) the allegory will not be able to perfectly capture every aspect of the phenomenon it is trying to represent, so any feedback to improve the allegory would be appreciated.

** â€” 1. Treefolk â€” **

Tolkien’s Middle-Earth contains, in addition to humans, many fantastical creatures. Tolkien’s book “The Hobbit” introduces the trolls, who can move around freely at night but become petrified into stone during the day; and his book “The Two Towers” (the second of his three-volume work “The Lord of the Rings“) introduces the Ents, who are large walking sentient tree-like creatures.

In this Tolkienesque fantasy world of our allegory (readers, by the way, are welcome to suggest a name for this world), there are two intelligent species. On the one hand one has the humans, who can move around during the day much as humans in our world do, but must sleep at night without exception (one can invent whatever reason one likes for this, but it is not relevant to the rest of the allegory). On the other hand, inspired by the trolls and Ents of Tolkien, in this world we will have the *treefolk*, who in this world are intelligent creatures resembling a tree trunk (possibly with some additional branches or additional appendages, but these will not play a central role in the allegory). They are rooted to a fixed location in space, but during the night they have some limited ability to (slowly) twist their trunk around. On the other hand, during the day, they turn into non-sentient stone columns, frozen in whatever shape they last twisted themselves into. Thus the humans never see the treefolk during their active period, and vice versa; but we will assume that they are still somehow able to communicate asynchronously with each other through a common written language (more on this later).

Remark 1In Middle-Earth there are also theHuorns, who are briefly mentioned in “The Two Towers” as intelligent trees kin to the Ents, but are not described in much detail. Being something of a blank slate, these would have been a convenient name to give these fantasy creatures; however, given that the works of Tolkien will not be public domain for a few more decades, I’ll refrain from using the Huorns explicitly, and instead use the more generic term “treefolk”.

When a treefolk makes its trunk vertical (or at least straight), it is roughly cylindrical in shape, and has horizontal “rings” on its exterior at intervals of precisely one inch apart; so for instance one can easily calculate the height of a treefolk in inches by counting how many rings it has. One could think of a treefolk’s trunk geometrically as a sequence of horizontal disks stacked on top of each other, with each disk being an inch in height and basically of constant radius horizontally, and separated by the aforementioned rings. Because my artistic abilities are close to non-existent, I will draw a treefolk schematically (and two-dimensionally), as a vertical rectangle, with the rings drawn as horizontal lines (and the disks being the thin horizontal rectangles between the rings):

But treefolks can tilt their trunk at an angle; for instance, if a treefolk tilts its trunk to be at a 30 degree angle from the vertical, then now the top of each ring is only inches higher than the top of the preceding ring, rather than a full inch higher, though it is also displaced in space by a distance of inches, all in accordance with the laws of trigonometry. It is also possible for treefolks to (slowly) twist their trunk into more crooked shapes, for instance in the picture below the treefolk has its trunk vertical in its bottom half, but at a angle in its top half. (This will necessarily cause some compression or stretching of the rings at the turnaround point, so that those rings might no longer be exactly one inch apart; we will ignore this issue as we will only be analyzing the treefolk’s rings at “inertial” locations where the trunk is locally straight and it is possible for the rings to stay perfectly “rigid”. Curvature of the trunk in this allegory is the analogue of acceleration in our spacetime universe.)

treefolks prefer to stay very close to being vertical, and only tilt at significant deviations from the vertical in rare circumstances; it is only in recent years that they have started experimenting with more extreme angles of tilt. ~~Let us say that there is a hard limit of as to how far a treefolk can tilt its trunk; thus for instance it is not possible for a treefolk to place its trunk at a 60 degree angle from the vertical. (This is analogous to how matter is not able to travel faster than the speed of light in our world.)~~ *[Removed this hypothesis as being unnatural for the underlying Euclidean geometry – T.]*

Now we turn to the nature of the treefolk’s sentience, which is rather unusual. Namely – only one disk of the treefolk is conscious at any given time! As soon as the sun sets, a treefolk returns from stone to a living creature, and the lowest disk of that treefolk awakens and is able to sense its environment, as well as move the trunk above it. However, every minute, with the regularity of clockwork, the treefolk’s consciousness and memories transfer themselves to the next higher disk; the previous disk becomes petrifed into stone and no longer mobile or receiving sensory input (somewhat analogous to the rare human disease of fibrodysplasia ossificans progressiva, in which the body becomes increasingly ossified and unable to move). As the night progresses, the locus of the treefolk’s consciousness moves steadily upwards and more and more of the treefolk turns to stone, until it reaches the end of its trunk, at which point the treefolk turns completely into a stone column until the next night, at which point the process starts again. (In particular, no treefolk has ever been tall enough to retain its consciousness all the way to the next sunrise.) Treefolk are aware of this process, and in particular can count intervals of time by keeping track of how many times its consciousness has had to jump from one disk to the next; they use rings as a measure of time. For instance, if a treefolk experiences ten shifts of consciousness between one event and the next, the treefolk will know that ten minutes have elapsed between the two events; in their language, they would say that the second event occurred ten rings after the first.

The second unusual feature of the treefolk’s sentience is that at any given time, the treefolk can sense the portions of all nearby objects that are in the same plane as the disk, but not portions that are above or below this plane; in particular, some objects may be completely “invisible” to the treefolk of they are completely above or completely below the treefolk’s current plane of “vision”. Exactly how the treefolk senses its environment is not of central importance, but one could imagine either some sort of visual organ on each disk that is activated during the minute in which that disk is conscious, but which has a limited field of view (similar one that a knight might experience when wearing a helmet with only a narrow horizontal slit in their visor to see through), or perhaps some sort of horizontal echolocation ability. (Or, since we are in a fantasy setting, we can simply attribute this sensory ability to “magic”.) For instance, the picture below that (very crudely) depicts a treefolk standing vertically in an environment, fifty minutes after it first awakens, so that the disk that is fifty inches off the ground is currently sentient. The treefolk can sense any other object that is also fifty inches from the ground; for instance, it can “see” a slice of a bush to the left, and a slice of a boulder to the right, but cannot see the sign at all. (Let’s assume that this somewhat magical “vision” can penetrate through objects to some extent (much as “x-ray vision” would work in comic books), so it can get some idea for instance that the section of boulder it sees is somewhat wider than the slice of bush that it sees.) As the minutes pass and the treefolk’s consciousness moves to higher and higher rungs, the bush will fluctuate in size and then disappear from the treefolk’s point of “view”, and the boulder will also gradually shrink in size until disappearing several rings after the bush disappeared.

If the treefolk’s trunk is tilted at an angle, then its visual plane of view tilts similarly, and so the objects that it can see, and their relative positions and sizes, change somewhat. For instance, in the picture below, the bush, boulder, and sign remain in the same location, but the treefolk’s trunk has tilted; as such, it now senses a small slice of the sign (that will shortly disappear), and a (now smaller) slice of the boulder (that will grow for a couple rings before ultimately shrinking away to nothingness), but the bush has already vanished from view several rings previously.

At any given time, the treefolk only senses a two-dimensional slice of its surroundings, much like how the prisoners in Plato’s cave only see the two-dimensional shadows on the cave wall. As such, treefolks do not view the world around them as three-dimensional; to them, it is a two-dimensional world that slowly changes once every ring even if the three-dimensional world is completely static, similarly to how flipping the pages of an otherwise static flip book can give the illusion of movement. In particular, they do not have a concept in their language for “height”, but only for horizontal notions of spatial measurement, such as width; for instance, if a tall treefolk is next to a shorter treefolk that is 100 inches tall, with both treefolk vertical, it will think of that shorter treefolk as “living for 100 rings” rather than being 100 inches in height, since from the tall treefolk’s perspective, the shorter treefolk would be visible for 100 rings, and then disappear. These treefolk would also see that their rings line up: every time a ring passes for one treefolk, the portion of the other treefolk that is in view also advances by one ring. So treefolk, who usually stay close to vertical for most of their lives, have come to view rings as being universal measurements of time. They also do not view themselves as three-dimensional objects; somewhat like the characters in Edwin Abbott classic book “Flatland“, they think of themselves as two-dimensional disks, with each ring slightly changing the nature of that disk, much as humans feel their bodies changing slightly with each birthday. While they can twist the portion of their trunk above their currently conscious disk at various angles, they do not think of this twisting in three-dimensional terms; they think of it as willing their two-dimensional disk-shaped self into motion in a horizontal direction of their choosing.

Treefolk cannot communicate directly with other treefolk (and in particular one treefolk is not aware of which ring of another treefolk is currently conscious); but they can modify the appearance of their exterior on their currently conscious ring (or on rings above that ring, but not on the petrified rings below) for other treefolk to read. Two treefolks standing vertically side by side will then be able to communicate with each other by a kind of transient text messaging system, since they awaken at the same time, and at any given later moment, their conscious rings will be at the same height and each treefolk be able to read the messages that the other treefolk leaves for them, although a message that one treefolk leaves for another for one ring will vanish when these treefolk both shift their consciousnesses to the next ring. A human coming across these treefolks the following day would be able to view these messages (similar to how one can review a chat log in a text messaging app, though with the oldest messages at the bottom); they could also leave messages for the treefolk by placing text on some sort of sign that the treefolk can then read one line at a time (from bottom to top) on a subsequent night as their consciousness ascends through its rings. (Here we will assume that at some point in the past the humans have somehow learned the treefolk’s written language.) But from the point of view of the treefolk, their messages seem as impermanent to them as spoken words are to us: they last for a minute and then they are gone.

** â€” 2. Time contraction and width dilation â€” **

In recent years, treefolk scientists (or scholars/sages/wise ones, if one wishes to adhere as much as possible to the fantasy setting), studying the effect of significant tilting on other treefolk, discovered a strange phenomenon which they might term “time contraction” (similar to time dilation in special relativity, but with the opposite sign): if a treefolk test subject tilts at a significant angle, then it begins to “age” more rapidly in the sense that test subject will be seen to pass by more rings than the observer treefolk that remains vertical. For instance, with the test subject tilted at a angle, as 100 rings pass by for the vertical observer, rings can be counted on the tilted treefolk. This is obvious to human observers, who can readily explain the situation when they come across it during the day, in terms of trigonometry:

This leads to the following “twin paradox“: if two identical treefolk awaken at the same time, but one stays vertical while the other tilts away and then returns, then when they rejoin their rings will become out of sync, with the twisted treefolk being conscious at a given height several minutes after the vertical treefolk was conscious at that height. As such, communication now comes with a lag: a message left by the vertical treefolk at a given ring will take several minutes to be seen by the twisted treefolk, and the twisted treefolk would similarly have to leave its messages on a higher ring than it is currently conscious at in order to be seen by the vertical treefolk. Again, a human who comes across this situation in the day can readily explain the phenomenon geometrically, as the twisted treefolk takes longer (in terms of rings) to reach the same location as the vertical treefolk):

These treefolk scientists also observe a companion to the time contraction phenomenon, namely that of width dilation (the analogue of length contraction; a treefolk who is tilted at an angle will be seen by other (vertical) treefolk observers as having their shape distorted from a disk to an ellipse, with the width in the direction of the tilt being elongated (much like the slices of a carrot become longer and less circular when sliced diagonally). For instance, in the picture at the beginning of this section, the width of the tilted treefolk has increased by a factor of , or about fifteen percent. Once again, this is a phenomenon that humans, with their ability to visualize horizontal and vertical dimensions simultaneously, can readily explain via trigonometry (suppressing the rings on the tilted treefolk to reduce clutter):

The treefolk scientists were able to measure these effects more quantitatively. As they cannot directly sense any non-horizontal notions of space, they cannot directly compute the angle at which a given treefolk deviates from the vertical; but they can measure how much a treefolk “moves” in their two-dimensional plane of vision. Let’s say that the humans use the metric system of length measurement and have taught it (through some well-placed horizontal rulers perhaps) to the treefolk, who are able to use this system to measure horizontal displacements in units of centimeters. (They are unable to directly observe the inch-long height of their rings, as that is a purely vertical measurement, and so cannot use inches to directly measure horizontal displacements.) A treefolk that is tilted at an angle will then be seen to be “moving” at some number of centimeters per ring; with each ring that the vertical observer passes through, the tilted treefolk would appear to have shifted its position by that number of centimeters. After many experiments, the treefolk scientists eventually hit upon the following empirical law: if a treefolk is “moving” at centimeters per ring, then it will experience a time contraction of and a width dilation of , where is a physical constant that they compute to be about centimeters per ring. (Compare with special relativity, in which an object moving at meters per second experiences a time dilation of and a length contraction of , where the physical constant is now about meters per second.) However, they are unable to come up with a satisfactory explanation for this arbitrary-seeming law; it bears some resemblance to the Pythagorean theorem, which they would be familiar with from horizontal plane geometry, but until they view rings as a third spatial dimension rather than as a unit of time, they would struggle to describe this empirically observed time contraction and width dilation in purely geometric terms. But again, the analysis is simple to a human observer, who notices that the tilted treefolk is spatially displaced by centimeters whenever the vertical tree advances by rings (or inches), at which point the computation is straightforward from Pythagoras (and the mysterious constant is explained as being the number of centimeters in an inch):

At some point, these scientists might discover (either through actual experiment, or thought-experiment) what we would call the principle of relativity: the laws of geometry for a tilted treefolk are identical to that of a vertical treefolk. For instance, as mentioned previously, if a tilted treefolk appears to be moving at centimeters per second from the vantage point of a vertical treefolk, then the vertical treefolk will observe the tilted treefolk as experiencing a time contraction of and a width dilation of , but from the tilted treefolk’s point of view, it is the vertical treefolk which is moving at centimeters per second (in the opposite direction), and it will be the vertical treefolk that experiences the time contraction of and width dilation of . In particular, both treefolk will think that the other one is aging more rapidly, as each treefolk will see slightly more than one ring of the other pass by every time they pass a ring of their own. However, this is not a paradox, due to the *relativity of horizontality* (the analogue in this allegory to relativity of simultaneity in special relativity); two locations in space that are simultaneously visible to one treefolk (due to them lying on the same plane as one of the disks of that treefolk) need not be simultaneously visible to the other, if they are tilted at different angles. Again, this would be obvious to humans who can see the higher-dimensional picture: compare the planes of sight of the tilted treefolk in the figure below with the planes of sight of the vertical treefolk as depicted in the first figure of this section.

Similarly, the twin paradox discussed earlier continues to hold even when the “inertial” treefolk is not vertical:

[Strictly speaking one would need to move the treefolk to start at the exact same location, rather than merely being very close to each other, to deal with the slight synchronization discrepancy at the very bottom of the two twins in this image.]

Given two locations in and in (three-dimensional space), therefore, one treefolk may view the second location as displaced in space from the first location by centimeters in one direction (say east-west) and centimeters in an orthogonal direction (say north-south), while also being displaced by time by rings; but a treefolk tilted at a different angle may come up with different measures of the spatial displacement as well as a different measure of the ring displacement, due to the effects of time contraction, width dilation, non-relativity of horizontality, and the relative “motion” between the two treefolk. However, to an external human observer, it is clear from two applications of Pythagoras’s theorem that there is an invariant

See the figure below, where the dimension has been suppressed for simplicity.

From the principle of relativity, this invariance strongly suggests the laws of geometry should be invariant under transformations that preserve the interval . Humans would refer to such transformations as three-dimensional rigid motions, and the invariance of geometry under these motions would be an obvious fact to them; but it would be a highly unintuitive hypothesis for a treefolk used to viewing their environment as two dimensional space evolving one ring at a time.

Humans could also explain to the treefolk that their calculations would be simplified if they used the same unit of measurement for both horizontal length and vertical length, for instance using the inch to measure horizontal distances as well as the vertical height of their rings. This would normalize to be one, and is somewhat analogous to the use of Planck units in physics.

** â€” 3. The analogy with relativity â€” **

In this allegory, the treefolk are extremely limited in their ability to sense and interact with their environment, in comparison to the humans who can move (and look) rather freely in all three spatial dimensions, and who can easily explain the empirical scientific efforts of the treefolk to understand their environment in terms of three-dimensional geometry. But in the real four-dimensional spacetime that we live in, it is us who are the treefolk; we inhabit a worldline tracing through this spacetime, similar to the trunk of a treefolk, but at any given moment our consciousness only occupies a slice of that worldline, transferred from one slice to the next as we pass from moment to moment; the slices that we have already experienced are frozen in place, and it is only the present and future slices that we have some ability to still control. Thus, we experience the world as a three-dimensional body moving in time, as opposed to a “static” four-dimensional object. We can still map out these experiences in terms of four-dimensional spacetime diagrams (or diagrams in fewer dimensions, if we are able to omit some spatial directions for simplicity); this is analogous to how the humans in this world are easily able to map out the experiences of these treefolk using three-dimensional spatial diagrams (or the two-dimensional versions of them depicted here in which we suppress one of the two horizontal dimensions for simplicity). Even so, it takes a non-trivial amount of conceptual effort to identify these diagrams with reality, since we are so accustomed to the dynamic three-dimensional perspective. But one can try to adopt the perhaps this allegory can help in some cases to make this conceptual leap, and be able to think more like humans than like treefolk.

]]>If , a Poisson random variable with mean is a random variable taking values in the natural numbers with probability distribution

One is often interested in bounding upper tail probabilities for , or lower tail probabilities for . A standard tool for this is Bennett’s inequality:

Proposition 1 (Bennett’s inequality)One has for and for , where

From the Taylor expansion for we conclude Gaussian type tail bounds in the regime (and in particular when (in the spirit of the Chernoff, Bernstein, and Hoeffding inequalities). but in the regime where is large and positive one obtains a slight gain over these other classical bounds (of type, rather than ).

*Proof:* We use the exponential moment method. For any , we have from Markov’s inequality that

Remark 2Bennett’s inequality also applies for (suitably normalized) sums of bounded independent random variables. In some cases there are direct comparison inequalities available to relate those variables to the Poisson case. For instance, suppose is the sum of independent Boolean variables of total mean and with for some . Then for any natural number , we have As such, for small, one can efficiently control the tail probabilities of in terms of the tail probability of a Poisson random variable of mean close to ; this is of course very closely related to the well known fact that the Poisson distribution emerges as the limit of sums of many independent boolean variables, each of which is non-zero with small probability. See this paper of Bentkus and this paper of Pinelis for some further useful (and less obvious) comparison inequalities of this type.

In this note I wanted to record the observation that one can improve the Bennett bound by a small polynomial factor once one leaves the Gaussian regime , in particular gaining a factor of when . This observation is not difficult and is implicitly in the literature (one can extract it for instance from the much more general results of this paper of Talagrand, and the basic idea already appears in this paper of Glynn), but I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog. (But if a reader knows of a reference that basically contains the bound below, I would be happy to know of it.)

Proposition 3 (Improved Bennett’s inequality)One has for and for .

*Proof:* We begin with the first inequality. We may assume that , since otherwise the claim follows from the usual Bennett inequality. We expand out the left-hand side as

Now we turn to the second inequality. As before we may assume that . We first dispose of a degenerate case in which . Here the left-hand side is just

and the right-hand side is comparable to Since is negative and , we see that the right-hand side is , and the estimate holds in this case.It remains to consider the regime where and . The left-hand side expands as

The sum is dominated by the first term times a geometric series . The maximal is comparable to , so we can bound the left-hand side by Using the Stirling approximation as before we can bound this by which simplifies to after a routine calculation.The same analysis can be reversed to show that the bounds given above are basically sharp up to constants, at least when (and ) are large.

]]>We are currently advertising two positions in math and AI:

- A Level A position (for first time postdocs, upcoming PhDs); and
- a Level B position (for candidates with a postdoc / some research experience):

Both positions are for three years and are based at the Sydney Mathematical Research Institute. The positions are research only, but teaching at the University of Sydney is possible if desired. The successful candidate will have considerable time and flexibility to pursue their own research program.

We are after either:

- excellent mathematicians with some interest in programming and modern AI;
- excellent computer scientists with some interest and background in mathematics, as well as an interest in using AI to attack tough problems in mathematics.

In more detail: the original strategy, as described in the announcement, was to build a “tiling language” that was capable of encoding a certain “-adic Sudoku puzzle”, and then show that the latter type of puzzle had only non-periodic solutions if was a sufficiently large prime. As it turns out, the second half of this strategy worked out, but there was an issue in the first part: our tiling language was able (using -group-valued functions) to encode arbitrary boolean relationships between boolean functions, and was also able (using -valued functions) to encode “clock” functions such as that were part of our -adic Sudoku puzzle, but we were not able to make these two types of functions “talk” to each other in the way that was needed to encode the -adic Sudoku puzzle (the basic problem being that if is a finite abelian -group then there are no non-trivial subgroups of that are not contained in or trivial in the direction). As a consequence, we had to replace our “-adic Sudoku puzzle” by a “-adic Sudoku puzzle” which basically amounts to replacing the prime by a sufficiently large power of (we believe will suffice). This solved the encoding issue, but the analysis of the -adic Sudoku puzzles was a little bit more complicated than the -adic case, for the following reason. The following is a nice exercise in analysis:

Theorem 1 (Linearity in three directions implies full linearity)Let be a smooth function which is affine-linear on every horizontal line, diagonal (line of slope ), and anti-diagonal (line of slope ). In other words, for any , the functions , , and are each affine functions on . Then is an affine function on .

Indeed, the property of being affine in three directions shows that the quadratic form associated to the Hessian at any given point vanishes at , , and , and thus must vanish everywhere. In fact the smoothness hypothesis is not necessary; we leave this as an exercise to the interested reader. The same statement turns out to be true if one replaces with the cyclic group as long as is odd; this is the key for us to showing that our -adic Sudoku puzzles have an (approximate) two-dimensional affine structure, which on further analysis can then be used to show that it is in fact non-periodic. However, it turns out that the corresponding claim for cyclic groups can fail when is a sufficiently large power of ! In fact the general form of functions that are affine on every horizontal line, diagonal, and anti-diagonal takes the form

for some integer coefficients . This additional “pseudo-affine” term causes some additional technical complications but ultimately turns out to be manageable.During the writing process we also discovered that the encoding part of the proof becomes more modular and conceptual once one introduces two new definitions, that of an “expressible property” and a “weakly expressible property”. These concepts are somewhat analogous to that of sentences and sentences in the arithmetic hierarchy, or to algebraic sets and semi-algebraic sets in real algebraic geometry. Roughly speaking, an expressible property is a property of a tuple of functions , from an abelian group to finite abelian groups , such that the property can be expressed in terms of one or more tiling equations on the graph

For instance, the property that two functions differ by a constant can be expressed in terms of the tiling equation (the vertical line test), as well as where is the diagonal subgroup of . A weakly expressible property is an existential quantification of some expressible property , so that a tuple of functions obeys the property if and only if there exists an extension of this tuple by some additional functions that obey the property . It turns out that weakly expressible properties are closed under a number of useful operations, and allow us to easily construct quite complicated weakly expressible properties out of a “library” of simple weakly expressible properties, much as a complex computer program can be constructed out of simple library routines. In particular we will be able to “program” our Sudoku puzzle as a weakly expressible property.]]>

I just created an account at Mathstodon and it currently has very little content, but I hope to add some soon (though I will probably not be as prolific as some other mathematicians already on that site, such as John Baez or Nalini Joshi).

]]>