You are currently browsing Terence Tao’s articles.

Let {M_{n \times m}({\bf Z})} denote the space of {n \times m} matrices with integer entries, and let {GL_n({\bf Z})} be the group of invertible {n \times n} matrices with integer entries. The Smith normal form takes an arbitrary matrix {A \in M_{n \times m}({\bf Z})} and factorises it as {A = UDV}, where {U \in GL_n({\bf Z})}, {V \in GL_m({\bf Z})}, and {D} is a rectangular diagonal matrix, by which we mean that the principal {\min(n,m) \times \min(n,m)} minor is diagonal, with all other entries zero. Furthermore the diagonal entries of {D} are {\alpha_1,\dots,\alpha_k,0,\dots,0} for some {0 \leq k \leq \min(n,m)} (which is also the rank of {A}) with the numbers {\alpha_1,\dots,\alpha_k} (known as the invariant factors) principal divisors with {\alpha_1 | \dots | \alpha_k}. The invariant factors are uniquely determined; but there can be some freedom to modify the invertible matrices {U,V}. The Smith normal form can be computed easily; for instance, in SAGE, it can be computed calling the {{\tt smith\_form()}} function from the matrix class. The Smith normal form is also available for other principal ideal domains than the integers, but we will only be focused on the integer case here. For the purposes of this post, we will view the Smith normal form as a primitive operation on matrices that can be invoked as a “black box”.

In this post I would like to record how to use the Smith normal form to computationally manipulate two closely related classes of objects:

  • Subgroups {\Gamma \leq {\bf Z}^d} of a standard lattice {{\bf Z}^d} (or lattice subgroups for short);
  • Closed subgroups {H \leq ({\bf R}/{\bf Z})^d} of a standard torus {({\bf R}/{\bf Z})^d} (or closed torus subgroups for short).
(This arose for me due to the need to actually perform (with a collaborator) some numerical calculations with a number of lattice subgroups and closed torus subgroups.) It’s possible that all of these operations are already encoded in some existing object classes in a computational algebra package; I would be interested to know of such packages and classes for lattice subgroups or closed torus subgroups in the comments.

The above two classes of objects are isomorphic to each other by Pontryagin duality: if {\Gamma \leq {\bf Z}^d} is a lattice subgroup, then the orthogonal complement

\displaystyle  \Gamma^\perp := \{ x \in ({\bf R}/{\bf Z})^d: \langle x, \xi \rangle = 0 \forall \xi \in \Gamma \}

is a closed torus subgroup (with {\langle,\rangle: ({\bf R}/{\bf Z})^d \times {\bf Z}^d \rightarrow {\bf R}/{\bf Z}} the usual Fourier pairing); conversely, if {H \leq ({\bf R}/{\bf Z})^d} is a closed torus subgroup, then

\displaystyle  H^\perp := \{ \xi \in {\bf Z}^d: \langle x, \xi \rangle = 0 \forall x \in H \}

is a lattice subgroup. These two operations invert each other: {(\Gamma^\perp)^\perp = \Gamma} and {(H^\perp)^\perp = H}.

Example 1 The orthogonal complement of the lattice subgroup

\displaystyle  2{\bf Z} \times \{0\} = \{ (2n,0): n \in {\bf Z}\} \leq {\bf Z}^2

is the closed torus subgroup

\displaystyle  (\frac{1}{2}{\bf Z}/{\bf Z}) \times ({\bf R}/{\bf Z}) = \{ (x,y) \in ({\bf R}/{\bf Z})^2: 2x=0\} \leq ({\bf R}/{\bf Z})^2

and conversely.

Let us focus first on lattice subgroups {\Gamma \leq {\bf Z}^d}. As all such subgroups are finitely generated abelian groups, one way to describe a lattice subgroup is to specify a set {v_1,\dots,v_n \in \Gamma} of generators of {\Gamma}. Equivalently, we have

\displaystyle  \Gamma = A {\bf Z}^n

where {A \in M_{d \times n}({\bf Z})} is the matrix whose columns are {v_1,\dots,v_n}. Applying the Smith normal form {A = UDV}, we conclude that

\displaystyle  \Gamma = UDV{\bf Z}^n = UD{\bf Z}^n

so in particular {\Gamma} is isomorphic (with respect to the automorphism group {GL_d({\bf Z})} of {{\bf Z}^d}) to {D{\bf Z}^n}. In particular, we see that {\Gamma} is a free abelian group of rank {k}, where {k} is the rank of {D} (or {A}). This representation also allows one to trim the representation {A {\bf Z}^n} down to {U D'{\bf Z}^k}, where {D' \in M_{d \times k}} is the matrix formed from the {k} left columns of {D}; the columns of {UD'} then give a basis for {\Gamma}. Let us call this a trimmed representation of {A{\bf Z}^n}.

Example 2 Let {\Gamma \leq {\bf Z}^3} be the lattice subgroup generated by {(1,3,1)}, {(2,-2,2)}, {(3,1,3)}, thus {\Gamma = A {\bf Z}^3} with {A = \begin{pmatrix} 1 & 2 & 3 \\ 3 & -2 & 1 \\ 1 & 2 & 3 \end{pmatrix}}. A Smith normal form for {A} is given by

\displaystyle  A = \begin{pmatrix} 3 & 1 & 1 \\ 1 & 0 & 0 \\ 3 & 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 8 & 0 \\ 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 3 & -2 & 1 \\ -1 & 1 & 0 \\ 1 & 0 & 0 \end{pmatrix}

so {A{\bf Z}^3} is a rank two lattice with a basis of {(3,1,3) \times 1 = (3,1,3)} and {(1,0,1) \times 8 = (8,0,8)} (and the invariant factors are {1} and {8}). The trimmed representation is

\displaystyle  A {\bf Z}^3 = \begin{pmatrix} 3 & 1 & 1 \\ 1 & 0 & 0 \\ 3 & 1 & 0 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 8 \\ 0 & 0 \end{pmatrix} {\bf Z}^2 = \begin{pmatrix} 3 & 8 \\ 1 & 0 \\ 3 & 8 \end{pmatrix} {\bf Z}^2.

There are other Smith normal forms for {A}, giving slightly different representations here, but the rank and invariant factors will always be the same.

By the above discussion we can represent a lattice subgroup {\Gamma \leq {\bf Z}^d} by a matrix {A \in M_{d \times n}({\bf Z})} for some {n}; this representation is not unique, but we will address this issue shortly. For now, we focus on the question of how to use such data representations of subgroups to perform basic operations on lattice subgroups. There are some operations that are very easy to perform using this data representation:

  • (Applying a linear transformation) if {T \in M_{d' \times d}({\bf Z})}, so that {T} is also a linear transformation from {{\bf Z}^d} to {{\bf Z}^{d'}}, then {T} maps lattice subgroups to lattice subgroups, and clearly maps the lattice subgroup {A{\bf Z}^n} to {(TA){\bf Z}^n} for any {A \in M_{d \times n}({\bf Z})}.
  • (Sum) Given two lattice subgroups {A_1 {\bf Z}^{n_1}, A_2 {\bf Z}^{n_2} \leq {\bf Z}^d} for some {A_1 \in M_{d \times n_1}({\bf Z})}, {A_2 \in M_{d \times n_2}({\bf Z})}, the sum {A_1 {\bf Z}^{n_1} + A_2 {\bf Z}^{n_2}} is equal to the lattice subgroup {A {\bf Z}^{n_1+n_2}}, where {A = (A_1 A_2) \in M_{d \times n_1 + n_2}({\bf Z})} is the matrix formed by concatenating the columns of {A_1} with the columns of {A_2}.
  • (Direct sum) Given two lattice subgroups {A_1 {\bf Z}^{n_1} \leq {\bf Z}^{d_1}}, {A_2 {\bf Z}^{n_2} \leq {\bf Z}^{d_2}}, the direct sum {A_1 {\bf Z}^{n_1} \times A_2 {\bf Z}^{n_2}} is equal to the lattice subgroup {A {\bf Z}^{n_1+n_2}}, where {A = \begin{pmatrix} A_1 & 0 \\ 0 & A_2 \end{pmatrix} \in M_{d_1+d_2 \times n_1 + n_2}({\bf Z})} is the block matrix formed by taking the direct sum of {A_1} and {A_2}.

One can also use Smith normal form to detect when one lattice subgroup {B {\bf Z}^m \leq {\bf Z}^d} is a subgroup of another lattice subgroup {A {\bf Z}^n \leq {\bf Z}^d}. Using Smith normal form factorization {A = U D V}, with invariant factors {\alpha_1|\dots|\alpha_k}, the relation {B {\bf Z}^m \leq A {\bf Z}^n} is equivalent after some manipulation to

\displaystyle  U^{-1} B {\bf Z}^m \leq D {\bf Z}^n.

The group {U^{-1} B {\bf Z}^m} is generated by the columns of {U^{-1} B}, so this gives a test to determine whether {B {\bf Z}^{m} \leq A {\bf Z}^{n}}: the {i^{th}} row of {U^{-1} B} must be divisible by {\alpha_i} for {i=1,\dots,k}, and all other rows must vanish.

Example 3 To test whether the lattice subgroup {\Gamma'} generated by {(1,1,1)} and {(0,2,0)} is contained in the lattice subgroup {\Gamma = A{\bf Z}^3} from Example 2, we write {\Gamma'} as {B {\bf Z}^2} with {B = \begin{pmatrix} 1 & 0 \\ 1 & 2 \\ 1 & 0\end{pmatrix}}, and observe that

\displaystyle  U^{-1} B = \begin{pmatrix} 1 & 2 \\ -2 & -6 \\ 0 & 0 \end{pmatrix}.

The first row is of course divisible by {1}, and the last row vanishes as required, but the second row is not divisible by {8}, so {\Gamma'} is not contained in {\Gamma} (but {4\Gamma'} is); also a similar computation verifies that {\Gamma} is conversely contained in {\Gamma'}.

One can now test whether {B{\bf Z}^m = A{\bf Z}^n} by testing whether {B{\bf Z}^m \leq A{\bf Z}^n} and {A{\bf Z}^n \leq B{\bf Z}^m} simultaneously hold (there may be more efficient ways to do this, but this is already computationally manageable in many applications). This in principle addresses the issue of non-uniqueness of representation of a subgroup {\Gamma} in the form {A{\bf Z}^n}.

Next, we consider the question of representing the intersection {A{\bf Z}^n \cap B{\bf Z}^m} of two subgroups {A{\bf Z}^n, B{\bf Z}^m \leq {\bf Z}^d} in the form {C{\bf Z}^p} for some {p} and {C \in M_{d \times p}({\bf Z})}. We can write

\displaystyle  A{\bf Z}^n \cap B{\bf Z}^m = \{ Ax: Ax = By \hbox{ for some } x \in {\bf Z}^n, y \in {\bf Z}^m \}

\displaystyle  = (A 0) \{ z \in {\bf Z}^{n+m}: (A B) z = 0 \}

where {(A B) \in M_{d \times n+m}({\bf Z})} is the matrix formed by concatenating {A} and {B}, and similarly for {(A 0) \in M_{d \times n+m}({\bf Z})} (here we use the change of variable {z = \begin{pmatrix} x \\ -y \end{pmatrix}}). We apply the Smith normal form to {(A B)} to write

\displaystyle  (A B) = U D V

where {U \in GL_d({\bf Z})}, {D \in M_{d \times n+m}({\bf Z})}, {V \in GL_{n+m}({\bf Z})} with {D} of rank {k}. We can then write

\displaystyle  \{ z \in {\bf Z}^{n+m}: (A B) z = 0 \} = V^{-1} \{ w \in {\bf Z}^{n+m}: Dw = 0 \}

\displaystyle  = V^{-1} (\{0\}^k \times {\bf Z}^{n+m-k})

(making the change of variables {w = Vz}). Thus we can write {A{\bf Z}^n \cap B{\bf Z}^m = C {\bf Z}^{n+m-k}} where {C \in M_{d \times n+m-k}({\bf Z})} consists of the right {n+m-k} columns of {(A 0) V^{-1} \in M_{d \times n+m}({\bf Z})}.

Example 4 With the lattice {A{\bf Z}^3} from Example 2, we shall compute the intersection of {A{\bf Z}^3} with the subgroup {{\bf Z}^2 \times \{0\}}, which one can also write as {B{\bf Z}^2} with {B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}}. We obtain a Smith normal form

\displaystyle  (A B) = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \end{pmatrix} \begin{pmatrix} 3 & -2 & 1 & 0 & 1 \\ 1 & 2 & 3 & 1 & 0 \\ 1 & 2 & 3 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{pmatrix}

so {k=3}. We have

\displaystyle  (A 0) V^{-1} = \begin{pmatrix} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 3 & 0 & -8 \\ 0 & 0 & 1 & 0 & 0 \end{pmatrix}

and so we can write {A{\bf Z}^3 \cap B{\bf Z}^2 = C{\bf Z}^2} where

\displaystyle  C = \begin{pmatrix} 0 & 0 \\ 0 & -8 \\ 0 & 0 \end{pmatrix}.

One can trim this representation if desired, for instance by deleting the first column of {C} (and replacing {{\bf Z}^2} with {{\bf Z}}). Thus the intersection of {A{\bf Z}^3} with {{\bf Z}^2 \times \{0\}} is the rank one subgroup generated by {(0,-8,0)}.

A similar calculation allows one to represent the pullback {T^{-1} (A {\bf Z}^n) \leq {\bf Z}^{d'}} of a subgroup {A{\bf Z}^n \leq {\bf Z}^d} via a linear transformation {T \in M_{d \times d'}({\bf Z})}, since

\displaystyle T^{-1} (A {\bf Z}^n) = \{ x \in {\bf Z}^{d'}: Tx = Ay \hbox{ for some } y \in {\bf Z}^m \}

\displaystyle  = (I 0) \{ z \in {\bf Z}^{d'+m}: (T A) z = 0 \}

where {(I 0) \in M_{d' \times d'+m}({\bf Z})} is the concatenation of the {d' \times d'} identity matrix {I} and the {d' \times m} zero matrix. Applying the Smith normal form to write {(T A) = UDV} with {D} of rank {k}, the same argument as before allows us to write {T^{-1}(A{\bf Z}^n) = C {\bf Z}^{d'+m-k}} where {C \in M_{d' \times d'+m-k}} consists of the right {d'+m-k} columns of {(I 0) V^{-1} \in M_{d' \times d'+m}({\bf Z})}.

Among other things, this allows one to describe lattices given by systems of linear equations and congruences in the {A{\bf Z}^n} format. Indeed, the set of lattice vectors {x \in {\bf Z}^d} that solve the system of congruences

\displaystyle  \alpha_i | x \cdot v_i \ \ \ \ \ (1)

for {i=1,\dots,k}, some natural numbers {\alpha_i}, and some lattice vectors {v_i \in {\bf Z}^d}, together with an additional system of equations

\displaystyle  x \cdot w_j = 0 \ \ \ \ \ (2)

for {j=1,\dots,l} and some lattice vectors {w_j \in {\bf Z}^d}, can be written as {T^{-1}(A {\bf Z}^k)} where {T \in M_{k+l \times d}({\bf Z})} is the matrix with rows {v_1,\dots,v_k,w_1,\dots,w_l}, and {A \in M_{k+l \times k}({\bf Z})} is the diagonal matrix with diagonal entries {\alpha_1,\dots,\alpha_k}. Conversely, any subgroup {A{\bf Z}^n} can be described in this form by first using the trimmed representation {A{\bf Z}^n = UD'{\bf Z}^k}, at which point membership of a lattice vector {x \in {\bf Z}^d} in {A{\bf Z}^n} is seen to be equivalent to the congruences

\displaystyle  \alpha_i | U^{-1} x \cdot e_i

for {i=1,\dots,k} (where {k} is the rank, {\alpha_1,\dots,\alpha_k} are the invariant factors, and {e_1,\dots,e_d} is the standard basis of {{\bf Z}^d}) together with the equations

\displaystyle  U^{-1} x \cdot e_j = 0

for {j=k+1,\dots,d}. Thus one can obtain a representation in the form (1), (2) with {l=d-k}, and {v_1,\dots,v_k,w_1,\dots,w_{d-k}} to be the rows of {U^{-1}} in order.

Example 5 With the lattice subgroup {A{\bf Z}^3} from Example 2, we have {U^{-1} = \begin{pmatrix} 0 & 1 & 0 \\ 0 & -3 & 1 \\ 1 & 0 & -1 \end{pmatrix}}, and so {A{\bf Z}^3} consists of those triples {(x_1,x_2,x_3)} which obey the (redundant) congruence

\displaystyle  1 | x_2,

the congruence

\displaystyle  8 | -3x_2 + x_3

and the identity

\displaystyle  x_1 - x_3 = 0.

Conversely, one can use the above procedure to convert the above system of congruences and identities back into a form {A' {\bf Z}^{n'}} (though depending on which Smith normal form one chooses, the end result may be a different representation of the same lattice group {A{\bf Z}^3}).

Now we apply Pontryagin duality. We claim the identity

\displaystyle  (A{\bf Z}^n)^\perp = \{ x \in ({\bf R}/{\bf Z})^d: A^Tx = 0 \}

for any {A \in M_{d \times n}({\bf Z})} (where {A^T \in M_{n \times d}({\bf Z})} induces a homomorphism from {({\bf R}/{\bf Z})^d} to {({\bf R}/{\bf Z})^n} in the obvious fashion). This can be verified by direct computation when {A} is a (rectangular) diagonal matrix, and the general case then easily follows from a Smith normal form computation (one can presumably also derive it from the category-theoretic properties of Pontryagin duality, although I will not do so here). So closed torus subgroups that are defined by a system of linear equations (over {{\bf R}/{\bf Z}}, with integer coefficients) are represented in the form {(A{\bf Z}^n)^\perp} of an orthogonal complement of a lattice subgroup. Using the trimmed form {A{\bf Z}^n = U D' {\bf Z}^k}, we see that

\displaystyle  (A{\bf Z}^n)^\perp = \{ x \in ({\bf R}/{\bf Z})^d: (UD')^T x = 0 \}

\displaystyle  = (U^{-1})^T \{ y \in ({\bf R}/{\bf Z})^d: (D')^T x = 0 \}

\displaystyle  = (U^{-1})^T (\frac{1}{\alpha_1} {\bf Z}/{\bf Z} \times \dots \times \frac{1}{\alpha_k} {\bf Z}/{\bf Z} \times ({\bf R}/{\bf Z})^{d-k}),

giving an explicit representation “in coordinates” of such a closed torus subgroup. In particular we can read off the isomorphism class of a closed torus subgroup as the product of a finite number of cyclic groups and a torus:

\displaystyle (A{\bf Z}^n)^\perp \equiv ({\bf Z}/\alpha_1 {\bf Z}) \times \dots \times ({\bf Z}/\alpha_k{\bf Z}) \times ({\bf R}/{\bf Z})^{d-k}.

Example 6 The orthogonal complement of the lattice subgroup {A{\bf Z}^3} from Example 2 is the closed torus subgroup

\displaystyle  (A{\bf Z}^3)^\perp = \{ (x_1,x_2,x_3) \in ({\bf R}/{\bf Z})^3: x_1 + 3x_2 + x_3

\displaystyle  = 2x_1 - 2x_2 + 2x_3 = 3x_1 + x_2 + 3x_3 = 0 \};

using the trimmed representation of {(A{\bf Z}^3)^\perp}, one can simplify this a little to

\displaystyle  (A{\bf Z}^3)^\perp = \{ (x_1,x_2,x_3) \in ({\bf R}/{\bf Z})^3: 3x_1 + x_2 + 3x_3

\displaystyle  = 8 x_1 + 8x_3 = 0 \}

and one can also write this as the image of the group {\{ 0\} \times (\frac{1}{8}{\bf Z}/{\bf Z}) \times ({\bf R}/{\bf Z})} under the torus isomorphism

\displaystyle  (y_1,y_2,y_3) \mapsto (y_3, y_1 - 3y_2, y_2 - y_3).

In other words, one can write

\displaystyle  (A{\bf Z}^3)^\perp = \{ (y,0,-y) + (0,-\frac{3a}{8},\frac{a}{8}): y \in {\bf R}/{\bf Z}; a \in {\bf Z}/8{\bf Z} \}

so that {(A{\bf Z}^3)^\perp} is isomorphic to {{\bf R}/{\bf Z} \times {\bf Z}/8{\bf Z}}.

We can now dualize all of the previous computable operations on subgroups of {{\bf Z}^d} to produce computable operations on closed subgroups of {({\bf R}/{\bf Z})^d}. For instance:

  • To form the intersection or sum of two closed torus subgroups {(A_1 {\bf Z}^{n_1})^\perp, (A_2 {\bf Z}^{n_2})^\perp \leq ({\bf R}/{\bf Z})^d}, use the identities

    \displaystyle  (A_1 {\bf Z}^{n_1})^\perp \cap (A_2 {\bf Z}^{n_2})^\perp = (A_1 {\bf Z}^{n_1} + A_2 {\bf Z}^{n_2})^\perp

    and

    \displaystyle  (A_1 {\bf Z}^{n_1})^\perp + (A_2 {\bf Z}^{n_2})^\perp = (A_1 {\bf Z}^{n_1} \cap A_2 {\bf Z}^{n_2})^\perp

    and then calculate the sum or intersection of the lattice subgroups {A_1 {\bf Z}^{n_1}, A_2 {\bf Z}^{n_2}} by the previous methods. Similarly, the operation of direct sum of two closed torus subgroups dualises to the operation of direct sum of two lattice subgroups.
  • To determine whether one closed torus subgroup {(A_1 {\bf Z}^{n_1})^\perp \leq ({\bf R}/{\bf Z})^d} is contained in (or equal to) another closed torus subgroup {(A_2 {\bf Z}^{n_2})^\perp \leq ({\bf R}/{\bf Z})^d}, simply use the preceding methods to check whether the lattice subgroup {A_2 {\bf Z}^{n_2}} is contained in (or equal to) the lattice subgroup {A_1 {\bf Z}^{n_1}}.
  • To compute the pull back {T^{-1}( (A{\bf Z}^n)^\perp )} of a closed torus subgroup {(A{\bf Z}^n)^\perp \leq ({\bf R}/{\bf Z})^d} via a linear transformation {T \in M_{d' \times d}({\bf Z})}, use the identity

    \displaystyle T^{-1}( (A{\bf Z}^n)^\perp ) = (T^T A {\bf Z}^n)^\perp.

    Similarly, to compute the image {T( (B {\bf Z}^m)^\perp )} of a closed torus subgroup {(B {\bf Z}^m)^\perp \leq ({\bf R}/{\bf Z})^{d'}}, use the identity

    \displaystyle T( (B{\bf Z}^m)^\perp ) = ((T^T)^{-1} B {\bf Z}^m)^\perp.

Example 7 Suppose one wants to compute the sum of the closed torus subgroup {(A{\bf Z}^3)^\perp} from Example 6 with the closed torus subgroup {\{0\}^2 \times {\bf R}/{\bf Z}}. This latter group is the orthogonal complement of the lattice subgroup {{\bf Z}^2 \times \{0\}} considered in Example 4. Thus we have {(A{\bf Z}^3)^\perp + (\{0\}^2 \times {\bf R}/{\bf Z}) = (C{\bf Z}^2)^\perp} where {C} is the matrix from Example 6; discarding the zero column, we thus have

\displaystyle (A{\bf Z}^3)^\perp + (\{0\}^2 \times {\bf R}/{\bf Z}) = \{ (x_1,x_2,x_3): -8x_2 = 0 \}.

[This post is collectively authored by the ICM structure committee, whom I am currently chairing – T.]

The ICM structure committee is responsible for the preparation of the Scientific Program of the International Congress of Mathematicians (ICM). It decides the structure of the Scientific Program, in particular,

  • the number of plenary lectures,
  • the sections and their precise definition,
  • the target number of talks in each section,
  • other kind of lectures, and
  • the arrangement of sections.

(The actual selection of speakers and the local organization of the ICM are handled separately by the Program Committee and Organizing Comittee respectively.)

Our committee can also propose more radical changes to the format of the congress, although certain components of the congress, such as the prize lectures and satellite events, are outside the jurisdiction of this committee. For instance, in 2019 we proposed the addition of two new categories of lectures, “special sectional lectures” and “special plenary lectures”, which are broad and experimental categories of lectures that do not fall under the traditional format of a mathematician presenting their recent advances in a given section, but can instead highlight (for instance) emerging connections between two areas of mathematics, or present a “big picture” talk on a “hot topic” from an expert with the appropriate perspective. These new categories made their debut at the recently concluded virtual ICM, held on July 6-14, 2022.

Over the next year or so, our committee will conduct our deliberations on proposed changes to the structure of the congress for the next ICM (to be held in-person in Philadelphia in 2026) and beyond. As part of the preparation for these deliberations, we are soliciting feedback from the general mathematics community (on this blog and elsewhere) on the current state of the ICM, and any proposals to improve that state for the subsequent congresses; we had issued a similar call on this blog back in 2019. This time around, of course, the situation is complicated by the extraordinary and exceptional circumstances that led to the 2022 ICM being moved to a virtual platform on short notice, and so it is difficult for many reasons to hold the 2022 virtual ICM as a model for subsequent congresses. On the other hand, the scientific program had already been selected by the 2022 ICM Program Committee prior to the invasion of Ukraine, and feedback on the content of that program will be of great value to our committee.

Among the specific questions (in no particular order) for which we seek comments are the following:

  1. Are there suggestions to change the format of the ICM that would increase its value to the mathematical community?
  2. Are there suggestions to change the format of the ICM that would encourage greater participation and interest in attending, particularly with regards to junior researchers and mathematicians from developing countries?
  3. The special sectional and special plenary lectures were introduced in part to increase the emphasis on the quality of exposition at ICM lectures. Has this in fact resulted in a notable improvement in exposition, and should any alternations be made to the special lecture component of the ICM?
  4. Is the balance between plenary talks, sectional talks, special plenary and sectional talks, and public talks at an optimal level?  There is only a finite amount of space in the calendar, so any increase in the number or length of one of these types of talks will come at the expense of another.
  5. The ICM is generally perceived to be more important to pure mathematics than to applied mathematics.  In what ways can the ICM be made more relevant and attractive to applied mathematicians, or should one not try to do so?
  6. Are there structural barriers that cause certain areas or styles of mathematics (such as applied or interdisciplinary mathematics) or certain groups of mathematicians to be under-represented at the ICM?  What, if anything, can be done to mitigate these barriers?
  7. The recently concluded virtual ICM had a sui generis format, in which the core virtual program was supplemented by a number of physical “overlay” satellite events. Are there any positive features of that format which could potentially be usefully adapted to such congresses? For instance, should there be any virtual or hybrid components at the next ICM?

Of course, we do not expect these complex and difficult questions to be resolved within this blog post, and debating these and other issues would likely be a major component of our internal committee discussions.  Nevertheless, we would value constructive comments towards the above questions (or on other topics within the scope of our committee) to help inform these subsequent discussions.  We therefore welcome and invite such commentary, either as responses to this blog post, or sent privately to one of the members of our committee.  We would also be interested in having readers share their personal experiences at past congresses, and how it compares with other major conferences of this type.   (But in order to keep the discussion focused and constructive, we request that comments here refrain from discussing topics that are out of the scope of this committee, such as suggesting specific potential speakers for the next congress, which is a task instead for the 2022 ICM Program Committee. Comments that are specific to the recently concluded virtual ICM can be made instead at this blog post.)

I’m currently in Helsinki, Finland for the General Assembly meeting of the International Mathematical Union (IMU), which runs the International Congress of Mathematicians (ICM) as well as several other events and initiatives. In particular the assembly voted on the location of the 2026 ICM; it will be held in Philadelphia, USA (with the general assembly being held in New York, USA).

Tomorrow the IMU award ceremony will take place, where the recipients of the various IMU awards (such as the Fields medal) will be revealed and honored. Event information can be found at this Facebook Event page, and will also be streamed at this Youtube page; participants who have registered at the virtual ICM can also view it from the web page links they would have received in email in the last few days. (Due to high demand, registration for the virtual ICM has unfortunately reached the capacity of the live platform; but lectures will be made available on the IMU Youtube channel a few hours after they are given. The virtual ICM program will begin the day after the award ceremony, beginning with the lectures of the prize laureates.

We have an unofficial ICM Discord server set up to follow the virtual ICM as it happens, with events set up for the prize ceremony and individual days of the congress, as well as for individual sections, as well as more recreational channels, such as a speculation page for the IMU prize winners. There are also a number of other virtual ICM satellite events that are being held either simultaneously with, or close to, the virtual ICM; I would like to draw particular attention to the satellite public lectures by Williamson (July 8), Giorgi (July 11), and Tokieda (July 13), which was also highlighted in my previous blog post. (EDIT: I would also like to mention the now-live poster room for the short communic

After the virtual ICM concludes, I will solicit feedback on this blog (in my capacity as chair of the IMU Structure Committee) on all aspects of that congress, as well as suggestions for future congresses; but I am not formally requesting such feedback at this present time.

The (now virtual) 2022 International Congress of Mathematicians, which will be held on July 6-14, now has open registration (free of charge).

I’ll also take this opportunity to mention that there are a large number of supporting satellite events for the virtual ICM, which are listed on this web page. I’d like to draw particular attention to the public lecture satellite event, now hosted by the London Mathematical Society, that will feature three speakers:

(As with many other of the satellite events, these public lectures will require a separate registration from that of the main ICM.)

Let {G} be a finite set of order {N}; in applications {G} will be typically something like a finite abelian group, such as the cyclic group {{\bf Z}/N{\bf Z}}. Let us define a {1}-bounded function to be a function {f: G \rightarrow {\bf C}} such that {|f(n)| \leq 1} for all {n \in G}. There are many seminorms {\| \|} of interest that one places on functions {f: G \rightarrow {\bf C}} that are bounded by {1} on {1}-bounded functions, such as the Gowers uniformity seminorms {\| \|_k} for {k \geq 1} (which are genuine norms for {k \geq 2}). All seminorms in this post will be implicitly assumed to obey this property.

In additive combinatorics, a significant role is played by inverse theorems, which abstractly take the following form for certain choices of seminorm {\| \|}, some parameters {\eta, \varepsilon>0}, and some class {{\mathcal F}} of {1}-bounded functions:

Theorem 1 (Inverse theorem template) If {f} is a {1}-bounded function with {\|f\| \geq \eta}, then there exists {F \in {\mathcal F}} such that {|\langle f, F \rangle| \geq \varepsilon}, where {\langle,\rangle} denotes the usual inner product

\displaystyle  \langle f, F \rangle := {\bf E}_{n \in G} f(n) \overline{F(n)}.

Informally, one should think of {\eta} as being somewhat small but fixed independently of {N}, {\varepsilon} as being somewhat smaller but depending only on {\eta} (and on the seminorm), and {{\mathcal F}} as representing the “structured functions” for these choices of parameters. There is some flexibility in exactly how to choose the class {{\mathcal F}} of structured functions, but intuitively an inverse theorem should become more powerful when this class is small. Accordingly, let us define the {(\eta,\varepsilon)}-entropy of the seminorm {\| \|} to be the least cardinality of {{\mathcal F}} for which such an inverse theorem holds. Seminorms with low entropy are ones for which inverse theorems can be expected to be a useful tool. This concept arose in some discussions I had with Ben Green many years ago, but never appeared in print, so I decided to record some observations we had on this concept here on this blog.

Lebesgue norms {\| f\|_{L^p} := ({\bf E}_{n \in G} |f(n)|^p)^{1/p}} for {1 < p < \infty} have exponentially large entropy (and so inverse theorems are not expected to be useful in this case):

Proposition 2 ({L^p} norm has exponentially large inverse entropy) Let {1 < p < \infty} and {0 < \eta < 1}. Then the {(\eta,\eta^p/4)}-entropy of {\| \|_{L^p}} is at most {(1+8/\eta^p)^N}. Conversely, for any {\varepsilon>0}, the {(\eta,\varepsilon)}-entropy of {\| \|_{L^p}} is at least {\exp( c \varepsilon^2 N)} for some absolute constant {c>0}.

Proof: If {f} is {1}-bounded with {\|f\|_{L^p} \geq \eta}, then we have

\displaystyle  |\langle f, |f|^{p-2} f \rangle| \geq \eta^p

and hence by the triangle inequality we have

\displaystyle  |\langle f, F \rangle| \geq \eta^p/2

where {F} is either the real or imaginary part of {|f|^{p-2} f}, which takes values in {[-1,1]}. If we let {\tilde F} be {F} rounded to the nearest multiple of {\eta^p/4}, then by the triangle inequality again we have

\displaystyle  |\langle f, \tilde F \rangle| \geq \eta^p/4.

There are only at most {1+8/\eta^p} possible values for each value {\tilde F(n)} of {\tilde F}, and hence at most {(1+8/\eta^p)^N} possible choices for {\tilde F}. This gives the first claim.

Now suppose that there is an {(\eta,\varepsilon)}-inverse theorem for some {{\mathcal F}} of cardinality {M}. If we let {f} be a random sign function (so the {f(n)} are independent random variables taking values in {-1,+1} with equal probability), then there is a random {F \in {\mathcal F}} such that

\displaystyle  |\langle f, F \rangle| \geq \varepsilon

and hence by the pigeonhole principle there is a deterministic {F \in {\mathcal F}} such that

\displaystyle  {\bf P}( |\langle f, F \rangle| \geq \varepsilon ) \geq 1/M.

On the other hand, from the Hoeffding inequality one has

\displaystyle  {\bf P}( |\langle f, F \rangle| \geq \varepsilon ) \ll \exp( - c \varepsilon^2 N )

for some absolute constant {c}, hence

\displaystyle  M \geq \exp( c \varepsilon^2 N )

as claimed. \Box

Most seminorms of interest in additive combinatorics, such as the Gowers uniformity norms, are bounded by some finite {L^p} norm thanks to Hölder’s inequality, so from the above proposition and the obvious monotonicity properties of entropy, we conclude that all Gowers norms on finite abelian groups {G} have at most exponential inverse theorem entropy. But we can do significantly better than this:

  • For the {U^1} seminorm {\|f\|_{U^1(G)} := |{\bf E}_{n \in G} f(n)|}, one can simply take {{\mathcal F} = \{1\}} to consist of the constant function {1}, and the {(\eta,\eta)}-entropy is clearly equal to {1} for any {0 < \eta < 1}.
  • For the {U^2} norm, the standard Fourier-analytic inverse theorem asserts that if {\|f\|_{U^2(G)} \geq \eta} then {|\langle f, e(\xi \cdot) \rangle| \geq \eta^2} for some Fourier character {\xi \in \hat G}. Thus the {(\eta,\eta^2)}-entropy is at most {N}.
  • For the {U^k({\bf Z}/N{\bf Z})} norm on cyclic groups for {k > 2}, the inverse theorem proved by Green, Ziegler, and myself gives an {(\eta,\varepsilon)}-inverse theorem for some {\varepsilon \gg_{k,\eta} 1} and {{\mathcal F}} consisting of nilsequences {n \mapsto F(g(n) \Gamma)} for some filtered nilmanifold {G/\Gamma} of degree {k-1} in a finite collection of cardinality {O_{\eta,k}(1)}, some polynomial sequence {g: {\bf Z} \rightarrow G} (which was subsequently observed by Candela-Sisask (see also Manners) that one can choose to be {N}-periodic), and some Lipschitz function {F: G/\Gamma \rightarrow {\bf C}} of Lipschitz norm {O_{\eta,k}(1)}. By the Arzela-Ascoli theorem, the number of possible {F} (up to uniform errors of size at most {\varepsilon/2}, say) is {O_{\eta,k}(1)}. By standard arguments one can also ensure that the coefficients of the polynomial {g} are {O_{\eta,k}(1)}, and then by periodicity there are only {O(N^{O_{\eta,k}(1)}} such polynomials. As a consequence, the {(\eta,\varepsilon)}-entropy is of polynomial size {O_{\eta,k}( N^{O_{\eta,k}(1)} )} (a fact that seems to have first been implicitly observed in Lemma 6.2 of this paper of Frantzikinakis; thanks to Ben Green for this reference). One can obtain more precise dependence on {\eta,k} using the quantitative version of this inverse theorem due to Manners; back of the envelope calculations using Section 5 of that paper suggest to me that one can take {\varepsilon = \eta^{O_k(1)}} to be polynomial in {\eta} and the entropy to be of the order {O_k( N^{\exp(\exp(\eta^{-O_k(1)}))} )}, or alternatively one can reduce the entropy to {O_k( \exp(\exp(\eta^{-O_k(1)})) N^{\eta^{-O_k(1)}})} at the cost of degrading {\varepsilon} to {1/\exp\exp( O(\eta^{-O(1)}))}.
  • If one replaces the cyclic group {{\bf Z}/N{\bf Z}} by a vector space {{\bf F}_p^n} over some fixed finite field {{\bf F}_p} of prime order (so that {N=p^n}), then the inverse theorem of Ziegler and myself (available in both high and low characteristic) allows one to obtain an {(\eta,\varepsilon)}-inverse theorem for some {\varepsilon \gg_{k,\eta} 1} and {{\mathcal F}} the collection of non-classical degree {k-1} polynomial phases from {{\bf F}_p^n} to {S^1}, which one can normalize to equal {1} at the origin, and then by the classification of such polynomials one can calculate that the {(\eta,\varepsilon)} entropy is of quasipolynomial size {\exp( O_{p,k}(n^{k-1}) ) = \exp( O_{p,k}( \log^{k-1} N ) )} in {N}. By using the recent work of Gowers and Milicevic, one can make the dependence on {p,k} here more precise, but we will not perform these calcualtions here.
  • For the {U^3(G)} norm on an arbitrary finite abelian group, the recent inverse theorem of Jamneshan and myself gives (after some calculations) a bound of the polynomial form {O( q^{O(n^2)} N^{\exp(\eta^{-O(1)})})} on the {(\eta,\varepsilon)}-entropy for some {\varepsilon \gg \eta^{O(1)}}, which one can improve slightly to {O( q^{O(n^2)} N^{\eta^{-O(1)}})} if one degrades {\varepsilon} to {1/\exp(\eta^{-O(1)})}, where {q} is the maximal order of an element of {G}, and {n} is the rank (the number of elements needed to generate {G}). This bound is polynomial in {N} in the cyclic group case and quasipolynomial in general.

For general finite abelian groups {G}, we do not yet have an inverse theorem of comparable power to the ones mentioned above that give polynomial or quasipolynomial upper bounds on the entropy. However, there is a cheap argument that at least gives some subexponential bounds:

Proposition 3 (Cheap subexponential bound) Let {k \geq 2} and {0 < \eta < 1/2}, and suppose that {G} is a finite abelian group of order {N \geq \eta^{-C_k}} for some sufficiently large {C_k}. Then the {(\eta,c_k \eta^{O_k(1)})}-complexity of {\| \|_{U^k(G)}} is at most {O( \exp( \eta^{-O_k(1)} N^{1 - \frac{k+1}{2^k-1}} ))}.

Proof: (Sketch) We use a standard random sampling argument, of the type used for instance by Croot-Sisask or Briet-Gopi (thanks to Ben Green for this latter reference). We can assume that {N \geq \eta^{-C_k}} for some sufficiently large {C_k>0}, since otherwise the claim follows from Proposition 2.

Let {A} be a random subset of {{\bf Z}/N{\bf Z}} with the events {n \in A} being iid with probability {0 < p < 1} to be chosen later, conditioned to the event {|A| \leq 2pN}. Let {f} be a {1}-bounded function. By a standard second moment calculation, we see that with probability at least {1/2}, we have

\displaystyle  \|f\|_{U^k(G)}^{2^k} = {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|} \frac{1}{p} 1_A f(n + \omega \cdot h)

\displaystyle + O((\frac{1}{N^{k+1} p^{2^k-1}})^{1/2}).

Thus, by the triangle inequality, if we choose {p := C \eta^{-2^{k+1}/(2^k-1)} / N^{\frac{k+1}{2^k-1}}} for some sufficiently large {C = C_k > 0}, then for any {1}-bounded {f} with {\|f\|_{U^k(G)} \geq \eta/2}, one has with probability at least {1/2} that

\displaystyle  |{\bf E}_{n, h_1,\dots,h_k \i2^n G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|} \frac{1}{p} 1_A f(n + \omega \cdot h)|

\displaystyle \geq \eta^{2^k}/2^{2^k+1}.

We can write the left-hand side as {|\langle f, F \rangle|} where {F} is the randomly sampled dual function

\displaystyle  F(n) := {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|+1} \frac{1}{p} 1_A f(n + \omega \cdot h).

Unfortunately, {F} is not {1}-bounded in general, but we have

\displaystyle  \|F\|_{L^2(G)}^2 \leq {\bf E}_{n, h_1,\dots,h_k ,h'_1,\dots,h'_k \in G}

\displaystyle  \prod_{\omega \in \{0,1\}^k \backslash \{0\}} \frac{1}{p} 1_A(n + \omega \cdot h) \frac{1}{p} 1_A(n + \omega \cdot h')

and the right-hand side can be shown to be {1+o(1)} on the average, so we can condition on the event that the right-hand side is {O(1)} without significant loss in falure probability.

If we then let {\tilde f_A} be {1_A f} rounded to the nearest Gaussian integer multiple of {\eta^{2^k}/2^{2^{10k}}} in the unit disk, one has from the triangle inequality that

\displaystyle  |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{2^k+2}

where {\tilde F} is the discretised randomly sampled dual function

\displaystyle  \tilde F(n) := {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|+1} \frac{1}{p} \tilde f_A(n + \omega \cdot h).

For any given {A}, there are at most {2np} places {n} where {\tilde f_A(n)} can be non-zero, and in those places there are {O_k( \eta^{-2^{k}})} possible values for {\tilde f_A(n)}. Thus, if we let {{\mathcal F}_A} be the collection of all possible {\tilde f_A} associated to a given {A}, the cardinality of this set is {O( \exp( \eta^{-O_k(1)} N^{1 - \frac{k+1}{2^k-1}} ) )}, and for any {f} with {\|f\|_{U^k(G)} \geq \eta/2}, we have

\displaystyle  \sup_{\tilde F \in {\mathcal F}_A} |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

with probability at least {1/2}.

Now we remove the failure probability by independent resampling. By rounding to the nearest Gaussian integer multiple of {c_k \eta^{2^k}} in the unit disk for a sufficiently small {c_k>0}, one can find a family {{\mathcal G}} of cardinality {O( \eta^{-O_k(N)})} consisting of {1}-bounded functions {\tilde f} of {U^k(G)} norm at least {\eta/2} such that for every {1}-bounded {f} with {\|f\|_{U^k(G)} \geq \eta} there exists {\tilde f \in {\mathcal G}} such that

\displaystyle  \|f-\tilde f\|_{L^\infty(G)} \leq \eta^{2^k}/2^{k+3}.

Now, let {A_1,\dots,A_M} be independent samples of {A} for some {M} to be chosen later. By the preceding discussion, we see that with probability at least {1 - 2^{-M}}, we have

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle \tilde f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

for any given {\tilde f \in {\mathcal G}}, so by the union bound, if we choose {M = \lfloor C N \log \frac{1}{\eta} \rfloor} for a large enough {C = C_k}, we can find {A_1,\dots,A_M} such that

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle \tilde f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

for all {\tilde f \in {\mathcal G}}, and hence y the triangle inequality

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+3}.

Taking {{\mathcal F}} to be the union of the {{\mathcal F}_{A_j}} (applying some truncation and rescaling to these {L^2}-bounded functions to make them {L^\infty}-bounded, and then {1}-bounded), we obtain the claim. \Box

One way to obtain lower bounds on the inverse theorem entropy is to produce a collection of almost orthogonal functions with large norm. More precisely:

Proposition 4 Let {\| \|} be a seminorm, let {0 < \varepsilon \leq \eta < 1}, and suppose that one has a collection {f_1,\dots,f_M} of {1}-bounded functions such that for all {i=1,\dots,M}, {\|f_i\| \geq \eta} one has {|\langle f_i, f_j \rangle| \leq \varepsilon^2/2} for all but at most {L} choices of {j \in \{1,\dots,M\}} for all distinct {i,j \in \{1,\dots,M\}}. Then the {(\eta, \varepsilon)}-entropy of {\| \|} is at least {\varepsilon^2 M / 2L}.

Proof: Suppose we have an {(\eta,\varepsilon)}-inverse theorem with some family {{\mathcal F}}. Then for each {i=1,\dots,M} there is {F_i \in {\mathcal F}} such that {|\langle f_i, F_i \rangle| \geq \varepsilon}. By the pigeonhole principle, there is thus {F \in {\mathcal F}} such that {|\langle f_i, F \rangle| \geq \varepsilon} for all {i} in a subset {I} of {\{1,\dots,M\}} of cardinality at least {M/|{\mathcal F}|}:

\displaystyle  |I| \geq M / |{\mathcal F}|.

We can sum this to obtain

\displaystyle  |\sum_{i \in I} c_i \langle f_i, F \rangle| \geq |I| \varepsilon

for some complex numbers {c_i} of unit magnitude. By Cauchy-Schwarz, this implies

\displaystyle  \| \sum_{i \in I} c_i f_i \|_{L^2(G)}^2 \geq |I|^2 \varepsilon^2

and hence by the triangle inequality

\displaystyle  \sum_{i,j \in I} |\langle f_i, f_j \rangle| \geq |I|^2 \varepsilon^2.

On the other hand, by hypothesis we can bound the left-hand side by {|I| (L + \varepsilon^2 |I|/2)}. Rearranging, we conclude that

\displaystyle  |I| \leq 2 L / \varepsilon^2

and hence

\displaystyle  |{\mathcal F}| \geq \varepsilon^2 M / 2L

giving the claim. \Box

Thus for instance:

  • For the {U^2(G)} norm, one can take {f_1,\dots,f_M} to be the family of linear exponential phases {n \mapsto e(\xi \cdot n)} with {M = N} and {L=1}, and obtain a linear lower bound of {\varepsilon^2 N/2} for the {(\eta,\varepsilon)}-entropy, thus matching the upper bound of {N} up to constants when {\varepsilon} is fixed.
  • For the {U^k({\bf Z}/N{\bf Z})} norm, a similar calculation using polynomial phases of degree {k-1}, combined with the Weyl sum estimates, gives a lower bound of {\gg_{k,\varepsilon} N^{k-1}} for the {(\eta,\varepsilon)}-entropy for any fixed {\eta,\varepsilon}; by considering nilsequences as well, together with nilsequence equidistribution theory, one can replace the exponent {k-1} here by some quantity that goes to infinity as {\eta \rightarrow 0}, though I have not attempted to calculate the exact rate.
  • For the {U^k({\bf F}_p^n)} norm, another similar calculation using polynomial phases of degree {k-1} should give a lower bound of {\gg_{p,k,\eta,\varepsilon} \exp( c_{p,k,\eta,\varepsilon} n^{k-1} )} for the {(\eta,\varepsilon)}-entropy, though I have not fully performed the calculation.

We close with one final example. Suppose {G} is a product {G = A \times B} of two sets {A,B} of cardinality {\asymp \sqrt{N}}, and we consider the Gowers box norm

\displaystyle  \|f\|_{\Box^2(G)}^4 := {\bf E}_{a,a' \in A; b,b' \in B} f(a,b) \overline{f}(a,b') \overline{f}(a',b) f(a,b).

One possible choice of class {{\mathcal F}} here are the indicators {1_{U \times V}} of “rectangles” {U \times V} with {U \subset A}, {V \subset B} (cf. this previous blog post on cut norms). By standard calculations, one can use this class to show that the {(\eta, \eta^4/10)}-entropy of {\| \|_{\Box^2(G)}} is {O( \exp( O(\sqrt{N}) )}, and a variant of the proof of the second part of Proposition 2 shows that this is the correct order of growth in {N}. In contrast, a modification of Proposition 3 only gives an upper bound of the form {O( \exp( O( N^{2/3} ) ) )} (the bottleneck is ensuring that the randomly sampled dual functions stay bounded in {L^2}), which shows that while this cheap bound is not optimal, it can still broadly give the correct “type” of bound (specifically, intermediate growth between polynomial and exponential).

In orthodox first-order logic, variables and expressions are only allowed to take one value at a time; a variable {x}, for instance, is not allowed to equal {+3} and {-3} simultaneously. We will call such variables completely specified. If one really wants to deal with multiple values of objects simultaneously, one is encouraged to use the language of set theory and/or logical quantifiers to do so.

However, the ability to allow expressions to become only partially specified is undeniably convenient, and also rather intuitive. A classic example here is that of the quadratic formula:

\displaystyle  \hbox{If } x,a,b,c \in {\bf R} \hbox{ with } a \neq 0, \hbox{ then }

\displaystyle  ax^2+bx+c=0 \hbox{ if and only if } x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}. \ \ \ \ \ (1)

Strictly speaking, the expression {x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}} is not well-formed according to the grammar of first-order logic; one should instead use something like

\displaystyle x = \frac{-b - \sqrt{b^2-4ac}}{2a} \hbox{ or } x = \frac{-b + \sqrt{b^2-4ac}}{2a}

or

\displaystyle x \in \left\{ \frac{-b - \sqrt{b^2-4ac}}{2a}, \frac{-b + \sqrt{b^2-4ac}}{2a} \right\}

or

\displaystyle x = \frac{-b + \epsilon \sqrt{b^2-4ac}}{2a} \hbox{ for some } \epsilon \in \{-1,+1\}

in order to strictly adhere to this grammar. But none of these three reformulations are as compact or as conceptually clear as the original one. In a similar spirit, a mathematical English sentence such as

\displaystyle  \hbox{The sum of two odd numbers is an even number} \ \ \ \ \ (2)

is also not a first-order sentence; one would instead have to write something like

\displaystyle  \hbox{For all odd numbers } x, y, \hbox{ the number } x+y \hbox{ is even} \ \ \ \ \ (3)

or

\displaystyle  \hbox{For all odd numbers } x,y \hbox{ there exists an even number } z \ \ \ \ \ (4)

\displaystyle  \hbox{ such that } x+y=z

instead. These reformulations are not all that hard to decipher, but they do have the aesthetically displeasing effect of cluttering an argument with temporary variables such as {x,y,z} which are used once and then discarded.

Another example of partially specified notation is the innocuous {\ldots} notation. For instance, the assertion

\displaystyle \pi=3.14\ldots,

when written formally using first-order logic, would become something like

\displaystyle \pi = 3 + \frac{1}{10} + \frac{4}{10^2} + \sum_{n=3}^\infty \frac{a_n}{10^n} \hbox{ for some sequence } (a_n)_{n=3}^\infty

\displaystyle  \hbox{ with } a_n \in \{0,1,2,3,4,5,6,7,8,9\} \hbox{ for all } n,

which is not exactly an elegant reformulation. Similarly with statements such as

\displaystyle \tan x = x + \frac{x^3}{3} + \ldots \hbox{ for } |x| < \pi/2

or

\displaystyle \tan x = x + \frac{x^3}{3} + O(|x|^5) \hbox{ for } |x| < \pi/2.

Below the fold I’ll try to assign a formal meaning to partially specified expressions such as (1), for instance allowing one to condense (2), (3), (4) to just

\displaystyle  \hbox{odd} + \hbox{odd} = \hbox{even}.

When combined with another common (but often implicit) extension of first-order logic, namely the ability to reason using ambient parameters, we become able to formally introduce asymptotic notation such as the big-O notation {O()} or the little-o notation {o()}. We will explain how to do this at the end of this post.

Read the rest of this entry »

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.

Just a brief announcement that the AMS is now accepting (until June 30) nominations for the 2023 Joseph L. Doob Prize, which recognizes a single, relatively recent, outstanding research book that makes a seminal contribution to the research literature, reflects the highest standards of research exposition, and promises to have a deep and long-term impact in its area. The book must have been published within the six calendar years preceding the year in which it is nominated. Books may be nominated by members of the Society, by members of the selection committee, by members of AMS editorial committees, or by publishers.  (I am currently on the committee for this prize.)  A list of previous winners may be found here.  The nomination procedure may be found at the bottom of this page.

Just a brief update to the previous post. Gerhard Paseman and I have now set up a web site for the Short Communication Satellite (SCS) for the virtual International Congress of Mathematicians (ICM), which will be an experimental independent online satellite event in which short communications on topics relevant to one or two of the sections of the ICM can be submitted, reviewed by peers, and (if appropriate for the SCS event) displayed in a virtual “poster room” during the Congress on July 6-14 (which, by the way, has recently released its schedule and list of speakers). Our plan is to open the registration for this event on April 5, and start taking submissions on April 20; we are also currently accepting any expressions of interest in helping out with the event, for instance by serving as a reviewer. For more information about the event, please see the overview page, the guidelines page, and the FAQ page of the web site. As viewers will see, the web site is still somewhat under construction, but will be updated as we move closer to the actual Congress.

The comments section of this post would be a suitable place to ask further questions about this event, or give any additional feedback.

UPDATE: for readers who have difficulty accessing the links above, here are backup copies of the overview page and guidelines page.

[As with previous posts regarding ICM satellite events, I am authoring this post as an individual, and not in my capacity as chair of the ICM Structure Committee, which does not have any advisory or supervisory role over ICM satellite events – T.]

One of the traditional features of the International Congress of Mathematicians are the “short communications”, organized by the local organizing committee (as opposed to the International Mathematical Union), which allows participants at the congress to present either a poster or a short talk (typically 15 minutes or so) during the congress. For instance, here are the titles of the short communications and posters from the 2018 ICM, and here are the short communications and posters from the 2014 ICM. While not as high profile as other events of the ICM such as the plenary lectures, sectional lectures, or prize lectures, the short communications and posters can offer a chance for academics from a quite diverse range of institutions worldwide (and even a few independent mathematicians) be able to present their work to a mathematical audience.

There has been some volunteer effort to try to replicate some form of this event for the upcoming virtual ICM this July as a semi-official “satellite” event of the virtual ICM; it would technically not be part of the core ICM program, but I expect it would be recognized by the IMU as an independently organized satellite. Due to lack of time, funding, and technical expertise, we will not be able to offer any video, audio, or physical hosting for such an event, but we believe that a modest virtual event is possible involving submission of either a PDF “poster” or a PDF “slide deck”, together with other metadata such as author, title, abstract, and external links (e.g., to an externally hosted video presentation of the poster or slides), with some reviewing to ensure a certain minimum level of quality of approved submissions (we are thinking about setting guidelines similar to those required for a submission to the arXiv), and some ability to offer feedback on each submission. (For instance, we are thinking of hosting the event on a MediaWiki, with each communication being given a separate page which can attract discussion and responses to queries from the author(s).) We are also thinking of grouping the poster or slides according to the 20 sections of the 2022 ICM. We would then promote these communications during the virtual ICM, for instance on this blog or on the unofficial ICM Discord. Perhaps some of the other proposed online experiments for virtual events discussed in this previous post could also be implemented experimentally on this satellite event to demonstrate proof-of-concept. (If the event turns out to be successful, one could hope that it could serve as a pilot project for a longer-term and better funded platform for virtual short communications to accompany other conferences, but for now we would like to focus just on the virtual ICM satellite event.)

As one of our first actions, we would like to survey the level of interest in such an event, both among potential submitters of posters or slides, and also potential volunteers to help organize the event (in particular we may need some assistance in manually reviewing submissions, though we do plan to enlist peer reviewers by requiring submitters to rate and comment on other submissions in the same section). We have therefore created a form to (very unscientifically) gauge this level in order to decide on the scale of this project (or whether to attempt it at all). All readers of this blog are welcome to offer feedback through that form, or as a comment to this blog.

EDIT (Mar 29): a formal announcement will be made soon, but you can view a draft of the announcement here.

Archives