You are currently browsing the category archive for the ‘math.GR’ category.

In a recent post I discussed how the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of {P_t} should all lie on the unit circle, and one should then be able to write {P_t} as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix {U = U_t \in U(N)}, which we normalise as

\displaystyle  P_t(Z) = \exp(A_t) \mathrm{det}(1 - ZU). \ \ \ \ \ (2)

Here {A_t} is some quantity depending on {t}. We view {U} as a random element of {U(N)}; in the limit {T \rightarrow \infty}, the GUE hypothesis is equivalent to {U} becoming equidistributed with respect to Haar measure on {U(N)} (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view {U} as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

\displaystyle  -\frac{P'_t(Z)}{P_t(Z)} = \mathrm{tr}( U (1-ZU)^{-1} ) = \sum_{j=1}^\infty Z^{j-1} \mathrm{tr} U^j \ \ \ \ \ (3)

and hence on taking logarithmic derivatives of (1) in the {z} variable we (heuristically) have

\displaystyle  -\frac{2\pi i}{\log T} \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx \frac{2\pi i}{N} \sum_{j=1}^\infty e^{2\pi i jz/N} \mathrm{tr} U^j.

Morally speaking, we have

\displaystyle  - \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^{1/2+it}} e^{2\pi i z (\log n/\log T)}

so on comparing coefficients we expect to interpret the moments {\mathrm{tr} U^j} of {U} as a finite Dirichlet series:

\displaystyle  \mathrm{tr} U^j \approx \frac{N}{\log T} \sum_{T^{(j-1)/N} < n \leq T^{j/N}} \frac{\Lambda(n)}{n^{1/2+it}}. \ \ \ \ \ (4)

To understand the distribution of {U} in the unitary group {U(N)}, it suffices to understand the distribution of the moments

\displaystyle  {\bf E}_t \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (5)

where {{\bf E}_t} denotes averaging over {t \in [T,2T]}, and {k, a_1,\dots,a_k, b_1,\dots,b_k \geq 0}. The GUE hypothesis asserts that in the limit {T \rightarrow \infty}, these moments converge to their CUE counterparts

\displaystyle  {\bf E}_{\mathrm{CUE}} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (6)

where {U} is now drawn uniformly in {U(n)} with respect to the CUE ensemble, and {{\bf E}_{\mathrm{CUE}}} denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j. \ \ \ \ \ (7)

This follows from the fact that for any phase {\theta \in {\bf R}}, {e(\theta) U} has the same distribution as {U}, where we use the number theory notation {e(\theta) := e^{2\pi i\theta}}.

In the case when the degree {\sum_{j=1}^k j a_j} is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model) If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment (6) vanishes unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j!. \ \ \ \ \ (9)

Another way of viewing this proposition is that for {U} distributed according to CUE, the random variables {\mathrm{tr} U^j} are distributed like independent complex random variables of mean zero and variance {j}, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of {a_j}, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of {j}‘s and lets {N} go to infinity. (The paper of Diaconis and Shahshahani writes {\sum_{j=1}^k a_j + b_j} in place of {\sum_{j=1}^k j a_j}, but I believe this to be a typo.)

Proof: Let {D} be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

\displaystyle  D = \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j.

Our starting point is Schur-Weyl duality. Namely, we consider the {n^D}-dimensional complex vector space

\displaystyle  ({\bf C}^n)^{\otimes D} = {\bf C}^n \otimes \dots \otimes {\bf C}^n.

This space has an action of the product group {S_D \times GL_n({\bf C})}: the symmetric group {S_D} acts by permutation on the {D} tensor factors, while the general linear group {GL_n({\bf C})} acts diagonally on the {{\bf C}^n} factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

\displaystyle  ({\bf C}^n)^{\otimes D} \equiv \bigoplus_\lambda V^\lambda_{S_D} \otimes V^\lambda_{GL_n({\bf C})} \ \ \ \ \ (10)

where {\lambda} ranges over Young tableaux of size {D} with at most {n} rows, {V^\lambda_{S_D}} is the {S_D}-irreducible unitary representation corresponding to {\lambda} (which can be constructed for instance using Specht modules), and {V^\lambda_{GL_n({\bf C})}} is the {GL_n({\bf C})}-irreducible polynomial representation corresponding with highest weight {\lambda}.

Let {\pi \in S_D} be a permutation consisting of {a_j} cycles of length {j} (this is uniquely determined up to conjugation), and let {g \in GL_n({\bf C})}. The pair {(\pi,g)} then acts on {({\bf C}^n)^{\otimes D}}, with the action on basis elements {e_{i_1} \otimes \dots \otimes e_{i_D}} given by

\displaystyle  g e_{\pi(i_1)} \otimes \dots \otimes g_{\pi(i_D)}.

The trace of this action can then be computed as

\displaystyle  \sum_{i_1,\dots,i_D \in \{1,\dots,n\}} g_{\pi(i_1),i_1} \dots g_{\pi(i_D),i_D}

where {g_{i,j}} is the {ij} matrix coefficient of {g}. Breaking up into cycles and summing, this is just

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j}.

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

\displaystyle  \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(g) \ \ \ \ \ (11)

where {\chi_\lambda: S_D \rightarrow {\bf C}} is the character on {S_D} associated to {V^\lambda_{S_D}}, and {s_\lambda: GL_n({\bf C}) \rightarrow {\bf C}} is the character on {GL_n({\bf C})} associated to {V^\lambda_{GL_n({\bf C})}}. As is well known, {s_\lambda(g)} is just the Schur polynomial of weight {\lambda} applied to the (algebraic, generalised) eigenvalues of {g}. We can specialise to unitary matrices to conclude that

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(U)

and similarly

\displaystyle  \prod_{j=1}^k \mathrm{tr}(U^j)^{b_j} = \sum_\lambda \chi_\lambda(\pi') s_\lambda(U)

where {\pi' \in S_D} consists of {b_j} cycles of length {j} for each {j=1,\dots,k}. On the other hand, the characters {s_\lambda} are an orthonormal system on {L^2(U(N))} with the CUE measure. Thus we can write the expectation (6) as

\displaystyle  \sum_\lambda \chi_\lambda(\pi) \overline{\chi_\lambda(\pi')}. \ \ \ \ \ (12)

Now recall that {\lambda} ranges over all the Young tableaux of size {D} with at most {N} rows. But by (8) we have {D \leq N}, and so the condition of having {N} rows is redundant. Hence {\lambda} now ranges over all Young tableaux of size {D}, which as is well known enumerates all the irreducible representations of {S_D}. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if {\pi}, {\pi'} are not conjugate, and is equal to {D!} divided by the size of the conjugacy class of {\pi} (or equivalently, by the size of the centraliser of {\pi}) otherwise. But the latter expression is easily computed to be {\prod_{j=1}^k j^{a_j} a_j!}, giving the claim. \Box

Example 2 We illustrate the identity (11) when {D=3}, {n \geq 3}. The Schur polynomials are given as

\displaystyle  s_{3}(g) = \sum_i \lambda_i^3 + \sum_{i<j} \lambda_i^2 \lambda_j + \lambda_i \lambda_j^2 + \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{2,1}(g) = \sum_{i < j} \lambda_i^2 \lambda_j + \sum_{i < j,k} \lambda_i \lambda_j \lambda_k

\displaystyle  s_{1,1,1}(g) = \sum_{i<j<k} \lambda_i \lambda_j \lambda_k

where {\lambda_1,\dots,\lambda_n} are the (generalised) eigenvalues of {g}, and the formula (11) in this case becomes

\displaystyle  \mathrm{tr}(g^3) = s_{3}(g) - s_{2,1}(g) + s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g^2) \mathrm{tr}(g) = s_{3}(g) - s_{1,1,1}(g)

\displaystyle  \mathrm{tr}(g)^3 = s_{3}(g) + 2 s_{2,1}(g) + s_{1,1,1}(g).

The functions {s_{1,1,1}, s_{2,1}, s_3} are orthonormal on {U(n)}, so the three functions {\mathrm{tr}(g^3), \mathrm{tr}(g^2) \mathrm{tr}(g), \mathrm{tr}(g)^3} are also, and their {L^2} norms are {\sqrt{3}}, {\sqrt{2}}, and {\sqrt{6}} respectively, reflecting the size in {S_3} of the centralisers of the permutations {(123)}, {(12)}, and {\mathrm{id}} respectively. If {n} is instead set to say {2}, then the {s_{1,1,1}} terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3 Consider the moment {{\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2}. For {j \leq N}, the above proposition shows us that this moment is equal to {D}. What happens for {j>N}? The formula (12) computes this moment as

\displaystyle  \sum_\lambda |\chi_\lambda(\pi)|^2

where {\pi} is a cycle of length {j} in {S_j}, and {\lambda} ranges over all Young tableaux with size {j} and at most {N} rows. The Murnaghan-Nakayama rule tells us that {\chi_\lambda(\pi)} vanishes unless {\lambda} is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space {{\bf C}^j_{\sum=0}} of vectors in {{\bf C}^j} whose coordinates sum to zero), in which case it is equal to {\pm 1} (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to {N}. Thus in general we have

\displaystyle  {\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2 = \min(j,N). \ \ \ \ \ (13)

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges {T^{(j-1)/N} < n \leq T^{j/N}} and {T^{(k-1)/N} < n \leq T^{k/N}} is not quite the range {T^{(j+k-1)/N} < n \leq T^{j+k/N}} but instead leaks into the adjacent range {T^{(j+k-2)/N} < n \leq T^{j+k-1/N}}. This issue can be addressed by working in a “weak" sense in which parameters such as {j,k} are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of {t} to the range {[T,2T]} (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J,m_1,\dots,m_K} \frac{\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}{n_1^{1/2} \dots n_J^{1/2} m_1^{1/2} \dots m_K^{1/2}} \times \ \ \ \ \ (14)

\displaystyle  \times {\bf E}_t (m_1 \dots m_K / n_1 \dots n_J)^{it}

where {J := \sum_{j=1}^k a_j}, {K := \sum_{j=1}^k b_j}, and the integers {n_i,m_i} are in the ranges

\displaystyle  T^{(j-1)/N} < n_{a_1 + \dots + a_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq a_j}, and

\displaystyle  T^{(j-1)/N} < m_{b_1 + \dots + b_{j-1} + i} \leq T^{j/N}

for {j=1,\dots,k} and {1 \leq i \leq b_j}. Morally, the expectation here is negligible unless

\displaystyle  m_1 \dots m_K = (1 + O(1/T)) n_1 \dots n_J \ \ \ \ \ (15)

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then {n_1 \dots n_J} is significantly less than {T}, so the {O(1/T)} multiplicative error in (15) becomes an additive error of {o(1)}. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least {1} – this forces the integers {m_1 \dots m_K}, {n_1 \dots n_J} to in fact be equal:

\displaystyle  m_1 \dots m_K = n_1 \dots n_J. \ \ \ \ \ (16)

The von Mangoldt factors {\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)} effectively restrict {n_1,\dots,n_J,m_1,\dots,m_K} to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces {J=K}, and {n_1,\dots,n_J} to be a permutation of {m_1,\dots,m_K}, which then forces {a_j = b_j} for all {j=1,\dots,k}._ For a given {n_1,\dots,n_J}, the number of possible {m_1 \dots m_K} is then {\prod_{j=1}^k a_j!}, and the expectation in (14) is equal to {1}. Thus this expectation is morally

\displaystyle  (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J} \frac{\Lambda^2(n_1) \dots \Lambda^2(n_J) }{n_1 \dots n_J} \prod_{j=1}^k a_j!

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree {1} case of pair correlations due to Montgomery, and the degree {2} case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for {{\mathbf E}_t}, thus

\displaystyle  {\bf E}_{t} |\mathrm{tr} U^j|^2 \approx \min(j,N) \ \ \ \ \ (17)

for all {j \geq 0}. Montgomery showed this for (essentially) the range {j \leq N} (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of {U}. One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose {Q_t(Z)} is some random polynomial depending on {t} of degree at most {N}. Let {\lambda_1,\dots,\lambda_n} denote the eigenvalues of {U}, and let {c > 0} be a parameter. Observe from the pigeonhole principle that if the quantity

\displaystyle  \sum_{j=1}^n \int_0^{c/N} |Q_t( e(\theta) \lambda_j )|^2\ d\theta \ \ \ \ \ (18)

exceeds the quantity

\displaystyle  \int_{0}^{2\pi} |Q_t(e(\theta))|^2\ d\theta, \ \ \ \ \ (19)

then the arcs {\{ e(\theta) \lambda_j: 0 \leq \theta \leq c \}} cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than {c/N} ({c} times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than {c} times the mean angle separation. By judiciously choosing the coefficients of {Q_t} as functions of the moments {\mathrm{tr}(U^j)}, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

\displaystyle  \frac{1}{2\pi i} \int_0^{c/N} (\int_{|z| = 1+\varepsilon} - \int_{|z|=1-\varepsilon}) Q_t( e(\theta) z ) \overline{Q_t}( \frac{1}{e(\theta) z} ) \frac{P'_t(z)}{P_t(z)}\ dz

for sufficiently small {\varepsilon>0}, and this can be computed (in principle, at least) using (3) if the coefficients of {Q_t} are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than {\mu} times the mean spacing and greater than {\lambda} times the mean spacing infinitely often for certain {0 < \mu < 1 < \lambda}; the current records are {\mu = 0.50412} (due to Goldston and Turnage-Butterbaugh) and {\lambda = 3.18} (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound {\mu} for the smallest gap below {1/2}. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases {\theta} of the eigenvalues {e^{2\pi i\theta}} of {U} are nasymptotically always non-zero integer multiples of {1/2N}. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of {T}); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which {L(1,\chi)} is very close to zero, then {1*\chi} behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined {L}-function {\zeta(s) L(s,\chi)} will have a polynomial approximation which in our language looks like a scalar multiple of {1 + e(\theta) Z^{2N+M}}, where {q \approx T^{M/N}} and {\theta} is a phase. The zeroes of this approximation lie on a coset of the {(2N+M)^{th}} roots of unity; the polynomial {P} is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of {1/(2N+M)}. Taking {M = o(N)} then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation {{\bf E}_{ACUE}} for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of {1/2N}. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group {U(n)} in terms of the distribution of the phases {\theta_1,\dots,\theta_N \in {\bf R}/{\bf Z}} of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

\displaystyle  \frac{1}{N!} |V(\theta)|^2\ d\theta_1 \dots d\theta_N; \ \ \ \ \ (20)

where

\displaystyle  V(\theta) := \prod_{1 \leq i<j \leq N} (e(\theta_i)-e(\theta_j))

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

\displaystyle  V(\theta) = \sum_{\pi \in S_N} \mathrm{sgn}(\pi) e(\theta \cdot \pi(\rho))

where {\theta := (\theta_1,\dots,\theta_N)}, {\cdot} denotes the dot product, and {\rho := (1,2,\dots,N)} is the “long word”, which implies that (20) is a trigonometric series with constant term {1}; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing {(\theta_1,\dots,\theta_n) \in ({\bf R}/{\bf Z})^N} using the probability measure (20), and then generating {U} to be a random unitary matrix with eigenvalues {e(\theta_1),\dots,e(\theta_N)}.

For the alternative distribution, we first draw {(\theta_1,\dots,\theta_N)} on the discrete torus {(\frac{1}{2N}{\bf Z}/{\bf Z})^N} (thus each {\theta_j} is a {2N^{th}} root of unity) with probability density function

\displaystyle  \frac{1}{(2N)^N} \frac{1}{N!} |V(\theta)|^2 \ \ \ \ \ (21)

shift by a phase {\alpha \in {\bf R}/{\bf Z}} drawn uniformly at random, and then select {U} to be a random unitary matrix with eigenvalues {e^{i(\theta_1+\alpha)}, \dots, e^{i(\theta_N+\alpha)}}. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. The diagonal contribution {\pi=\pi'} gives the constant function {\frac{1}{(2N)^N}}, which has total mass one. All of the other exponentials have a frequency {\pi(\rho)-\pi'(\rho)} that is not a multiple of {2N}, and hence will have mean zero on {(\frac{1}{2N}{\bf Z}/{\bf Z})^N}. The claim follows.

From construction it is clear that the matrix {U} drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of {1/2N}. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials {s_\lambda} with {\lambda} of size at most {N} and of equal size remain orthonormal with respect to the alternative measure. That is to say,

\displaystyle  \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{CUE}}(U) = \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{ACUE}}(U)

when {\lambda,\lambda'} have size equal to each other and at most {N}. In this case the phase {\alpha} in the definition of {U} is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

\displaystyle  \int_{({\bf R}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2\ d\theta = \frac{1}{(2N)^N} \sum_{\theta \in (\frac{1}{2N}{\bf Z}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2.

By Fourier decomposition, it then suffices to show that the trigonometric polynomial {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} does not contain any components of the form {e( \theta \cdot 2N k)} for some non-zero lattice vector {k \in {\bf Z}^N}. But we have already observed that {|V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\pi(\rho)-\pi'(\rho))} for {\pi,\pi' \in S_N}. Also, as is well known, {s_\lambda(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa )} where {\kappa} is majorised by {\lambda}, and similarly {s_{\lambda'}(\theta)} is a linear combination of plane waves {e( \theta \cdot \kappa' )} where {\kappa'} is majorised by {\lambda'}. So the product {s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2} is a linear combination of plane waves of the form {e(\theta \cdot (\kappa - \kappa' + \pi(\rho) - \pi'(\rho)))}. But every coefficient of the vector {\kappa - \kappa' + \pi(\rho) - \pi'(\rho)} lies between {1-2N} and {2N-1}, and so cannot be of the form {2Nk} for any non-zero lattice vector {k}, giving the claim.

Example 4 If {N=2}, then the distribution (21) assigns a probability of {\frac{1}{4^2 2!} 2} to any pair {(\theta_1,\theta_2) \in (\frac{1}{4} {\bf Z}/{\bf Z})^2} that is a permuted rotation of {(0,\frac{1}{4})}, and a probability of {\frac{1}{4^2 2!} 4} to any pair that is a permuted rotation of {(0,\frac{1}{2})}. Thus, a matrix {U} drawn from the alternative distribution will be conjugate to a phase rotation of {\mathrm{diag}(1, i)} with probability {1/2}, and to {\mathrm{diag}(1,-1)} with probability {1/2}.

A similar computation when {N=3} gives {U} conjugate to a phase rotation of {\mathrm{diag}(1, e(1/6), e(1/3))} with probability {1/12}, to a phase rotation of {\mathrm{diag}( 1, e(1/6), -1)} or its adjoint with probability of {1/3} each, and a phase rotation of {\mathrm{diag}(1, e(1/3), e(2/3))} with probability {1/4}.

Remark 5 For large {N} it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of {1/2N}; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then {|\mathrm{tr}(U^j)|} is periodic in {j} with period {2N}, so from Proposition 1 for the alternative distribution one has

\displaystyle  {\bf E}_{\mathrm{ACUE}} |\mathrm{tr} U^j|^2 = \min_{k \in {\bf Z}} |j-2Nk|

which differs from (13) for any {|j| > N}. (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single {j} with {|j| > N} would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6 One can view the CUE as normalised Lebesgue measure on {U(N)} (viewed as a smooth submanifold of {{\bf C}^{N^2}}). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of {U(N)} consisting of those unitary matrices whose phase spacings are non-zero integer multiples of {1/2N}; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel {K(\theta,\theta') = \frac{1}{N} \sum_{j=0}^{N-1} e(j(\theta - \theta'))} (or one can equivalently take {K(\theta,\theta') = \frac{\sin(\pi N (\theta-\theta'))}{N\sin(\pi(\theta-\theta'))}}; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be {2N^{th}} roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of {\frac{1}{2}}). In particular, the {k}-point correlation functions of ACUE (after this rotation) are precisely the restriction of the {k}-point correlation functions of CUE after normalisation, that is to say they are proportional to {\mathrm{det}( K( \theta_i,\theta_j) )_{1 \leq i,j \leq k}}.

Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for

\displaystyle  \int_T^{2T} |\zeta(\frac{1}{2}+it)|^{2k} |Q(\frac{1}{2}+it)|^2\ dt

for some small even exponent {2k} (almost always {2} or {4}) and some short Dirichlet polynomial {Q}; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

\displaystyle  {\bf E}_t |P_t(1)|^{2k} |Q_t(1)|^2

where {Q_t} is now some random medium degree polynomial that depends on the unitary matrix {U} associated to {P_t} (and in applications will typically also contain some negative power of {\exp(A_t)} to cancel the corresponding powers of {\exp(A_t)} in {|P_t(1)|^{2k}}). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of {Q} involve products of traces {\mathrm{tr}(U^k)} of total order less than {N-k}, then in terms of the eigenvalue phases {\theta}, {|Q(1)|^2} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude less than {N-k}. On the other hand, as each coefficient of {P_t} is an elementary symmetric function of the eigenvalues, {P_t(1)} is a linear combination of plane waves {e(\theta \cdot \xi)} where the frequencies {\xi} have coefficients of magnitude at most {1}. Thus {|P_t(1)|^{2k} |Q_t(1)|^2} is a linear combination of plane waves where the frequencies {\xi} have coefficients of magnitude less than {N}, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus {({\bf R}/{\bf Z})^n} by the previous arguments. In other words, {|P_t(1)|^{2k} |Q_t(1)|^2} has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier {Q_t} has degree close to or exceeding {N}, which corresponds to Dirichlet polynomials {Q} of length close to or exceeding {T}, which is far beyond current technology for such moment estimates.

Remark 8 The GUE hypothesis for the zeta function asserts that the average

\displaystyle  \lim_{T \rightarrow \infty} \frac{1}{T} \int_T^{2T} \sum_{\gamma_1,\dots,\gamma_n \hbox{ distinct}} \eta( \frac{\log T}{2\pi}(\gamma_1-t),\dots, \frac{\log T}{2\pi}(\gamma_k-t))\ dt \ \ \ \ \ (22)

is equal to

\displaystyle  \int_{{\bf R}^n} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ dx_1 \dots dx_k \ \ \ \ \ (23)

for any {k \geq 1} and any test function {\eta: {\bf R}^k \rightarrow {\bf C}}, where {K(x) := \frac{\sin \pi x}{\pi x}} is the Dyson sine kernel and {\gamma_i} are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for {U}. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

\displaystyle  \int_0^1 2^{-k} \sum_{x_1,\dots,x_k \in \frac{1}{2} {\bf Z} + \theta} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ d\theta.

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

Let {G = (G,+)}, {H = (H,+)} be additive groups (i.e., groups with an abelian addition group law). A map {f: G \rightarrow H} is a homomorphism if one has

\displaystyle  f(x+y) - f(x) - f(y) = 0

for all {x,y \in G}. A map {f: G \rightarrow H} is an affine homomorphism if one has

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) = 0 \ \ \ \ \ (1)

for all additive quadruples {(x_1,x_2,x_3,x_4)} in {G}, by which we mean that {x_1,x_2,x_3,x_4 \in G} and {x_1-x_2+x_3-x_4=0}. The two notions are closely related; it is easy to verify that {f} is an affine homomorphism if and only if {f} is the sum of a homomorphism and a constant.

Now suppose that {H} also has a translation-invariant metric {d}. A map {f: G \rightarrow H} is said to be a quasimorphism if one has

\displaystyle  f(x+y) - f(x) - f(y) = O(1) \ \ \ \ \ (2)

for all {x,y \in G}, where {O(1)} denotes a quantity at a bounded distance from the origin. Similarly, {f: G \rightarrow H} is an affine quasimorphism if

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) = O(1) \ \ \ \ \ (3)

for all additive quadruples {(x_1,x_2,x_3,x_4)} in {G}. Again, one can check that {f} is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism {f: {\bf Z} \rightarrow {\bf R}}. Iterating (2), we see that {f(kx) = kf(x) + O(k)} for any integer {x} and natural number {k}, which we can rewrite as {f(kx)/kx = f(x)/x + O(1/|x|)} for non-zero {x}. Also, {f} is Lipschitz. Sending {k \rightarrow \infty}, we can verify that {f(x)/x} is a Cauchy sequence as {x \rightarrow \infty} and thus tends to some limit {\alpha}; we have {\alpha = f(x)/x + O(1/x)} for {x \geq 1}, hence {f(x) = \alpha x + O(1)} for positive {x}, and then one can use (2) one last time to obtain {f(x) = \alpha x + O(1)} for all {x}. Thus {f} is the sum of the homomorphism {x \mapsto \alpha x} and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map {f: G \rightarrow H} a {0}-cocycle. A {1}-cocycle is a map {\rho: G \times G \rightarrow H} obeying the identity

\displaystyle  \rho(x,y+z) + \rho(y,z) = \rho(x,y) + \rho(x+y,z)

for all {x,y,z \in G}. Given a {0}-cocycle {f: G \rightarrow H}, one can form its derivative {\partial f: G \times G \rightarrow H} by the formula

\displaystyle  \partial f(x,y) := f(x+y)-f(x)-f(y).

Such functions are called {1}-coboundaries. It is easy to see that the abelian group of {1}-coboundaries is a subgroup of the abelian group of {1}-cocycles. The quotient of these two groups is the first group cohomology of {G} with coefficients in {H}, and is denoted {H^1(G; H)}.

If a {0}-cocycle is bounded then its derivative is a bounded {1}-coboundary. The quotient of the group of bounded {1}-cocycles by the derivatives of bounded {0}-cocycles is called the bounded first group cohomology of {G} with coefficients in {H}, and is denoted {H^1_b(G; H)}. There is an obvious homomorphism {\phi} from {H^1_b(G; H)} to {H^1(G; H)}, formed by taking a coset of the space of derivatives of bounded {0}-cocycles, and enlarging it to a coset of the space of {1}-coboundaries. By chasing all the definitions, we see that all quasimorphism from {G} to {H} are the sum of a homomorphism and a bounded function if and only if this homomorphism {\phi} is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of {\phi}.

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “{1\%} of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of {1\%}-structure to {100\%}-structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1 Let {G = (G,+)}, {H = (H,+)} be additive groups with {|G|=N}, let {S} be a subset of {H}, let {E \subset G}, and let {f: E \rightarrow H} be a function such that

\displaystyle  f(x_1) - f(x_2) + f(x_3) - f(x_4) \in S

for {\geq K^{-1} N^3} additive quadruples {(x_1,x_2,x_3,x_4)} in {E}. Then there exists a subset {A} of {G} containing {0} with {|A| \gg K^{-O(1)} N}, a subset {X} of {H} with {|X| \ll K^{O(1)}}, and a function {g: 4A-4A \rightarrow H} such that

\displaystyle  g(x+y) - g(x)-g(y) \in X + 496S - 496S \ \ \ \ \ (4)

for all {x, y \in 2A-2A} (thus, the derivative {\partial g} takes values in {X + 496 S - 496 S} on {2A - 2A}), and such that for each {h \in A}, one has

\displaystyle  f(x+h) - f(x) - g(h) \in 8S - 8S \ \ \ \ \ (5)

for {\gg K^{-O(1)} N} values of {x \in E}.

Presumably the constants {8} and {496} can be improved further, but we have not attempted to optimise these constants. We chose {2A-2A} as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside {2A-2A}. In applications, the set {S} need not have bounded size, or even bounded doubling; for instance, in the inverse {U^4} theory over a small finite fields {F}, one would be interested in the situation where {H} is the group of {n \times n} matrices with coefficients in {F} (for some large {n}, and {S} being the subset consisting of those matrices of rank bounded by some bound {C = O(1)}.

Proof: By hypothesis, there are {\geq K N^3} triples {(h,x,y) \in G^3} such that {x,x+h,y,y+h \in E} and

\displaystyle  f(x+h) - f(x) \in f(y+h)-f(y) + S. \ \ \ \ \ (6)

Thus, there is a set {B \subset G} with {|B| \gg K^{-1} N} such that for all {h \in B}, one has (6) for {\gg K^{-1} N^2} pairs {(x,y) \in G^2} with {x,x+h,y,y+h \in E}; in particular, there exists {y = y(h) \in E \cap (E-h)} such that (6) holds for {\gg K^{-1} N} values of {x \in E \cap (E-h)}. Setting {g_0(h) := f(y(h)+h) - f(y(h))}, we conclude that for each {h \in B}, one has

\displaystyle  f(x+h) - f(x) \in g_0(h) + S \ \ \ \ \ (7)

for {\gg K^{-1} N} values of {x \in E \cap (E-h)}.

Consider the bipartite graph whose vertex sets are two copies of {E}, and {x} and {x+h} connected by a (directed) edge if {h \in B} and (7) holds. Then this graph has {\gg K^{-2} N^2} edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset {C} of {E} with {|C| \gg K^{-O(1)} N} with the property that for any {x_1,x_3 \in C}, there exist {\gg K^{-O(1)} N^3} triples {(x_2,y_1,y_2) \in E^3} such that the edges {(x_1,y_1), (x_2,y_1), (x_2,y_2), (x_3,y_2)} all lie in this bipartite graph. This implies that, for all {x_1,x_3 \in C}, there exist {\gg K^{-O(1)} N^7} septuples {(x_2,y_1,y_2,z_{11},z_{21},z_{22},z_{32}) \in G^7} obeying the constraints

\displaystyle  f(y_j) - f(x_i), f(y_j+z_{ij}) - f(x_i+z_{ij}) \in g_0(y_j-x_i) + S

and {y_j, x_i, y_j+z_{ij}, x_i+z_{ij} \in E} for {ij = 11, 21, 22, 32}. These constraints imply in particular that

\displaystyle  f(x_3) - f(x_1) \in f(x_3+z_{32}) - f(y_2+z_{32}) + f(y_2+z_{22}) - f(x_2+z_{22}) + f(x_2+z_{21}) - f(y_1+z_{21}) + f(y_1+z_{11}) - f(x_1+z_{11}) + 4S - 4S.

Also observe that

\displaystyle  x_3 - x_1 = (x_3+z_{32}) - (y_2+z_{32}) + (y_2+z_{22}) - (x_2+z_{22}) + (x_2+z_{21}) - (y_1+z_{21}) + (y_1+z_{11}) - (x_1+z_{11}).

Thus, if {h \in G} and {x_3,x_1 \in C} are such that {x_3-x_1 = h}, we see that

\displaystyle  f(w_1) - f(w_2) + f(w_3) - f(w_4) + f(w_5) - f(w_6) + f(w_7) - f(w_8) \in f(x_3) - f(x_1) + 4S - 4S

for {\gg K^{-O(1)} N^7} octuples {(w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8) \in E^8} in the hyperplane

\displaystyle  h = w_1 - w_2 + w_3 - w_4 + w_5 - w_6 + w_7 - w_8.

By the pigeonhole principle, this implies that for any fixed {h \in G}, there can be at most {O(K^{O(1)})} sets of the form {f(x_3)-f(x_1) + 3S-3S} with {x_3-x_1=h}, {x_1,x_3 \in C} that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set {W_h} of cardinality {O(K^{O(1)})}, such that each set {f(x_3) - f(x_1) + 3S-3S} with {x_3-x_1=h}, {x_1,x_3 \in C} intersects {w+4S -4S} for some {w \in W_h}, or in other words that

\displaystyle  f(x_3) - f(x_1) \in W_{x_3-x_1} + 8S-8S \ \ \ \ \ (8)

whenever {x_1,x_3 \in C}. In particular,

\displaystyle  \sum_{h \in G} \sum_{w \in W_h} | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in w + 8S-8S \}| \geq |C|^2 \gg K^{-O(1)} N^2.

This implies that there exists a subset {A} of {G} with {|A| \gg K^{-O(1)} N}, and an element {g_1(h) \in W_h} for each {h \in A}, such that

\displaystyle  | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in g_1(h) + 8S-8S \}| \gg K^{-O(1)} N \ \ \ \ \ (9)

for all {h \in A}. Note we may assume without loss of generality that {0 \in A} and {g_1(0)=0}.

Suppose that {h_1,\dots,h_{16} \in A} are such that

\displaystyle  \sum_{i=1}^{16} (-1)^{i-1} h_i = 0. \ \ \ \ \ (10)

By construction of {A}, and permuting labels, we can find {\gg K^{-O(1)} N^{16}} 16-tuples {(x_1,\dots,x_{16},y_1,\dots,y_{16}) \in C^{32}} such that

\displaystyle  y_i - x_i = (-1)^{i-1} h_i

and

\displaystyle  f(y_i) - f(x_i) \in (-1)^{i-1} g_i(h) + 8S - 8S

for {i=1,\dots,16}. We sum this to obtain

\displaystyle  f(y_1) + \sum_{i=1}^{15} f(y_{i+1})-f(x_i) - f(x_8) \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 128 S - 128 S

and hence by (8)

\displaystyle  f(y_1) - f(x_{16}) + \sum_{i=1}^{15} W_{k_i} \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 248 S - 248 S

where {k_i := y_{i+1}-x_i}. Since

\displaystyle  y_1 - x_{16} + \sum_{i=1}^{15} k_i = 0

we see that there are only {N^{16}} possible values of {(y_1,x_{16},k_1,\dots,k_{15})}. By the pigeonhole principle, we conclude that at most {O(K^{O(1)})} of the sets {\sum_{i=1}^{16} (-1)^i g_1(h_i) + 248 S - 248 S} can be disjoint. Arguing as before, we conclude that there exists a set {X} of cardinality {O(K^{O(1)})} such that

\displaystyle  \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) \in X + 496 S - 496 S \ \ \ \ \ (11)

whenever (10) holds.

For any {h \in 4A-4A}, write {h} arbitrarily as {h = \sum_{i=1}^8 (-1)^{i-1} h_i} for some {h_1,\dots,h_8 \in A} (with {h_5=\dots=h_8=0} if {h \in 2A-2A}, and {h_2 = \dots = h_8 = 0} if {h \in A}) and then set

\displaystyle  g(h) := \sum_{i=1}^8 (-1)^i g_1(h_i).

Then from (11) we have (4). For {h \in A} we have {g(h) = g_1(h)}, and (5) then follows from (9). \Box

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z)

on the zeroes of a time-dependent family of polynomials {z \mapsto P(t,z)}, with a particular focus on the case when the polynomials {z \mapsto P(t,z)} had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle {\{ z: |z| = \sqrt{q} \}}, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when {P} is the numerator of the zeta function of a curve.

More precisely, let {g} be a natural number. We will say that a polynomial

\displaystyle  P(z) = \sum_{j=0}^{2g} a_j z^j

of degree {2g} (so that {a_{2g} \neq 0}) obeys the functional equation if the {a_j} are all real and

\displaystyle  a_j = q^{g-j} a_{2g-j}

for all {j=0,\dots,2g}, thus

\displaystyle  P(\overline{z}) = \overline{P(z)}

and

\displaystyle  P(q/z) = q^g z^{-2g} P(z)

for all non-zero {z}. This means that the {2g} zeroes {\alpha_1,\dots,\alpha_{2g}} of {P(z)} (counting multiplicity) lie in {{\bf C} \backslash \{0\}} and are symmetric with respect to complex conjugation {z \mapsto \overline{z}} and inversion {z \mapsto q/z} across the circle {\{ |z| = \sqrt{q}\}}. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle {\{ z = \sqrt{q}\}}. For instance, in the {g=1} case, the polynomial {z^2 - a_1 z + q} obeys the Riemann hypothesis if and only if {|a_1| \leq 2\sqrt{q}}.

Such polynomials arise in number theory as follows: if {C} is a projective curve of genus {g} over a finite field {\mathbf{F}_q}, then, as famously proven by Weil, the associated local zeta function {\zeta_{C,q}(z)} (as defined for instance in this previous blog post) is known to take the form

\displaystyle  \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}

where {P} is a degree {2g} polynomial obeying both the functional equation and the Riemann hypothesis. In the case that {C} is an elliptic curve, then {g=1} and {P} takes the form {P(z) = z^2 - a_1 z + q}, where {a_1} is the number of {{\bf F}_q}-points of {C} minus {q+1}. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

\displaystyle  P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)

of {2g \times 2g} matrices {F} in the compact symplectic group {Sp(g)}. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials {P} arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where {F} is drawn uniformly from {Sp(g)} with Haar measure.

Given a polynomial {z \mapsto P(0,z)} of degree {2g} with coefficients

\displaystyle  P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,

we can evolve it in time by the formula

\displaystyle  P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,

thus {a_j(t) = \exp(t(j-g)) a_j(0)} for {t \in {\bf R}}. Informally, as one increases {t}, this evolution accentuates the effect of the extreme monomials, particularly, {z^0} and {z^{2g}} at the expense of the intermediate monomials such as {z^g}, and conversely as one decreases {t}. This family of polynomials obeys the heat-type equation

\displaystyle  \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group {Sp(n)}, and should also be tied to some sort of “{\beta=\infty}” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if {z \mapsto P(0,z)} obeys the functional equation, then so does {z \mapsto P(t,z)} for any other time {t}. Now we investigate the evolution of the zeroes. Suppose at some time {t_0} that the zeroes {\alpha_1(t_0),\dots,\alpha_{2g}(t_0)} of {z \mapsto P(t_0,z)} are distinct, then

\displaystyle  P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).

From the inverse function theorem we see that for times {t} sufficiently close to {t_0}, the zeroes {\alpha_1(t),\dots,\alpha_{2g}(t)} of {z \mapsto P(t,z)} continue to be distinct (and vary smoothly in {t}), with

\displaystyle  P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).

Differentiating this at any {z} not equal to any of the {\alpha_j(t)}, we obtain

\displaystyle  \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})

and

\displaystyle  \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})

and

\displaystyle  \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).

Inserting these formulae into (2) (expanding {(z \partial_z - g)^2} as {z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}) and canceling some terms, we conclude that

\displaystyle  - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}

\displaystyle  - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}

for {t} sufficiently close to {t_0}, and {z} not equal to {\alpha_1(t),\dots,\alpha_{2g}(t)}. Extracting the residue at {z = \alpha_j(t)}, we conclude that

\displaystyle  - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)

which we can rearrange as

\displaystyle  \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.

If we make the change of variables {\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}} (noting that one can make {\theta_j} depend smoothly on {t} for {t} sufficiently close to {t_0}), this becomes

\displaystyle  \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)

Intuitively, this equation asserts that the phases {\theta_j} repel each other if they are real (and attract each other if their difference is imaginary). If {z \mapsto P(t_0,z)} obeys the Riemann hypothesis, then the {\theta_j} are all real at time {t_0}, then the Picard uniqueness theorem (applied to {\theta_j(t)} and its complex conjugate) then shows that the {\theta_j} are also real for {t} sufficiently close to {t_0}. If we then define the entropy functional

\displaystyle  H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }

then the above equation becomes a gradient flow

\displaystyle  \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )

which implies in particular that {H(\theta_1(t),\dots,\theta_{2g}(t))} is non-increasing in time. This shows that as one evolves time forward from {t_0}, there is a uniform lower bound on the separation between the phases {\theta_1(t),\dots,\theta_{2g}(t)}, and hence the equation can be solved indefinitely; in particular, {z \mapsto P(t,z)} obeys the Riemann hypothesis for all {t > t_0} if it does so at time {t_0}. Our argument here assumed that the zeroes of {z \mapsto P(t_0,z)} were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial {z \mapsto P(0,z)} obeying the functional equation, the rescaled polynomials {z \mapsto e^{-g^2 t} P(t,z)} converge locally uniformly to {a_{2g}(0) (z^{2g} + q^g)} as {t \rightarrow +\infty}. By Rouche’s theorem, we conclude that the zeroes of {z \mapsto P(t,z)} converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}} on the circle {\{ |z| = \sqrt{q}\}}. Together with the symmetry properties of the zeroes, this implies in particular that {z \mapsto P(t,z)} obeys the Riemann hypothesis for all sufficiently large positive {t}. In the opposite direction, when {t \rightarrow -\infty}, the polynomials {z \mapsto P(t,z)} converge locally uniformly to {a_g(0) z^g}, so if {a_g(0) \neq 0}, {g} of the zeroes converge to the origin and the other {g} converge to infinity. In particular, {z \mapsto P(t,z)} fails the Riemann hypothesis for sufficiently large negative {t}. Thus (if {a_g(0) \neq 0}), there must exist a real number {\Lambda}, which we call the de Bruijn-Newman constant of the original polynomial {z \mapsto P(0,z)}, such that {z \mapsto P(t,z)} obeys the Riemann hypothesis for {t \geq \Lambda} and fails the Riemann hypothesis for {t < \Lambda}. The situation is a bit more complicated if {a_g(0)} vanishes; if {k} is the first natural number such that {a_{g+k}(0)} (or equivalently, {a_{g-j}(0)}) does not vanish, then by the above arguments one finds in the limit {t \rightarrow -\infty} that {g-k} of the zeroes go to the origin, {g-k} go to infinity, and the remaining {2k} zeroes converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}. In this case the de Bruijn-Newman constant remains finite except in the degenerate case {k=g}, in which case {\Lambda = -\infty}.

For instance, consider the case when {g=1} and {P(0,z) = z^2 - a_1 z + q} for some real {a_1} with {|a_1| \leq 2\sqrt{q}}. Then the quadratic polynomial

\displaystyle  P(t,z) = e^t z^2 - a_1 z + e^t q

has zeroes

\displaystyle  \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}

and one easily checks that these zeroes lie on the circle {\{ |z|=\sqrt{q}\}} when {t \geq \log \frac{|a_1|}{2\sqrt{q}}}, and are on the real axis otherwise. Thus in this case we have {\Lambda = \log \frac{|a_1|}{2\sqrt{q}}} (with {\Lambda=-\infty} if {a_1=0}). Note how as {t} increases to {+\infty}, the zeroes repel each other and eventually converge to {\pm i \sqrt{q}}, while as {t} decreases to {-\infty}, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial {P} of degree {g} that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to {1/g}, basically because the average spacing is {1/g} and hence by (3) the typical velocity of the zeroes should be comparable to {g}, and the diameter of the unit circle is comparable to {1}, thus requiring time comparable to {1/g} to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant {\Lambda} should typically take on values comparable to {-1/g} (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of {P} given previously) to explore this further.

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies homogeneous length functions {\| \|: G \rightarrow {\bf R}^+} on an arbitrary group {G = (G,\cdot,e,()^{-1})}, that is to say non-negative functions that obey the symmetry condition {\|x^{-1}\| = \|x\|}, the non-degeneracy condition {\|x\|=0 \iff x=e}, the triangle inequality {\|xy\| \leq \|x\| + \|y\|}, and the homogeneity condition {\|x^2\| = 2\|x\|}. It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that {G} can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as {\|[x,y]\|}, until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1 Let {G = (G,\cdot)} be a group. Suppose one has a “seminorm” function {\| \|: G \rightarrow [0,+\infty)} which obeys the triangle inequality

\displaystyle \|xy \| \leq \|x\| + \|y\|

for all {x,y \in G}, with equality whenever {x=y}. Then the seminorm factors through the abelianisation map {G \mapsto G/[G,G]}.

Proof: By the triangle inequality, it suffices to show that {\| [x,y]\| = 0} for all {x,y \in G}, where {[x,y] := xyx^{-1}y^{-1}} is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have {\|x^2\| = 2 \|x\|}, and hence {\|x^n \| = n \|x\|} whenever {n} is a power of two. On the other hand, by the triangle inequality we have {\|x^n \| \leq n\|x\|} for all positive {n}, and hence by the triangle inequality again we also have the matching lower bound, thus

\displaystyle \|x^n \| = n \|x\|

for all {n > 0}. The claim is also true for {n=0} (apply the preceding bound with {x=1} and {n=2}). By replacing {\|x\|} with {\max(\|x\|, \|x^{-1}\|)} if necessary we may now also assume without loss of generality that {\|x^{-1} \| = \|x\|}, thus

\displaystyle \|x^n \| = |n| \|x\| \ \ \ \ \ (1)

 

for all integers {n}.

Next, for any {x,y \in G}, and any natural number {n}, we have

\displaystyle \|yxy^{-1} \| = \frac{1}{n} \| (yxy^{-1})^n \|

\displaystyle = \frac{1}{n} \| y x^n y^{-1} \|

\displaystyle \leq \frac{1}{n} ( \|y\| + n \|x\| + \|y\|^{-1} )

so on taking limits as {n \rightarrow \infty} we have {\|yxy^{-1} \| \leq \|x\|}. Replacing {x,y} by {yxy^{-1},y^{-1}} gives the matching lower bound, thus we have the conjugation invariance

\displaystyle \|yxy^{-1} \| = \|x\|. \ \ \ \ \ (2)

 

Next, we observe that if {x,y,z,w} are such that {x} is conjugate to both {wy} and {zw^{-1}}, then one has the inequality

\displaystyle \|x\| \leq \frac{1}{2} ( \|y \| + \| z \| ). \ \ \ \ \ (3)

 

Indeed, if we write {x = swys^{-1} = t zw^{-1} t^{-1}} for some {s,t \in G}, then for any natural number {n} one has

\displaystyle \|x\| = \frac{1}{2n} \| x^n x^n \|

\displaystyle = \frac{1}{2n} \| swy \dots wy s^{-1}t zw^{-1} \dots zw^{-1} t^{-1} \|

where the {wy} and {zw^{-1}} terms each appear {n} times. From (2) we see that conjugation by {w} does not affect the norm. Using this and the triangle inequality several times, we conclude that

\displaystyle \|x\| \leq \frac{1}{2n} ( \|s\| + n \|y\| + \| s^{-1} t\| + n \|z\| + \|t^{-1} \| ),

and the claim (3) follows by sending {n \rightarrow \infty}.

The following special case of (3) will be of particular interest. Let {x,y \in G}, and for any integers {m,k}, define the quantity

\displaystyle f(m,k) := \| x^m [x,y]^k \|.

Observe that {x^m [x,y]^k} is conjugate to both {x (x^{m-1} [x,y]^k)} and to {(y^{-1} x^m [x,y]^{k-1} xy) x^{-1}}, hence by (3) one has

\displaystyle \| x^m [x,y]^k \| \leq \frac{1}{2} ( \| x^{m-1} [x,y]^k \| + \| y^{-1} x^{m} [x,y]^{k-1} xy \|)

which by (2) leads to the recursive inequality

\displaystyle f(m,k) \leq \frac{1}{2} (f(m-1,k) + f(m+1,k-1)).

We can write this in probabilistic notation as

\displaystyle f(m,k) \leq {\bf E} f( (m,k) + X )

where {X} is a random vector that takes the values {(-1,0)} and {(1,-1)} with probability {1/2} each. Iterating this, we conclude in particular that for any large natural number {n}, one has

\displaystyle f(0,n) \leq {\bf E} f( Z )

where {Z := (0,n) + X_1 + \dots + X_{2n}} and {X_1,\dots,X_{2n}} are iid copies of {X}. We can write {Z = (1,-1/2) (Y_1 + \dots + Y_{2n})} where Y_1,\dots,Y_{2n} = \pm 1 are iid signs.  By the triangle inequality, we thus have

\displaystyle f( Z ) \leq |Y_1+\dots+Y_{2n}| (\|x\| + \frac{1}{2} \| [x,y] \|),

noting that Y_1+\dots+Y_{2n} is an even integer.  On the other hand, Y_1+\dots+Y_{2n} has mean zero and variance 2n, hence by Cauchy-Schwarz

\displaystyle f(0,n) \leq \sqrt{2n}( \|x\| + \frac{1}{2} \| [x,y] \|).

But by (1), the left-hand side is equal to {n \| [x,y]\|}. Dividing by {n} and then sending {n \rightarrow \infty}, we obtain the claim. \Box

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation {G = (G,+)}, thus for instance {\| nx \| = |n| \|x\|} for {n \in {\bf Z}}. We think of {G} as a {{\bf Z}}-module. One can then extend the seminorm to the associated {{\bf Q}}-vector space {G \otimes_{\bf Z} {\bf Q}} by the formula {\|\frac{a}{b} x\| := \frac{a}{b} \|x\|}, and then to the associated {{\bf R}}-vector space {G \otimes_{\bf Z} {\bf R}} by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition {\|x\| = \|x^{-1}\|}). Conversely, any seminorm on {G \otimes_{\bf Z} {\bf R}} induces a seminorm on {G}. (These arguments also appear in this paper of Khare and Rajaratnam.)

 

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let {F_2} be the free group on two generators {a,b}, and let {\| \|: F_2 \rightarrow {\bf R}^+} be a quantity obeying the triangle inequality

\displaystyle \| xy\| \leq \|x \| + \|y\|

and the linear growth property

\displaystyle \| x^n \| = |n| \| x\|

for all {x,y \in F_2} and integers {n \in {\bf Z}}; this implies the conjugation invariance

\displaystyle \| y^{-1} x y \| = \|x\|

or equivalently

\displaystyle \| xy \| = \| yx\|

We consider inequalities of the form

\displaystyle \| xyx^{-1}y^{-1} \| \leq \alpha \|x\| + \beta \| y\| \ \ \ \ \ (1)

or

\displaystyle \| xyx^{-2}y^{-1} \| \leq \gamma \|x\| + \delta \| y\| \ \ \ \ \ (2)

for various real numbers {\alpha,\beta,\gamma,\delta}. For instance, since

\displaystyle \| xyx^{-1}y^{-1} \| \leq \| xyx^{-1}\| + \|y^{-1} \| = \|y\| + \|y\|

we have (1) for {(\alpha,\beta) = (2,0)}. We also have the following further relations:

Proposition 1

  • (i) If (1) holds for {(\alpha,\beta)}, then it holds for {(\beta,\alpha)}.
  • (ii) If (1) holds for {(\alpha,\beta)}, then (2) holds for {(\alpha+1, \frac{\beta}{2})}.
  • (iii) If (2) holds for {(\gamma,\delta)}, then (1) holds for {(\frac{2\gamma}{3}, \frac{2\delta}{3})}.
  • (iv) If (1) holds for {(\alpha,\beta)} and (2) holds for {(\gamma,\delta)}, then (1) holds for {(\frac{2\alpha+1+\gamma}{4}, \frac{\delta+\beta}{4})}.

Proof: For (i) we simply observe that

\displaystyle \| xyx^{-1} y^{-1} \| = \| (xyx^{-1} y^{-1})^{-1} \| = \| y^{-1} x^{-1} y x \| = \| y x y^{-1} x^{-1} \|.

For (ii), we calculate

\displaystyle \| xyx^{-2}y^{-1} \| = \frac{1}{2}\| (xyx^{-2}y^{-1})^2 \|

\displaystyle = \frac{1}{2} \| (xyx^{-2}y^{-1} x) (yx^{-2} y^{-1}) \|

\displaystyle \leq \frac{1}{2} (\| xyx^{-2}y^{-1} x\| + \|yx^{-2} y^{-1}\|)

\displaystyle \leq \frac{1}{2} ( \| x^2 y x^{-2} y^{-1} \| + 2 \|x\| )

\displaystyle \leq \frac{1}{2} ( 2 \alpha \|x\| + \beta \|y\| + 2 \|x\|)

giving the claim.

For (iii), we calculate

\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{3} \| (xyx^{-1}y^{-1})^3 \|

\displaystyle = \frac{1}{3} \| (xyx) (x^{-2} y^{-1} xy) (xyx)^{-1} (x^2 y x^{-1} y^{-1}) \|

\displaystyle \leq \frac{1}{3} ( \| x^{-2} y^{-1} xy\| + \| x^2 y x^{-1} y^{-1}\| )

\displaystyle = \frac{1}{3} ( \| xy x^{-2} y^{-1} \| + \|x^{-1} y^{-1} x^2 y \| )

\displaystyle \leq \frac{1}{3} ( \gamma \|x\| + \delta \|y\| + \gamma \|x\| + \delta \|y\|)

giving the claim.

For (iv), we calculate

\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{4} \| (xyx^{-1}y^{-1})^4 \|

\displaystyle = \frac{1}{4} \| (xy) (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) (xy)^{-1} (x^2yx^{-1}y^{-1}) \|

\displaystyle \leq \frac{1}{4} ( \| (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) \| + \|x^2yx^{-1}y^{-1}\| )

\displaystyle \leq \frac{1}{4} ( \|(y x^{-1} y^{-1}) (xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)

\displaystyle \leq \frac{1}{4} ( \|x\| + \|(xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)

\displaystyle = \frac{1}{4} ( \|x\| + \|x^{-2} y x^2 y^{-1} \|+ \gamma \|x\| + \delta \|y\|)

\displaystyle \leq \frac{1}{4} ( \|x\| + 2\alpha \|x\| + \beta \|y\| + \gamma \|x\| + \delta \|y\|)

giving the claim. \Box

Here is a typical application of the above estimates. If (1) holds for {(\alpha,\beta)}, then by part (i) it holds for {(\beta,\alpha)}, then by (ii) (2) holds for {(\beta+1,\frac{\alpha}{2})}, then by (iv) (1) holds for {(\frac{3\beta+2}{4}, \frac{3\alpha}{8})}. The map {(\alpha,\beta) \mapsto (\frac{3\beta+2}{4}, \frac{3\alpha}{8})} has fixed point {(\alpha,\beta) = (\frac{16}{23}, \frac{6}{23})}, thus

\displaystyle \| xyx^{-1}y^{-1} \| \leq \frac{16}{23} \|x\| + \frac{6}{23} \|y\|.

For instance, if {\|a\|, \|b\| \leq 1}, then {\|aba^{-1}b^{-1} \| \leq 22/23 = 0.95652\dots}.

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let {F_2} be the free group on two generators {a,b}. Does there exist a metric {d} on this group which is

  • bi-invariant, thus {d(xg,yg)=d(gx,gy) = d(x,y)} for all {x,y,g \in F_2}; and
  • linear growth in the sense that {d(x^n,1) = n d(x,1)} for all {x \in F_2} and all natural numbers {n}?

By defining the “norm” of an element {x \in F_2} to be {\| x\| := d(x,1)}, an equivalent formulation of the problem asks if there exists a non-negative norm function {\| \|: F_2 \rightarrow {\bf R}^+} that obeys the conjugation invariance

\displaystyle  \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)

for all {x,g \in F_2}, the triangle inequality

\displaystyle  \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)

for all {x,y \in F_2}, and the linear growth

\displaystyle  \| x^n \| = |n| \|x\| \ \ \ \ \ (3)

for all {x \in F_2} and {n \in {\bf Z}}, and such that {\|x\| > 0} for all non-identity {x \in F_2}. Indeed, if such a norm exists then one can just take {d(x,y) := \| x y^{-1} \|} to give the desired metric.

One can normalise the norm of the generators to be at most {1}, thus

\displaystyle  \| a \|, \| b \| \leq 1.

This can then be used to upper bound the norm of other words in {F_2}. For instance, from (1), (3) one has

\displaystyle  \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.

A bit less trivially, from (3), (2), (1) one can bound commutators as

\displaystyle  \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|

\displaystyle  = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|

\displaystyle  \leq \frac{4}{3}.

In a similar spirit one has

\displaystyle  \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|

\displaystyle  = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|

\displaystyle  \leq 2.

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm {\| g\|} of a given non-trivial group element {g} to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on {F_2} would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

How many groups of order four are there? Technically, there are an enormous number, so much so, in fact, that the class of groups of order four is not even a set, but merely a proper class. This is because any four objects {a,b,c,d} can be turned into a group {\{a,b,c,d\}} by designating one of the four objects, say {a}, to be the group identity, and imposing a suitable multiplication table (and inversion law) on the four elements in a manner that obeys the usual group axioms. Since all sets are themselves objects, the class of four-element groups is thus at least as large as the class of all sets, which by Russell’s paradox is known not to itself be a set (assuming the usual ZFC axioms of set theory).

A much better question is to ask how many groups of order four there are up to isomorphism, counting each isomorphism class of groups exactly once. Now, as one learns in undergraduate group theory classes, the answer is just “two”: the cyclic group {C_4} and the Klein four-group {C_2 \times C_2}.

More generally, given a class of objects {X} and some equivalence relation {\sim} on {X} (which one should interpret as describing the property of two objects in {X} being “isomorphic”), one can consider the number {|X / \sim|} of objects in {X} “up to isomorphism”, which is simply the cardinality of the collection {X/\sim} of equivalence classes {[x]:=\{y\in X:x \sim {}y \}} of {X}. In the case where {X} is finite, one can express this cardinality by the formula

\displaystyle |X/\sim| = \sum_{x \in X} \frac{1}{|[x]|}, \ \ \ \ \ (1)

thus one counts elements in {X}, weighted by the reciprocal of the number of objects they are isomorphic to.

Example 1 Let {X} be the five-element set {\{-2,-1,0,1,2\}} of integers between {-2} and {2}. Let us say that two elements {x, y} of {X} are isomorphic if they have the same magnitude: {x \sim y \iff |x| = |y|}. Then the quotient space {X/\sim} consists of just three equivalence classes: {\{-2,2\} = [2] = [-2]}, {\{-1,1\} = [-1] = [1]}, and {\{0\} = [0]}. Thus there are three objects in {X} up to isomorphism, and the identity (1) is then just

\displaystyle 3 = \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2}.

Thus, to count elements in {X} up to equivalence, the elements {-2,-1,1,2} are given a weight of {1/2} because they are each isomorphic to two elements in {X}, while the element {0} is given a weight of {1} because it is isomorphic to just one element in {X} (namely, itself).

Given a finite probability set {X}, there is also a natural probability distribution on {X}, namely the uniform distribution, according to which a random variable {\mathbf{x} \in X} is set equal to any given element {x} of {X} with probability {\frac{1}{|X|}}:

\displaystyle {\bf P}( \mathbf{x} = x ) = \frac{1}{|X|}.

Given a notion {\sim} of isomorphism on {X}, one can then define the random equivalence class {[\mathbf{x}] \in X/\sim} that the random element {\mathbf{x}} belongs to. But if the isomorphism classes are unequal in size, we now encounter a biasing effect: even if {\mathbf{x}} was drawn uniformly from {X}, the equivalence class {[\mathbf{x}]} need not be uniformly distributed in {X/\sim}. For instance, in the above example, if {\mathbf{x}} was drawn uniformly from {\{-2,-1,0,1,2\}}, then the equivalence class {[\mathbf{x}]} will not be uniformly distributed in the three-element space {X/\sim}, because it has a {2/5} probability of being equal to the class {\{-2,2\}} or to the class {\{-1,1\}}, and only a {1/5} probability of being equal to the class {\{0\}}.

However, it is possible to remove this bias by changing the weighting in (1), and thus changing the notion of what cardinality means. To do this, we generalise the previous situation. Instead of considering sets {X} with an equivalence relation {\sim} to capture the notion of isomorphism, we instead consider groupoids, which are sets {X} in which there are allowed to be multiple isomorphisms between elements in {X} (and in particular, there are allowed to be multiple automorphisms from an element to itself). More precisely:

Definition 2 A groupoid is a set (or proper class) {X}, together with a (possibly empty) collection {\mathrm{Iso}(x \rightarrow y)} of “isomorphisms” between any pair {x,y} of elements of {X}, and a composition map {f, g \mapsto g \circ f} from isomorphisms {f \in \mathrm{Iso}(x \rightarrow y)}, {g \in \mathrm{Iso}(y \rightarrow z)} to isomorphisms in {\mathrm{Iso}(x \rightarrow z)} for every {x,y,z \in X}, obeying the following group-like axioms:

  • (Identity) For every {x \in X}, there is an identity isomorphism {\mathrm{id}_x \in \mathrm{Iso}(x \rightarrow x)}, such that {f \circ \mathrm{id}_x = \mathrm{id}_y \circ f = f} for all {f \in \mathrm{Iso}(x \rightarrow y)} and {x,y \in X}.
  • (Associativity) If {f \in \mathrm{Iso}(x \rightarrow y)}, {g \in \mathrm{Iso}(y \rightarrow z)}, and {h \in \mathrm{Iso}(z \rightarrow w)} for some {x,y,z,w \in X}, then {h \circ (g \circ f) = (h \circ g) \circ f}.
  • (Inverse) If {f \in \mathrm{Iso}(x \rightarrow y)} for some {x,y \in X}, then there exists an inverse isomorphism {f^{-1} \in \mathrm{Iso}(y \rightarrow x)} such that {f \circ f^{-1} = \mathrm{id}_y} and {f^{-1} \circ f = \mathrm{id}_x}.

We say that two elements {x,y} of a groupoid are isomorphic, and write {x \sim y}, if there is at least one isomorphism from {x} to {y}.

Example 3 Any category gives a groupoid by taking {X} to be the set (or class) of objects, and {\mathrm{Iso}(x \rightarrow y)} to be the collection of invertible morphisms from {x} to {y}. For instance, in the category {\mathbf{Set}} of sets, {\mathrm{Iso}(x \rightarrow y)} would be the collection of bijections from {x} to {y}; in the category {\mathbf{Vec}/k} of linear vector spaces over some given base field {k}, {\mathrm{Iso}(x \rightarrow y)} would be the collection of invertible linear transformations from {x} to {y}; and so forth.

Every set {X} equipped with an equivalence relation {\sim} can be turned into a groupoid by assigning precisely one isomorphism {\iota_{x \rightarrow y}} from {x} to {y} for any pair {x,y \in X} with {x \sim y}, and no isomorphisms from {x} to {y} when {x \not \sim y}, with the groupoid operations of identity, composition, and inverse defined in the only way possible consistent with the axioms. We will call this the simply connected groupoid associated with this equivalence relation. For instance, with {X = \{-2,-1,0,1,2\}} as above, if we turn {X} into a simply connected groupoid, there will be precisely one isomorphism from {2} to {-2}, and also precisely one isomorphism from {2} to {2}, but no isomorphisms from {2} to {-1}, {0}, or {1}.

However, one can also form multiply-connected groupoids in which there can be multiple isomorphisms from one element of {X} to another. For instance, one can view {X = \{-2,-1,0,1,2\}} as a space that is acted on by multiplication by the two-element group {\{-1,+1\}}. This gives rise to two types of isomorphisms, an identity isomorphism {(+1)_x} from {x} to {x} for each {x \in X}, and a negation isomorphism {(-1)_x} from {x} to {-x} for each {x \in X}; in particular, there are two automorphisms of {0} (i.e., isomorphisms from {0} to itself), namely {(+1)_0} and {(-1)_0}, whereas the other four elements of {X} only have a single automorphism (the identity isomorphism). One defines composition, identity, and inverse in this groupoid in the obvious fashion (using the group law of the two-element group {\{-1,+1\}}); for instance, we have {(-1)_{-2} \circ (-1)_2 = (+1)_2}.

For a finite multiply-connected groupoid, it turns out that the natural notion of “cardinality” (or as I prefer to call it, “cardinality up to isomorphism”) is given by the variant

\displaystyle \sum_{x \in X} \frac{1}{|\{ f: f \in \mathrm{Iso}(x \rightarrow y) \hbox{ for some } y\}|}

of (1). That is to say, in the multiply connected case, the denominator is no longer the number of objects {y} isomorphic to {x}, but rather the number of isomorphisms from {x} to other objects {y}. Grouping together all summands coming from a single equivalence class {[x]} in {X/\sim}, we can also write this expression as

\displaystyle \sum_{[x] \in X/\sim} \frac{1}{|\mathrm{Aut}(x)|} \ \ \ \ \ (2)

where {\mathrm{Aut}(x) := \mathrm{Iso}(x \rightarrow x)} is the automorphism group of {x}, that is to say the group of isomorphisms from {x} to itself. (Note that if {x,x'} belong to the same equivalence class {[x]}, then the two groups {\mathrm{Aut}(x)} and {\mathrm{Aut}(x')} will be isomorphic and thus have the same cardinality, and so the above expression is well-defined.

For instance, if we take {X} to be the simply connected groupoid on {\{-2,-1,0,1,2\}}, then the number of elements of {X} up to isomorphism is

\displaystyle \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2} = 1 + 1 + 1 = 3

exactly as before. If however we take the multiply connected groupoid on {\{-2,-1,0,1,2\}}, in which {0} has two automorphisms, the number of elements of {X} up to isomorphism is now the smaller quantity

\displaystyle \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} = 1 + \frac{1}{2} + 1 = \frac{5}{2};

the equivalence class {[0]} is now counted with weight {1/2} rather than {1} due to the two automorphisms on {0}. Geometrically, one can think of this groupoid as being formed by taking the five-element set {\{-2,-1,0,1,2\}}, and “folding it in half” around the fixed point {0}, giving rise to two “full” quotient points {[1], [2]} and one “half” point {[0]}. More generally, given a finite group {G} acting on a finite set {X}, and forming the associated multiply connected groupoid, the cardinality up to isomorphism of this groupoid will be {|X|/|G|}, since each element {x} of {X} will have {|G|} isomorphisms on it (whether they be to the same element {x}, or to other elements of {X}).

The definition (2) can also make sense for some infinite groupoids; to my knowledge this was first explicitly done in this paper of Baez and Dolan. Consider for instance the category {\mathbf{FinSet}} of finite sets, with isomorphisms given by bijections as in Example 3. Every finite set is isomorphic to {\{1,\dots,n\}} for some natural number {n}, so the equivalence classes of {\mathbf{FinSet}} may be indexed by the natural numbers. The automorphism group {S_n} of {\{1,\dots,n\}} has order {n!}, so the cardinality of {\mathbf{FinSet}} up to isomorphism is

\displaystyle \sum_{n=0}^\infty \frac{1}{n!} = e.

(This fact is sometimes loosely stated as “the number of finite sets is {e}“, but I view this statement as somewhat misleading if the qualifier “up to isomorphism” is not added.) Similarly, when one allows for multiple isomorphisms from a group to itself, the number of groups of order four up to isomorphism is now

\displaystyle \frac{1}{2} + \frac{1}{6} = \frac{2}{3}

because the cyclic group {C_4} has two automorphisms, whereas the Klein four-group {C_2 \times C_2} has six.

In the case that the cardinality of a groupoid {X} up to isomorphism is finite and non-empty, one can now define the notion of a random isomorphism class {[\mathbf{x}]} in {X/\sim} drawn “uniformly up to isomorphism”, by requiring the probability of attaining any given isomorphism class {[x]} to be

\displaystyle {\mathbf P}([\mathbf{x}] = [x]) = \frac{1 / |\mathrm{Aut}(x)|}{\sum_{[y] \in X/\sim} 1/|\mathrm{Aut}(y)|},

thus the probability of being isomorphic to a given element {x} will be inversely proportional to the number of automorphisms that {x} has. For instance, if we take {X} to be the set {\{-2,-1,0,1,2\}} with the simply connected groupoid, {[\mathbf{x}]} will be drawn uniformly from the three available equivalence classes {[0], [1], [2]}, with a {1/3} probability of attaining each; but if instead one uses the multiply connected groupoid coming from the action of {\{-1,+1\}}, and draws {[\mathbf{x}]} uniformly up to isomorphism, then {[1]} and {[2]} will now be selected with probability {2/5} each, and {[0]} will be selected with probability {1/5}. Thus this distribution has accounted for the bias mentioned previously: if a finite group {G} acts on a finite space {X}, and {\mathbf{x}} is drawn uniformly from {X}, then {[\mathbf{x}]} now still be drawn uniformly up to isomorphism from {X/G}, if we use the multiply connected groupoid coming from the {G} action, rather than the simply connected groupoid coming from just the {G}-orbit structure on {X}.

Using the groupoid of finite sets, we see that a finite set chosen uniformly up to isomorphism will have a cardinality that is distributed according to the Poisson distribution of parameter {1}, that is to say it will be of cardinality {n} with probability {\frac{e^{-1}}{n!}}.

One important source of groupoids are the fundamental groupoids {\pi_1(M)} of a manifold {M} (one can also consider more general topological spaces than manifolds, but for simplicity we will restrict this discussion to the manifold case), in which the underlying space is simply {M}, and the isomorphisms from {x} to {y} are the equivalence classes of paths from {x} to {y} up to homotopy; in particular, the automorphism group of any given point is just the fundamental group of {M} at that base point. The equivalence class {[x]} of a point in {M} is then the connected component of {x} in {M}. The cardinality up to isomorphism of the fundamental groupoid is then

\displaystyle \sum_{M' \in \pi_0(M)} \frac{1}{|\pi_1(M')|}

where {\pi_0(M)} is the collection of connected components {M'} of {M}, and {|\pi_1(M')|} is the order of the fundamental group of {M'}. Thus, simply connected components of {M} count for a full unit of cardinality, whereas multiply connected components (which can be viewed as quotients of their simply connected cover by their fundamental group) will count for a fractional unit of cardinality, inversely to the order of their fundamental group.

This notion of cardinality up to isomorphism of a groupoid behaves well with respect to various basic notions. For instance, suppose one has an {n}-fold covering map {\pi: X \rightarrow Y} of one finite groupoid {Y} by another {X}. This means that {\pi} is a functor that is surjective, with all preimages of cardinality {n}, with the property that given any pair {y,y'} in the base space {Y} and any {x} in the preimage {\pi^{-1}(\{y\})} of {y}, every isomorphism {f \in \mathrm{Iso}(y \rightarrow y')} has a unique lift {\tilde f \in \mathrm{Iso}(x \rightarrow x')} from the given initial point {x} (and some {x'} in the preimage of {y'}). Then one can check that the cardinality up to isomorphism of {X} is {n} times the cardinality up to isomorphism of {Y}, which fits well with the geometric picture of {X} as the {n}-fold cover of {Y}. (For instance, if one covers a manifold {M} with finite fundamental group by its universal cover, this is a {|\pi_1(M)|}-fold cover, the base has cardinality {1/|\pi_1(M)|} up to isomorphism, and the universal cover has cardinality one up to isomorphism.) Related to this, if one draws an equivalence class {[\mathrm{x}]} of {X} uniformly up to isomorphism, then {\pi([\mathrm{x}])} will be an equivalence class of {Y} drawn uniformly up to isomorphism also.

Indeed, one can show that this notion of cardinality up to isomorphism for groupoids is uniquely determined by a small number of axioms such as these (similar to the axioms that determine Euler characteristic); see this blog post of Qiaochu Yuan for details.

The probability distributions on isomorphism classes described by the above recipe seem to arise naturally in many applications. For instance, if one draws a profinite abelian group up to isomorphism at random in this fashion (so that each isomorphism class {[G]} of a profinite abelian group {G} occurs with probability inversely proportional to the number of automorphisms of this group), then the resulting distribution is known as the Cohen-Lenstra distribution, and seems to emerge as the natural asymptotic distribution of many randomly generated profinite abelian groups in number theory and combinatorics, such as the class groups of random quadratic fields; see this previous blog post for more discussion. For a simple combinatorial example, the set of fixed points of a random permutation on {n} elements will have a cardinality that converges in distribution to the Poisson distribution of rate {1} (as discussed in this previous post), thus we see that the fixed points of a large random permutation asymptotically are distributed uniformly up to isomorphism. I’ve been told that this notion of cardinality up to isomorphism is also particularly compatible with stacks (which are a good framework to describe such objects as moduli spaces of algebraic varieties up to isomorphism), though I am not sufficiently acquainted with this theory to say much more than this.

Suppose that {A \subset B} are two subgroups of some ambient group {G}, with the index {K := [B:A]} of {A} in {B} being finite. Then {B} is the union of {K} left cosets of {A}, thus {B = SA} for some set {S \subset B} of cardinality {K}. The elements {s} of {S} are not entirely arbitrary with regards to {A}. For instance, if {A} is a normal subgroup of {B}, then for each {s \in S}, the conjugation map {g \mapsto s^{-1} g s} preserves {A}. In particular, if we write {A^s := s^{-1} A s} for the conjugate of {A} by {s}, then

\displaystyle  A = A^s.

Even if {A} is not normal in {B}, it turns out that the conjugation map {g \mapsto s^{-1} g s} approximately preserves {A}, if {K} is bounded. To quantify this, let us call two subgroups {A,B} {K}-commensurate for some {K \geq 1} if one has

\displaystyle  [A : A \cap B], [B : A \cap B] \leq K.

Proposition 1 Let {A \subset B} be groups, with finite index {K = [B:A]}. Then for every {s \in B}, the groups {A} and {A^s} are {K}-commensurate, in fact

\displaystyle  [A : A \cap A^s ] = [A^s : A \cap A^s ] \leq K.

Proof: One can partition {B} into {K} left translates {xA} of {A}, as well as {K} left translates {yA^s} of {A^s}. Combining the partitions, we see that {B} can be partitioned into at most {K^2} non-empty sets of the form {xA \cap yA^s}. Each of these sets is easily seen to be a left translate of the subgroup {A \cap A^s}, thus {[B: A \cap A^s] \leq K^2}. Since

\displaystyle [B: A \cap A^s] = [B:A] [A: A \cap A^s] = [B:A^s] [A^s: A \cap A^s]

and {[B:A] = [B:A^s]=K}, we obtain the claim. \Box

One can replace the inclusion {A \subset B} by commensurability, at the cost of some worsening of the constants:

Corollary 2 Let {A, B} be {K}-commensurate subgroups of {G}. Then for every {s \in B}, the groups {A} and {A^s} are {K^2}-commensurate.

Proof: Applying the previous proposition with {A} replaced by {A \cap B}, we see that for every {s \in B}, {A \cap B} and {(A \cap B)^s} are {K}-commensurate. Since {A \cap B} and {(A \cap B)^s} have index at most {K} in {A} and {A^s} respectively, the claim follows. \Box

It turns out that a similar phenomenon holds for the more general concept of an approximate group, and gives a “classification” of all the approximate groups {B} containing a given approximate group {A} as a “bounded index approximate subgroup”. Recall that a {K}-approximate group {A} in a group {G} for some {K \geq 1} is a symmetric subset of {G} containing the identity, such that the product set {A^2 := \{ a_1 a_2: a_1,a_2 \in A\}} can be covered by at most {K} left translates of {A} (and thus also {K} right translates, by the symmetry of {A}). For simplicity we will restrict attention to finite approximate groups {A} so that we can use their cardinality {A} as a measure of size. We call finite two approximate groups {A,B} {K}-commensurate if one has

\displaystyle  |A^2 \cap B^2| \geq \frac{1}{K} |A|, \frac{1}{K} |B|;

note that this is consistent with the previous notion of commensurability for genuine groups.

Theorem 3 Let {G} be a group, and let {K_1,K_2,K_3 \geq 1} be real numbers. Let {A} be a finite {K_1}-approximate group, and let {B} be a symmetric subset of {G} that contains {A}.

  • (i) If {B} is a {K_2}-approximate group with {|B| \leq K_3 |A|}, then one has {B \subset SA} for some set {S} of cardinality at most {K_1 K_2 K_3}. Furthermore, for each {s \in S}, the approximate groups {A} and {A^s} are {K_1 K_2^5 K_3}-commensurate.
  • (ii) Conversely, if {B \subset SA} for some set {S} of cardinality at most {K_3}, and {A} and {A^s} are {K_2}-commensurate for all {s \in S}, then {|B| \leq K_3 |A|}, and {B} is a {K_1^6 K_2 K_3^2}-approximate group.

Informally, the assertion that {B} is an approximate group containing {A} as a “bounded index approximate subgroup” is equivalent to {B} being covered by a bounded number of shifts {sA} of {A}, where {s} approximately normalises {A^2} in the sense that {A^2} and {(A^2)^s} have large intersection. Thus, to classify all such {B}, the problem essentially reduces to that of classifying those {s} that approximately normalise {A^2}.

To prove the theorem, we recall some standard lemmas from arithmetic combinatorics, which are the foundation stones of the “Ruzsa calculus” that we will use to establish our results:

Lemma 4 (Ruzsa covering lemma) If {A} and {B} are finite non-empty subsets of {G}, then one has {B \subset SAA^{-1}} for some set {S \subset B} with cardinality {|S| \leq \frac{|BA|}{|A|}}.

Proof: We take {S} to be a subset of {B} with the property that the translates {sA, s \in S} are disjoint in {BA}, and such that {S} is maximal with respect to set inclusion. The required properties of {S} are then easily verified. \Box

Lemma 5 (Ruzsa triangle inequality) If {A,B,C} are finite non-empty subsets of {G}, then

\displaystyle  |A C^{-1}| \leq |A B^{-1}| |B C^{-1}| / |B|.

Proof: If {ac^{-1}} is an element of {AC^{-1}} with {a \in A} and {c \in C}, then from the identity {ac^{-1} = (ab^{-1}) (bc^{-1})} we see that {ac^{-1}} can be written as the product of an element of {AB^{-1}} and an element of {BC^{-1}} in at least {|B|} distinct ways. The claim follows. \Box

Now we can prove (i). By the Ruzsa covering lemma, {B} can be covered by at most

\displaystyle \frac{|BA|}{|A|} \leq \frac{|B^2|}{|A|} \leq \frac{K_2 |B|}{|A|} \leq K_2 K_3

left-translates of {A^2}, and hence by at most {K_1 K_2 K_3} left-translates of {A}, thus {B \subset SA} for some {|S| \leq K_1 K_2 K_3}. Since {sA} only intersects {B} if {s \in BA}, we may assume that

\displaystyle  S \subset BA \subset B^2

and hence for any {s \in S}

\displaystyle  |A^s A| \leq |B^2 A B^2 A| \leq |B^6|

\displaystyle  \leq K_2^5 |B| \leq K_2^5 K_3 |A|.

By the Ruzsa covering lemma again, this implies that {A^s} can be covered by at most {K_2^5 K_3} left-translates of {A^2}, and hence by at most {K_1 K_2^5 K_3} left-translates of {A}. By the pigeonhole principle, there thus exists a group element {g} with

\displaystyle  |A^s \cap gA| \geq \frac{1}{K_1 K_2^5 K_3} |A|.

Since

\displaystyle  |A^s \cap gA| \leq | (A^s \cap gA)^{-1} (A^s \cap gA)|

and

\displaystyle  (A^s \cap gA)^{-1} (A^s \cap gA) \subset A^2 \cap (A^s)^2

the claim follows.

Now we prove (ii). Clearly

\displaystyle  |B| \leq |S| |A| \leq K_3 |A|.

Now we control the size of {B^2 A}. We have

\displaystyle  |B^2 A| \leq |SA SA^2| \leq K_3^2 \sup_{s \in S} |A s A^2| = K_3^2 \sup_{s \in S} |A^s A^2|.

From the Ruzsa triangle inequality and symmetry we have

\displaystyle  |A^s A^2| \leq \frac{ |A^s (A^2 \cap (A^2)^s)| |(A^2 \cap (A^2)^s) A^2|}{|A^2 \cap (A^2)^s|}

\displaystyle  \leq \frac{ |(A^3)^s| |A^4| }{|A|/K_2}

\displaystyle  \leq K_2 \frac{ |A^3| |A^4| }{|A|}

\displaystyle  \leq K_1^5 K_2 |A|

and thus

\displaystyle  |B^2 A| \leq K_1^5 K_2 K_3^2 |A|.

By the Ruzsa covering lemma, this implies that {B^2} is covered by at most {K_1^5 K_2 K_3^2} left-translates of {A^2}, hence by at most {K_1^6 K_2 K_3^2} left-translates of {A}. Since {A \subset B}, the claim follows.

We now establish some auxiliary propositions about commensurability of approximate groups. The first claim is that commensurability is approximately transitive:

Proposition 6 Let {A} be a {K_1}-approximate group, {B} be a {K_2}-approximate group, and {C} be a {K_3}-approximate group. If {A} and {B} are {K_4}-commensurate, and {B} and {C} are {K_5}-commensurate, then {A} and {C} are {K_1^2 K_2^3 K_2^3 K_4 K_5 \max(K_1,K_3)}-commensurate.

Proof: From two applications of the Ruzsa triangle inequality we have

\displaystyle  |AC| \leq \frac{|A (A^2 \cap B^2)| |(A^2 \cap B^2) (B^2 \cap C^2)| |(B^2 \cap C^2) C|}{|A^2 \cap B^2| |B^2 \cap C^2|}

\displaystyle  \leq \frac{|A^3| |B^4| |C^3|}{ (|A|/K_4) (|B|/K_5)}

\displaystyle  \leq K_4 K_5 \frac{K_1^2 |A| K_2^3 |B| K_3^2 |C|}{ |A| |B| }

\displaystyle  = K_1^2 K_2^3 K_3^2 K_4 K_5 |C|.

By the Ruzsa covering lemma, we may thus cover {A} by at most {K_1^2 K_2^3 K_3^2 K_4 K_5} left-translates of {C^2}, and hence by {K_1^2 K_2^3 K_3^3 K_4 K_5} left-translates of {C}. By the pigeonhole principle, there thus exists a group element {g} such that

\displaystyle  |A \cap gC| \geq \frac{1}{K_1^2 K_2^3 K_3^3 K_4 K_5} |A|,

and so by arguing as in the proof of part (i) of the theorem we have

\displaystyle  |A^2 \cap C^2| \geq \frac{1}{K_1^2 K_2^3 K_3^3 K_4 K_5} |A|

and similarly

\displaystyle  |A^2 \cap C^2| \geq \frac{1}{K_1^3 K_2^3 K_3^2 K_4 K_5} |C|

and the claim follows. \Box

The next proposition asserts that the union and (modified) intersection of two commensurate approximate groups is again an approximate group:

Proposition 7 Let {A} be a {K_1}-approximate group, {B} be a {K_2}-approximate group, and suppose that {A} and {B} are {K_3}-commensurate. Then {A \cup B} is a {K_1 + K_2 + K_1^2 K_2^4 K_3 + K_1^4 K_2^2 K_3}-approximate subgroup, and {A^2 \cap B^2} is a {K_1^6 K_2^3 K_3}-approximate subgroup.

Using this proposition, one may obtain a variant of the previous theorem where the containment {A \subset B} is replaced by commensurability; we leave the details to the interested reader.

Proof: We begin with {A \cup B}. Clearly {A \cup B} is symmetric and contains the identity. We have {(A \cup B)^2 = A^2 \cup AB \cup BA \cup B^2}. The set {A^2} is already covered by {K_1} left translates of {A}, and hence of {A \cup B}; similarly {B^2} is covered by {K_2} left translates of {A \cup B}. As for {AB}, we observe from the Ruzsa triangle inequality that

\displaystyle  |AB^2| \leq \frac{|A (A^2 \cap B^2)| |(A^2 \cap B^2) B^2|}{|A^2 \cap B^2|}

\displaystyle  \leq \frac{|A^3| |B^4|}{|A|/K_3}

\displaystyle  \leq K_1^2 K_2^3 K_3 |B|

and hence by the Ruzsa covering lemma, {AB} is covered by at most {K_1^2 K_2^3 K_3} left translates of {B^2}, and hence by {K_1^2 K_2^4 K_3} left translates of {B}, and hence of {A \cup B}. Similarly {BA} is covered by at most {K_1^4 K_2^2 K_3} left translates of {B}. The claim follows.

Now we consider {A^2 \cap B^2}. Again, this is clearly symmetric and contains the identity. Repeating the previous arguments, we see that {A} is covered by at most {K_1^2 K_2^3 K_3} left-translates of {B}, and hence there exists a group element {g} with

\displaystyle  |A \cap gB| \geq \frac{1}{K_1^2 K_2^3 K_3} |A|.

Now observe that

\displaystyle  |(A^2 \cap B^2)^2 (A \cap gB)| \leq |A^5| \leq K_1^4 |A|

and so by the Ruzsa covering lemma, {(A^2 \cap B^2)^2} can be covered by at most {K_1^6 K_2^3 K_3} left-translates of {(A \cap gB) (A \cap gB)^{-1}}. But this latter set is (as observed previously) contained in {A^2 \cap B^2}, and the claim follows. \Box

Because of Euler’s identity {e^{\pi i} + 1 = 0}, the complex exponential is not injective: {e^{z + 2\pi i k} = e^z} for any complex {z} and integer {k}. As such, the complex logarithm {z \mapsto \log z} is not well-defined as a single-valued function from {{\bf C} \backslash \{0\}} to {{\bf C}}. However, after making a branch cut, one can create a branch of the logarithm which is single-valued. For instance, after removing the negative real axis {(-\infty,0]}, one has the standard branch {\hbox{Log}: {\bf C} \backslash (-\infty,0] \rightarrow \{ z \in {\bf C}: |\hbox{Im} z| < \pi \}} of the logarithm, with {\hbox{Log}(z)} defined as the unique choice of the complex logarithm of {z} whose imaginary part has magnitude strictly less than {\pi}. This particular branch has a number of useful additional properties:

  • The standard branch {\hbox{Log}} is holomorphic on its domain {{\bf C} \backslash (-\infty,0]}.
  • One has {\hbox{Log}( \overline{z} ) = \overline{ \hbox{Log}(z) }} for all {z} in the domain {{\bf C} \backslash (-\infty,0]}. In particular, if {z \in {\bf C} \backslash (-\infty,0]} is real, then {\hbox{Log} z} is real.
  • One has {\hbox{Log}( z^{-1} ) = - \hbox{Log}(z)} for all {z} in the domain {{\bf C} \backslash (-\infty,0]}.

One can then also use the standard branch of the logarithm to create standard branches of other multi-valued functions, for instance creating a standard branch {z \mapsto \exp( \frac{1}{2} \hbox{Log} z )} of the square root function. We caution however that the identity {\hbox{Log}(zw) = \hbox{Log}(z) + \hbox{Log}(w)} can fail for the standard branch (or indeed for any branch of the logarithm).

One can extend this standard branch of the logarithm to {n \times n} complex matrices, or (equivalently) to linear transformations {T: V \rightarrow V} on an {n}-dimensional complex vector space {V}, provided that the spectrum of that matrix or transformation avoids the branch cut {(-\infty,0]}. Indeed, from the spectral theorem one can decompose any such {T: V \rightarrow V} as the direct sum of operators {T_\lambda: V_\lambda \rightarrow V_\lambda} on the non-trivial generalised eigenspaces {V_\lambda} of {T}, where {\lambda \in {\bf C} \backslash (-\infty,0]} ranges in the spectrum of {T}. For each component {T_\lambda} of {T}, we define

\displaystyle  \hbox{Log}( T_\lambda ) = P_\lambda( T_\lambda )

where {P_\lambda} is the Taylor expansion of {\hbox{Log}} at {\lambda}; as {T_\lambda-\lambda} is nilpotent, only finitely many terms in this Taylor expansion are required. The logarithm {\hbox{Log} T} is then defined as the direct sum of the {\hbox{Log} T_\lambda}.

The matrix standard branch of the logarithm has many pleasant and easily verified properties (often inherited from their scalar counterparts), whenever {T: V \rightarrow V} has no spectrum in {(-\infty,0]}:

  • (i) We have {\exp( \hbox{Log} T ) = T}.
  • (ii) If {T_1: V_1 \rightarrow V_1} and {T_2: V_2 \rightarrow V_2} have no spectrum in {(-\infty,0]}, then {\hbox{Log}( T_1 \oplus T_2 ) = \hbox{Log}(T_1) \oplus \hbox{Log}(T_2)}.
  • (iii) If {T} has spectrum in a closed disk {B(z,r)} in {{\bf C} \backslash (-\infty,0]}, then {\hbox{Log}(T) = P_z(T)}, where {P_z} is the Taylor series of {\hbox{Log}} around {z} (which is absolutely convergent in {B(z,r)}).
  • (iv) {\hbox{Log}(T)} depends holomorphically on {T}. (Easily established from (ii), (iii), after covering the spectrum of {T} by disjoint disks; alternatively, one can use the Cauchy integral representation {\hbox{Log}(T) = \frac{1}{2\pi i} \int_\gamma \hbox{Log}(z)(z-T)^{-1}\ dz} for a contour {\gamma} in the domain enclosing the spectrum of {T}.) In particular, the standard branch of the matrix logarithm is smooth.
  • (v) If {S: V \rightarrow W} is any invertible linear or antilinear map, then {\hbox{Log}( STS^{-1} ) = S \hbox{Log}(T) S^{-1}}. In particular, the standard branch of the logarithm commutes with matrix conjugations; and if {T} is real with respect to a complex conjugation operation on {V} (that is to say, an antilinear involution), then {\hbox{Log}(T)} is real also.
  • (vi) If {T^*: V^* \rightarrow V^*} denotes the transpose of {T} (with {V^*} the complex dual of {V}), then {\hbox{Log}(T^*) = \hbox{Log}(T)^*}. Similarly, if {T^\dagger: V^\dagger \rightarrow V^\dagger} denotes the adjoint of {T} (with {V^\dagger} the complex conjugate of {V^*}, i.e. {V^*} with the conjugated multiplication map {(c,z) \mapsto \overline{c} z}), then {\hbox{Log}(T^\dagger) = \hbox{Log}(T)^\dagger}.
  • (vii) One has {\hbox{Log}(T^{-1}) = - \hbox{Log}( T )}.
  • (viii) If {\sigma(T)} denotes the spectrum of {T}, then {\sigma(\hbox{Log} T) = \hbox{Log} \sigma(T)}.

As a quick application of the standard branch of the matrix logarithm, we have

Proposition 1 Let {G} be one of the following matrix groups: {GL_n({\bf C})}, {GL_n({\bf R})}, {U_n({\bf C})}, {O(Q)}, {Sp_{2n}({\bf C})}, or {Sp_{2n}({\bf R})}, where {Q: {\bf R}^n \rightarrow {\bf R}} is a non-degenerate real quadratic form (so {O(Q)} is isomorphic to a (possibly indefinite) orthogonal group {O(k,n-k)} for some {0 \leq k \leq n}. Then any element {T} of {G} whose spectrum avoids {(-\infty,0]} is exponential, that is to say {T = \exp(X)} for some {X} in the Lie algebra {{\mathfrak g}} of {G}.

Proof: We just prove this for {G=O(Q)}, as the other cases are similar (or a bit simpler). If {T \in O(Q)}, then (viewing {T} as a complex-linear map on {{\bf C}^n}, and using the complex bilinear form associated to {Q} to identify {{\bf C}^n} with its complex dual {({\bf C}^n)^*}, then {T} is real and {T^{*-1} = T}. By the properties (v), (vi), (vii) of the standard branch of the matrix logarithm, we conclude that {\hbox{Log} T} is real and {- \hbox{Log}(T)^* = \hbox{Log}(T)}, and so {\hbox{Log}(T)} lies in the Lie algebra {{\mathfrak g} = {\mathfrak o}(Q)}, and the claim now follows from (i). \Box

Exercise 2 Show that {\hbox{diag}(-\lambda, -1/\lambda)} is not exponential in {GL_2({\bf R})} if {-\lambda \in (-\infty,0) \backslash \{-1\}}. Thus we see that the branch cut in the above proposition is largely necessary. See this paper of Djokovic for a more complete description of the image of the exponential map in classical groups, as well as this previous blog post for some more discussion of the surjectivity (or lack thereof) of the exponential map in Lie groups.

For a slightly less quick application of the standard branch, we have the following result (recently worked out in the answers to this MathOverflow question):

Proposition 3 Let {T} be an element of the split orthogonal group {O(n,n)} which lies in the connected component of the identity. Then {\hbox{det}(1+T) \geq 0}.

The requirement that {T} lie in the identity component is necessary, as the counterexample {T = \hbox{diag}(-\lambda, -1/\lambda )} for {\lambda \in (-\infty,-1) \cup (-1,0)} shows.

Proof: We think of {T} as a (real) linear transformation on {{\bf C}^{2n}}, and write {Q} for the quadratic form associated to {O(n,n)}, so that {O(n,n) \equiv O(Q)}. We can split {{\bf C}^{2n} = V_1 \oplus V_2}, where {V_1} is the sum of all the generalised eigenspaces corresponding to eigenvalues in {(-\infty,0]}, and {V_2} is the sum of all the remaining eigenspaces. Since {T} and {(-\infty,0]} are real, {V_1,V_2} are real (i.e. complex-conjugation invariant) also. For {i=1,2}, the restriction {T_i: V_i \rightarrow V_i} of {T} to {V_i} then lies in {O(Q_i)}, where {Q_i} is the restriction of {Q} to {V_i}, and

\displaystyle  \hbox{det}(1+T) = \hbox{det}(1+T_1) \hbox{det}(1+T_2).

The spectrum of {T_2} consists of positive reals, as well as complex pairs {\lambda, \overline{\lambda}} (with equal multiplicity), so {\hbox{det}(1+T_2) > 0}. From the preceding proposition we have {T_2 = \exp( X_2 )} for some {X_2 \in {\mathfrak o}(Q_2)}; this will be important later.

It remains to show that {\hbox{det}(1+T_1) \geq 0}. If {T_1} has spectrum at {-1} then we are done, so we may assume that {T_1} has spectrum only at {(-\infty,-1) \cup (-1,0)} (being invertible, {T} has no spectrum at {0}). We split {V_1 = V_3 \oplus V_4}, where {V_3,V_4} correspond to the portions of the spectrum in {(-\infty,-1)}, {(-1,0)}; these are real, {T}-invariant spaces. We observe that if {V_\lambda, V_\mu} are generalised eigenspaces of {T} with {\lambda \mu \neq 1}, then {V_\lambda, V_\mu} are orthogonal with respect to the (complex-bilinear) inner product {\cdot} associated with {Q}; this is easiest to see first for the actual eigenspaces (since { \lambda \mu u \cdot v = Tu \cdot Tv = u \cdot v} for all {u \in V_\lambda, v \in V_\mu}), and the extension to generalised eigenvectors then follows from a routine induction. From this we see that {V_1} is orthogonal to {V_2}, and {V_3} and {V_4} are null spaces, which by the non-degeneracy of {Q} (and hence of the restriction {Q_1} of {Q} to {V_1}) forces {V_3} to have the same dimension as {V_4}, indeed {Q} now gives an identification of {V_3^*} with {V_4}. If we let {T_3, T_4} be the restrictions of {T} to {V_3,V_4}, we thus identify {T_4} with {T_3^{*-1}}, since {T} lies in {O(Q)}; in particular {T_3} is invertible. Thus

\displaystyle  \hbox{det}(1+T_1) = \hbox{det}(1 + T_3) \hbox{det}( 1 + T_3^{*-1} ) = \hbox{det}(T_3)^{-1} \hbox{det}(1+T_3)^2

and so it suffices to show that {\hbox{det}(T_3) > 0}.

At this point we need to use the hypothesis that {T} lies in the identity component of {O(n,n)}. This implies (by a continuity argument) that the restriction of {T} to any maximal-dimensional positive subspace has positive determinant (since such a restriction cannot be singular, as this would mean that {T} positive norm vector would map to a non-positive norm vector). Now, as {V_3,V_4} have equal dimension, {Q_1} has a balanced signature, so {Q_2} does also. Since {T_2 = \exp(X_2)}, {T_2} already lies in the identity component of {O(Q_2)}, and so has positive determinant on any maximal-dimensional positive subspace of {V_2}. We conclude that {T_1} has positive determinant on any maximal-dimensional positive subspace of {V_1}.

We choose a complex basis of {V_3}, to identify {V_3} with {V_3^*}, which has already been identified with {V_4}. (In coordinates, {V_3,V_4} are now both of the form {{\bf C}^m}, and {Q( v \oplus w ) = v \cdot w} for {v,w \in {\bf C}^m}.) Then {\{ v \oplus v: v \in V_3 \}} becomes a maximal positive subspace of {V_1}, and the restriction of {T_1} to this subspace is conjugate to {T_3 + T_3^{*-1}}, so that

\displaystyle  \hbox{det}( T_3 + T_3^{*-1} ) > 0.

But since {\hbox{det}( T_3 + T_3^{*-1} ) = \hbox{det}(T_3) \hbox{det}( 1 + T_3^{-1} T_3^{*-1} )} and { 1 + T_3^{-1} T_3^{*-1}} is positive definite, so {\hbox{det}(T_3)>0} as required. \Box

Archives