Joni Teräväinen and myself have just uploaded to the arXiv our preprint “Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions“. This paper makes quantitative the Gowers uniformity estimates on the Möbius function ${\mu}$ and the von Mangoldt function ${\Lambda}$.

To discuss the results we first discuss the situation of the Möbius function, which is technically simpler in some (though not all) ways. We assume familiarity with Gowers norms and standard notations around these norms, such as the averaging notation ${\mathop{\bf E}_{n \in [N]}}$ and the exponential notation ${e(\theta) = e^{2\pi i \theta}}$. The prime number theorem in qualitative form asserts that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) = o(1)$

as ${N \rightarrow \infty}$. With Vinogradov-Korobov error term, the prime number theorem is strengthened to

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} );$

we refer to such decay bounds (With ${\exp(-c\log^c N)}$ type factors) as pseudopolynomial decay. Equivalently, we obtain pseudopolynomial decay of Gowers ${U^1}$ seminorm of ${\mu}$:

$\displaystyle \| \mu \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} ).$

As is well known, the Riemann hypothesis would be equivalent to an upgrade of this estimate to polynomial decay of the form

$\displaystyle \| \mu \|_{U^1([N])} \ll_\varepsilon N^{-1/2+\varepsilon}$

for any ${\varepsilon>0}$.

Once one restricts to arithmetic progressions, the situation gets worse: the Siegel-Walfisz theorem gives the bound

$\displaystyle \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll_A \log^{-A} N \ \ \ \ \ (1)$

for any residue class ${a \hbox{ mod } q}$ and any ${A>0}$, but with the catch that the implied constant is ineffective in ${A}$. This ineffectivity cannot be removed without further progress on the notorious Siegel zero problem.

In 1937, Davenport was able to show the discorrelation estimate

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) e(-\alpha n) \ll_A \log^{-A} N$

for any ${A>0}$ uniformly in ${\alpha \in {\bf R}}$, which leads (by standard Fourier arguments) to the Fourier uniformity estimate

$\displaystyle \| \mu \|_{U^2([N])} \ll_A \log^{-A} N.$

Again, the implied constant is ineffective. If one insists on effective constants, the best bound currently available is

$\displaystyle \| \mu \|_{U^2([N])} \ll \log^{-c} N \ \ \ \ \ (2)$

for some small effective constant ${c>0}$.

For the situation with the ${U^3}$ norm the previously known results were much weaker. Ben Green and I showed that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll_{A,F,G/\Gamma} \log^{-A} N \ \ \ \ \ (3)$

uniformly for any ${A>0}$, any degree two (filtered) nilmanifold ${G/\Gamma}$, any polynomial sequence ${g: {\bf Z} \rightarrow G}$, and any Lipschitz function ${F}$; again, the implied constants are ineffective. On the other hand, in a separate paper of Ben Green and myself, we established the following inverse theorem: if for instance we knew that

$\displaystyle \| \mu \|_{U^3([N])} \geq \delta$

for some ${0 < \delta < 1/2}$, then there exists a degree two nilmanifold ${G/\Gamma}$ of dimension ${O( \delta^{-O(1)} )}$, complexity ${O( \delta^{-O(1)} )}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and Lipschitz function ${F}$ of Lipschitz constant ${O(\delta^{-O(1)})}$ such that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp(-\delta^{-O(1)}).$

Putting the two assertions together and comparing all the dependencies on parameters, one can establish the qualitative decay bound

$\displaystyle \| \mu \|_{U^3([N])} = o(1).$

However the decay rate ${o(1)}$ produced by this argument is completely ineffective: obtaining a bound on when this ${o(1)}$ quantity dips below a given threshold ${\delta}$ depends on the implied constant in (3) for some ${G/\Gamma}$ whose dimension depends on ${\delta}$, and the dependence on ${\delta}$ obtained in this fashion is ineffective in the face of a Siegel zero.

For higher norms ${U^k, k \geq 3}$, the situation is even worse, because the quantitative inverse theory for these norms is poorer, and indeed it was only with the recent work of Manners that any such bound is available at all (at least for ${k>4}$). Basically, Manners establishes if

$\displaystyle \| \mu \|_{U^k([N])} \geq \delta$

then there exists a degree ${k-1}$ nilmanifold ${G/\Gamma}$ of dimension ${O( \delta^{-O(1)} )}$, complexity ${O( \exp\exp(\delta^{-O(1)}) )}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and Lipschitz function ${F}$ of Lipschitz constant ${O(\exp\exp(\delta^{-O(1)}))}$ such that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp\exp(-\delta^{-O(1)}).$

(We allow all implied constants to depend on ${k}$.) Meanwhile, the bound (3) was extended to arbitrary nilmanifolds by Ben and myself. Again, the two results when concatenated give the qualitative decay

$\displaystyle \| \mu \|_{U^k([N])} = o(1)$

but the decay rate is completely ineffective.

Our first result gives an effective decay bound:

Theorem 1 For any ${k \geq 2}$, we have ${\| \mu \|_{U^k([N])} \ll (\log\log N)^{-c_k}}$ for some ${c_k>0}$. The implied constants are effective.

This is off by a logarithm from the best effective bound (2) in the ${k=2}$ case. In the ${k=3}$ case there is some hope to remove this logarithm based on the improved quantitative inverse theory currently available in this case, but there is a technical obstruction to doing so which we will discuss later in this post. For ${k>3}$ the above bound is the best one could hope to achieve purely using the quantitative inverse theory of Manners.

We have analogues of all the above results for the von Mangoldt function ${\Lambda}$. Here a complication arises that ${\Lambda}$ does not have mean close to zero, and one has to subtract off some suitable approximant ${\Lambda^\sharp}$ to ${\Lambda}$ before one would expect good Gowers norms bounds. For the prime number theorem one can just use the approximant ${1}$, giving

$\displaystyle \| \Lambda - 1 \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} )$

but even for the prime number theorem in arithmetic progressions one needs a more accurate approximant. In our paper it is convenient to use the “Cramér approximant”

$\displaystyle \Lambda_{\hbox{Cram\'er}}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}$

where

$\displaystyle W := \prod_{p

and ${Q}$ is the quasipolynomial quantity

$\displaystyle Q = \exp(\log^{1/10} N). \ \ \ \ \ (4)$

Then one can show from the Siegel-Walfisz theorem and standard bilinear sum methods that

$\displaystyle \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Cram\'er}}(n)) e(-\alpha n) \ll_A \log^{-A} N$

and

$\displaystyle \| \Lambda - \Lambda_{\hbox{Cram\'er}}\|_{U^2([N])} \ll_A \log^{-A} N$

for all ${A>0}$ and ${\alpha \in {\bf R}}$ (with an ineffective dependence on ${A}$), again regaining effectivity if ${A}$ is replaced by a sufficiently small constant ${c>0}$. All the previously stated discorrelation and Gowers uniformity results for ${\mu}$ then have analogues for ${\Lambda}$, and our main result is similarly analogous:

Theorem 2 For any ${k \geq 2}$, we have ${\| \Lambda - \Lambda_{\hbox{Cram\'er}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}}$ for some ${c_k>0}$. The implied constants are effective.

By standard methods, this result also gives quantitative asymptotics for counting solutions to various systems of linear equations in primes, with error terms that gain a factor of ${O((\log\log N)^{-c})}$ with respect to the main term.

We now discuss the methods of proof, focusing first on the case of the Möbius function. Suppose first that there is no “Siegel zero”, by which we mean a quadratic character ${\chi}$ of some conductor ${q \leq Q}$ with a zero ${L(\beta,\chi)}$ with ${1 - \beta \leq \frac{c}{\log Q}}$ for some small absolute constant ${c>0}$. In this case the Siegel-Walfisz bound (1) improves to a quasipolynomial bound

$\displaystyle \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N). \ \ \ \ \ (5)$

To establish Theorem 1 in this case, it suffices by Manners’ inverse theorem to establish the polylogarithmic bound

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (6)$

for all degree ${k-1}$ nilmanifolds ${G/\Gamma}$ of dimension ${O((\log\log N)^c)}$ and complexity ${O( \exp(\log^c N))}$, all polynomial sequences ${g}$, and all Lipschitz functions ${F}$ of norm ${O( \exp(\log^c N))}$. If the nilmanifold ${G/\Gamma}$ had bounded dimension, then one could repeat the arguments of Ben and myself more or less verbatim to establish this claim from (5), which relied on the quantitative equidistribution theory on nilmanifolds developed in a separate paper of Ben and myself. Unfortunately, in the latter paper the dependence of the quantitative bounds on the dimension ${d}$ was not explicitly given. In an appendix to the current paper, we go through that paper to account for this dependence, showing that all exponents depend at most doubly exponentially in the dimension ${d}$, which is barely sufficient to handle the dimension of ${O((\log\log N)^c)}$ that arises here.

Now suppose we have a Siegel zero ${L(\beta,\chi)}$. In this case the bound (5) will not hold in general, and hence also (6) will not hold either. Here, the usual way out (while still maintaining effective estimates) is to approximate ${\mu}$ not by ${0}$, but rather by a more complicated approximant ${\mu_{\hbox{Siegel}}}$ that takes the Siegel zero into account, and in particular is such that one has the (effective) pseudopolynomial bound

$\displaystyle \| (\mu - \mu_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N) \ \ \ \ \ (7)$

for all residue classes ${a \hbox{ mod } q}$. The Siegel approximant to ${\mu}$ is actually a little bit complicated, and to our knowledge the first appearance of this sort of approximant only appears as late as this 2010 paper of Germán and Katai. Our version of this approximant is defined as the multiplicative function such that

$\displaystyle \mu_{\hbox{Siegel}}(p^j) = \mu(p^j)$

when ${p < Q}$, and

$\displaystyle \mu_{\hbox{Siegel}}(n) = \alpha n^{\beta-1} \chi(n)$

when ${n}$ is coprime to all primes ${p, and ${\alpha}$ is a normalising constant given by the formula

$\displaystyle \alpha := \frac{1}{L'(\beta,\chi)} \prod_{p

(this constant ends up being of size ${O(1)}$ and plays only a minor role in the analysis). This is a rather complicated formula, but it seems to be virtually the only choice of approximant that allows for bounds such as (7) to hold. (This is the one aspect of the problem where the von Mangoldt theory is simpler than the Möbius theory, as in the former one only needs to work with very rough numbers for which one does not need to make any special accommodations for the behavior at small primes when introducing the Siegel correction term.) With this starting point it is then possible to repeat the analysis of my previous papers with Ben and obtain the pseudopolynomial discorrelation bound

$\displaystyle \mathop{\bf E}_{n \in [N]} (\mu - \mu_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N)$

for ${F(g(n)\Gamma)}$ as before, which when combined with Manners’ inverse theorem gives the doubly logarithmic bound

$\displaystyle \| \mu - \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.$

Meanwhile, a direct sieve-theoretic computation ends up giving the singly logarithmic bound

$\displaystyle \| \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N$

(indeed, there is a good chance that one could improve the bounds even further, though it is not helpful for this current argument to do so). Theorem 1 then follows from the triangle inequality for the Gowers norm. It is interesting that the Siegel approximant ${\mu_{\hbox{Siegel}}}$ seems to play a rather essential component in the proof, even if it is absent in the final statement. We note that this approximant seems to be a useful tool to explore the “illusory world” of the Siegel zero further; see for instance the recent paper of Chinis for some work in this direction.

For the analogous problem with the von Mangoldt function (assuming a Siegel zero for sake of discussion), the approximant ${\Lambda_{\hbox{Siegel}}}$ is simpler; we ended up using

$\displaystyle \Lambda_{\hbox{Siegel}}(n) = \Lambda_{\hbox{Cram\'er}}(n) (1 - n^{\beta-1} \chi(n))$

which allows one to state the standard prime number theorem in arithmetic progressions with classical error term and Siegel zero term compactly as

$\displaystyle \| (\Lambda - \Lambda_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N).$

Routine modifications of previous arguments also give

$\displaystyle \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (8)$

and

$\displaystyle \| \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N.$

The one tricky new step is getting from the discorrelation estimate (8) to the Gowers uniformity estimate

$\displaystyle \| \Lambda - \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.$

One cannot directly apply Manners’ inverse theorem here because ${\Lambda}$ and ${\Lambda_{\hbox{Siegel}}}$ are unbounded. There is a standard tool for getting around this issue, now known as the dense model theorem, which is the standard engine powering the transference principle from theorems about bounded functions to theorems about certain types of unbounded functions. However the quantitative versions of the dense model theorem in the literature are expensive and would basically weaken the doubly logarithmic gain here to a triply logarithmic one. Instead, we bypass the dense model theorem and directly transfer the inverse theorem for bounded functions to an inverse theorem for unbounded functions by using the densification approach to transference introduced by Conlon, Fox, and Zhao. This technique turns out to be quantitatively quite efficient (the dependencies of the main parameters in the transference are polynomial in nature), and also has the technical advantage of avoiding the somewhat tricky “correlation condition” present in early transference results which are also not beneficial for quantitative bounds.

In principle, the above results can be improved for ${k=3}$ due to the stronger quantitative inverse theorems in the ${U^3}$ setting. However, there is a bottleneck that prevents us from achieving this, namely that the equidistribution theory of two-step nilmanifolds has exponents which are exponential in the dimension rather than polynomial in the dimension, and as a consequence we were unable to improve upon the doubly logarithmic results. Specifically, if one is given a sequence of bracket quadratics such as ${\lfloor \alpha_1 n \rfloor \beta_1 n, \dots, \lfloor \alpha_d n \rfloor \beta_d n}$ that fails to be ${\delta}$-equidistributed, one would need to establish a nontrivial linear relationship modulo 1 between the ${\alpha_1,\beta_1,\dots,\alpha_d,\beta_d}$ (up to errors of ${O(1/N)}$), where the coefficients are of size ${O(\delta^{-d^{O(1)}})}$; current methods only give coefficient bounds of the form ${O(\delta^{-\exp(d^{O(1)})})}$. An old result of Schmidt demonstrates proof of concept that these sorts of polynomial dependencies on exponents is possible in principle, but actually implementing Schmidt’s methods here seems to be a quite non-trivial task. There is also another possible route to removing a logarithm, which is to strengthen the inverse ${U^3}$ theorem to make the dimension of the nilmanifold logarithmic in the uniformity parameter ${\delta}$ rather than polynomial. Again, the Freiman-Bilu theorem (see for instance this paper of Ben and myself) demonstrates proof of concept that such an improvement in dimension is possible, but some work would be needed to implement it.

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than ${1}$ appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer ${t \geq 2}$, there are at most ${O(1)}$ solutions to the equation

$\displaystyle \binom{n}{m} = t \ \ \ \ \ (1)$

with ${1 \leq m < n}$. Currently, the largest number of solutions that is known to be attainable is eight, with ${t}$ equal to

$\displaystyle 3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}$

$\displaystyle = \binom{78}{76} = \binom{3003}{3002}.$

Because of the symmetry ${\binom{n}{m} = \binom{n}{n-m}}$ of Pascal’s triangle it is natural to restrict attention to the left half ${1 \leq m \leq n/2}$ of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (1) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)$

and hence at most four in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).$

Also, there is at most one solution in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).$

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

$\displaystyle 2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)$

(or equivalently ${n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}$); we have deleted the ${m=1}$ case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for ${t}$ sufficiently large there are no further collisions ${\binom{n}{m} = \binom{n'}{m'}=t}$ for ${(n,m), (n',m')}$ in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large ${t}$. This is latter claim known for bounded values of ${m,m'}$ by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

$\displaystyle \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)$

coming from ${n = F_{2j+2} F_{2j+3}-1}$, ${m = F_{2j} F_{2j+3}-1}$ and ${F_j}$ is the ${j^{th}}$ Fibonacci number.

The appearance of the quantity ${\exp( \log^{2/3+\varepsilon} n )}$ in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view ${n,m}$ primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

$\displaystyle \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.$

One can use this equation to solve for ${n}$ in terms of ${m,t}$ as

$\displaystyle n = f_t(m)$

for a certain real analytic function ${f_t}$ whose asymptotics are easily computable (for instance one has the asymptotic ${f_t(m) \asymp m t^{1/m}}$). One can then view the problem as one of trying to control the number of lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$. Here we can take advantage of the fact that in the regime ${m \leq f_t(m)/2}$ (which corresponds to working in the left half ${m \leq n/2}$ of Pascal’s triangle), the function ${f_t}$ can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of ${f_t}$ (in fact one can show that ${f''_t(m) \asymp f_t(m) (\log t/m^2)^2}$). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$, basically because the area subtended by the triangle connecting three of these points would lie between ${0}$ and ${1/2}$, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let ${\varepsilon>0}$, and suppose ${t}$ is sufficiently large depending on ${\varepsilon}$. If ${(m,n)}$ is a solution to (1) in the left half ${m \leq n/2}$ of Pascal’s triangle, then there is at most one other solution ${(m',n')}$ to this equation in the left half with

$\displaystyle |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).$

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions ${(m,n), (m',n')}$ to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the ${p}$-adic valuations ${\nu_p( \binom{n}{m} )}$ of the binomial coefficients, defined as the number of times a prime ${p}$ divides ${\binom{n}{m}}$. From the fundamental theorem of arithmetic, a collision

$\displaystyle \binom{n}{m} = \binom{n'}{m'}$

between binomial coefficients occurs if and only if one has agreement of valuations

$\displaystyle \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)$

From the Legendre formula

$\displaystyle \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor$

we can rewrite this latter identity (5) as

$\displaystyle \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)$

where ${\{x\} := x - \lfloor x\rfloor}$ denotes the fractional part of ${x}$. (These sums are not truly infinite, because the summands vanish once ${p^j}$ is larger than ${\max(n,n')}$.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing ${p}$ as a prime drawn randomly from an interval such as ${[P, P + P \log^{-100} P]}$ for some suitably chosen scale parameter ${P}$, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if ${n}$ and ${n'}$ are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part ${\{ \frac{n}{p}\}}$, since this term shows up in the summation on the left-hand side but not the right. Similarly if ${m}$ and ${m'}$ are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if ${n'-m'}$ is a rational multiple of ${m}$ where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

$\displaystyle \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )$

for various choices of parameters ${P, N, M, j}$, where ${e(\theta) := e^{2\pi i \theta}}$. Fortunately, the methods of Vinogradov (which more generally can handle sums such as ${\sum_{n \in I} e(f(n))}$ and ${\sum_{p \in I} e(f(p))}$ for various analytic functions ${f}$) can give useful bounds on such sums as long as ${N}$ and ${M}$ are not too large compared to ${P}$; more specifically, Vinogradov’s estimates are non-trivial in the regime ${N,M \ll \exp( \log^{3/2-\varepsilon} P )}$, and this ultimately leads to a distance bound

$\displaystyle m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

between any colliding pair ${(n,m), (n',m')}$ in the left half of Pascal’s triangle, as well as the variant bound

$\displaystyle n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

$\displaystyle m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).$

Comparing these bounds with Proposition 2 and using some basic estimates about the function ${f_t}$, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

$\displaystyle (n)_m = t \ \ \ \ \ (7)$

where ${(n)_m := n (n-1) \dots (n-m+1)}$ is the falling factorial:

Theorem 3 If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (7) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)$

Again the upper bound of two is best possible, thanks to identities such as

$\displaystyle (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.$

I’m collecting in this blog post a number of simple group-theoretic lemmas, all of the following flavour: if ${H}$ is a subgroup of some product ${G_1 \times \dots \times G_k}$ of groups, then one of three things has to happen:

• (${H}$ too small) ${H}$ is contained in some proper subgroup ${G'_1 \times \dots \times G'_k}$ of ${G_1 \times \dots \times G_k}$, or the elements of ${H}$ are constrained to some sort of equation that the full group ${G_1 \times \dots \times G_k}$ does not satisfy.
• (${H}$ too large) ${H}$ contains some non-trivial normal subgroup ${N_1 \times \dots \times N_k}$ of ${G_1 \times \dots \times G_k}$, and as such actually arises by pullback from some subgroup of the quotient group ${G_1/N_1 \times \dots \times G_k/N_k}$.
• (Structure) There is some useful structural relationship between ${H}$ and the groups ${G_1,\dots,G_k}$.
These sorts of lemmas show up often in ergodic theory, when the equidistribution of some orbit is governed by some unspecified subgroup ${H}$ of a product group ${G_1 \times \dots \times G_k}$, and one needs to know further information about this subgroup in order to take the analysis further. In some cases only two of the above three options are relevant. In the cases where ${H}$ is too “small” or too “large” one can reduce the groups ${G_1,\dots,G_k}$ to something smaller (either a subgroup or a quotient) and in applications one can often proceed in this case by some induction on the “size” of the groups ${G_1,\dots,G_k}$ (for instance, if these groups are Lie groups, one can often perform an induction on dimension), so it is often the structured case which is the most interesting case to deal with.

It is perhaps easiest to explain the flavour of these lemmas with some simple examples, starting with the ${k=1}$ case where we are just considering subgroups ${H}$ of a single group ${G}$.

Lemma 1 Let ${H}$ be a subgroup of a group ${G}$. Then exactly one of the following hold:
• (i) (${H}$ too small) There exists a non-trivial group homomorphism ${\eta: G \rightarrow K}$ into a group ${K = (K,\cdot)}$ such that ${\eta(h)=1}$ for all ${h \in H}$.
• (ii) (${H}$ normally generates ${G}$) ${G}$ is generated as a group by the conjugates ${gHg^{-1}}$ of ${H}$.

Proof: Let ${G'}$ be the group normally generated by ${H}$, that is to say the group generated by the conjugates ${gHg^{-1}}$ of ${H}$. This is a normal subgroup of ${G}$ containing ${H}$ (indeed it is the smallest such normal subgroup). If ${G'}$ is all of ${G}$ we are in option (ii); otherwise we can take ${K}$ to be the quotient group ${K := G/G'}$ and ${\eta}$ to be the quotient map. Finally, if (i) holds, then all of the conjugates ${gHg^{-1}}$ of ${H}$ lie in the kernel of ${\eta}$, and so (ii) cannot hold. $\Box$

Here is a “dual” to the above lemma:

Lemma 2 Let ${H}$ be a subgroup of a group ${G}$. Then exactly one of the following hold:
• (i) (${H}$ too large) ${H}$ is the pullback ${H = \pi^{-1}(H')}$ of some subgroup ${H'}$ of ${G/N}$ for some non-trivial normal subgroup ${N}$ of ${G}$, where ${\pi: G \rightarrow G/N}$ is the quotient map.
• (ii) (${H}$ is core-free) ${H}$ does not contain any non-trivial conjugacy class ${\{ ghg^{-1}: g \in G \}}$.

Proof: Let ${N}$ be the normal core of ${H}$, that is to say the intersection of all the conjugates ${gHg^{-1}}$ of ${H}$. This is the largest normal subgroup of ${G}$ that is contained in ${H}$. If ${N}$ is non-trivial, we can quotient it out and end up with option (i). If instead ${N}$ is trivial, then there is no non-trivial element ${h}$ that lies in the core, hence no non-trivial conjugacy class lies in ${H}$ and we are in option (ii). Finally, if (i) holds, then every conjugacy class of an element of ${N}$ is contained in ${N}$ and hence in ${H}$, so (ii) cannot hold. $\Box$

For subgroups of nilpotent groups, we have a nice dichotomy that detects properness of a subgroup through abelian representations:

Lemma 3 Let ${H}$ be a subgroup of a nilpotent group ${G}$. Then exactly one of the following hold:
• (i) (${H}$ too small) There exists non-trivial group homomorphism ${\eta: G \rightarrow K}$ into an abelian group ${K = (K,+)}$ such that ${\eta(h)=0}$ for all ${h \in H}$.
• (ii) ${H=G}$.

Informally: if ${h}$ is a variable ranging in a subgroup ${H}$ of a nilpotent group ${G}$, then either ${h}$ is unconstrained (in the sense that it really ranges in all of ${G}$), or it obeys some abelian constraint ${\eta(h)=0}$.

Proof: By definition of nilpotency, the lower central series

$\displaystyle G_2 := [G,G], G_3 := [G,G_2], \dots$

eventually becomes trivial.

Since ${G_2}$ is a normal subgroup of ${G}$, ${HG_2}$ is also a subgroup of ${G}$. Suppose first that ${HG_2}$ is a proper subgroup of ${G}$, then the quotient map ${\eta \colon G \rightarrow G/HG_2}$ is a non-trivial homomorphism to an abelian group ${G/HG_2}$ that annihilates ${H}$, and we are in option (i). Thus we may assume that ${HG_2 = G}$, and thus

$\displaystyle G_2 = [G,G] = [G, HG_2].$

Note that modulo the normal group ${G_3}$, ${G_2}$ commutes with ${G}$, hence

$\displaystyle [G, HG_2] \subset [G,H] G_3 \subset H G_3$

and thus

$\displaystyle G = H G_2 \subset H H G_3 = H G_3.$

We conclude that ${HG_3 = G}$. One can continue this argument by induction to show that ${H G_i = G}$ for every ${i}$; taking ${i}$ large enough we end up in option (ii). Finally, it is clear that (i) and (ii) cannot both hold. $\Box$

Remark 4 When the group ${G}$ is locally compact and ${H}$ is closed, one can take the homomorphism ${\eta}$ in Lemma 3 to be continuous, and by using Pontryagin duality one can also take the target group ${K}$ to be the unit circle ${{\bf R}/{\bf Z}}$. Thus ${\eta}$ is now a character of ${G}$. Similar considerations hold for some of the later lemmas in this post. Discrete versions of this above lemma, in which the group ${H}$ is replaced by some orbit of a polynomial map on a nilmanifold, were obtained by Leibman and are important in the equidistribution theory of nilmanifolds; see this paper of Ben Green and myself for further discussion.

Here is an analogue of Lemma 3 for special linear groups, due to Serre (IV-23):

Lemma 5 Let ${p \geq 5}$ be a prime, and let ${H}$ be a closed subgroup of ${SL_2({\bf Z}_p)}$, where ${{\bf Z}_p}$ is the ring of ${p}$-adic integers. Then exactly one of the following hold:
• (i) (${H}$ too small) There exists a proper subgroup ${H'}$ of ${SL_2({\mathbf F}_p)}$ such that ${h \hbox{ mod } p \in H'}$ for all ${h \in H}$.
• (ii) ${H=SL_2({\bf Z}_p)}$.

Proof: It is a standard fact that the reduction of ${SL_2({\bf Z}_p)}$ mod ${p}$ is ${SL_2({\mathbf F}_p)}$, hence (i) and (ii) cannot both hold.

Suppose that (i) fails, then for every ${g \in SL_2({\bf Z}_p)}$ there exists ${h \in H}$ such that ${h = g \hbox{ mod } p}$, which we write as

$\displaystyle h = g + O(p).$

We now claim inductively that for any ${j \geq 0}$ and ${g \in SL_2({\bf Z}_p)}$, there exists ${h \in SL_2({\bf Z}_p)}$ with ${h = g + O(p^{j+1})}$; taking limits as ${j \rightarrow \infty}$ using the closed nature of ${H}$ will then place us in option (ii).

The case ${j=0}$ is already handled, so now suppose ${j=1}$. If ${g \in SL_2({\bf Z}_p)}$, we see from the ${j=0}$ case that we can write ${g = hg'}$ where ${h \in H}$ and ${g' = 1+O(p)}$. Thus to establish the ${j=1}$ claim it suffices to do so under the additional hypothesis that ${g = 1+O(p)}$.

First suppose that ${g = 1 + pX + O(p^2)}$ for some ${X \in M_2({\bf Z}_p)}$ with ${X^2=0 \hbox{ mod } p}$. By the ${j=0}$ case, we can find ${h \in H}$ of the form ${h = 1 + X + pY + O(p^2)}$ for some ${Y \in M_2({\bf Z}_p)}$. Raising to the ${p^{th}}$ power and using ${X^2=0}$ and ${p \geq 5 > 3}$, we note that

$\displaystyle h^p = 1 + \binom{p}{1} X + \binom{p}{1} pY + \binom{p}{2} X pY + \binom{p}{2} pY X$

$\displaystyle + \binom{p}{3} X pY X + O(p^2)$

$\displaystyle = 1 + pX + O(p^2),$

giving the claim in this case.

Any ${2 \times 2}$ matrix of trace zero with coefficients in ${{\mathbf F}_p}$ is a linear combination of ${\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}}$, ${\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}}$, ${\begin{pmatrix} 1 & 1 \\ -1 & -1 \end{pmatrix}}$ and is thus a sum of matrices that square to zero. Hence, if ${g \in SL_2({\bf Z}_p)}$ is of the form ${g = 1 + O(p)}$, then ${g = 1 + pY + O(p^2)}$ for some matrix ${Y}$ of trace zero, and thus one can write ${g}$ (up to ${O(p^2)}$ errors) as the finite product of matrices of the form ${1 + pY + O(p^2)}$ with ${Y^2=0}$. By the previous arguments, such a matrix ${1+pY + O(p^2)}$ lies in ${H}$ up to ${O(p^2)}$ errors, and hence ${g}$ does also. This completes the proof of the ${j=1}$ case.

Now suppose ${j \geq 2}$ and the claim has already been proven for ${j-1}$. Arguing as before, it suffices to close the induction under the additional hypothesis that ${g = 1 + O(p^j)}$, thus we may write ${g = 1 + p^j X + O(p^{j+1})}$. By induction hypothesis, we may find ${h \in H}$ with ${h = 1 + p^{j-1} X + O(p^j)}$. But then ${h^p = 1 + p^j X + O(p^{j+1})}$, and we are done. $\Box$

We note a generalisation of Lemma 3 that involves two groups ${G_1,G_2}$ rather than just one:

Lemma 6 Let ${H}$ be a subgroup of a product ${G_1 \times G_2}$ of two nilpotent groups ${G_1, G_2}$. Then exactly one of the following hold:
• (i) (${H}$ too small) There exists group homomorphisms ${\eta_1: G'_1 \rightarrow K}$, ${\eta_2: G_2 \rightarrow K}$ into an abelian group ${K = (K,+)}$, with ${\eta_2}$ non-trivial, such that ${\eta_1(h_1) + \eta_2(h_2)=0}$ for all ${(h_1,h_2) \in H}$, where ${G'_1 := \{ h_1: (h_1,h_2) \in H \hbox{ for some } h_2 \in G_2 \}}$ is the projection of ${H}$ to ${G_1}$.
• (ii) ${H = G'_1 \times G_2}$ for some subgroup ${G'_1}$ of ${G_2}$.

Proof: Consider the group ${\{ h_2 \in G_2: (1,h_2) \in H \}}$. This is a subgroup of ${G_2}$. If it is all of ${G_2}$, then ${H}$ must be a Cartesian product ${H = G'_1 \times G_2}$ and option (ii) holds. So suppose that this group is a proper subgroup of ${G_2}$. Applying Lemma 3, we obtain a non-trivial group homomorphism ${\eta_2: G_2 \rightarrow K}$ into an abelian group ${K = (K,+)}$ such that ${\eta(h_2)=0}$ whenever ${(1,h_2) \in H}$. For any ${h_1}$ in the projection ${G'_1}$ of ${H}$ to ${G_1}$, there is thus a unique quantity ${\eta_1(h_1) \in H}$ such that ${\eta_1(h_1) + \eta_2(h_2) = 0}$ whenever ${(h_1,h_2) \in H}$. One easily checks that ${\eta_1}$ is a homomorphism, so we are in option (i).

Finally, it is clear that (i) and (ii) cannot both hold, since (i) places a non-trivial constraint on the second component ${h_2}$ of an element ${(h_1,h_2) \in H}$ of ${H}$ for any fixed choice of ${h_1}$. $\Box$

We also note a similar variant of Lemma 5, which is Lemme 10 of this paper of Serre:

Lemma 7 Let ${p \geq 5}$ be a prime, and let ${H}$ be a closed subgroup of ${SL_2({\bf Z}_p) \times SL_2({\bf Z}_p)}$. Then exactly one of the following hold:
• (i) (${H}$ too small) There exists a proper subgroup ${H'}$ of ${SL_2({\mathbf F}_p) \times SL_2({\mathbf F}_p)}$ such that ${h \hbox{ mod } p \in H'}$ for all ${h \in H}$.
• (ii) ${H=SL_2({\bf Z}_p) \times SL_2({\bf Z}_p)}$.

Proof: As in the proof of Lemma 5, (i) and (ii) cannot both hold. Suppose that (i) does not hold, then for any ${g \in SL_2({\bf Z}_p)}$ there exists ${h_1 \in H}$ such that ${h_1 = (g+O(p), 1 + O(p))}$. Similarly, there exists ${h_0 \in H}$ with ${h_0 = (1+O(p), 1+O(p))}$. Taking commutators of ${h_1}$ and ${h_0}$, we can find ${h_2 \in H}$ with ${h_2 = (g+O(p), 1+O(p^2))}$. Continuing to take commutators with ${h_0}$ and extracting a limit (using compactness and the closed nature of ${H}$), we can find ${h_\infty \in H}$ with ${h_\infty = (g+O(p),1)}$. Thus, the closed subgroup ${\{ g \in SL_2({\bf Z}_p): (g,1) \in H \}}$ of ${SL_2({\bf Z}_p)}$ does not obey conclusion (i) of Lemma 5, and must therefore obey conclusion (ii); that is to say, ${H}$ contains ${SL_2({\bf Z}_p) \times \{1\}}$. Similarly ${H}$ contains ${\{1\} \times SL_2({\bf Z}_p)}$; multiplying, we end up in conclusion (ii). $\Box$

The most famous result of this type is of course the Goursat lemma, which we phrase here in a somewhat idiosyncratic manner to conform to the pattern of the other lemmas in this post:

Lemma 8 (Goursat lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2}$ of two groups ${G_1, G_2}$. Then one of the following hold:
• (i) (${H}$ too small) ${H}$ is contained in ${G'_1 \times G'_2}$ for some subgroups ${G'_1}$, ${G'_2}$ of ${G_1, G_2}$ respectively, with either ${G'_1 \subsetneq G_1}$ or ${G'_2 \subsetneq G_2}$ (or both).
• (ii) (${H}$ too large) There exist normal subgroups ${N_1, N_2}$ of ${G_1, G_2}$ respectively, not both trivial, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1/N_1 \times G_2/N_2}$, where ${\pi: G_1 \times G_2 \rightarrow G_1/N_1 \times G_2/N_2}$ is the quotient map.
• (iii) (Isomorphism) There is a group isomorphism ${\phi: G_1 \rightarrow G_2}$ such that ${H = \{ (g_1, \phi(g_1)): g_1 \in G_1\}}$ is the graph of ${\phi}$. In particular, ${G_1}$ and ${G_2}$ are isomorphic.

Here we almost have a trichotomy, because option (iii) is incompatible with both option (i) and option (ii). However, it is possible for options (i) and (ii) to simultaneously hold.

Proof: If either of the projections ${\pi_1: H \rightarrow G_1}$, ${\pi_2: H \rightarrow G_2}$ from ${H}$ to the factor groups ${G_1,G_2}$ (thus ${\pi_1(h_1,h_2)=h_1}$ and ${\pi_2(h_1,h_2)=h_2}$ fail to be surjective, then we are in option (i). Thus we may assume that these maps are surjective.

Next, if either of the maps ${\pi_1: H \rightarrow G_1}$, ${\pi_2: H \rightarrow G_2}$ fail to be injective, then at least one of the kernels ${N_1 \times \{1\} := \mathrm{ker} \pi_2}$, ${\{1\} \times N_2 := \mathrm{ker} \pi_1}$ is non-trivial. We can then descend down to the quotient ${G_1/N_1 \times G_2/N_2}$ and end up in option (ii).

The only remaining case is when the group homomorphisms ${\pi_1, \pi_2}$ are both bijections, hence are group isomorphisms. If we set ${\phi := \pi_2 \circ \pi_1^{-1}}$ we end up in case (iii). $\Box$

We can combine the Goursat lemma with Lemma 3 to obtain a variant:

Corollary 9 (Nilpotent Goursat lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2}$ of two nilpotent groups ${G_1, G_2}$. Then one of the following hold:
• (i) (${H}$ too small) There exists ${i=1,2}$ and a non-trivial group homomorphism ${\eta_i: G_i \rightarrow K}$ such that ${\eta_i(h_i)=0}$ for all ${(h_1,h_2) \in H}$.
• (ii) (${H}$ too large) There exist normal subgroups ${N_1, N_2}$ of ${G_1, G_2}$ respectively, not both trivial, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1/N_1 \times G_2/N_2}$.
• (iii) (Isomorphism) There is a group isomorphism ${\phi: G_1 \rightarrow G_2}$ such that ${H = \{ (g_1, \phi(g_1)): g_1 \in G_1\}}$ is the graph of ${\phi}$. In particular, ${G_1}$ and ${G_2}$ are isomorphic.

Proof: If Lemma 8(i) holds, then by applying Lemma 3 we arrive at our current option (i). The other options are unchanged from Lemma 8, giving the claim. $\Box$

Now we present a lemma involving three groups ${G_1,G_2,G_3}$ that is known in ergodic theory contexts as the “Furstenberg-Weiss argument”, as an argument of this type arose in this paper of Furstenberg and Weiss, though perhaps it also implicitly appears in other contexts also. It has the remarkable feature of being able to enforce the abelian nature of one of the groups once the other options of the lemma are excluded.

Lemma 10 (Furstenberg-Weiss lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2 \times G_3}$ of three groups ${G_1, G_2, G_3}$. Then one of the following hold:
• (i) (${H}$ too small) There is some proper subgroup ${G'_3}$ of ${G_3}$ and some ${i=1,2}$ such that ${h_3 \in G'_3}$ whenever ${(h_1,h_2,h_3) \in H}$ and ${h_i = 1}$.
• (ii) (${H}$ too large) There exists a non-trivial normal subgroup ${N_3}$ of ${G_3}$ with ${G_3/N_3}$ abelian, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1 \times G_2 \times G_3/N_3}$, where ${\pi: G_1 \times G_2 \times G_3 \rightarrow G_1 \times G_2 \times G_3/N_3}$ is the quotient map.
• (iii) ${G_3}$ is abelian.

Proof: If the group ${\{ h_3 \in G_3: (1,h_2,h_3) \in H \}}$ is a proper subgroup of ${G_3}$, then we are in option (i) (with ${i=1}$), so we may assume that

$\displaystyle \{ h_3 \in G_3: (1,h_2,h_3) \in H \} = G.$

Similarly we may assume that

$\displaystyle \{ h_3 \in G_3: (h_1,1,h_3) \in H \} = G.$

Now let ${g_3,g'_3}$ be any two elements of ${G}$. By the above assumptions, we can find ${h_1 \in G_1, h_2 \in G_2}$ such that

$\displaystyle (1, h_2, g_3) \in H$

and

$\displaystyle (h_1,1, g'_3) \in H.$

Taking commutators to eliminate the ${h_1,h_2}$ terms, we conclude that

$\displaystyle (1, 1, [g_3,g'_3]) \in H.$

Thus the group ${\{ h_3 \in G_3: (1,1,h_3) \in H \}}$ contains every commutator ${[g_3,g'_3]}$, and thus contains the entire group ${[G_3,G_3]}$ generated by these commutators. If ${G_3}$ fails to be abelian, then ${[G_3,G_3]}$ is a non-trivial normal subgroup of ${G_3}$, and ${H}$ now arises from ${G_1 \times G_2 \times G_3/[G_3,G_3]}$ in the obvious fashion, placing one in option (ii). Hence the only remaining case is when ${G_3}$ is abelian, giving us option (iii). $\Box$

As before, we can combine this with previous lemmas to obtain a variant in the nilpotent case:

Lemma 11 (Nilpotent Furstenberg-Weiss lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2 \times G_3}$ of three nilpotent groups ${G_1, G_2, G_3}$. Then one of the following hold:
• (i) (${H}$ too small) There exists ${i=1,2}$ and group homomorphisms ${\eta_i: G'_i \rightarrow K}$, ${\eta_3: G_3 \rightarrow K}$ for some abelian group ${K = (K,+)}$, with ${\eta_3}$ non-trivial, such that ${\eta_i(h_i) + \eta_3(h_3) = 0}$ whenever ${(h_1,h_2,h_3) \in H}$, where ${G'_i}$ is the projection of ${H}$ to ${G_i}$.
• (ii) (${H}$ too large) There exists a non-trivial normal subgroup ${N_3}$ of ${G_3}$, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1 \times G_2 \times G_3/N_3}$.
• (iii) ${G_3}$ is abelian.

Informally, this lemma asserts that if ${(h_1,h_2,h_3)}$ is a variable ranging in some subgroup ${G_1 \times G_2 \times G_3}$, then either (i) there is a non-trivial abelian equation that constrains ${h_3}$ in terms of either ${h_1}$ or ${h_2}$; (ii) ${h_3}$ is not fully determined by ${h_1}$ and ${h_2}$; or (iii) ${G_3}$ is abelian.

Proof: Applying Lemma 10, we are already done if conclusions (ii) or (iii) of that lemma hold, so suppose instead that conclusion (i) holds for say ${i=1}$. Then the group ${\{ (h_1,h_3) \in G_1 \times G_3: (h_1,h_2,h_3) \in H \hbox{ for some } h_2 \in G_2 \}}$ is not of the form ${G'_2 \times G_3}$, since it only contains those ${(1,h_3)}$ with ${h_3 \in G'_3}$. Applying Lemma 6, we obtain group homomorphisms ${\eta_1: G'_1 \rightarrow K}$, ${\eta_3: G_3 \rightarrow K}$ into an abelian group ${K= (K,+)}$, with ${\eta_3}$ non-trivial, such that ${\eta_1(h_1) + \eta_3(h_3) = 0}$ whenever ${(h_1,h_2,h_3) \in H}$, placing us in option (i). $\Box$

The Furstenberg-Weiss argument is often used (though not precisely in this form) to establish that certain key structure groups arising in ergodic theory are abelian; see for instance Proposition 6.3(1) of this paper of Host and Kra for an example.

One can get more structural control on ${H}$ in the Furstenberg-Weiss lemma in option (iii) if one also broadens options (i) and (ii):

Lemma 12 (Variant of Furstenberg-Weiss lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2 \times G_3}$ of three groups ${G_1, G_2, G_3}$. Then one of the following hold:
• (i) (${H}$ too small) There is some proper subgroup ${G'_{ij}}$ of ${G_i \times G_j}$ for some ${1 \leq i < j \leq 3}$ such that ${(h_i,h_j) \in G'_{ij}}$ whenever ${(h_1,h_2,h_3) \in H}$. (In other words, the projection of ${H}$ to ${G_i \times G_j}$ is not surjective.)
• (ii) (${H}$ too large) There exists a normal ${N_1, N_2, N_3}$ of ${G_1, G_2, G_3}$ respectively, not all trivial, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1/N_1 \times G_2/N_2 \times G_3/N_3}$, where ${\pi: G_1 \times G_2 \times G_3 \rightarrow G_1/N_1 \times G_2/N_2 \times G_3/N_3}$ is the quotient map.
• (iii) ${G_1,G_2,G_3}$ are abelian and isomorphic. Furthermore, there exist isomorphisms ${\phi_1: G_1 \rightarrow K}$, ${\phi_2: G_2 \rightarrow K}$, ${\phi_3: G_3 \rightarrow K}$ to an abelian group ${K = (K,+)}$ such that

$\displaystyle H = \{ (g_1,g_2,g_3) \in G_1 \times G_2 \times G_3: \phi(g_1) + \phi(g_2) + \phi(g_3) = 0 \}.$

The ability to encode an abelian additive relation in terms of group-theoretic properties is vaguely reminiscent of the group configuration theorem.

Proof: We apply Lemma 10. Option (i) of that lemma implies option (i) of the current lemma, and similarly for option (ii), so we may assume without loss of generality that ${G_3}$ is abelian. By permuting we may also assume that ${G_1,G_2}$ are abelian, and will use additive notation for these groups.

We may assume that the projections of ${H}$ to ${G_1 \times G_2}$ and ${G_3}$ are surjective, else we are in option (i). The group ${\{ g_3 \in G_3: (1,1,g_3) \in H\}}$ is then a normal subgroup of ${G_3}$; we may assume it is trivial, otherwise we can quotient it out and be in option (ii). Thus ${H}$ can be expressed as a graph ${\{ (h_1,h_2,\phi(h_1,h_2)): h_1 \in G_1, h_2 \in G_2\}}$ for some map ${\phi: G_1 \times G_2 \rightarrow G_3}$. As ${H}$ is a group, ${\phi}$ must be a homomorphism, and we can write it as ${\phi(h_1+h_2) = -\phi_1(h_1) - \phi_2(h_2)}$ for some homomorphisms ${\phi_1: G_1 \rightarrow G_3}$, ${\phi_2: G_2 \rightarrow G_3}$. Thus elements ${(h_1,h_2,h_3)}$ of ${H}$ obey the constraint ${\phi_1(h_1) + \phi_2(h_2) + h_3 = 0}$.

If ${\phi_1}$ or ${\phi_2}$ fails to be injective, then we can quotient out by their kernels and end up in option (ii). If ${\phi_1}$ fails to be surjective, then the projection of ${H}$ to ${G_2 \times G_3}$ also fails to be surjective (since for ${(h_1,h_2,h_3) \in H}$, ${\phi_2(h_2) + h_3}$ is now constrained to lie in the range of ${\phi_1}$) and we are in option (i). Similarly if ${\phi_2}$ fails to be surjective. Thus we may assume that the homomorphisms ${\phi_1,\phi_2}$ are bijective and thus group isomorphisms. Setting ${\phi_3}$ to the identity, we arrive at option (iii). $\Box$

Combining this lemma with Lemma 3, we obtain a nilpotent version:

Corollary 13 (Variant of nilpotent Furstenberg-Weiss lemma) Let ${H}$ be a subgroup of a product ${G_1 \times G_2 \times G_3}$ of three groups ${G_1, G_2, G_3}$. Then one of the following hold:
• (i) (${H}$ too small) There are homomorphisms ${\eta_i: G_i \rightarrow K}$, ${\eta_j: G_j \rightarrow K}$ to some abelian group ${K =(K,+)}$ for some ${1 \leq i < j \leq 3}$, with ${\eta_i, \eta_j}$ not both trivial, such that ${\eta_i(h_i) + \eta_j(h_j) = 0}$ whenever ${(h_1,h_2,h_3) \in H}$.
• (ii) (${H}$ too large) There exists a normal ${N_1, N_2, N_3}$ of ${G_1, G_2, G_3}$ respectively, not all trivial, such that ${H = \pi^{-1}(H')}$ arises from a subgroup ${H'}$ of ${G_1/N_1 \times G_2/N_2 \times G_3/N_3}$, where ${\pi: G_1 \times G_2 \times G_3 \rightarrow G_1/N_1 \times G_2/N_2 \times G_3/N_3}$ is the quotient map.
• (iii) ${G_1,G_2,G_3}$ are abelian and isomorphic. Furthermore, there exist isomorphisms ${\phi_1: G_1 \rightarrow K}$, ${\phi_2: G_2 \rightarrow K}$, ${\phi_3: G_3 \rightarrow K}$ to an abelian group ${K = (K,+)}$ such that

$\displaystyle H = \{ (g_1,g_2,g_3) \in G_1 \times G_2 \times G_3: \phi(g_1) + \phi(g_2) + \phi(g_3) = 0 \}.$

Here is another variant of the Furstenberg-Weiss lemma, attributed to Serre by Ribet (see Lemma 3.3):

Lemma 14 (Serre’s lemma) Let ${H}$ be a subgroup of a finite product ${G_1 \times \dots \times G_k}$ of groups ${G_1,\dots,G_k}$ with ${k \geq 2}$. Then one of the following hold:
• (i) (${H}$ too small) There is some proper subgroup ${G'_{ij}}$ of ${G_i \times G_j}$ for some ${1 \leq i < j \leq k}$ such that ${(h_i,h_j) \in G'_{ij}}$ whenever ${(h_1,\dots,h_k) \in H}$.
• (ii) (${H}$ too large) One has ${H = G_1 \times \dots \times G_k}$.
• (iii) One of the ${G_i}$ has a non-trivial abelian quotient ${G_i/N_i}$.

Proof: The claim is trivial for ${k=2}$ (and we don’t need (iii) in this case), so suppose that ${k \geq 3}$. We can assume that each ${G_i}$ is a perfect group, ${G_i = [G_i,G_i]}$, otherwise we can quotient out by the commutator and arrive in option (iii). Similarly, we may assume that all the projections of ${H}$ to ${G_i \times G_j}$, ${1 \leq i < j \leq k}$ are surjective, otherwise we are in option (i).

We now claim that for any ${1 \leq j < k}$ and any ${g_k \in G_k}$, one can find ${(h_1,\dots,h_k) \in H}$ with ${h_i=1}$ for ${1 \leq i \leq j}$ and ${h_k = g_k}$. For ${j=1}$ this follows from the surjectivity of the projection of ${H}$ to ${G_1 \times G_k}$. Now suppose inductively that ${1 < j < k}$ and the claim has already been proven for ${j-1}$. Since ${G_k}$ is perfect, it suffices to establish this claim for ${g_k}$ of the form ${g_k = [g'_k, g''_k]}$ for some ${g'_k, g''_k \in G_k}$. By induction hypothesis, we can find ${(h'_1,\dots,h'_k) \in H}$ with ${h'_i = 1}$ for ${1 \leq i < j}$ and ${h'_k = g'_k}$. By surjectivity of the projection of ${H}$ to ${G_j \times G_k}$, one can find ${(h''_1,\dots,h''_k) \in H}$ with ${h''_j = 1}$ and ${h''_k=g''_k}$. Taking commutators of these two elements, we obtain the claim.

Setting ${j = k-1}$, we conclude that ${H}$ contains ${1 \times \dots \times 1 \times G_k}$. Similarly for permutations. Multiplying these together we see that ${H}$ contains all of ${G_1 \times \dots \times G_k}$, and we are in option (ii). $\Box$

I was asked the following interesting question from a bright high school student I am working with, to which I did not immediately know the answer:

Question 1 Does there exist a smooth function ${f: {\bf R} \rightarrow {\bf R}}$ which is not real analytic, but such that all the differences ${x \mapsto f(x+h) - f(x)}$ are real analytic for every ${h \in {\bf R}}$?

The hypothesis implies that the Newton quotients ${\frac{f(x+h)-f(x)}{h}}$ are real analytic for every ${h \neq 0}$. If analyticity was preserved by smooth limits, this would imply that ${f'}$ is real analytic, which would make ${f}$ real analytic. However, we are not assuming any uniformity in the analyticity of the Newton quotients, so this simple argument does not seem to resolve the question immediately.

In the case that ${f}$ is periodic, say periodic with period ${1}$, one can answer the question in the negative by Fourier series. Perform a Fourier expansion ${f(x) = \sum_{n \in {\bf Z}} c_n e^{2\pi i nx}}$. If ${f}$ is not real analytic, then there is a sequence ${n_j}$ going to infinity such that ${|c_{n_j}| = e^{-o(n_j)}}$ as ${j \rightarrow \infty}$. From the Borel-Cantelli lemma one can then find a real number ${h}$ such that ${|e^{2\pi i h n_j} - 1| \gg \frac{1}{n^2_j}}$ (say) for infinitely many ${j}$, hence ${|(e^{2\pi i h n_j} - 1) c_{n_j}| \gg n_j^2 e^{-o(n_j)}}$ for infinitely many ${j}$. Thus the Fourier coefficients of ${x \mapsto f(x+h) - f(x)}$ do not decay exponentially and hence this function is not analytic, a contradiction.

I was not able to quickly resolve the non-periodic case, but I thought perhaps this might be a good problem to crowdsource, so I invite readers to contribute their thoughts on this problem here. In the spirit of the polymath projects, I would encourage comments that contain thoughts that fall short of a complete solution, in the event that some other reader may be able to take the thought further.

In this previous blog post I noted the following easy application of Cauchy-Schwarz:

Lemma 1 (Van der Corput inequality) Let ${v,u_1,\dots,u_n}$ be unit vectors in a Hilbert space ${H}$. Then

$\displaystyle (\sum_{i=1}^n |\langle v, u_i \rangle_H|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H|.$

Proof: The left-hand side may be written as ${|\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2}$ for some unit complex numbers ${\epsilon_i}$. By Cauchy-Schwarz we have

$\displaystyle |\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2 \leq \langle \sum_{i=1}^n \epsilon_i u_i, \sum_{j=1}^n \epsilon_j u_j \rangle_H$

and the claim now follows from the triangle inequality. $\Box$

As a corollary, correlation becomes transitive in a statistical sense (even though it is not transitive in an absolute sense):

Corollary 2 (Statistical transitivity of correlation) Let ${v,u_1,\dots,u_n}$ be unit vectors in a Hilbert space ${H}$ such that ${|\langle v,u_i \rangle_H| \geq \delta}$ for all ${i=1,\dots,n}$ and some ${0 < \delta \leq 1}$. Then we have ${|\langle u_i, u_j \rangle_H| \geq \delta^2/2}$ for at least ${\delta^2 n^2/2}$ of the pairs ${(i,j) \in \{1,\dots,n\}^2}$.

Proof: From the lemma, we have

$\displaystyle \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H| \geq \delta^2 n^2.$

The contribution of those ${i,j}$ with ${|\langle u_i, u_j \rangle_H| < \delta^2/2}$ is at most ${\delta^2 n^2/2}$, and all the remaining summands are at most ${1}$, giving the claim. $\Box$

One drawback with this corollary is that it does not tell us which pairs ${u_i,u_j}$ correlate. In particular, if the vector ${v}$ also correlates with a separate collection ${w_1,\dots,w_n}$ of unit vectors, the pairs ${(i,j)}$ for which ${u_i,u_j}$ correlate may have no intersection whatsoever with the pairs in which ${w_i,w_j}$ correlate (except of course on the diagonal ${i=j}$ where they must correlate).

While working on an ongoing research project, I recently found that there is a very simple way to get around the latter problem by exploiting the tensor power trick:

Corollary 3 (Simultaneous statistical transitivity of correlation) Let ${v, u^k_i}$ be unit vectors in a Hilbert space for ${i=1,\dots,n}$ and ${k=1,\dots,K}$ such that ${|\langle v, u^k_i \rangle_H| \geq \delta_k}$ for all ${i=1,\dots,n}$, ${k=1,\dots,K}$ and some ${0 < \delta_k \leq 1}$. Then there are at least ${(\delta_1 \dots \delta_K)^2 n^2/2}$ pairs ${(i,j) \in \{1,\dots,n\}^2}$ such that ${\prod_{k=1}^K |\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}$. In particular (by Cauchy-Schwarz) we have ${|\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}$ for all ${k}$.

Proof: Apply Corollary 2 to the unit vectors ${v^{\otimes K}}$ and ${u^1_i \otimes \dots \otimes u^K_i}$, ${i=1,\dots,n}$ in the tensor power Hilbert space ${H^{\otimes K}}$. $\Box$

It is surprisingly difficult to obtain even a qualitative version of the above conclusion (namely, if ${v}$ correlates with all of the ${u^k_i}$, then there are many pairs ${(i,j)}$ for which ${u^k_i}$ correlates with ${u^k_j}$ for all ${k}$ simultaneously) without some version of the tensor power trick. For instance, even the powerful Szemerédi regularity lemma, when applied to the set of pairs ${i,j}$ for which one has correlation of ${u^k_i}$, ${u^k_j}$ for a single ${i,j}$, does not seem to be sufficient. However, there is a reformulation of the argument using the Schur product theorem as a substitute for (or really, a disguised version of) the tensor power trick. For simplicity of notation let us just work with real Hilbert spaces to illustrate the argument. We start with the identity

$\displaystyle \langle u^k_i, u^k_j \rangle_H = \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H + \langle \pi(u^k_i), \pi(u^k_j) \rangle_H$

where ${\pi}$ is the orthogonal projection to the complement of ${v}$. This implies a Gram matrix inequality

$\displaystyle (\langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ 0$

for each ${k}$ where ${A \succ B}$ denotes the claim that ${A-B}$ is positive semi-definite. By the Schur product theorem, we conclude that

$\displaystyle (\prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\prod_{k=1}^K \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n}$

and hence for a suitable choice of signs ${\epsilon_1,\dots,\epsilon_n}$,

$\displaystyle \sum_{1 \leq i, j \leq n} \epsilon_i \epsilon_j \prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H \geq \delta_1^2 \dots \delta_K^2 n^2.$

One now argues as in the proof of Corollary 2.

A separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly related to this current application.

The (classical) Möbius function ${\mu: {\bf N} \rightarrow {\bf Z}}$ is the unique function that obeys the classical Möbius inversion formula:

Proposition 1 (Classical Möbius inversion) Let ${f,g: {\bf N} \rightarrow A}$ be functions from the natural numbers to an additive group ${A}$. Then the following two claims are equivalent:
• (i) ${f(n) = \sum_{d|n} g(d)}$ for all ${n \in {\bf N}}$.
• (ii) ${g(n) = \sum_{d|n} \mu(n/d) f(d)}$ for all ${n \in {\bf N}}$.

There is a generalisation of this formula to (finite) posets, due to Hall, in which one sums over chains ${n_0 > \dots > n_k}$ in the poset:

Proposition 2 (Poset Möbius inversion) Let ${{\mathcal N}}$ be a finite poset, and let ${f,g: {\mathcal N} \rightarrow A}$ be functions from that poset to an additive group ${A}$. Then the following two claims are equivalent:
• (i) ${f(n) = \sum_{d \leq n} g(d)}$ for all ${n \in {\mathcal N}}$, where ${d}$ is understood to range in ${{\mathcal N}}$.
• (ii) ${g(n) = \sum_{k=0}^\infty (-1)^k \sum_{n = n_0 > n_1 > \dots > n_k} f(n_k)}$ for all ${n \in {\mathcal N}}$, where in the inner sum ${n_0,\dots,n_k}$ are understood to range in ${{\mathcal N}}$ with the indicated ordering.
(Note from the finite nature of ${{\mathcal N}}$ that the inner sum in (ii) is vacuous for all but finitely many ${k}$.)

Comparing Proposition 2 with Proposition 1, it is natural to refer to the function ${\mu(d,n) := \sum_{k=0}^\infty (-1)^k \sum_{n = n_0 > n_1 > \dots > n_k = d} 1}$ as the Möbius function of the poset; the condition (ii) can then be written as

$\displaystyle g(n) = \sum_{d \leq n} \mu(d,n) f(d).$

Proof: If (i) holds, then we have

$\displaystyle g(n) = f(n) - \sum_{d

for any ${n \in {\mathcal N}}$. Iterating this we obtain (ii). Conversely, from (ii) and separating out the ${k=0}$ term, and grouping all the other terms based on the value of ${d:=n_1}$, we obtain (1), and hence (i). $\Box$

In fact it is not completely necessary that the poset ${{\mathcal N}}$ be finite; an inspection of the proof shows that it suffices that every element ${n}$ of the poset has only finitely many predecessors ${\{ d \in {\mathcal N}: d < n \}}$.

It is not difficult to see that Proposition 2 includes Proposition 1 as a special case, after verifying the combinatorial fact that the quantity

$\displaystyle \sum_{k=0}^\infty (-1)^k \sum_{d=n_k | n_{k-1} | \dots | n_1 | n_0 = n} 1$

is equal to ${\mu(n/d)}$ when ${d}$ divides ${n}$, and vanishes otherwise.

I recently discovered that Proposition 2 can also lead to a useful variant of the inclusion-exclusion principle. The classical version of this principle can be phrased in terms of indicator functions: if ${A_1,\dots,A_\ell}$ are subsets of some set ${X}$, then

$\displaystyle \prod_{j=1}^\ell (1-1_{A_j}) = \sum_{k=0}^\ell (-1)^k \sum_{1 \leq j_1 < \dots < j_k \leq \ell} 1_{A_{j_1} \cap \dots \cap A_{j_k}}.$

In particular, if there is a finite measure ${\nu}$ on ${X}$ for which ${A_1,\dots,A_\ell}$ are all measurable, we have

$\displaystyle \nu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_{k=0}^\ell (-1)^k \sum_{1 \leq j_1 < \dots < j_k \leq \ell} \nu( A_{j_1} \cap \dots \cap A_{j_k} ).$

One drawback of this formula is that there are exponentially many terms on the right-hand side: ${2^\ell}$ of them, in fact. However, in many cases of interest there are “collisions” between the intersections ${A_{j_1} \cap \dots \cap A_{j_k}}$ (for instance, perhaps many of the pairwise intersections ${A_i \cap A_j}$ agree), in which case there is an opportunity to collect terms and hopefully achieve some cancellation. It turns out that it is possible to use Proposition 2 to do this, in which one only needs to sum over chains in the resulting poset of intersections:

Proposition 3 (Hall-type inclusion-exclusion principle) Let ${A_1,\dots,A_\ell}$ be subsets of some set ${X}$, and let ${{\mathcal N}}$ be the finite poset formed by intersections of some of the ${A_i}$ (with the convention that ${X}$ is the empty intersection), ordered by set inclusion. Then for any ${E \in {\mathcal N}}$, one has

$\displaystyle 1_E \prod_{F \subsetneq E} (1 - 1_F) = \sum_{k=0}^\ell (-1)^k \sum_{E = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} 1_{E_k} \ \ \ \ \ (2)$

where ${F, E_0,\dots,E_k}$ are understood to range in ${{\mathcal N}}$. In particular (setting ${E}$ to be the empty intersection) if the ${A_j}$ are all proper subsets of ${X}$ then we have

$\displaystyle \prod_{j=1}^\ell (1-1_{A_j}) = \sum_{k=0}^\ell (-1)^k \sum_{X = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} 1_{E_k}. \ \ \ \ \ (3)$

In particular, if there is a finite measure ${\nu}$ on ${X}$ for which ${A_1,\dots,A_\ell}$ are all measurable, we have

$\displaystyle \mu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_{k=0}^\ell (-1)^k \sum_{X = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} \mu(E_k).$

Using the Möbius function ${\mu}$ on the poset ${{\mathcal N}}$, one can write these formulae as

$\displaystyle 1_E \prod_{F \subsetneq E} (1 - 1_F) = \sum_{F \subseteq E} \mu(F,E) 1_F,$

$\displaystyle \prod_{j=1}^\ell (1-1_{A_j}) = \sum_F \mu(F,X) 1_F$

and

$\displaystyle \nu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_F \mu(F,X) \nu(F).$

Proof: It suffices to establish (2) (to derive (3) from (2) observe that all the ${F \subsetneq X}$ are contained in one of the ${A_j}$, so the effect of ${1-1_F}$ may be absorbed into ${1 - 1_{A_j}}$). Applying Proposition 2, this is equivalent to the assertion that

$\displaystyle 1_E = \sum_{F \subseteq E} 1_F \prod_{G \subsetneq F} (1 - 1_G)$

for all ${E \in {\mathcal N}}$. But this amounts to the assertion that for each ${x \in E}$, there is precisely one ${F \subseteq E}$ in ${{\mathcal n}}$ with the property that ${x \in F}$ and ${x \not \in G}$ for any ${G \subsetneq F}$ in ${{\mathcal N}}$, namely one can take ${F}$ to be the intersection of all ${G \subseteq E}$ in ${{\mathcal N}}$ such that ${G}$ contains ${x}$. $\Box$

Example 4 If ${A_1,A_2,A_3 \subsetneq X}$ with ${A_1 \cap A_2 = A_1 \cap A_3 = A_2 \cap A_3 = A_*}$, and ${A_1,A_2,A_3,A_*}$ are all distinct, then we have for any finite measure ${\nu}$ on ${X}$ that makes ${A_1,A_2,A_3}$ measurable that

$\displaystyle \nu(X \backslash (A_1 \cup A_2 \cup A_3)) = \nu(X) - \nu(A_1) - \nu(A_2) \ \ \ \ \ (4)$

$\displaystyle - \nu(A_3) - \nu(A_*) + 3 \nu(A_*)$

due to the four chains ${X \supsetneq A_1}$, ${X \supsetneq A_2}$, ${X \supsetneq A_3}$, ${X \supsetneq A_*}$ of length one, and the three chains ${X \supsetneq A_1 \supsetneq A_*}$, ${X \supsetneq A_2 \supsetneq A_*}$, ${X \supsetneq A_3 \supsetneq A_*}$ of length two. Note that this expansion just has six terms in it, as opposed to the ${2^3=8}$ given by the usual inclusion-exclusion formula, though of course one can reduce the number of terms by combining the ${\nu(A_*)}$ factors. This may not seem particularly impressive, especially if one views the term ${3 \mu(A_*)}$ as really being three terms instead of one, but if we add a fourth set ${A_4 \subsetneq X}$ with ${A_i \cap A_j = A_*}$ for all ${1 \leq i < j \leq 4}$, the formula now becomes

$\displaystyle \nu(X \backslash (A_1 \cup A_2 \cup A_3 \cap A_4)) = \nu(X) - \nu(A_1) - \nu(A_2) \ \ \ \ \ (5)$

$\displaystyle - \nu(A_3) - \nu(A_4) - \nu(A_*) + 4 \nu(A_*)$

and we begin to see more cancellation as we now have just seven terms (or ten if we count ${4 \nu(A_*)}$ as four terms) instead of ${2^4 = 16}$ terms.

Example 5 (Variant of Legendre sieve) If ${q_1,\dots,q_\ell > 1}$ are natural numbers, and ${a_1,a_2,\dots}$ is some sequence of complex numbers with only finitely many terms non-zero, then by applying the above proposition to the sets ${A_j := q_j {\bf N}}$ and with ${\nu}$ equal to counting measure weighted by the ${a_n}$ we obtain a variant of the Legendre sieve

$\displaystyle \sum_{n: (n,q_1 \dots q_\ell) = 1} a_n = \sum_{k=0}^\ell (-1)^k \sum_{1 |' d_1 |' \dots |' d_k} \sum_{n: d_k |n} a_n$

where ${d_1,\dots,d_k}$ range over the set ${{\mathcal N}}$ formed by taking least common multiples of the ${q_j}$ (with the understanding that the empty least common multiple is ${1}$), and ${d |' n}$ denotes the assertion that ${d}$ divides ${n}$ but is strictly less than ${n}$. I am curious to know of this version of the Legendre sieve already appears in the literature (and similarly for the other applications of Proposition 2 given here).

If the poset ${{\mathcal N}}$ has bounded depth then the number of terms in Proposition 3 can end up being just polynomially large in ${\ell}$ rather than exponentially large. Indeed, if all chains ${X \supsetneq E_1 \supsetneq \dots \supsetneq E_k}$ in ${{\mathcal N}}$ have length ${k}$ at most ${k_0}$ then the number of terms here is at most ${1 + \ell + \dots + \ell^{k_0}}$. (The examples (4), (5) are ones in which the depth is equal to two.) I hope to report in a later post on how this version of inclusion-exclusion with polynomially many terms can be useful in an application.

Actually in our application we need an abstraction of the above formula, in which the indicator functions are replaced by more abstract idempotents:

Proposition 6 (Hall-type inclusion-exclusion principle for idempotents) Let ${A_1,\dots,A_\ell}$ be pairwise commuting elements of some ring ${R}$ with identity, which are all idempotent (thus ${A_j A_j = A_j}$ for ${j=1,\dots,\ell}$). Let ${{\mathcal N}}$ be the finite poset formed by products of the ${A_i}$ (with the convention that ${1}$ is the empty product), ordered by declaring ${E \leq F}$ when ${EF = E}$ (note that all the elements of ${{\mathcal N}}$ are idempotent so this is a partial ordering). Then for any ${E \in {\mathcal N}}$, one has

$\displaystyle E \prod_{F < E} (1-F) = \sum_{k=0}^\ell (-1)^k \sum_{E = E_0 > E_1 > \dots > E_k} E_k. \ \ \ \ \ (6)$

where ${F, E_0,\dots,E_k}$ are understood to range in ${{\mathcal N}}$. In particular (setting ${E=1}$) if all the ${A_j}$ are not equal to ${1}$ then we have

$\displaystyle \prod_{j=1}^\ell (1-A_j) = \sum_{k=0}^\ell (-1)^k \sum_{1 = E_0 > E_1 > \dots > E_k} E_k.$

Morally speaking this proposition is equivalent to the previous one after applying a “spectral theorem” to simultaneously diagonalise all of the ${A_j}$, but it is quicker to just adapt the previous proof to establish this proposition directly. Using the Möbius function ${\mu}$ for ${{\mathcal N}}$, we can rewrite these formulae as

$\displaystyle E \prod_{F < E} (1-F) = \sum_{F \leq E} \mu(F,E) 1_F$

and

$\displaystyle \prod_{j=1}^\ell (1-A_j) = \sum_F \mu(F,1) 1_F.$

Proof: Again it suffices to verify (6). Using Proposition 2 as before, it suffices to show that

$\displaystyle E = \sum_{F \leq E} F \prod_{G < F} (1 - G) \ \ \ \ \ (7)$

for all ${E \in {\mathcal N}}$ (all sums and products are understood to range in ${{\mathcal N}}$). We can expand

$\displaystyle E = E \prod_{G < E} (G + (1-G)) = \sum_{{\mathcal A}} (\prod_{G \in {\mathcal A}} G) (\prod_{G < E: G \not \in {\mathcal A}} (1-G)) \ \ \ \ \ (8)$

where ${{\mathcal A}}$ ranges over all subsets of ${\{ G \in {\mathcal N}: G \leq E \}}$ that contain ${E}$. For such an ${{\mathcal A}}$, if we write ${F := \prod_{G \in {\mathcal A}} G}$, then ${F}$ is the greatest lower bound of ${{\mathcal A}}$, and we observe that ${F (\prod_{G < E: G \not \in {\mathcal A}} (1-G))}$ vanishes whenever ${{\mathcal A}}$ fails to contain some ${G \in {\mathcal N}}$ with ${F \leq G \leq E}$. Thus the only ${{\mathcal A}}$ that give non-zero contributions to (8) are the intervals of the form ${\{ G \in {\mathcal N}: F \leq G \leq E\}}$ for some ${F \leq E}$ (which then forms the greatest lower bound for that interval), and the claim (7) follows (after noting that ${F (1-G) = F (1-FG)}$ for any ${F,G \in {\mathcal N}}$). $\Box$

[I am posting this advertisement in my capacity as chair of the Steering Committee for the UCLA Endowed Olga Radko Math Circle – T.]

The Department of Mathematics at the University of California, Los Angeles, is inviting applications for the position of an Academic Administrator who will serve as the Director of the UCLA Endowed Olga Radko Math Circle (ORMC). The Academic Administrator will have the broad responsibility for administration of the ORMC, an outreach program with weekly activities for mathematically inclined students in grades K-12. Currently, over 300 children take part in the program each weekend. Instruction is delivered by a team of over 50 docents, the majority of whom are UCLA undergraduate and graduate students.

The Academic Administrator is required to teach three mathematics courses in the undergraduate curriculum per academic year as assigned by the Department. This is also intended to help with the recruitment of UCLA students as docents and instructors for the ORMC.

As the director of ORMC, the Academic Administrator will have primary responsibility for all aspects of ORMC operations:

• Determining the structure of ORMC, including the number and levels of groups
• Recruiting, training and supervising instructors, docents, and postdoctoral fellows associated with the ORMC
• Developing curricular materials and providing leadership in development of innovative ways of explaining mathematical ideas to school children
• Working with the Mathematics Department finance office to ensure timely payment of stipends and wages to ORMC instructors and docents, as appropriate
• Maintaining ORMC budget and budgetary projections, ensuring that the funds are used appropriately and efficiently for ORMC activities, and applying for grants as appropriate to fund the operations of ORMC
• Working with the Steering Committee and UCLA Development to raise funds for ORMC, both from families whose children participate in ORMC and other sources
• Admitting students to ORMC, ensuring appropriate placement, and working to maintain a collegial and inclusive atmosphere conducive to learning for all ORMC attendees
• Reporting to and working with the ORMC Steering Committee throughout the year

A competitive candidate should have leadership potential and experience with developing mathematical teaching materials for the use of gifted school children, as well as experience with teaching undergraduate mathematics courses. Candidates must have a Ph.D. degree (or equivalent) or expect to complete their Ph.D. by June 30, 2021.

Applications should be received by March 15, 2021. Further details on the position and the application process can be found at the application page.

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let ${\pi(x)}$ denote the number of primes less than a given real number ${x}$. Then

$\displaystyle \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1$

(or in asymptotic notation, ${\pi(x) = (1+o(1)) \frac{x}{\ln x}}$ as ${x \rightarrow \infty}$).

(Actually, it turns out to be slightly more natural to replace the approximation ${\frac{x}{\ln x}}$ in the prime number theorem by the logarithmic integral ${\int_2^x \frac{dt}{\ln t}}$, which turns out to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function ${\zeta}$. Initially, it is only defined on the half-plane ${\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}$:

Definition 2 (Riemann zeta function, preliminary definition) Let ${s \in {\bf C}}$ be such that ${\mathrm{Re} s > 1}$. Then we define

$\displaystyle \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)$

Note that the series is locally uniformly convergent in the half-plane ${\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}$, so in particular ${\zeta}$ is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

$\displaystyle \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)$

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number ${n}$ is uniquely representable as a product ${n = \prod_p p^{a_p}}$ over primes, where the ${a_p}$ are natural numbers, all but finitely many of which are zero. Writing this representation as ${\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}$, we see that

$\displaystyle \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}$

whenever ${x \geq 1}$, ${m \geq 0}$, and ${S_{x,m}}$ consists of all the natural numbers of the form ${n = \prod_{p \leq x} p^{a_p}}$ for some ${a_p \leq m}$. Sending ${m}$ and ${x}$ to infinity, we conclude from monotone convergence and the geometric series formula that

$\displaystyle \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}$

whenever ${s>1}$ is real, and then from dominated convergence we see that the same formula holds for complex ${s}$ with ${\mathrm{Re} s > 1}$ as well. Local uniform convergence then follows from the product form of the Weierstrass ${M}$-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product ${\prod_p (1-\frac{1}{p^s})^{-1}}$ is absolutely convergent and all terms are non-zero. $\Box$

We remark that by sending ${s}$ to ${1}$ in Theorem 3(i) we conclude that

$\displaystyle \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}$

and from the divergence of the harmonic series we then conclude Euler’s theorem ${\sum_p \frac{1}{p} = \infty}$. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers ${n}$ occurring in the sum (1), or as a consequence of various integral representations of ${\zeta}$ (or slight modifications thereof). We will focus in this set of notes on a particular representation of ${\zeta}$ as essentially the Mellin transform of the theta function ${\theta}$ that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of ${\zeta}$. The functional equation (iv), when combined with special values of ${\zeta}$ such as (2), gives some additional values of ${\zeta}$ outside of its initial domain ${\{s: \mathrm{Re} s > 1\}}$, most famously

$\displaystyle \zeta(-1) = -\frac{1}{12}.$

If one formally compares this formula with (1), one arrives at the infamous identity

$\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12}$

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of ${\Gamma}$, we see that ${\zeta}$ has simple zeroes (known as trivial zeroes) at the negative even integers ${-2, -4, \dots}$, and all other zeroes (the non-trivial zeroes) inside the critical strip ${\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}$. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if ${\rho}$ is a non-trivial zero then so is ${1-\rho}$; also, we have the identity

$\displaystyle \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)$

for all ${s>1}$ by (1), hence for all ${s}$ (except the pole at ${s=1}$) by meromorphic continuation. Thus if ${\rho}$ is a non-trivial zero then so is ${\overline{\rho}}$. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line ${\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}$. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of ${\zeta}$ lie on the critical line ${\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}$.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where ${\mathrm{Re} s > 1}$. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

$\displaystyle -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})$

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

$\displaystyle -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.$

From the geometric series formula we have

$\displaystyle \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}$

and so (by another application of Fubini’s theorem) we have the identity

$\displaystyle -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)$

for ${\mathrm{Re} s > 1}$, where the von Mangoldt function ${\Lambda(n)}$ is defined to equal ${\Lambda(n) = \ln p}$ whenever ${n = p^j}$ is a power ${p^j}$ of a prime ${p}$ for some ${j=1,2,\dots}$, and ${\Lambda(n)=0}$ otherwise. The contribution of the higher prime powers ${p^2, p^3, \dots}$ is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series ${\sum_{n=1}^\infty \frac{1}{n^s}}$ and ${\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}}$ that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let ${s}$ be a complex number with ${\mathrm{Re} s > 1}$.
• (i) Show that ${-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}$.
• (ii) Show that ${\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}$, where ${\tau(n) := \sum_{d|n} 1}$ is the divisor function of ${n}$ (the number of divisors of ${n}$).
• (iii) Show that ${\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}$, where ${\mu(n)}$ is the Möbius function, defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ distinct primes for some ${k \geq 0}$, and ${0}$ otherwise.
• (iv) Show that ${\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}$, where ${\lambda(n)}$ is the Liouville function, defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ (not necessarily distinct) primes for some ${k \geq 0}$.
• (v) Show that ${\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}$, where ${\log \zeta}$ is the holomorphic branch of the logarithm that is real for ${s>1}$, and with the convention that ${\Lambda(n)/\ln n}$ vanishes for ${n=1}$.
• (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function ${\Lambda: {\bf N} \rightarrow {\bf R}}$ such that

$\displaystyle \ln n = \sum_{d|n} \Lambda(d)$

for every positive integer ${n}$. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function ${\Lambda}$, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1$

(or in asymptotic notation, ${\sum_{n\leq x} \Lambda(n) = x + o(x)}$ as ${x \rightarrow \infty}$).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any ${x \geq 2}$, we can write

$\displaystyle \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.$

The sum ${\sum_{p \leq x^{1/j}} \ln p}$ is non-zero for only ${O(\ln x)}$ values of ${j}$, and is of size ${O( x^{1/2} \ln x )}$, thus

$\displaystyle \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).$

Since ${x^{1/2} \ln^2 x = o(x)}$, we conclude from Theorem 6 that

$\displaystyle \sum_{p \leq x} \ln p = x + o(x)$

as ${x \rightarrow \infty}$. Next, observe from the fundamental theorem of calculus that

$\displaystyle \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.$

Multiplying by ${\log p}$ and summing over all primes ${p \leq x}$, we conclude that

$\displaystyle \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.$

From Theorem 6 we certainly have ${\sum_{p \leq y} \ln p = O(y)}$, thus

$\displaystyle \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).$

By splitting the integral into the ranges ${2 \leq y \leq \sqrt{x}}$ and ${\sqrt{x} < y \leq x}$ we see that the right-hand side is ${o(x/\ln x)}$, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between ${\zeta}$ and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer ${x > 1}$, we have

$\displaystyle \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )$

where ${\rho}$ ranges over the non-trivial zeroes of ${\zeta}$ with imaginary part in ${[-T,T]}$. Furthermore, the convergence of the limit is locally uniform in ${x}$.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes ${\rho}$ of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in ${x}$, one is led to the (quite nonrigorous) approximation

$\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)$

or (writing ${\rho = \sigma+i\gamma}$)

$\displaystyle \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.$

Thus we see that each zero ${\rho = \sigma + i\gamma}$ induces an oscillation in the von Mangoldt function, with ${\gamma}$ controlling the frequency of the oscillation and ${\sigma}$ the rate to which the oscillation dies out as ${n \rightarrow \infty}$. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of ${\zeta}$ on the line ${\{ 1+it: t \in {\bf R} \}}$.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at ${1+it}$, then there would also be a different zero at ${1-it}$ (note ${t}$ cannot vanish due to the pole at ${s=1}$), and then the approximation (9) becomes

$\displaystyle \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \log n) + \dots.$

But the expression ${1 - 2 \cos(t \log n)}$ can be negative for large regions of the variable ${n}$, whereas ${\Lambda(n)}$ is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real ${t}$, show that

$\displaystyle -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})$

as ${\sigma \rightarrow 1^+}$, where ${o( \frac{1}{\sigma-1})}$ denotes a quantity that goes to zero as ${\sigma \rightarrow 1^+}$ after being multiplied by ${\sigma-1}$. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

[The following statement is signed by several mathematicians at Stanford and MIT in support of one of their recently admitted graduate students, and I am happy to post it here on my blog. -T]

We were saddened and horrified to learn that Ilya Dumanski, a brilliant young mathematician who has been admitted to our graduate programs at Stanford and MIT, has been imprisoned in Russia, along with several other mathematicians, for participation in a peaceful demonstration. Our thoughts are with them. We urge their rapid release, and failing that, that they be kept in humane conditions. A petition in their support has been started at

https://www.ipetitions.com/petition/a-call-for-immediate-release-of-arrested-students/

Signed,

Roman Bezrukavnikov (MIT)
Alexei Borodin (MIT)
Daniel Bump (Stanford)
Sourav Chatterjee (Stanford)
Otis Chodosh (Stanford)
Ralph Cohen (Stanford)
Henry Cohn (MIT)
Joern Dunkel (MIT)
Pavel Etingof (MIT)
Jacob Fox (Stanford)
Michel Goemans (MIT)
Eleny Ionel (Stanford)
Steven Kerckhoff (Stanford)
Jonathan Luk (Stanford)
Eugenia Malinnikova (Stanford)
Davesh Maulik (MIT)
Rafe Mazzeo (Stanford)
Haynes Miller (MIT)
Ankur Moitra (MIT)
Elchanan Mossel (MIT)
Tomasz Mrowka (MIT)
Bjorn Poonen (MIT)
Alex Postnikov (MIT)
Lenya Ryzhik (Stanford)
Paul Seidel (MIT)
Mike Sipser (MIT)
Kannan Soundararajan (Stanford)
Gigliola Staffilani (MIT)
Nike Sun (MIT)
Richard Taylor (Stanford)
Ravi Vakil (Stanford)
Andras Vasy (Stanford)
Jan Vondrak (Stanford)
Brian White (Stanford)
Zhiwei Yun (MIT)

Previous set of notes: Notes 2. Next set of notes: Notes 4.

On the real line, the quintessential examples of a periodic function are the (normalised) sine and cosine functions ${\sin(2\pi x)}$, ${\cos(2\pi x)}$, which are ${1}$-periodic in the sense that

$\displaystyle \sin(2\pi(x+1)) = \sin(2\pi x); \quad \cos(2\pi (x+1)) = \cos(2\pi x).$

By taking various polynomial combinations of ${\sin(2\pi x)}$ and ${\cos(2\pi x)}$ we obtain more general trigonometric polynomials that are ${1}$-periodic; and the theory of Fourier series tells us that all other ${1}$-periodic functions (with reasonable integrability conditions) can be approximated in various senses by such polynomial combinations. Using Euler’s identity, one can use ${e^{2\pi ix}}$ and ${e^{-2\pi ix}}$ in place of ${\sin(2\pi x)}$ and ${\cos(2\pi x)}$ as the basic generating functions here, provided of course one is willing to use complex coefficients instead of real ones. Of course, by rescaling one can also make similar statements for other periods than ${1}$. ${1}$-periodic functions ${f: {\bf R} \rightarrow {\bf C}}$ can also be identified (by abuse of notation) with functions ${f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ on the quotient space ${{\bf R}/{\bf Z}}$ (known as the additive ${1}$-torus or additive unit circle), or with functions ${f: [0,1] \rightarrow {\bf C}}$ on the fundamental domain (up to boundary) ${[0,1]}$ of that quotient space with the periodic boundary condition ${f(0)=f(1)}$. The map ${x \mapsto (\cos(2\pi x), \sin(2\pi x))}$ also identifies the additive unit circle ${{\bf R}/{\bf Z}}$ with the geometric unit circle ${S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2=1\} \subset {\bf R}^2}$, thanks in large part to the fundamental trigonometric identity ${\cos^2 x + \sin^2 x = 1}$; this can also be identified with the multiplicative unit circle ${S^1 = \{ z \in {\bf C}: |z|=1 \}}$. (Usually by abuse of notation we refer to all of these three sets simultaneously as the “unit circle”.) Trigonometric polynomials on the additive unit circle then correspond to ordinary polynomials of the real coefficients ${x,y}$ of the geometric unit circle, or Laurent polynomials of the complex variable ${z}$.

What about periodic functions on the complex plane? We can start with singly periodic functions ${f: {\bf C} \rightarrow {\bf C}}$ which obey a periodicity relationship ${f(z+\omega)=f(z)}$ for all ${z}$ in the domain and some period ${\omega \in {\bf C} \backslash \{0\}}$; such functions can also be viewed as functions on the “additive cylinder” ${\omega {\bf Z} \backslash {\bf C}}$ (or equivalently ${{\bf C} / \omega {\bf Z}}$). We can rescale ${\omega=1}$ as before. For holomorphic functions, we have the following characterisations:

Proposition 1 (Description of singly periodic holomorphic functions)
In both cases, the coefficients ${a_n}$ can be recovered from ${f}$ by the Fourier inversion formula

$\displaystyle a_n = \int_{\gamma_{z_0 \rightarrow z_0+1}} f(z) e^{-2\pi i nz}\ dz \ \ \ \ \ (5)$

for any ${z_0}$ in ${{\bf C}}$ (in case (i)) or ${{\bf H}}$ (in case (ii)).

Proof: If ${f: {\bf C} \rightarrow {\bf C}}$ is ${1}$-periodic, then it can be expressed as ${f(z) = F(q) = F(e^{2\pi i z})}$ for some function ${F: {\bf C} \backslash \{0\} \rightarrow {\bf C}}$ on the “multiplicative cylinder” ${{\bf C} \backslash \{0\}}$, since the fibres of the map ${z \mapsto e^{2\pi i z}}$ are cosets of the integers ${{\bf Z}}$, on which ${f}$ is constant by hypothesis. As the map ${z \mapsto e^{2\pi i z}}$ is a covering map from ${{\bf C}}$ to ${{\bf C} \backslash \{0\}}$, we see that ${F}$ will be holomorphic if and only if ${f}$ is. Thus ${F}$ must have a Laurent series expansion ${F(q) = \sum_{n=-\infty}^\infty a_n q^n}$ with coefficients ${a_n}$ obeying (2), which gives (1), and the inversion formula (5) follows from the usual contour integration formula for Laurent series coefficients. The converse direction to (i) also follows by reversing the above arguments.

For part (ii), we observe that the map ${z \mapsto e^{2\pi i z}}$ is also a covering map from ${{\bf H}}$ to the punctured disk ${D(0,1) \backslash \{0\}}$, so we can argue as before except that now ${F}$ is a bounded holomorphic function on the punctured disk. By the Riemann singularity removal theorem (Exercise 35 of 246A Notes 3) ${F}$ extends to be holomorphic on all of ${D(0,1)}$, and thus has a Taylor expansion ${F(q) = \sum_{n=0}^\infty a_n q^n}$ for some coefficients ${a_n}$ obeying (4). The argument now proceeds as with part (i). $\Box$

The additive cylinder ${{\bf Z} \backslash {\bf C}}$ and the multiplicative cylinder ${{\bf C} \backslash \{0\}}$ can both be identified (on the level of smooth manifolds, at least) with the geometric cylinder ${\{ (x,y,z) \in {\bf R}^3: x^2+y^2=1\}}$, but we will not use this identification here.

Now let us turn attention to doubly periodic functions of a complex variable ${z}$, that is to say functions ${f}$ that obey two periodicity relations

$\displaystyle f(z+\omega_1) = f(z); \quad f(z+\omega_2) = f(z)$

for all ${z \in {\bf C}}$ and some periods ${\omega_1,\omega_2 \in {\bf C}}$, which to avoid degeneracies we will assume to be linearly independent over the reals (thus ${\omega_1,\omega_2}$ are non-zero and the ratio ${\omega_2/\omega_1}$ is not real). One can rescale ${\omega_1,\omega_2}$ by a common scaling factor ${\lambda \in {\bf C} \backslash \{0\}}$ to normalise either ${\omega_1=1}$ or ${\omega_2=1}$, but one of course cannot simultaneously normalise both parameters in this fashion. As in the singly periodic case, such functions can also be identified with functions on the additive ${2}$-torus ${\Lambda \backslash {\bf C}}$, where ${\Lambda}$ is the lattice ${\Lambda := \omega_1 {\bf Z} + \omega_2 {\bf Z}}$, or with functions ${f}$ on the solid parallelogram bounded by the contour ${\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}}$ (a fundamental domain up to boundary for that torus), obeying the boundary periodicity conditions

$\displaystyle f(z+\omega_1) = f(z)$

for ${z}$ in the edge ${\gamma_{\omega_2 \rightarrow 0}}$, and

$\displaystyle f(z+\omega_2) = f(z)$

for ${z}$ in the edge ${\gamma_{\omega_0 \rightarrow 1}}$.

Within the world of holomorphic functions, the collection of doubly periodic functions is boring:

Proposition 2 Let ${f: {\bf C} \rightarrow {\bf C}}$ be an entire doubly periodic function (with periods ${\omega_1,\omega_2}$ linearly independent over ${{\bf R}}$). Then ${f}$ is constant.

In the language of Riemann surfaces, this proposition asserts that the torus ${\Lambda \backslash {\bf C}}$ is a non-hyperbolic Riemann surface; it cannot be holomorphically mapped non-trivially into a bounded subset of the complex plane.

Proof: The fundamental domain (up to boundary) enclosed by ${\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}}$ is compact, hence ${f}$ is bounded on this domain, hence bounded on all of ${{\bf C}}$ by double periodicity. The claim now follows from Liouville’s theorem. (One could alternatively have argued here using the compactness of the torus ${(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}$. $\Box$

To obtain more interesting examples of doubly periodic functions, one must therefore turn to the world of meromorphic functions – or equivalently, holomorphic functions into the Riemann sphere ${{\bf C} \cup \{\infty\}}$. As it turns out, a particularly fundamental example of such a function is the Weierstrass elliptic function

$\displaystyle \wp(z) := \frac{1}{z^2} + \sum_{z_0 \in \Lambda \backslash 0} \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} \ \ \ \ \ (6)$

which plays a role in doubly periodic functions analogous to the role of ${x \mapsto \cos(2\pi x)}$ for ${1}$-periodic real functions. This function will have a double pole at the origin ${0}$, and more generally at all other points on the lattice ${\Lambda}$, but no other poles. The derivative

$\displaystyle \wp'(z) = -2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3} \ \ \ \ \ (7)$

of the Weierstrass function is another doubly periodic meromorphic function, now with a triple pole at every point of ${\Lambda}$, and plays a role analogous to ${x \mapsto \sin(2\pi x)}$. Remarkably, all the other doubly periodic meromorphic functions with these periods will turn out to be rational combinations of ${\wp}$ and ${\wp'}$; furthermore, in analogy with the identity ${\cos^2 x+ \sin^2 x = 1}$, one has an identity of the form

$\displaystyle \wp'(z)^2 = 4 \wp(z)^3 - g_2 \wp(z) - g_3 \ \ \ \ \ (8)$

for all ${z \in {\bf C}}$ (avoiding poles) and some complex numbers ${g_2,g_3}$ that depend on the lattice ${\Lambda}$. Indeed, much as the map ${x \mapsto (\cos 2\pi x, \sin 2\pi x)}$ creates a diffeomorphism between the additive unit circle ${{\bf R}/{\bf Z}}$ to the geometric unit circle ${\{ (x,y) \in{\bf R}^2: x^2+y^2=1\}}$, the map ${z \mapsto (\wp(z), \wp'(z))}$ turns out to be a complex diffeomorphism between the torus ${(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}$ and the elliptic curve

$\displaystyle \{ (z, w) \in {\bf C}^2: z^2 = 4w^3 - g_2 w - g_3 \} \cup \{\infty\}$

with the convention that ${(\wp,\wp')}$ maps the origin ${\omega_1 {\bf Z} + \omega_2 {\bf Z}}$ of the torus to the point ${\infty}$ at infinity. (Indeed, one can view elliptic curves as “multiplicative tori”, and both the additive and multiplicative tori can be identified as smooth manifolds with the more familiar geometric torus, but we will not use such an identification here.) This fundamental identification with elliptic curves and tori motivates many of the further remarkable properties of elliptic curves; for instance, the fact that tori are obviously an abelian group gives rise to an abelian group law on elliptic curves (and this law can be interpreted as an analogue of the trigonometric sum identities for ${\wp, \wp'}$). The description of the various meromorphic functions on the torus also helps motivate the more general Riemann-Roch theorem that is a fundamental law governing meromorphic functions on other compact Riemann surfaces (and is discussed further in these 246C notes). So far we have focused on studying a single torus ${\Lambda \backslash {\bf C}}$. However, another important mathematical object of study is the space of all such tori, modulo isomorphism; this is a basic example of a moduli space, known as the (classical, level one) modular curve ${X_0(1)}$. This curve can be described in a number of ways. On the one hand, it can be viewed as the upper half-plane ${{\bf H} = \{ z: \mathrm{Im}(z) > 0 \}}$ quotiented out by the discrete group ${SL_2({\bf Z})}$; on the other hand, by using the ${j}$-invariant, it can be identified with the complex plane ${{\bf C}}$; alternatively, one can compactify the modular curve and identify this compactification with the Riemann sphere ${{\bf C} \cup \{\infty\}}$. (This identification, by the way, produces a very short proof of the little and great Picard theorems, which we proved in 246A Notes 4.) Functions on the modular curve (such as the ${j}$-invariant) can be viewed as ${SL_2({\bf Z})}$-invariant functions on ${{\bf H}}$, and include the important class of modular functions; they naturally generalise to the larger class of (weakly) modular forms, which are functions on ${{\bf H}}$ which transform in a very specific way under ${SL_2({\bf Z})}$-action, and which are ubiquitous throughout mathematics, and particularly in number theory. Basic examples of modular forms include the Eisenstein series, which are also the Laurent coefficients of the Weierstrass elliptic functions ${\wp}$. More number theoretic examples of modular forms include (suitable powers of) theta functions ${\theta}$, and the modular discriminant ${\Delta}$. Modular forms are ${1}$-periodic functions on the half-plane, and hence by Proposition 1 come with Fourier coefficients ${a_n}$; these coefficients often turn out to encode a surprising amount of number-theoretic information; a dramatic example of this is the famous modularity theorem, (a special case of which was) used amongst other things to establish Fermat’s last theorem. Modular forms can be generalised to other discrete groups than ${SL_2({\bf Z})}$ (such as congruence groups) and to other domains than the half-plane ${{\bf H}}$, leading to the important larger class of automorphic forms, which are of major importance in number theory and representation theory, but which are well outside the scope of this course to discuss.