Asgar Jamneshan and myself have just uploaded to the arXiv our preprint “The inverse theorem for the ${U^3}$ Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches“. This paper, which is a companion to another recent paper of ourselves and Or Shalom, studies the inverse theory for the third Gowers uniformity norm

$\displaystyle \| f \|_{U^3(G)}^8 = {\bf E}_{h_1,h_2,h_3,x \in G} \Delta_{h_1} \Delta_{h_2} \Delta_{h_3} f(x)$

on an arbitrary finite abelian group ${G}$, where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$ is the multiplicative derivative. Our main result is as follows:

Theorem 1 (Inverse theorem for ${U^3(G)}$) Let ${G}$ be a finite abelian group, and let ${f: G \rightarrow {\bf C}}$ be a ${1}$-bounded function with ${\|f\|_{U^3(G)} \geq \eta}$ for some ${0 < \eta \leq 1/2}$. Then:
• (i) (Correlation with locally quadratic phase) There exists a regular Bohr set ${B(S,\rho) \subset G}$ with ${|S| \ll \eta^{-O(1)}}$ and ${\exp(-\eta^{-O(1)}) \ll \rho \leq 1/2}$, a locally quadratic function ${\phi: B(S,\rho) \rightarrow {\bf R}/{\bf Z}}$, and a function ${\xi: G \rightarrow \hat G}$ such that

$\displaystyle {\bf E}_{x \in G} |{\bf E}_{h \in B(S,\rho)} f(x+h) e(-\phi(h)-\xi(x) \cdot h)| \gg \eta^{O(1)}.$

• (ii) (Correlation with nilsequence) There exists an explicit degree two filtered nilmanifold ${H/\Lambda}$ of dimension ${O(\eta^{-O(1)})}$, a polynomial map ${g: G \rightarrow H/\Lambda}$, and a Lipschitz function ${F: H/\Lambda \rightarrow {\bf C}}$ of constant ${O(\exp(\eta^{-O(1)}))}$ such that

$\displaystyle |{\bf E}_{x \in G} f(x) \overline{F}(g(x))| \gg \exp(-\eta^{-O(1)}).$

Such a theorem was proven by Ben Green and myself in the case when ${|G|}$ was odd, and by Samorodnitsky in the ${2}$-torsion case ${G = {\bf F}_2^n}$. In all cases one uses the “higher order Fourier analysis” techniques introduced by Gowers. After some now-standard manipulations (using for instance what is now known as the Balog-Szemerédi-Gowers lemma), one arrives (for arbitrary ${G}$) at an estimate that is roughly of the form

$\displaystyle |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) b(x,k) b(x,h) e(-B(h,k))| \gg \eta^{O(1)}$

where ${b}$ denotes various ${1}$-bounded functions whose exact values are not too important, and ${B: B(S,\rho) \times B(S,\rho) \rightarrow {\bf R}/{\bf Z}}$ is a symmetric locally bilinear form. The idea is then to “integrate” this form by expressing it in the form

$\displaystyle B(h,k) = \phi(h+k) - \phi(h) - \phi(k) \ \ \ \ \ (1)$

for some locally quadratic ${\phi: B(S,\rho) \rightarrow {\bf C}}$; this then allows us to write the above correlation as

$\displaystyle |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) e(-\phi(h+k)) b(x,k) b(x,h)| \gg \eta^{O(1)}$

(after adjusting the ${b}$ functions suitably), and one can now conclude part (i) of the above theorem using some linear Fourier analysis. Part (ii) follows by encoding locally quadratic phase functions as nilsequences; for this we adapt an algebraic construction of Manners.

So the key step is to obtain a representation of the form (1), possibly after shrinking the Bohr set ${B(S,\rho)}$ a little if needed. This has been done in the literature in two ways:

• When ${|G|}$ is odd, one has the ability to divide by ${2}$, and on the set ${2 \cdot B(S,\frac{\rho}{10}) = \{ 2x: x \in B(S,\frac{\rho}{10})\}}$ one can establish (1) with ${\phi(h) := B(\frac{1}{2} h, h)}$. (This is similar to how in single variable calculus the function ${x \mapsto \frac{1}{2} x^2}$ is a function whose second derivative is equal to ${1}$.)
• When ${G = {\bf F}_2^n}$, then after a change of basis one can take the Bohr set ${B(S,\rho)}$ to be ${{\bf F}_2^m}$ for some ${m}$, and the bilinear form can be written in coordinates as

$\displaystyle B(h,k) = \sum_{1 \leq i,j \leq m} a_{ij} h_i k_j / 2 \hbox{ mod } 1$

for some ${a_{ij} \in {\bf F}_2}$ with ${a_{ij}=a_{ji}}$. The diagonal terms ${a_{ii}}$ cause a problem, but by subtracting off the rank one form ${(\sum_{i=1}^m a_{ii} h_i) ((\sum_{i=1}^m a_{ii} k_i) / 2}$ one can write

$\displaystyle B(h,k) = \sum_{1 \leq i,j \leq m} b_{ij} h_i k_j / 2 \hbox{ mod } 1$

on the orthogonal complement of ${(a_{11},\dots,a_{mm})}$ for some coefficients ${b_{ij}=b_{ji}}$ which now vanish on the diagonal: ${b_{ii}=0}$. One can now obtain (1) on this complement by taking

$\displaystyle \phi(h) := \sum_{1 \leq i < j \leq m} b_{ij} h_i h_k / 2 \hbox{ mod } 1.$

In our paper we can now treat the case of arbitrary finite abelian groups ${G}$, by means of the following two new ingredients:

• (i) Using some geometry of numbers, we can lift the group ${G}$ to a larger (possibly infinite, but still finitely generated) abelian group ${G_S}$ with a projection map ${\pi: G_S \rightarrow G}$, and find a globally bilinear map ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ on the latter group, such that one has a representation

$\displaystyle B(\pi(x), \pi(y)) = \tilde B(x,y) \ \ \ \ \ (2)$

of the locally bilinear form ${B}$ by the globally bilinear form ${\tilde B}$ when ${x,y}$ are close enough to the origin.
• (ii) Using an explicit construction, one can show that every globally bilinear map ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ has a representation of the form (1) for some globally quadratic function ${\tilde \phi: G_S \rightarrow {\bf R}/{\bf Z}}$.

To illustrate (i), consider the Bohr set ${B(S,1/10) = \{ x \in {\bf Z}/N{\bf Z}: \|x/N\|_{{\bf R}/{\bf Z}} < 1/10\}}$ in ${G = {\bf Z}/N{\bf Z}}$ (where ${\|\|_{{\bf R}/{\bf Z}}}$ denotes the distance to the nearest integer), and consider a locally bilinear form ${B: B(S,1/10) \times B(S,1/10) \rightarrow {\bf R}/{\bf Z}}$ of the form ${B(x,y) = \alpha x y \hbox{ mod } 1}$ for some real number ${\alpha}$ and all integers ${x,y \in (-N/10,N/10)}$ (which we identify with elements of ${G}$. For generic ${\alpha}$, this form cannot be extended to a globally bilinear form on ${G}$; however if one lifts ${G}$ to the finitely generated abelian group

$\displaystyle G_S := \{ (x,\theta) \in {\bf Z}/N{\bf Z} \times {\bf R}: \theta = x/N \hbox{ mod } 1 \}$

(with projection map ${\pi: (x,\theta) \mapsto x}$) and introduces the globally bilinear form ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ by the formula

$\displaystyle \tilde B((x,\theta),(y,\sigma)) = N^2 \alpha \theta \sigma \hbox{ mod } 1$

then one has (2) when ${\theta,\sigma}$ lie in the interval ${(-1/10,1/10)}$. A similar construction works for higher rank Bohr sets.

To illustrate (ii), the key case turns out to be when ${G_S}$ is a cyclic group ${{\bf Z}/N{\bf Z}}$, in which case ${\tilde B}$ will take the form

$\displaystyle \tilde B(x,y) = \frac{axy}{N} \hbox{ mod } 1$

for some integer ${a}$. One can then check by direct construction that (1) will be obeyed with

$\displaystyle \tilde \phi(x) = \frac{a \binom{x}{2}}{N} - \frac{a x \binom{N}{2}}{N^2} \hbox{ mod } 1$

regardless of whether ${N}$ is even or odd. A variant of this construction also works for ${{\bf Z}}$, and the general case follows from a short calculation verifying that the claim (ii) for any two groups ${G_S, G'_S}$ implies the corresponding claim (ii) for the product ${G_S \times G'_S}$.

This concludes the Fourier-analytic proof of Theorem 1. In this paper we also give an ergodic theory proof of (a qualitative version of) Theorem 1(ii), using a correspondence principle argument adapted from this previous paper of Ziegler, and myself. Basically, the idea is to randomly generate a dynamical system on the group ${G}$, by selecting an infinite number of random shifts ${g_1, g_2, \dots \in G}$, which induces an action of the infinitely generated free abelian group ${{\bf Z}^\omega = \bigcup_{n=1}^\infty {\bf Z}^n}$ on ${G}$ by the formula

$\displaystyle T^h x := x + \sum_{i=1}^\infty h_i g_i.$

Much as the law of large numbers ensures the almost sure convergence of Monte Carlo integration, one can show that this action is almost surely ergodic (after passing to a suitable Furstenberg-type limit ${X}$ where the size of ${G}$ goes to infinity), and that the dynamical Host-Kra-Gowers seminorms of that system coincide with the combinatorial Gowers norms of the original functions. One is then well placed to apply an inverse theorem for the third Host-Kra-Gowers seminorm ${U^3(X)}$ for ${{\bf Z}^\omega}$-actions, which was accomplished in the companion paper to this one. After doing so, one almost gets the desired conclusion of Theorem 1(ii), except that after undoing the application of the Furstenberg correspondence principle, the map ${g: G \rightarrow H/\Lambda}$ is merely an almost polynomial rather than a polynomial, which roughly speaking means that instead of certain derivatives of ${g}$ vanishing, they instead are merely very small outside of a small exceptional set. To conclude we need to invoke a “stability of polynomials” result, which at this level of generality was first established by Candela and Szegedy (though we also provide an independent proof here in an appendix), which roughly speaking asserts that every approximate polynomial is close in measure to an actual polynomial. (This general strategy is also employed in the Candela-Szegedy paper, though in the absence of the ergodic inverse theorem input that we rely upon here, the conclusion is weaker in that the filtered nilmanifold ${H/\Lambda}$ is replaced with a general space known as a “CFR nilspace”.)

This transference principle approach seems to work well for the higher step cases (for instance, the stability of polynomials result is known in arbitrary degree); the main difficulty is to establish a suitable higher step inverse theorem in the ergodic theory setting, which we hope to do in future research.

Asgar Jamneshan, Or Shalom, and myself have just uploaded to the arXiv our preprint “The structure of arbitrary Conze–Lesigne systems“. As the title suggests, this paper is devoted to the structural classification of Conze-Lesigne systems, which are a type of measure-preserving system that are “quadratic” or of “complexity two” in a certain technical sense, and are of importance in the theory of multiple recurrence. There are multiple ways to define such systems; here is one. Take a countable abelian group ${\Gamma}$ acting in a measure-preserving fashion on a probability space ${(X,\mu)}$, thus each group element ${\gamma \in \Gamma}$ gives rise to a measure-preserving map ${T^\gamma: X \rightarrow X}$. Define the third Gowers-Host-Kra seminorm ${\|f\|_{U^3(X)}}$ of a function ${f \in L^\infty(X)}$ via the formula

$\displaystyle \|f\|_{U^3(X)}^8 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2,h_3 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2,\omega_3 \in \{0,1\}}$

$\displaystyle {\mathcal C}^{\omega_1+\omega_2+\omega_3} f(T^{\omega_1 h_1 + \omega_2 h_2 + \omega_3 h_3} x)\ d\mu(x)$

where ${\Phi_n}$ is a Folner sequence for ${\Gamma}$ and ${{\mathcal C}: z \mapsto \overline{z}}$ is the complex conjugation map. One can show that this limit exists and is independent of the choice of Folner sequence, and that the ${\| \|_{U^3(X)}}$ seminorm is indeed a seminorm. A Conze-Lesigne system is an ergodic measure-preserving system in which the ${U^3(X)}$ seminorm is in fact a norm, thus ${\|f\|_{U^3(X)}>0}$ whenever ${f \in L^\infty(X)}$ is non-zero. Informally, this means that when one considers a generic parallelepiped in a Conze–Lesigne system ${X}$, the location of any vertex of that parallelepiped is more or less determined by the location of the other seven vertices. These are the important systems to understand in order to study “complexity two” patterns, such as arithmetic progressions of length four. While not all systems ${X}$ are Conze-Lesigne systems, it turns out that they always have a maximal factor ${Z^2(X)}$ that is a Conze-Lesigne system, known as the Conze-Lesigne factor or the second Host-Kra-Ziegler factor of the system, and this factor controls all the complexity two recurrence properties of the system.

The analogous theory in complexity one is well understood. Here, one replaces the ${U^3(X)}$ norm by the ${U^2(X)}$ norm

$\displaystyle \|f\|_{U^2(X)}^4 := \lim_{n \rightarrow \infty} {\bf E}_{h_1,h_2 \in \Phi_n} \int_X \prod_{\omega_1,\omega_2 \in \{0,1\}} {\mathcal C}^{\omega_1+\omega_2} f(T^{\omega_1 h_1 + \omega_2 h_2} x)\ d\mu(x)$

and the ergodic systems for which ${U^2}$ is a norm are called Kronecker systems. These systems are completely classified: a system is Kronecker if and only if it arises from a compact abelian group ${Z}$ equipped with Haar probability measure and a translation action ${T^\gamma \colon z \mapsto z + \phi(\gamma)}$ for some homomorphism ${\phi: \Gamma \rightarrow Z}$ with dense image. Such systems can then be analyzed quite efficiently using the Fourier transform, and this can then be used to satisfactory analyze “complexity one” patterns, such as length three progressions, in arbitrary systems (or, when translated back to combinatorial settings, in arbitrary dense sets of abelian groups).

We return now to the complexity two setting. The most famous examples of Conze-Lesigne systems are (order two) nilsystems, in which the space ${X}$ is a quotient ${G/\Lambda}$ of a two-step nilpotent Lie group ${G}$ by a lattice ${\Lambda}$ (equipped with Haar probability measure), and the action is given by a translation ${T^\gamma x = \phi(\gamma) x}$ for some group homomorphism ${\phi: \Gamma \rightarrow G}$. For instance, the Heisenberg ${{\bf Z}}$-nilsystem

$\displaystyle \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix} / \begin{pmatrix} 1 & {\bf Z} & {\bf Z} \\ 0 & 1 & {\bf Z} \\ 0 & 0 & 1 \end{pmatrix}$

with a shift of the form

$\displaystyle Tx = \begin{pmatrix} 1 & \alpha & 0 \\ 0 & 1 & \beta \\ 0 & 0 & 1 \end{pmatrix} x$

for ${\alpha,\beta}$ two real numbers with ${1,\alpha,\beta}$ linearly independent over ${{\bf Q}}$, is a Conze-Lesigne system. As the base case of a well known result of Host and Kra, it is shown in fact that all Conze-Lesigne ${{\bf Z}}$-systems are inverse limits of nilsystems (previous results in this direction were obtained by Conze-Lesigne, Furstenberg-Weiss, and others). Similar results are known for ${\Gamma}$-systems when ${\Gamma}$ is finitely generated, thanks to the thesis work of Griesmer (with further proofs by Gutman-Lian and Candela-Szegedy). However, this is not the case once ${\Gamma}$ is not finitely generated; as a recent example of Shalom shows, Conze-Lesigne systems need not be the inverse limit of nilsystems in this case.

Our main result is that even in the infinitely generated case, Conze-Lesigne systems are still inverse limits of a slight generalisation of the nilsystem concept, in which ${G}$ is a locally compact Polish group rather than a Lie group:

Theorem 1 (Classification of Conze-Lesigne systems) Let ${\Gamma}$ be a countable abelian group, and ${X}$ an ergodic measure-preserving ${\Gamma}$-system. Then ${X}$ is a Conze-Lesigne system if and only if it is the inverse limit of translational systems ${G/\Lambda}$, where ${G}$ is a nilpotent locally compact Polish group of nilpotency class two, and ${\Lambda}$ is a lattice in ${G}$ (and also a lattice in the commutator group ${[G,G]}$), with ${G/\Lambda}$ equipped with the Haar probability measure and a translation action ${T^\gamma x = \phi(\gamma) x}$ for some homomorphism ${\phi: \Gamma \rightarrow G}$.

In a forthcoming companion paper to this one, Asgar Jamneshan and I will use this theorem to derive an inverse theorem for the Gowers norm ${U^3(G)}$ for an arbitrary finite abelian group ${G}$ (with no restrictions on the order of ${G}$, in particular our result handles the case of even and odd ${|G|}$ in a unified fashion). In principle, having a higher order version of this theorem will similarly allow us to derive inverse theorems for ${U^{s+1}(G)}$ norms for arbitrary ${s}$ and finite abelian ${G}$; we hope to investigate this further in future work.

We sketch some of the main ideas used to prove the theorem. The existing machinery developed by Conze-Lesigne, Furstenberg-Weiss, Host-Kra, and others allows one to describe an arbitrary Conze-Lesigne system as a group extension ${Z \rtimes_\rho K}$, where ${Z}$ is a Kronecker system (a rotational system on a compact abelian group ${Z = (Z,+)}$ and translation action ${\phi: \Gamma \rightarrow Z}$), ${K = (K,+)}$ is another compact abelian group, and the cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is a collection of measurable maps ${\rho_\gamma: Z \rightarrow K}$ obeying the cocycle equation

$\displaystyle \rho_{\gamma_1+\gamma_2}(x) = \rho_{\gamma_1}(T^{\gamma_2} x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)$

for almost all ${x \in Z}$. Furthermore, ${\rho}$ is of “type two”, which means in this concrete setting that it obeys an additional equation

$\displaystyle \rho_\gamma(x + z_1 + z_2) - \rho_\gamma(x+z_1) - \rho_\gamma(x+z_2) + \rho_\gamma(x) \ \ \ \ \ (2)$

$\displaystyle = F(x + \phi(\gamma), z_1, z_2) - F(x,z_1,z_2)$

for all ${\gamma \in \Gamma}$ and almost all ${x,z_1,z_2 \in Z}$, and some measurable function ${F: Z^3 \rightarrow K}$; roughly speaking this asserts that ${\phi_\gamma}$ is “linear up to coboundaries”. For technical reasons it is also convenient to reduce to the case where ${Z}$ is separable. The problem is that the equation (2) is unwieldy to work with. In the model case when the target group ${K}$ is a circle ${{\bf T} = {\bf R}/{\bf Z}}$, one can use some Fourier analysis to convert (2) into the more tractable Conze-Lesigne equation

$\displaystyle \rho_\gamma(x+z) - \rho_\gamma(x) = F_z(x+\phi(\gamma)) - F_z(x) + c_z(\gamma) \ \ \ \ \ (3)$

for all ${\gamma \in \Gamma}$, all ${z \in Z}$, and almost all ${x \in Z}$, where for each ${z}$, ${F_z: Z \rightarrow K}$ is a measurable function, and ${c_z: \Gamma \rightarrow K}$ is a homomorphism. (For technical reasons it is often also convenient to enforce that ${F_z, c_z}$ depend in a measurable fashion on ${z}$; this can always be achieved, at least when the Conze-Lesigne system is separable, but actually verifying that this is possible actually requires a certain amount of effort, which we devote an appendix to in our paper.) It is not difficult to see that (3) implies (2) for any group ${K}$ (as long as one has the measurability in ${z}$ mentioned previously), but the converse turns out to fail for some groups ${K}$, such as solenoid groups (e.g., inverse limits of ${{\bf R}/2^n{\bf Z}}$ as ${n \rightarrow \infty}$), as was essentially shown by Rudolph. However, in our paper we were able to find a separate argument that also derived the Conze-Lesigne equation in the case of a cyclic group ${K = \frac{1}{N}{\bf Z}/{\bf Z}}$. Putting together the ${K={\bf T}}$ and ${K = \frac{1}{N}{\bf Z}/{\bf Z}}$ cases, one can then derive the Conze-Lesigne equation for arbitrary compact abelian Lie groups ${K}$ (as such groups are isomorphic to direct products of finitely many tori and cyclic groups). As has been known for some time (see e.g., this paper of Host and Kra), once one has a Conze-Lesigne equation, one can more or less describe the system ${X}$ as a translational system ${G/\Lambda}$, where the Host-Kra group ${G}$ is the set of all pairs ${(z, F_z)}$ that solve an equation of the form (3) (with these pairs acting on ${X \equiv Z \rtimes_\rho K}$ by the law ${(z,F_z) \cdot (x,k) := (x+z, k+F_z(x))}$), and ${\Lambda}$ is the stabiliser of a point in this system. This then establishes the theorem in the case when ${K}$ is a Lie group, and the general case basically comes from the fact (from Fourier analysis or the Peter-Weyl theorem) that an arbitrary compact abelian group is an inverse limit of Lie groups. (There is a technical issue here in that one has to check that the space of translational system factors of ${X}$ form a directed set in order to have a genuine inverse limit, but this can be dealt with by modifications of the tools mentioned here.)

There is an additional technical issue worth pointing out here (which unfortunately was glossed over in some previous work in the area). Because the cocycle equation (1) and the Conze-Lesigne equation (3) are only valid almost everywhere instead of everywhere, the action of ${G}$ on ${X}$ is technically only a near-action rather than a genuine action, and as such one cannot directly define ${\Lambda}$ to be the stabiliser of a point without running into multiple problems. To fix this, one has to pass to a topological model of ${X}$ in which the action becomes continuous, and the stabilizer becomes well defined, although one then has to work a little more to check that the action is still transitive. This can be done via Gelfand duality; we proceed using a mixture of a construction from this book of Host and Kra, and the machinery in this recent paper of Asgar and myself.

Now we discuss how to establish the Conze-Lesigne equation (3) in the cyclic group case ${K = \frac{1}{N}{\bf Z}/{\bf Z}}$. As this group embeds into the torus ${{\bf T}}$, it is easy to use existing methods obtain (3) but with the homomorphism ${c_z}$ and the function ${F_z}$ taking values in ${{\bf R}/{\bf Z}}$ rather than in ${\frac{1}{N}{\bf Z}/{\bf Z}}$. The main task is then to fix up the homomorphism ${c_z}$ so that it takes values in ${\frac{1}{N}{\bf Z}/{\bf Z}}$, that is to say that ${Nc_z}$ vanishes. This only needs to be done locally near the origin, because the claim is easy when ${z}$ lies in the dense subgroup ${\phi(\Gamma)}$ of ${Z}$, and also because the claim can be shown to be additive in ${z}$. Near the origin one can leverage the Steinhaus lemma to make ${c_z}$ depend linearly (or more precisely, homomorphically) on ${z}$, and because the cocycle ${\rho}$ already takes values in ${\frac{1}{N}{\bf Z}/{\bf Z}}$, ${N\rho}$ vanishes and ${Nc_z}$ must be an eigenvalue of the system ${Z}$. But as ${Z}$ was assumed to be separable, there are only countably many eigenvalues, and by another application of Steinhaus and linearity one can then make ${Nc_z}$ vanish on an open neighborhood of the identity, giving the claim.

Joni Teräväinen and I have just uploaded to the arXiv our preprint “The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero“. This paper is a development of the theme that certain conjectures in analytic number theory become easier if one makes the hypothesis that Siegel zeroes exist; this places one in a presumably “illusory” universe, since the widely believed Generalised Riemann Hypothesis (GRH) precludes the existence of such zeroes, yet this illusory universe seems remarkably self-consistent and notoriously impossible to eliminate from one’s analysis.

For the purposes of this paper, a Siegel zero is a zero ${\beta}$ of a Dirichlet ${L}$-function ${L(\cdot,\chi)}$ corresponding to a primitive quadratic character ${\chi}$ of some conductor ${q_\chi}$, which is close to ${1}$ in the sense that

$\displaystyle \beta = 1 - \frac{1}{\eta \log q_\chi}$

for some large ${\eta \gg 1}$ (which we will call the quality) of the Siegel zero. The significance of these zeroes is that they force the Möbius function ${\mu}$ and the Liouville function ${\lambda}$ to “pretend” to be like the exceptional character ${\chi}$ for primes of magnitude comparable to ${q_\chi}$. Indeed, if we define an exceptional prime to be a prime ${p^*}$ in which ${\chi(p^*) \neq -1}$, then very few primes near ${q_\chi}$ will be exceptional; in our paper we use some elementary arguments to establish the bounds

$\displaystyle \sum_{q_\chi^{1/2+\varepsilon} < p^* \leq x} \frac{1}{p^*} \ll_\varepsilon \frac{\log x}{\eta \log q_\chi} \ \ \ \ \ (1)$

for any ${x \geq q_\chi^{1/2+\varepsilon}}$ and ${\varepsilon>0}$, where the sum is over exceptional primes in the indicated range ${q_\chi^{1/2+\varepsilon} < p^* \leq x}$; this bound is non-trivial for ${x}$ as large as ${q_\chi^{\eta^{1-\varepsilon}}}$. (See Section 1 of this blog post for some variants of this argument, which were inspired by work of Heath-Brown.) There is also a companion bound (somewhat weaker) that covers a range of ${p^*}$ a little bit below ${q_\chi^{1/2}}$.

One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:

Theorem 1 (Hardy-Littlewood assuming Siegel zero) Let ${h}$ be a fixed natural number. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then we have

$\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+h) = ({\mathfrak S} + O( \frac{1}{\log\log \eta} )) x$

for all ${q_\chi^{250} \leq x \leq q_\chi^{300}}$, where ${\Lambda}$ is the von Mangoldt function and ${{\mathfrak S}}$ is the singular series

$\displaystyle {\mathfrak S} = \prod_{p|h} \frac{p}{p-1} \prod_{p \nmid h} (1 - \frac{1}{(p-1)^2})$

In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.

Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):

Theorem 2 (Chowla assuming Siegel zero) Let ${h_1,\dots,h_\ell}$ be distinct fixed natural numbers. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then one has

$\displaystyle \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_\ell) \ll \frac{x}{(\log\log \eta)^{1/2} (\log \eta)^{1/12}}$

in the range ${q_\chi^{10} \leq x \leq q_\chi^{\log\log \eta / 3}}$, where ${\lambda}$ is the Liouville function.

In our paper we unify these results and also improve the quantitative estimates and range of ${x}$:

Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Let ${h_1,\dots,h_k,h'_1,\dots,h'_\ell}$ be distinct fixed natural numbers with ${k \leq 2}$. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then one has

$\displaystyle \sum_{n \leq x} \Lambda(n+h_1) \dots \Lambda(n+h_k) \lambda(n+h'_1) \dots \lambda(n+h'_\ell)$

$\displaystyle = ({\mathfrak S} + O_\varepsilon( \frac{1}{\log^{1/10\max(1,k)} \eta} )) x$

for

$\displaystyle q_\chi^{10k+\frac{1}{2}+\varepsilon} \leq x \leq q_\chi^{\eta^{1/2}}$

for any fixed ${\varepsilon>0}$.

Our argument proceeds by a series of steps in which we replace ${\Lambda}$ and ${\lambda}$ by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:

• (i) Replace the Liouville function ${\lambda}$ with an approximant ${\lambda_{\mathrm{Siegel}}}$, which is a completely multiplicative function that agrees with ${\lambda}$ at small primes and agrees with ${\chi}$ at large primes.
• (ii) Replace the von Mangoldt function ${\Lambda}$ with an approximant ${\Lambda_{\mathrm{Siegel}}}$, which is the Dirichlet convolution ${\chi * \log}$ multiplied by a Selberg sieve weight ${\nu}$ to essentially restrict that convolution to almost primes.
• (iii) Replace ${\lambda_{\mathrm{Siegel}}}$ with a more complicated truncation ${\lambda_{\mathrm{Siegel}}^\sharp}$ which has the structure of a “Type I sum”, and which agrees with ${\lambda_{\mathrm{Siegel}}}$ on numbers that have a “typical” factorization.
• (iv) Replace the approximant ${\Lambda_{\mathrm{Siegel}}}$ with a more complicated approximant ${\Lambda_{\mathrm{Siegel}}^\sharp}$ which has the structure of a “Type I sum”.
• (v) Now that all terms in the correlation have been replaced with tractable Type I sums, use standard Euler product calculations and Fourier analysis, similar in spirit to the proof of the pseudorandomness of the Selberg sieve majorant for the primes in this paper of Ben Green and myself, to evaluate the correlation to high accuracy.

Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.

The restriction ${k \leq 2}$ in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant ${\Lambda_{\mathrm{Siegel}}}$ to ${\Lambda}$ is a twisted, sieved version of the divisor function ${\tau}$, and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum

$\displaystyle \sum_{n \leq x} \tau(n+h_1) \dots \tau(n+h_k).$

For ${k=1}$ this sum can be easily controlled by the Dirichlet hyperbola method. For ${k=2}$ one needs the fact that ${\tau}$ has a level of distribution greater than ${1/2}$; in fact Kloosterman sum bounds give a level of distribution of ${2/3}$, a folklore fact that seems to have first been observed by Linnik and Selberg. We use a (notationally more complicated) version of this argument to treat the sums arising in (iv) for ${k \leq 2}$. Unfortunately for ${k > 2}$ there are no known techniques to unconditionally obtain asymptotics, even for the model sum

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+1) \tau(n+2),$

although we do at least have fairly convincing conjectures as to what the asymptotics should be. Because of this, it seems unlikely that one will be able to relax the ${k \leq 2}$ hypothesis in our main theorem at our current level of understanding of analytic number theory.

Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.

Rachel Greenfeld and I have just uploaded to the arXiv our preprint “Undecidable translational tilings with only two tiles, or one nonabelian tile“. This paper studies the following question: given a finitely generated group ${G}$, a (periodic) subset ${E}$ of ${G}$, and finite sets ${F_1,\dots,F_J}$ in ${G}$, is it possible to tile ${E}$ by translations ${a_j+F_j}$ of the tiles ${F_1,\dots,F_J}$? That is to say, is there a solution ${\mathrm{X}_1 = A_1, \dots, \mathrm{X}_J = A_J}$ to the (translational) tiling equation

$\displaystyle (\mathrm{X}_1 \oplus F_1) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J) = E \ \ \ \ \ (1)$

for some subsets ${A_1,\dots,A_J}$ of ${G}$, where ${A \oplus F}$ denotes the set of sums ${\{a+f: a \in A, f \in F \}}$ if the sums ${a+f}$ are all disjoint (and is undefined otherwise), and ${\uplus}$ denotes disjoint union. (One can also write the tiling equation in the language of convolutions as ${1_{\mathrm{X}_1} * 1_{F_1} + \dots + 1_{\mathrm{X}_J} * 1_{F_J} = 1_E}$.)

A bit more specifically, the paper studies the decidability of the above question. There are two slightly different types of decidability one could consider here:

• Logical decidability. For a given ${G, E, J, F_1,\dots,F_J}$, one can ask whether the solvability of the tiling equation (1) is provable or disprovable in ZFC (where we encode all the data ${G, E, F_1,\dots,F_J}$ by appropriate constructions in ZFC). If this is the case we say that the tiling equation (1) (or more precisely, the solvability of this equation) is logically decidable, otherwise it is logically undecidable.
• Algorithmic decidability. For data ${G,E,J, F_1,\dots,F_J}$ in some specified class (and encoded somehow as binary strings), one can ask whether the solvability of the tiling equation (1) can be correctly determined for all choices of data in this class by the output of some Turing machine that takes the data as input (encoded as a binary string) and halts in finite time, returning either YES if the equation can be solved or NO otherwise. If this is the case, we say the tiling problem of solving (1) for data in the given class is algorithmically decidable, otherwise it is algorithmically undecidable.

Note that the notion of logical decidability is “pointwise” in the sense that it pertains to a single choice of data ${G,E,J,F_1,\dots,F_J}$, whereas the notion of algorithmic decidability pertains instead to classes of data, and is only interesting when this class is infinite. Indeed, any tiling problem with a finite class of data is trivially decidable because one could simply code a Turing machine that is basically a lookup table that returns the correct answer for each choice of data in the class. (This is akin to how a student with a good memory could pass any exam if the questions are drawn from a finite list, merely by memorising an answer key for that list of questions.)

The two notions are related as follows: if a tiling problem (1) is algorithmically undecidable for some class of data, then the tiling equation must be logically undecidable for at least one choice of data for this class. For if this is not the case, one could algorithmically decide the tiling problem by searching for proofs or disproofs that the equation (1) is solvable for a given choice of data; the logical decidability of all such solvability questions will ensure that this algorithm always terminates in finite time.

One can use the Gödel completeness theorem to interpret logical decidability in terms of universes (also known as structures or models) of ZFC. In addition to the “standard” universe ${{\mathfrak U}}$ of sets that we believe satisfies the axioms of ZFC, there are also other “nonstandard” universes ${{\mathfrak U}^*}$ that also obey the axioms of ZFC. If the solvability of a tiling equation (1) is logically undecidable, this means that such a tiling exists in some universes of ZFC, but not in others.

(To continue the exam analogy, we thus see that a yes-no exam question is logically undecidable if the answer to the question is yes in some parallel universes, but not in others. A course syllabus is algorithmically undecidable if there is no way to prepare for the final exam for the course in a way that guarantees a perfect score (in the standard universe).)

Questions of decidability are also related to the notion of aperiodicity. For a given ${G, E, J, F_1,\dots,F_J}$, a tiling equation (1) is said to be aperiodic if the equation (1) is solvable (in the standard universe ${{\mathfrak U}}$ of ZFC), but none of the solutions (in that universe) are completely periodic (i.e., there are no solutions ${\mathrm{X}_1 = A_1,\dots, \mathrm{X}_J = A_J}$ where all of the ${A_1,\dots,A_J}$ are periodic). Perhaps the most well-known example of an aperiodic tiling (in the context of ${{\bf R}^2}$, and using rotations as well as translations) come from the Penrose tilings, but there are many others besides.

It was (essentially) observed by Hao Wang in the 1960s that if a tiling equation is logically undecidable, then it must necessarily be aperiodic. Indeed, if a tiling equation fails to be aperiodic, then (in the standard universe) either there is a periodic tiling, or there are no tilings whatsoever. In the former case, the periodic tiling can be used to give a finite proof that the tiling equation is solvable; in the latter case, the compactness theorem implies that there is some finite fragment of ${E}$ that is not compatible with being tiled by ${F_1,\dots,F_J}$, and this provides a finite proof that the tiling equation is unsolvable. Thus in either case the tiling equation is logically decidable.

This observation of Wang clarifies somewhat how logically undecidable tiling equations behave in the various universes of ZFC. In the standard universe, tilings exist, but none of them will be periodic. In nonstandard universes, tilings may or may not exist, and the tilings that do exist may be periodic (albeit with a nonstandard period); but there must be at least one universe in which no tiling exists at all.

In one dimension when ${G={\bf Z}}$ (or more generally ${G = {\bf Z} \times G_0}$ with ${G_0}$ a finite group), a simple pigeonholing argument shows that no tiling equations are aperiodic, and hence all tiling equations are decidable. However the situation changes in two dimensions. In 1966, Berger (a student of Wang) famously showed that there exist tiling equations (1) in the discrete plane ${E = G = {\bf Z}^2}$ that are aperiodic, or even logically undecidable; in fact he showed that the tiling problem in this case (with arbitrary choices of data ${J, F_1,\dots,F_J}$) was algorithmically undecidable. (Strictly speaking, Berger established this for a variant of the tiling problem known as the domino problem, but later work of Golomb showed that the domino problem could be easily encoded within the tiling problem.) This was accomplished by encoding the halting problem for Turing machines into the tiling problem (or domino problem); the latter is well known to be algorithmically undecidable (and thus have logically undecidable instances), and so the latter does also. However, the number of tiles ${J}$ required for Berger’s construction was quite large: his construction of an aperiodic tiling required ${J = 20426}$ tiles, and his construction of a logically undecidable tiling required an even larger (and not explicitly specified) collection of tiles. Subsequent work by many authors did reduce the number of tiles required; in the ${E=G={\bf Z}^2}$ setting, the current world record for the fewest number of tiles in an aperiodic tiling is ${J=8}$ (due to Amman, Grunbaum, and Shephard) and for a logically undecidable tiling is ${J=11}$ (due to Ollinger). On the other hand, it is conjectured (see Grunbaum-Shephard and Lagarias-Wang) that one cannot lower ${J}$ all the way to ${1}$:

Conjecture 1 (Periodic tiling conjecture) If ${E}$ is a periodic subset of a finitely generated abelian group ${G}$, and ${F}$ is a finite subset of ${G}$, then the tiling equation ${\mathrm{X} \oplus F = E}$ is not aperiodic.

This conjecture is known to be true in two dimensions (by work of Bhattacharya when ${G=E={\bf Z}^2}$, and more recently by us when ${E \subset G = {\bf Z}^2}$), but remains open in higher dimensions. By the preceding discussion, the conjecture implies that every tiling equation with a single tile is logically decidable, and the problem of whether a given periodic set can be tiled by a single tile is algorithmically decidable.

In this paper we show on the other hand that aperiodic and undecidable tilings exist when ${J=2}$, at least if one is permitted to enlarge the group ${G}$ a bit:

Theorem 2 (Logically undecidable tilings)
• (i) There exists a group ${G}$ of the form ${G = {\bf Z}^2 \times G_0}$ for some finite abelian ${G_0}$, a subset ${E_0}$ of ${G_0}$, and finite sets ${F_1, F_2 \subset G}$ such that the tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = {\bf Z}^2 \times E_0}$ is logically undecidable (and hence also aperiodic).
• (ii) There exists a dimension ${d}$, a periodic subset ${E}$ of ${{\bf Z}^d}$, and finite sets ${F_1, F_2 \subset G}$ such that tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = E}$ is logically undecidable (and hence also aperiodic).
• (iii) There exists a non-abelian finite group ${G_0}$ (with the group law still written additively), a subset ${E_0}$ of ${G_0}$, and a finite set ${F \subset {\bf Z}^2 \times G_0}$ such that the nonabelian tiling equation ${\mathbf{X} \oplus F = {\bf Z}^2 \times E_0}$ is logically undecidable (and hence also aperiodic).

We also have algorithmic versions of this theorem. For instance, the algorithmic version of (i) is that the problem of determining solvability of the tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = {\bf Z}^2 \times E_0}$ for a given choice of finite abelian group ${G_0}$, subset ${E_0}$ of ${G_0}$, and finite sets ${F_1, F_2 \subset {\bf Z}^2 \times G_0}$ is algorithmically undecidable. Similarly for (ii), (iii).

This result (together with a negative result discussed below) suggest to us that there is a significant qualitative difference in the ${J=1}$ theory of tiling by a single (abelian) tile, and the ${J \geq 2}$ theory of tiling with multiple tiles (or one non-abelian tile). (The positive results on the periodic tiling conjecture certainly rely heavily on the fact that there is only one tile, in particular there is a “dilation lemma” that is only available in this setting that is of key importance in the two dimensional theory.) It would be nice to eliminate the group ${G_0}$ from (i) (or to set ${d=2}$ in (ii)), but I think this would require a fairly significant modification of our methods.

Like many other undecidability results, the proof of Theorem 2 proceeds by a sequence of reductions, in which the undecidability of one problem is shown to follow from the undecidability of another, more “expressive” problem that can be encoded inside the original problem, until one reaches a problem that is so expressive that it encodes a problem already known to be undecidable. Indeed, all three undecidability results are ultimately obtained from Berger’s undecidability result on the domino problem.

The first step in increasing expressiveness is to observe that the undecidability of a single tiling equation follows from the undecidability of a system of tiling equations. More precisely, suppose we have non-empty finite subsets ${F_j^{(m)}}$ of a finitely generated group ${G}$ for ${j=1,\dots,J}$ and ${m=1,\dots,M}$, as well as periodic sets ${E^{(m)}}$ of ${G}$ for ${m=1,\dots,M}$, such that it is logically undecidable whether the system of tiling equations

$\displaystyle (\mathrm{X}_1 \oplus F_1^{(m)}) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J^{(m)}) = E^{(m)} \ \ \ \ \ (2)$

for ${m=1,\dots,M}$ has no solution ${\mathrm{X}_1 = A_1,\dots, \mathrm{X}_J = A_J}$ in ${G}$. Then, for any ${N>M}$, we can “stack” these equations into a single tiling equation in the larger group ${G \times {\bf Z}/N{\bf Z}}$, and specifically to the equation

$\displaystyle (\mathrm{X}_1 \oplus F_1) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J) = E \ \ \ \ \ (3)$

where

$\displaystyle F_j := \biguplus_{m=1}^M F_j^{(m)} \times \{m\}$

and

$\displaystyle E := \biguplus_{m=1}^M E^{(m)} \times \{m\}.$

It is a routine exercise to check that the system of equations (2) admits a solution in ${G}$ if and only if the single equation (3) admits a equation in ${G \times {\bf Z}/N{\bf Z}}$. Thus, to prove the undecidability of a single equation of the form (3) it suffices to establish undecidability of a system of the form (2); note here how the freedom to select the auxiliary group ${G_0}$ is important here.

We view systems of the form (2) as belonging to a kind of “language” in which each equation in the system is a “sentence” in the language imposing additional constraints on a tiling. One can now pick and choose various sentences in this language to try to encode various interesting problems. For instance, one can encode the concept of a function ${f: {\bf Z}^2 \rightarrow G_0}$ taking values in a finite group ${G_0}$ as a single tiling equation

$\displaystyle \mathrm{X} \oplus (\{0\} \times G_0) = {\bf Z}^2 \times G_0 \ \ \ \ \ (4)$

since the solutions to this equation are precisely the graphs

$\displaystyle \mathrm{X} = \{ (n, f(n)): n \in {\bf Z}^2 \}$

of a function ${f: {\bf Z}^2 \rightarrow G_0}$. By adding more tiling equations to this equation to form a larger system, we can start imposing additional constraints on this function ${f}$. For instance, if ${x+H}$ is a coset of some subgroup ${H}$ of ${G_0}$, we can impose the additional equation

$\displaystyle \mathrm{X} \oplus (\{0\} \times H) = {\bf Z}^2 \times (x+H) \ \ \ \ \ (5)$

to impose the additional constraint that ${f(n) \in x+H}$ for all ${n \in {\bf Z}^2}$, if we desire. If ${G_0}$ happens to contain two distinct elements ${1, -1}$, and ${h \in {\bf Z}^2}$, then the additional equation

$\displaystyle \mathrm{X} \oplus (\{0,h\} \times \{0\}) = {\bf Z}^2 \times \{-1,1\} \ \ \ \ \ (6)$

imposes the additional constraints that ${f(n) \in \{-1,1\}}$ for all ${n \in {\bf Z}^2}$, and additionally that

$\displaystyle f(n+h) = -f(n)$

for all ${n \in {\bf Z}^2}$.

This begins to resemble the equations that come up in the domino problem. Here one has a finite set of Wang tiles – unit squares ${T}$ where each of the four sides is colored with a color ${c_N(T), c_S(T), c_E(T), c_W(T)}$ (corresponding to the four cardinal directions North, South, East, and West) from some finite set ${{\mathcal C}}$ of colors. The domino problem is then to tile the plane with copies of these tiles in such a way that adjacent sides match. In terms of equations, one is seeking to find functions ${c_N, c_S, c_E, c_W: {\bf Z}^2 \rightarrow {\mathcal C}}$ obeying the pointwise constraint

$\displaystyle (c_N(n), c_S(n), c_E(n), c_W(n)) \in {\mathcal W} \ \ \ \ \ (7)$

for all ${n \in {\bf Z}^2}$ where ${{\mathcal W}}$ is the set of colors associated to the set of Wang tiles being used, and the matching constraints

$\displaystyle c_S(n+(0,1)) = c_N(n); \quad c_W(n+(1,0)) = c_E(n) \ \ \ \ \ (8)$

for all ${{\bf Z}^2}$. As it turns out, the pointwise constraint (7) can be encoded by tiling equations that are fancier versions of (4), (5), (6) that involve only one unknown tiling set ${{\mathrm X}}$, but in order to encode the matching constraints (8) we were forced to introduce a second tile (or work with nonabelian tiling equations). This appears to be an inherent feature of the method, since we found a partial rigidity result for tilings of one tile in one dimension that obstructs this encoding strategy from working when one only has one tile available. The result is as follows:

Proposition 3 (Swapping property) Consider the solutions to a tiling equation

$\displaystyle \mathrm{X} \oplus F = E \ \ \ \ \ (9)$

in a one-dimensional group ${G = {\bf Z} \times G_0}$ (with ${G_0}$ a finite abelian group, ${F}$ finite, and ${E}$ periodic). Suppose there are two solutions ${\mathrm{X} = A_0, \mathrm{X} = A_1}$ to this equation that agree on the left in the sense that

$\displaystyle A_0 \cap (\{0, -1, -2, \dots\} \times G_0) = A_1 \cap (\{0, -1, -2, \dots\} \times G_0).$

For any function ${\omega: {\bf Z} \rightarrow \{0,1\}}$, define the “swap” ${A_\omega}$ of ${A_0}$ and ${A_1}$ to be the set

$\displaystyle A_\omega := \{ (n, g): n \in {\bf Z}, (n,g) \in A_{\omega(n)} \}$

Then ${A_\omega}$ also solves the equation (9).

One can think of ${A_0}$ and ${A_1}$ as “genes” with “nucleotides” ${\{ g \in G_0: (n,g) \in A_0\}}$, ${\{ g \in G_0: (n,g) \in A_1\}}$ at each position ${n \in {\bf Z}}$, and ${A_\omega}$ is a new gene formed by choosing one of the nucleotides from the “parent” genes ${A_0}$, ${A_1}$ at each position. The above proposition then says that the solutions to the equation (9) must be closed under “genetic transfer” among any pair of genes that agree on the left. This seems to present an obstruction to trying to encode equation such as

$\displaystyle c(n+1) = c'(n)$

for two functions ${c, c': {\bf Z} \rightarrow \{-1,1\}}$ (say), which is a toy version of the matching constraint (8), since the class of solutions to this equation turns out not to obey this swapping property. On the other hand, it is easy to encode such equations using two tiles instead of one, and an elaboration of this construction is used to prove our main theorem.

Louis Esser, Burt Totaro, Chengxi Wang, and myself have just uploaded to the arXiv our preprint “Varieties of general type with many vanishing plurigenera, and optimal sine and sawtooth inequalities“. This is an interdisciplinary paper that arose because in order to optimize a certain algebraic geometry construction it became necessary to solve a purely analytic question which, while simple, did not seem to have been previously studied in the literature. We were able to solve the analytic question exactly and thus fully optimize the algebraic geometry construction, though the analytic question may have some independent interest.

Let us first discuss the algebraic geometry application. Given a smooth complex ${n}$-dimensional projective variety ${X}$ there is a standard line bundle ${K_X}$ attached to it, known as the canonical line bundle; ${n}$-forms on the variety become sections of this bundle. The bundle may not actually admit global sections; that is to say, the dimension ${h^0(X, K_X)}$ of global sections may vanish. But as one raises the canonical line bundle ${K_X}$ to higher and higher powers to form further line bundles ${mK_X}$, the number of global sections tends to increase; in particular, the dimension ${h^0(X, mK_X)}$ of global sections (known as the ${m^{th}}$ plurigenus) always obeys an asymptotic of the form

$\displaystyle h^0(X, mK_X) = \mathrm{vol}(X) \frac{m^n}{n!} + O( m^{n-1} )$

as ${m \rightarrow \infty}$ for some non-negative number ${\mathrm{vol}(X)}$, which is called the volume of the variety ${X}$, which is an invariant that reveals some information about the birational geometry of ${X}$. For instance, if the canonical line bundle is ample (or more generally, nef), this volume is equal to the intersection number ${K_X^n}$ (roughly speaking, the number of common zeroes of ${n}$ generic sections of the canonical line bundle); this is a special case of the asymptotic Riemann-Roch theorem. In particular, the volume ${\mathrm{vol}(X)}$ is a natural number in this case. However, it is possible for the volume to also be fractional in nature. One can then ask: how small can the volume get ${\mathrm{vol}(X)}$ without vanishing entirely? (By definition, varieties with non-vanishing volume are known as varieties of general type.)

It follows from a deep result obtained independently by Hacon–McKernan, Takayama and Tsuji that there is a uniform lower bound for the volume ${\mathrm{vol}(X)}$ of all ${n}$-dimensional projective varieties of general type. However, the precise lower bound is not known, and the current paper is a contribution towards probing this bound by constructing varieties of particularly small volume in the high-dimensional limit ${n \rightarrow \infty}$. Prior to this paper, the best such constructions of ${n}$-dimensional varieties basically had exponentially small volume, with a construction of volume at most ${e^{-(1+o(1))n \log n}}$ given by Ballico–Pignatelli–Tasin, and an improved construction with a volume bound of ${e^{-\frac{1}{3} n \log^2 n}}$ given by Totaro and Wang. In this paper, we obtain a variant construction with the somewhat smaller volume bound of ${e^{-(1-o(1)) n^{3/2} \log^{1/2} n}}$; the method also gives comparable bounds for some other related algebraic geometry statistics, such as the largest ${m}$ for which the pluricanonical map associated to the linear system ${|mK_X|}$ is not a birational embedding into projective space.

The space ${X}$ is constructed by taking a general hypersurface of a certain degree ${d}$ in a weighted projective space ${P(a_0,\dots,a_{n+1})}$ and resolving the singularities. These varieties are relatively tractable to work with, as one can use standard algebraic geometry tools (such as the ReidTai inequality) to provide sufficient conditions to guarantee that the hypersurface has only canonical singularities and that the canonical bundle is a reflexive sheaf, which allows one to calculate the volume exactly in terms of the degree ${d}$ and weights ${a_0,\dots,a_{n+1}}$. The problem then reduces to optimizing the resulting volume given the constraints needed for the above-mentioned sufficient conditions to hold. After working with a particular choice of weights (which consist of products of mostly consecutive primes, with each product occuring with suitable multiplicities ${c_0,\dots,c_{b-1}}$), the problem eventually boils down to trying to minimize the total multiplicity ${\sum_{j=0}^{b-1} c_j}$, subject to certain congruence conditions and other bounds on the ${c_j}$. Using crude bounds on the ${c_j}$ eventually leads to a construction with volume at most ${e^{-0.8 n^{3/2} \log^{1/2} n}}$, but by taking advantage of the ability to “dilate” the congruence conditions and optimizing over all dilations, we are able to improve the ${0.8}$ constant to ${1-o(1)}$.

Now it is time to turn to the analytic side of the paper by describing the optimization problem that we solve. We consider the sawtooth function ${g: {\bf R} \rightarrow (-1/2,1/2]}$, with ${g(x)}$ defined as the unique real number in ${(-1/2,1/2]}$ that is equal to ${x}$ mod ${1}$. We consider a (Borel) probability measure ${\mu}$ on the real line, and then compute the average value of this sawtooth function

$\displaystyle \mathop{\bf E}_\mu g(x) := \int_{\bf R} g(x)\ d\mu(x)$

as well as various dilates

$\displaystyle \mathop{\bf E}_\mu g(kx) := \int_{\bf R} g(kx)\ d\mu(x)$

of this expectation. Since ${g}$ is bounded above by ${1/2}$, we certainly have the trivial bound

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{2}.$

However, this bound is not very sharp. For instance, the only way in which ${\mathop{\bf E}_\mu g(x)}$ could attain the value of ${1/2}$ is if the probability measure ${\mu}$ was supported on half-integers, but in that case ${\mathop{\bf E}_\mu g(2x)}$ would vanish. For the algebraic geometry application discussed above one is then led to the following question: for a given choice of ${m}$, what is the best upper bound ${c^{\mathrm{saw}}_m}$ on the quantity ${\min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx)}$ that holds for all probability measures ${\mu}$?

If one considers the deterministic case in which ${\mu}$ is a Dirac mass supported at some real number ${x_0}$, then the Dirichlet approximation theorem tells us that there is ${1 \leq k \leq m}$ such that ${x_0}$ is within ${\frac{1}{m+1}}$ of an integer, so we have

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{m+1}$

in this case, and this bound is sharp for deterministic measures ${\mu}$. Thus we have

$\displaystyle \frac{1}{m+1} \leq c^{\mathrm{saw}}_m \leq \frac{1}{2}.$

However, both of these bounds turn out to be far from the truth, and the optimal value of ${c^{\mathrm{saw}}_m}$ is comparable to ${\frac{\log 2}{\log m}}$. In fact we were able to compute this quantity precisely:

Theorem 1 (Optimal bound for sawtooth inequality) Let ${m \geq 1}$.
• (i) If ${m = 2^r}$ for some natural number ${r}$, then ${c^{\mathrm{saw}}_m = \frac{1}{r+2}}$.
• (ii) If ${2^r < m \leq 2^{r+1}}$ for some natural number ${r}$, then ${c^{\mathrm{saw}}_m = \frac{2^r}{2^r(r+1) + m}}$.
In particular, we have ${c^{\mathrm{saw}}_m = \frac{\log 2 + o(1)}{\log m}}$ as ${m \rightarrow \infty}$.

We establish this bound through duality. Indeed, suppose we could find non-negative coefficients ${a_1,\dots,a_m}$ such that one had the pointwise bound

$\displaystyle \sum_{k=1}^m a_k g(kx) \leq 1 \ \ \ \ \ (1)$

for all real numbers ${x}$. Integrating this against an arbitrary probability measure ${\mu}$, we would conclude

$\displaystyle (\sum_{k=1}^m a_k) \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \sum_{k=1}^m a_k \mathop{\bf E}_\mu g(kx) \leq 1$

and hence

$\displaystyle c^{\mathrm{saw}}_m \leq \frac{1}{\sum_{k=1}^m a_k}.$

Conversely, one can find lower bounds on ${c^{\mathrm{saw}}_m}$ by selecting suitable candidate measures ${\mu}$ and computing the means ${\mathop{\bf E}_\mu g(kx)}$. The theory of linear programming duality tells us that this method must give us the optimal bound, but one has to locate the optimal measure ${\mu}$ and optimal weights ${a_1,\dots,a_m}$. This we were able to do by first doing some extensive numerics to discover these weights and measures for small values of ${m}$, and then doing some educated guesswork to extrapolate these examples to the general case, and then to verify the required inequalities. In case (i) the situation is particularly simple, as one can take ${\mu}$ to be the discrete measure that assigns a probability ${\frac{1}{r+2}}$ to the numbers ${\frac{1}{2}, \frac{1}{4}, \dots, \frac{1}{2^r}}$ and the remaining probability of ${\frac{2}{r+2}}$ to ${\frac{1}{2^{r+1}}}$, while the optimal weighted inequality (1) turns out to be

$\displaystyle 2g(x) + \sum_{j=1}^r g(2^j x) \leq 1$

which is easily proven by telescoping series. However the general case turned out to be significantly tricker to work out, and the verification of the optimal inequality required a delicate case analysis (reflecting the fact that equality was attained in this inequality in a large number of places).

After solving the sawtooth problem, we became interested in the analogous question for the sine function, that is to say what is the best bound ${c^{\sin}_m}$ for the inequality

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu \sin(kx) \leq c^{\sin}_m.$

The left-hand side is the smallest imaginary part of the first ${m}$ Fourier coefficients of ${\mu}$. To our knowledge this quantity has not previously been studied in the Fourier analysis literature. By adopting a similar approach as for the sawtooth problem, we were able to compute this quantity exactly also:

Theorem 2 For any ${m \geq 1}$, one has

$\displaystyle c^{\sin}_m = \frac{m+1}{2 \sum_{1 \leq j \leq m: j \hbox{ odd}} \cot \frac{\pi j}{2m+2}}.$

In particular,

$\displaystyle c^{\sin}_m = \frac{\frac{\pi}{2} + o(1)}{\log m}.$

Interestingly, a closely related cotangent sum recently appeared in this MathOverflow post. Verifying the lower bound on ${c^{\sin}_m}$ boils down to choosing the right test measure ${\mu}$; it turns out that one should pick the probability measure supported the ${\frac{\pi j}{2m+2}}$ with ${1 \leq j \leq m}$ odd, with probability proportional to ${\cot \frac{\pi j}{2m+2}}$, and the lower bound verification eventually follows from a classical identity

$\displaystyle \frac{m+1}{2} = \sum_{1 \leq j \leq m; j \hbox{ odd}} \cot \frac{\pi j}{2m+2} \sin \frac{\pi jk}{m+1}$

for ${1 \leq k \leq m}$, first posed by Eisenstein in 1844 and proved by Stern in 1861. The upper bound arises from establishing the trigonometric inequality

$\displaystyle \frac{2}{(m+1)^2} \sum_{1 \leq k \leq m; k \hbox{ odd}}$

$\displaystyle \cot \frac{\pi k}{2m+2} ( (m+1-k) \sin kx + k \sin(m+1-k)x ) \leq 1$

for all real numbers ${x}$, which to our knowledge is new; the left-hand side has a Fourier-analytic intepretation as convolving the Fejér kernel with a certain discretized square wave function, and this interpretation is used heavily in our proof of the inequality.

Joni Teräväinen and myself have just uploaded to the arXiv our preprint “Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions“. This paper makes quantitative the Gowers uniformity estimates on the Möbius function ${\mu}$ and the von Mangoldt function ${\Lambda}$.

To discuss the results we first discuss the situation of the Möbius function, which is technically simpler in some (though not all) ways. We assume familiarity with Gowers norms and standard notations around these norms, such as the averaging notation ${\mathop{\bf E}_{n \in [N]}}$ and the exponential notation ${e(\theta) = e^{2\pi i \theta}}$. The prime number theorem in qualitative form asserts that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) = o(1)$

as ${N \rightarrow \infty}$. With Vinogradov-Korobov error term, the prime number theorem is strengthened to

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} );$

we refer to such decay bounds (With ${\exp(-c\log^c N)}$ type factors) as pseudopolynomial decay. Equivalently, we obtain pseudopolynomial decay of Gowers ${U^1}$ seminorm of ${\mu}$:

$\displaystyle \| \mu \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} ).$

As is well known, the Riemann hypothesis would be equivalent to an upgrade of this estimate to polynomial decay of the form

$\displaystyle \| \mu \|_{U^1([N])} \ll_\varepsilon N^{-1/2+\varepsilon}$

for any ${\varepsilon>0}$.

Once one restricts to arithmetic progressions, the situation gets worse: the Siegel-Walfisz theorem gives the bound

$\displaystyle \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll_A \log^{-A} N \ \ \ \ \ (1)$

for any residue class ${a \hbox{ mod } q}$ and any ${A>0}$, but with the catch that the implied constant is ineffective in ${A}$. This ineffectivity cannot be removed without further progress on the notorious Siegel zero problem.

In 1937, Davenport was able to show the discorrelation estimate

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) e(-\alpha n) \ll_A \log^{-A} N$

for any ${A>0}$ uniformly in ${\alpha \in {\bf R}}$, which leads (by standard Fourier arguments) to the Fourier uniformity estimate

$\displaystyle \| \mu \|_{U^2([N])} \ll_A \log^{-A} N.$

Again, the implied constant is ineffective. If one insists on effective constants, the best bound currently available is

$\displaystyle \| \mu \|_{U^2([N])} \ll \log^{-c} N \ \ \ \ \ (2)$

for some small effective constant ${c>0}$.

For the situation with the ${U^3}$ norm the previously known results were much weaker. Ben Green and I showed that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll_{A,F,G/\Gamma} \log^{-A} N \ \ \ \ \ (3)$

uniformly for any ${A>0}$, any degree two (filtered) nilmanifold ${G/\Gamma}$, any polynomial sequence ${g: {\bf Z} \rightarrow G}$, and any Lipschitz function ${F}$; again, the implied constants are ineffective. On the other hand, in a separate paper of Ben Green and myself, we established the following inverse theorem: if for instance we knew that

$\displaystyle \| \mu \|_{U^3([N])} \geq \delta$

for some ${0 < \delta < 1/2}$, then there exists a degree two nilmanifold ${G/\Gamma}$ of dimension ${O( \delta^{-O(1)} )}$, complexity ${O( \delta^{-O(1)} )}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and Lipschitz function ${F}$ of Lipschitz constant ${O(\delta^{-O(1)})}$ such that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp(-\delta^{-O(1)}).$

Putting the two assertions together and comparing all the dependencies on parameters, one can establish the qualitative decay bound

$\displaystyle \| \mu \|_{U^3([N])} = o(1).$

However the decay rate ${o(1)}$ produced by this argument is completely ineffective: obtaining a bound on when this ${o(1)}$ quantity dips below a given threshold ${\delta}$ depends on the implied constant in (3) for some ${G/\Gamma}$ whose dimension depends on ${\delta}$, and the dependence on ${\delta}$ obtained in this fashion is ineffective in the face of a Siegel zero.

For higher norms ${U^k, k \geq 3}$, the situation is even worse, because the quantitative inverse theory for these norms is poorer, and indeed it was only with the recent work of Manners that any such bound is available at all (at least for ${k>4}$). Basically, Manners establishes if

$\displaystyle \| \mu \|_{U^k([N])} \geq \delta$

then there exists a degree ${k-1}$ nilmanifold ${G/\Gamma}$ of dimension ${O( \delta^{-O(1)} )}$, complexity ${O( \exp\exp(\delta^{-O(1)}) )}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and Lipschitz function ${F}$ of Lipschitz constant ${O(\exp\exp(\delta^{-O(1)}))}$ such that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp\exp(-\delta^{-O(1)}).$

(We allow all implied constants to depend on ${k}$.) Meanwhile, the bound (3) was extended to arbitrary nilmanifolds by Ben and myself. Again, the two results when concatenated give the qualitative decay

$\displaystyle \| \mu \|_{U^k([N])} = o(1)$

but the decay rate is completely ineffective.

Our first result gives an effective decay bound:

Theorem 1 For any ${k \geq 2}$, we have ${\| \mu \|_{U^k([N])} \ll (\log\log N)^{-c_k}}$ for some ${c_k>0}$. The implied constants are effective.

This is off by a logarithm from the best effective bound (2) in the ${k=2}$ case. In the ${k=3}$ case there is some hope to remove this logarithm based on the improved quantitative inverse theory currently available in this case, but there is a technical obstruction to doing so which we will discuss later in this post. For ${k>3}$ the above bound is the best one could hope to achieve purely using the quantitative inverse theory of Manners.

We have analogues of all the above results for the von Mangoldt function ${\Lambda}$. Here a complication arises that ${\Lambda}$ does not have mean close to zero, and one has to subtract off some suitable approximant ${\Lambda^\sharp}$ to ${\Lambda}$ before one would expect good Gowers norms bounds. For the prime number theorem one can just use the approximant ${1}$, giving

$\displaystyle \| \Lambda - 1 \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} )$

but even for the prime number theorem in arithmetic progressions one needs a more accurate approximant. In our paper it is convenient to use the “Cramér approximant”

$\displaystyle \Lambda_{\hbox{Cram\'er}}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}$

where

$\displaystyle W := \prod_{p

and ${Q}$ is the quasipolynomial quantity

$\displaystyle Q = \exp(\log^{1/10} N). \ \ \ \ \ (4)$

Then one can show from the Siegel-Walfisz theorem and standard bilinear sum methods that

$\displaystyle \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Cram\'er}}(n)) e(-\alpha n) \ll_A \log^{-A} N$

and

$\displaystyle \| \Lambda - \Lambda_{\hbox{Cram\'er}}\|_{U^2([N])} \ll_A \log^{-A} N$

for all ${A>0}$ and ${\alpha \in {\bf R}}$ (with an ineffective dependence on ${A}$), again regaining effectivity if ${A}$ is replaced by a sufficiently small constant ${c>0}$. All the previously stated discorrelation and Gowers uniformity results for ${\mu}$ then have analogues for ${\Lambda}$, and our main result is similarly analogous:

Theorem 2 For any ${k \geq 2}$, we have ${\| \Lambda - \Lambda_{\hbox{Cram\'er}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}}$ for some ${c_k>0}$. The implied constants are effective.

By standard methods, this result also gives quantitative asymptotics for counting solutions to various systems of linear equations in primes, with error terms that gain a factor of ${O((\log\log N)^{-c})}$ with respect to the main term.

We now discuss the methods of proof, focusing first on the case of the Möbius function. Suppose first that there is no “Siegel zero”, by which we mean a quadratic character ${\chi}$ of some conductor ${q \leq Q}$ with a zero ${L(\beta,\chi)}$ with ${1 - \beta \leq \frac{c}{\log Q}}$ for some small absolute constant ${c>0}$. In this case the Siegel-Walfisz bound (1) improves to a quasipolynomial bound

$\displaystyle \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N). \ \ \ \ \ (5)$

To establish Theorem 1 in this case, it suffices by Manners’ inverse theorem to establish the polylogarithmic bound

$\displaystyle \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (6)$

for all degree ${k-1}$ nilmanifolds ${G/\Gamma}$ of dimension ${O((\log\log N)^c)}$ and complexity ${O( \exp(\log^c N))}$, all polynomial sequences ${g}$, and all Lipschitz functions ${F}$ of norm ${O( \exp(\log^c N))}$. If the nilmanifold ${G/\Gamma}$ had bounded dimension, then one could repeat the arguments of Ben and myself more or less verbatim to establish this claim from (5), which relied on the quantitative equidistribution theory on nilmanifolds developed in a separate paper of Ben and myself. Unfortunately, in the latter paper the dependence of the quantitative bounds on the dimension ${d}$ was not explicitly given. In an appendix to the current paper, we go through that paper to account for this dependence, showing that all exponents depend at most doubly exponentially in the dimension ${d}$, which is barely sufficient to handle the dimension of ${O((\log\log N)^c)}$ that arises here.

Now suppose we have a Siegel zero ${L(\beta,\chi)}$. In this case the bound (5) will not hold in general, and hence also (6) will not hold either. Here, the usual way out (while still maintaining effective estimates) is to approximate ${\mu}$ not by ${0}$, but rather by a more complicated approximant ${\mu_{\hbox{Siegel}}}$ that takes the Siegel zero into account, and in particular is such that one has the (effective) pseudopolynomial bound

$\displaystyle \| (\mu - \mu_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N) \ \ \ \ \ (7)$

for all residue classes ${a \hbox{ mod } q}$. The Siegel approximant to ${\mu}$ is actually a little bit complicated, and to our knowledge the first appearance of this sort of approximant only appears as late as this 2010 paper of Germán and Katai. Our version of this approximant is defined as the multiplicative function such that

$\displaystyle \mu_{\hbox{Siegel}}(p^j) = \mu(p^j)$

when ${p < Q}$, and

$\displaystyle \mu_{\hbox{Siegel}}(n) = \alpha n^{\beta-1} \chi(n)$

when ${n}$ is coprime to all primes ${p, and ${\alpha}$ is a normalising constant given by the formula

$\displaystyle \alpha := \frac{1}{L'(\beta,\chi)} \prod_{p

(this constant ends up being of size ${O(1)}$ and plays only a minor role in the analysis). This is a rather complicated formula, but it seems to be virtually the only choice of approximant that allows for bounds such as (7) to hold. (This is the one aspect of the problem where the von Mangoldt theory is simpler than the Möbius theory, as in the former one only needs to work with very rough numbers for which one does not need to make any special accommodations for the behavior at small primes when introducing the Siegel correction term.) With this starting point it is then possible to repeat the analysis of my previous papers with Ben and obtain the pseudopolynomial discorrelation bound

$\displaystyle \mathop{\bf E}_{n \in [N]} (\mu - \mu_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N)$

for ${F(g(n)\Gamma)}$ as before, which when combined with Manners’ inverse theorem gives the doubly logarithmic bound

$\displaystyle \| \mu - \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.$

Meanwhile, a direct sieve-theoretic computation ends up giving the singly logarithmic bound

$\displaystyle \| \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N$

(indeed, there is a good chance that one could improve the bounds even further, though it is not helpful for this current argument to do so). Theorem 1 then follows from the triangle inequality for the Gowers norm. It is interesting that the Siegel approximant ${\mu_{\hbox{Siegel}}}$ seems to play a rather essential component in the proof, even if it is absent in the final statement. We note that this approximant seems to be a useful tool to explore the “illusory world” of the Siegel zero further; see for instance the recent paper of Chinis for some work in this direction.

For the analogous problem with the von Mangoldt function (assuming a Siegel zero for sake of discussion), the approximant ${\Lambda_{\hbox{Siegel}}}$ is simpler; we ended up using

$\displaystyle \Lambda_{\hbox{Siegel}}(n) = \Lambda_{\hbox{Cram\'er}}(n) (1 - n^{\beta-1} \chi(n))$

which allows one to state the standard prime number theorem in arithmetic progressions with classical error term and Siegel zero term compactly as

$\displaystyle \| (\Lambda - \Lambda_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N).$

Routine modifications of previous arguments also give

$\displaystyle \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (8)$

and

$\displaystyle \| \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N.$

The one tricky new step is getting from the discorrelation estimate (8) to the Gowers uniformity estimate

$\displaystyle \| \Lambda - \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.$

One cannot directly apply Manners’ inverse theorem here because ${\Lambda}$ and ${\Lambda_{\hbox{Siegel}}}$ are unbounded. There is a standard tool for getting around this issue, now known as the dense model theorem, which is the standard engine powering the transference principle from theorems about bounded functions to theorems about certain types of unbounded functions. However the quantitative versions of the dense model theorem in the literature are expensive and would basically weaken the doubly logarithmic gain here to a triply logarithmic one. Instead, we bypass the dense model theorem and directly transfer the inverse theorem for bounded functions to an inverse theorem for unbounded functions by using the densification approach to transference introduced by Conlon, Fox, and Zhao. This technique turns out to be quantitatively quite efficient (the dependencies of the main parameters in the transference are polynomial in nature), and also has the technical advantage of avoiding the somewhat tricky “correlation condition” present in early transference results which are also not beneficial for quantitative bounds.

In principle, the above results can be improved for ${k=3}$ due to the stronger quantitative inverse theorems in the ${U^3}$ setting. However, there is a bottleneck that prevents us from achieving this, namely that the equidistribution theory of two-step nilmanifolds has exponents which are exponential in the dimension rather than polynomial in the dimension, and as a consequence we were unable to improve upon the doubly logarithmic results. Specifically, if one is given a sequence of bracket quadratics such as ${\lfloor \alpha_1 n \rfloor \beta_1 n, \dots, \lfloor \alpha_d n \rfloor \beta_d n}$ that fails to be ${\delta}$-equidistributed, one would need to establish a nontrivial linear relationship modulo 1 between the ${\alpha_1,\beta_1,\dots,\alpha_d,\beta_d}$ (up to errors of ${O(1/N)}$), where the coefficients are of size ${O(\delta^{-d^{O(1)}})}$; current methods only give coefficient bounds of the form ${O(\delta^{-\exp(d^{O(1)})})}$. An old result of Schmidt demonstrates proof of concept that these sorts of polynomial dependencies on exponents is possible in principle, but actually implementing Schmidt’s methods here seems to be a quite non-trivial task. There is also another possible route to removing a logarithm, which is to strengthen the inverse ${U^3}$ theorem to make the dimension of the nilmanifold logarithmic in the uniformity parameter ${\delta}$ rather than polynomial. Again, the Freiman-Bilu theorem (see for instance this paper of Ben and myself) demonstrates proof of concept that such an improvement in dimension is possible, but some work would be needed to implement it.

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than ${1}$ appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer ${t \geq 2}$, there are at most ${O(1)}$ solutions to the equation

$\displaystyle \binom{n}{m} = t \ \ \ \ \ (1)$

with ${1 \leq m < n}$. Currently, the largest number of solutions that is known to be attainable is eight, with ${t}$ equal to

$\displaystyle 3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}$

$\displaystyle = \binom{78}{76} = \binom{3003}{3002}.$

Because of the symmetry ${\binom{n}{m} = \binom{n}{n-m}}$ of Pascal’s triangle it is natural to restrict attention to the left half ${1 \leq m \leq n/2}$ of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (1) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)$

and hence at most four in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).$

Also, there is at most one solution in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).$

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

$\displaystyle 2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)$

(or equivalently ${n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}$); we have deleted the ${m=1}$ case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for ${t}$ sufficiently large there are no further collisions ${\binom{n}{m} = \binom{n'}{m'}=t}$ for ${(n,m), (n',m')}$ in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large ${t}$. This is latter claim known for bounded values of ${m,m'}$ by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

$\displaystyle \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)$

coming from ${n = F_{2j+2} F_{2j+3}-1}$, ${m = F_{2j} F_{2j+3}-1}$ and ${F_j}$ is the ${j^{th}}$ Fibonacci number.

The appearance of the quantity ${\exp( \log^{2/3+\varepsilon} n )}$ in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view ${n,m}$ primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

$\displaystyle \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.$

One can use this equation to solve for ${n}$ in terms of ${m,t}$ as

$\displaystyle n = f_t(m)$

for a certain real analytic function ${f_t}$ whose asymptotics are easily computable (for instance one has the asymptotic ${f_t(m) \asymp m t^{1/m}}$). One can then view the problem as one of trying to control the number of lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$. Here we can take advantage of the fact that in the regime ${m \leq f_t(m)/2}$ (which corresponds to working in the left half ${m \leq n/2}$ of Pascal’s triangle), the function ${f_t}$ can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of ${f_t}$ (in fact one can show that ${f''_t(m) \asymp f_t(m) (\log t/m^2)^2}$). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$, basically because the area subtended by the triangle connecting three of these points would lie between ${0}$ and ${1/2}$, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let ${\varepsilon>0}$, and suppose ${t}$ is sufficiently large depending on ${\varepsilon}$. If ${(m,n)}$ is a solution to (1) in the left half ${m \leq n/2}$ of Pascal’s triangle, then there is at most one other solution ${(m',n')}$ to this equation in the left half with

$\displaystyle |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).$

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions ${(m,n), (m',n')}$ to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the ${p}$-adic valuations ${\nu_p( \binom{n}{m} )}$ of the binomial coefficients, defined as the number of times a prime ${p}$ divides ${\binom{n}{m}}$. From the fundamental theorem of arithmetic, a collision

$\displaystyle \binom{n}{m} = \binom{n'}{m'}$

between binomial coefficients occurs if and only if one has agreement of valuations

$\displaystyle \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)$

From the Legendre formula

$\displaystyle \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor$

we can rewrite this latter identity (5) as

$\displaystyle \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)$

where ${\{x\} := x - \lfloor x\rfloor}$ denotes the fractional part of ${x}$. (These sums are not truly infinite, because the summands vanish once ${p^j}$ is larger than ${\max(n,n')}$.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing ${p}$ as a prime drawn randomly from an interval such as ${[P, P + P \log^{-100} P]}$ for some suitably chosen scale parameter ${P}$, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if ${n}$ and ${n'}$ are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part ${\{ \frac{n}{p}\}}$, since this term shows up in the summation on the left-hand side but not the right. Similarly if ${m}$ and ${m'}$ are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if ${n'-m'}$ is a rational multiple of ${m}$ where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

$\displaystyle \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )$

for various choices of parameters ${P, N, M, j}$, where ${e(\theta) := e^{2\pi i \theta}}$. Fortunately, the methods of Vinogradov (which more generally can handle sums such as ${\sum_{n \in I} e(f(n))}$ and ${\sum_{p \in I} e(f(p))}$ for various analytic functions ${f}$) can give useful bounds on such sums as long as ${N}$ and ${M}$ are not too large compared to ${P}$; more specifically, Vinogradov’s estimates are non-trivial in the regime ${N,M \ll \exp( \log^{3/2-\varepsilon} P )}$, and this ultimately leads to a distance bound

$\displaystyle m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

between any colliding pair ${(n,m), (n',m')}$ in the left half of Pascal’s triangle, as well as the variant bound

$\displaystyle n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

under the additional assumption

$\displaystyle m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).$

Comparing these bounds with Proposition 2 and using some basic estimates about the function ${f_t}$, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

$\displaystyle (n)_m = t \ \ \ \ \ (7)$

where ${(n)_m := n (n-1) \dots (n-m+1)}$ is the falling factorial:

Theorem 3 If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (7) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)$

Again the upper bound of two is best possible, thanks to identities such as

$\displaystyle (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.$

Marcel Filoche, Svitlana Mayboroda, and I have just uploaded to the arXiv our preprint “The effective potential of an ${M}$-matrix“. This paper explores the analogue of the effective potential of Schrödinger operators ${-\Delta + V}$ provided by the “landscape function” ${u}$, when one works with a certain type of self-adjoint matrix known as an ${M}$-matrix instead of a Schrödinger operator.

Suppose one has an eigenfunction

$\displaystyle (-\Delta + V) \phi = E \phi$

of a Schrödinger operator ${-\Delta+V}$, where ${\Delta}$ is the Laplacian on ${{\bf R}^d}$, ${V: {\bf R}^d \rightarrow {\bf R}}$ is a potential, and ${E}$ is an energy. Where would one expect the eigenfunction ${\phi}$ to be concentrated? If the potential ${V}$ is smooth and slowly varying, the correspondence principle suggests that the eigenfunction ${\phi}$ should be mostly concentrated in the potential energy wells ${\{ x: V(x) \leq E \}}$, with an exponentially decaying amount of tunnelling between the wells. One way to rigorously establish such an exponential decay is through an argument of Agmon, which we will sketch later in this post, which gives an exponentially decaying upper bound (in an ${L^2}$ sense) of eigenfunctions ${\phi}$ in terms of the distance to the wells ${\{ V \leq E \}}$ in terms of a certain “Agmon metric” on ${{\bf R}^d}$ determined by the potential ${V}$ and energy level ${E}$ (or any upper bound ${\overline{E}}$ on this energy). Similar exponential decay results can also be obtained for discrete Schrödinger matrix models, in which the domain ${{\bf R}^d}$ is replaced with a discrete set such as the lattice ${{\bf Z}^d}$, and the Laplacian ${\Delta}$ is replaced by a discrete analogue such as a graph Laplacian.

When the potential ${V}$ is very “rough”, as occurs for instance in the random potentials arising in the theory of Anderson localisation, the Agmon bounds, while still true, become very weak because the wells ${\{ V \leq E \}}$ are dispersed in a fairly dense fashion throughout the domain ${{\bf R}^d}$, and the eigenfunction can tunnel relatively easily between different wells. However, as was first discovered in 2012 by my two coauthors, in these situations one can replace the rough potential ${V}$ by a smoother effective potential ${1/u}$, with the eigenfunctions typically localised to a single connected component of the effective wells ${\{ 1/u \leq E \}}$. In fact, a good choice of effective potential comes from locating the landscape function ${u}$, which is the solution to the equation ${(-\Delta + V) u = 1}$ with reasonable behavior at infinity, and which is non-negative from the maximum principle, and then the reciprocal ${1/u}$ of this landscape function serves as an effective potential.

There are now several explanations for why this particular choice ${1/u}$ is a good effective potential. Perhaps the simplest (as found for instance in this recent paper of Arnold, David, Jerison, and my two coauthors) is the following observation: if ${\phi}$ is an eigenvector for ${-\Delta+V}$ with energy ${E}$, then ${\phi/u}$ is an eigenvector for ${-\frac{1}{u^2} \mathrm{div}(u^2 \nabla \cdot) + \frac{1}{u}}$ with the same energy ${E}$, thus the original Schrödinger operator ${-\Delta+V}$ is conjugate to a (variable coefficient, but still in divergence form) Schrödinger operator with potential ${1/u}$ instead of ${V}$. Closely related to this, we have the integration by parts identity

$\displaystyle \int_{{\bf R}^d} |\nabla f|^2 + V |f|^2\ dx = \int_{{\bf R}^d} u^2 |\nabla(f/u)|^2 + \frac{1}{u} |f|^2\ dx \ \ \ \ \ (1)$

for any reasonable function ${f}$, thus again highlighting the emergence of the effective potential ${1/u}$.

These particular explanations seem rather specific to the Schrödinger equation (continuous or discrete); we have for instance not been able to find similar identities to explain an effective potential for the bi-Schrödinger operator ${\Delta^2 + V}$.

In this paper, we demonstrate the (perhaps surprising) fact that effective potentials continue to exist for operators that bear very little resemblance to Schrödinger operators. Our chosen model is that of an ${M}$-matrix: self-adjoint positive definite matrices ${A}$ whose off-diagonal entries are negative. This model includes discrete Schrödinger operators (with non-negative potentials) but can allow for significantly more non-local interactions. The analogue of the landscape function would then be the vector ${u := A^{-1} 1}$, where ${1}$ denotes the vector with all entries ${1}$. Our main result, roughly speaking, asserts that an eigenvector ${A \phi = E \phi}$ of ${A}$ will then be exponentially localised to the “potential wells” ${K := \{ j: \frac{1}{u_j} \leq E \}}$, where ${u_j}$ denotes the coordinates of the landscape function ${u}$. In particular, we establish the inequality

$\displaystyle \sum_k \phi_k^2 e^{2 \rho(k,K) / \sqrt{W}} ( \frac{1}{u_k} - E )_+ \leq W \max_{i,j} |a_{ij}|$

if ${\phi}$ is normalised in ${\ell^2}$, where the connectivity ${W}$ is the maximum number of non-zero entries of ${A}$ in any row or column, ${a_{ij}}$ are the coefficients of ${A}$, and ${\rho}$ is a certain moderately complicated but explicit metric function on the spatial domain. Informally, this inequality asserts that the eigenfunction ${\phi_k}$ should decay like ${e^{-\rho(k,K) / \sqrt{W}}}$ or faster. Indeed, our numerics show a very strong log-linear relationship between ${\phi_k}$ and ${\rho(k,K)}$, although it appears that our exponent ${1/\sqrt{W}}$ is not quite optimal. We also provide an associated localisation result which is technical to state but very roughly asserts that a given eigenvector will in fact be localised to a single connected component of ${K}$ unless there is a resonance between two wells (by which we mean that an eigenvalue for a localisation of ${A}$ associated to one well is extremely close to an eigenvalue for a localisation of ${A}$ associated to another well); such localisation is also strongly supported by numerics. (Analogous results for Schrödinger operators had been previously obtained by the previously mentioned paper of Arnold, David, Jerison, and my two coauthors, and to quantum graphs in a very recent paper of Harrell and Maltsev.)

Our approach is based on Agmon’s methods, which we interpret as a double commutator method, and in particular relying on exploiting the negative definiteness of certain double commutator operators. In the case of Schrödinger operators ${-\Delta+V}$, this negative definiteness is provided by the identity

$\displaystyle \langle [[-\Delta+V,g],g] u, u \rangle = -2\int_{{\bf R}^d} |\nabla g|^2 |u|^2\ dx \leq 0 \ \ \ \ \ (2)$

for any sufficiently reasonable functions ${u, g: {\bf R}^d \rightarrow {\bf R}}$, where we view ${g}$ (like ${V}$) as a multiplier operator. To exploit this, we use the commutator identity

$\displaystyle \langle g [\psi, -\Delta+V] u, g \psi u \rangle = \frac{1}{2} \langle [[-\Delta+V, g \psi],g\psi] u, u \rangle$

$\displaystyle -\frac{1}{2} \langle [[-\Delta+V, g],g] \psi u, \psi u \rangle$

valid for any ${g,\psi,u: {\bf R}^d \rightarrow {\bf R}}$ after a brief calculation. The double commutator identity then tells us that

$\displaystyle \langle g [\psi, -\Delta+V] u, g \psi u \rangle \leq \int_{{\bf R}^d} |\nabla g|^2 |\psi u|^2\ dx.$

If we choose ${u}$ to be a non-negative weight and let ${\psi := \phi/u}$ for an eigenfunction ${\phi}$, then we can write

$\displaystyle [\psi, -\Delta+V] u = [\psi, -\Delta+V - E] u = \psi (-\Delta+V - E) u$

and we conclude that

$\displaystyle \int_{{\bf R}^d} \frac{(-\Delta+V-E)u}{u} |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx. \ \ \ \ \ (3)$

We have considerable freedom in this inequality to select the functions ${u,g}$. If we select ${u=1}$, we obtain the clean inequality

$\displaystyle \int_{{\bf R}^d} (V-E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx.$

If we take ${g}$ to be a function which equals ${1}$ on the wells ${\{ V \leq E \}}$ but increases exponentially away from these wells, in such a way that

$\displaystyle |\nabla g|^2 \leq \frac{1}{2} (V-E) |g|^2$

outside of the wells, we can obtain the estimate

$\displaystyle \int_{V > E} (V-E) |g|^2 |\phi|^2\ dx \leq 2 \int_{V < E} (E-V) |\phi|^2\ dx,$

which then gives an exponential type decay of ${\phi}$ away from the wells. This is basically the classic exponential decay estimate of Agmon; one can basically take ${g}$ to be the distance to the wells ${\{ V \leq E \}}$ with respect to the Euclidean metric conformally weighted by a suitably normalised version of ${V-E}$. If we instead select ${u}$ to be the landscape function ${u = (-\Delta+V)^{-1} 1}$, (3) then gives

$\displaystyle \int_{{\bf R}^d} (\frac{1}{u} - E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx,$

and by selecting ${g}$ appropriately this gives an exponential decay estimate away from the effective wells ${\{ \frac{1}{u} \leq E \}}$, using a metric weighted by ${\frac{1}{u}-E}$.

It turns out that this argument extends without much difficulty to the ${M}$-matrix setting. The analogue of the crucial double commutator identity (2) is

$\displaystyle \langle [[A,D],D] u, u \rangle = \sum_{i \neq j} a_{ij} u_i u_j (d_{ii} - d_{jj})^2 \leq 0$

for any diagonal matrix ${D = \mathrm{diag}(d_{11},\dots,d_{NN})}$. The remainder of the Agmon type arguments go through after making the natural modifications.

Numerically we have also found some aspects of the landscape theory to persist beyond the ${M}$-matrix setting, even though the double commutators cease being negative definite, so this may not yet be the end of the story, but it does at least demonstrate that utility the landscape does not purely rely on identities such as (1).

I’ve just uploaded to the arXiv my paper “Sendov’s conjecture for sufficiently high degree polynomials“. This paper is a contribution to an old conjecture of Sendov on the zeroes of polynomials:

Conjecture 1 (Sendov’s conjecture) Let ${f: {\bf C} \rightarrow {\bf C}}$ be a polynomial of degree ${n \geq 2}$ that has all zeroes in the closed unit disk ${\{ z: |z| \leq 1 \}}$. If ${\lambda_0}$ is one of these zeroes, then ${f'}$ has at least one zero in ${\{z: |z-\lambda_0| \leq 1\}}$.

It is common in the literature on this problem to normalise ${f}$ to be monic, and to rotate the zero ${\lambda_0}$ to be an element ${a}$ of the unit interval ${[0,1]}$. As it turns out, the location of ${a}$ on this unit interval ${[0,1]}$ ends up playing an important role in the arguments.

Many cases of this conjecture are already known, for instance

• When ${n<9}$ (Brown-Xiang 1999);
• When ${a=0}$ (Gauss-Lucas theorem);
• When ${a \leq \frac{1}{n-1}}$ (Bojanov 2011);
• When ${c \leq a \leq 1-c}$ for a fixed ${c>0}$, and ${n}$ is sufficiently large depending on ${c}$ (Dégot 2014);
• When ${C n^{-1/7} \leq a \leq 1 - C n^{-1/4}}$ for a sufficiently large absolute constant ${C}$ (Chalebgwa 2020);
• When ${a=1}$ (Rubinstein 1968; Goodman-Rahman-Ratti 1969; Joyal 1969);
• When ${a \geq 1-\varepsilon_n}$, where ${\varepsilon_n>0}$ is sufficiently small depending on ${n}$ (Miller 1993; Vajaitu-Zaharescu 1993);
• When ${a \geq 1 - \frac{1}{2 n^9 4^n}}$ (Chijiwa 2011);
• When ${a \geq 1 - \frac{90}{n^{12} \log n}}$ (Kasmalkar 2014).

In particular, in high degrees the only cases left uncovered by prior results are when ${a}$ is close (but not too close) to ${0}$, or when ${a}$ is close (but not too close) to ${1}$; see Figure 1 of my paper.

Our main result covers the high degree case uniformly for all values of ${a \in [0,1]}$:

Theorem 2 There exists an absolute constant ${n_0}$ such that Sendov’s conjecture holds for all ${n \geq n_0}$.

In principle, this reduces the verification of Sendov’s conjecture to a finite time computation, although our arguments use compactness methods and thus do not easily provide an explicit value of ${n_0}$. I believe that the compactness arguments can be replaced with quantitative substitutes that provide an explicit ${n_0}$, but the value of ${n_0}$ produced is likely to be extremely large (certainly much larger than ${9}$).

Because of the previous results (particularly those of Chalebgwa and Chijiwa), we will only need to establish the following two subcases of the above theorem:

Theorem 3 (Sendov’s conjecture near the origin) Under the additional hypothesis ${a = o(1/\log n)}$, Sendov’s conjecture holds for sufficiently large ${n}$.

Theorem 4 (Sendov’s conjecture near the unit circle) Under the additional hypothesis ${1-o(1) \leq a \leq 1 - \varepsilon_0^n}$ for a fixed ${\varepsilon_0>0}$, Sendov’s conjecture holds for sufficiently large ${n}$.

We approach these theorems using the “compactness and contradiction” strategy, assuming that there is a sequence of counterexamples whose degrees ${n}$ going to infinity, using various compactness theorems to extract various asymptotic objects in the limit ${n \rightarrow \infty}$, and somehow using these objects to derive a contradiction. There are many ways to effect such a strategy; we will use a formalism that I call “cheap nonstandard analysis” and which is common in the PDE literature, in which one repeatedly passes to subsequences as necessary whenever one invokes a compactness theorem to create a limit object. However, the particular choice of asymptotic formalism one selects is not of essential importance for the arguments.

I also found it useful to use the language of probability theory. Given a putative counterexample ${f}$ to Sendov’s conjecture, let ${\lambda}$ be a zero of ${f}$ (chosen uniformly at random among the ${n}$ zeroes of ${f}$, counting multiplicity), and let ${\zeta}$ similarly be a uniformly random zero of ${f'}$. We introduce the logarithmic potentials

$\displaystyle U_\lambda(z) := {\bf E} \log \frac{1}{|z-\lambda|}; \quad U_\zeta(z) := {\bf E} \log \frac{1}{|z-\zeta|}$

and the Stieltjes transforms

$\displaystyle s_\lambda(z) := {\bf E} \frac{1}{z-\lambda}; \quad s_\zeta(z) := {\bf E} \log \frac{1}{z-\zeta}.$

Standard calculations using the fundamental theorem of algebra yield the basic identities

$\displaystyle U_\lambda(z) = \frac{1}{n} \log \frac{1}{|f(z)|}; \quad U_\zeta(z) = \frac{1}{n-1} \log \frac{n}{|f'(z)|}$

and

$\displaystyle s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}; \quad s_\zeta(z) = \frac{1}{n-1} \frac{f''(z)}{f'(z)} \ \ \ \ \ (1)$

and in particular the random variables ${\lambda, \zeta}$ are linked to each other by the identity

$\displaystyle U_\lambda(z) - \frac{n-1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|. \ \ \ \ \ (2)$

On the other hand, the hypotheses of Sendov’s conjecture (and the Gauss-Lucas theorem) place ${\lambda,\zeta}$ inside the unit disk ${\{ z:|z| \leq 1\}}$. Applying Prokhorov’s theorem, and passing to a subsequence, one can then assume that the random variables ${\lambda,\zeta}$ converge in distribution to some limiting random variables ${\lambda^{(\infty)}, \zeta^{(\infty)}}$ (possibly defined on a different probability space than the original variables ${\lambda,\zeta}$), also living almost surely inside the unit disk. Standard potential theory then gives the convergence

$\displaystyle U_\lambda(z) \rightarrow U_{\lambda^{(\infty)}}(z); \quad U_\zeta(z) \rightarrow U_{\zeta^{(\infty)}}(z) \ \ \ \ \ (3)$

and

$\displaystyle s_\lambda(z) \rightarrow s_{\lambda^{(\infty)}}(z); \quad s_\zeta(z) \rightarrow s_{\zeta^{(\infty)}}(z) \ \ \ \ \ (4)$

at least in the local ${L^1}$ sense. Among other things, we then conclude from the identity (2) and some elementary inequalities that

$\displaystyle U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)$

for all ${|z|>1}$. This turns out to have an appealing interpretation in terms of Brownian motion: if one takes two Brownian motions in the complex plane, one originating from ${\lambda^{(\infty)}}$ and one originating from ${\zeta^{(\infty)}}$, then the location where these Brownian motions first exit the unit disk ${\{ z: |z| \leq 1 \}}$ will have the same distribution. (In our paper we actually replace Brownian motion with the closely related formalism of balayage.) This turns out to connect the random variables ${\lambda^{(\infty)}}$, ${\zeta^{(\infty)}}$ quite closely to each other. In particular, with this observation and some additional arguments involving both the unique continuation property for harmonic functions and Grace’s theorem (discussed in this previous post), with the latter drawn from the prior work of Dégot, we can get very good control on these distributions:

Theorem 5
• (i) If ${a = o(1)}$, then ${\lambda^{(\infty)}, \zeta^{(\infty)}}$ almost surely lie in the semicircle ${\{ e^{i\theta}: \pi/2 \leq \theta \leq 3\pi/2\}}$ and have the same distribution.
• (ii) If ${a = 1-o(1)}$, then ${\lambda^{(\infty)}}$ is uniformly distributed on the circle ${\{ z: |z|=1\}}$, and ${\zeta^{(\infty)}}$ is almost surely zero.

In case (i) (and strengthening the hypothesis ${a=o(1)}$ to ${a=o(1/\log n)}$ to control some technical contributions of “outlier” zeroes of ${f}$), we can use this information about ${\lambda^{(\infty)}}$ and (4) to ensure that the normalised logarithmic derivative ${\frac{1}{n} \frac{f'}{f} = s_\lambda}$ has a non-negative winding number in a certain small (but not too small) circle around the origin, which by the argument principle is inconsistent with the hypothesis that ${f}$ has a zero at ${a = o(1)}$ and that ${f'}$ has no zeroes near ${a}$. This is how we establish Theorem 3.

Case (ii) turns out to be more delicate. This is because there are a number of “near-counterexamples” to Sendov’s conjecture that are compatible with the hypotheses and conclusion of case (ii). The simplest such example is ${f(z) = z^n - 1}$, where the zeroes ${\lambda}$ of ${f}$ are uniformly distributed amongst the ${n^{th}}$ roots of unity (including at ${a=1}$), and the zeroes of ${f'}$ are all located at the origin. In my paper I also discuss a variant of this construction, in which ${f'}$ has zeroes mostly near the origin, but also acquires a bounded number of zeroes at various locations ${\lambda_1+o(1),\dots,\lambda_m+o(1)}$ inside the unit disk. Specifically, we take

$\displaystyle f(z) := \left(z + \frac{c_2}{n}\right)^{n-m} P(z) - \left(a + \frac{c_2}{n}\right)^{n-m} P(a)$

where ${a = 1 - \frac{c_1}{n}}$ for some constants ${0 < c_1 < c_2}$ and

$\displaystyle P(z) := (z-\lambda_1) \dots (z-\lambda_m).$

By a perturbative analysis to locate the zeroes of ${f}$, one eventually would be able to arrive at a true counterexample to Sendov’s conjecture if these locations ${\lambda_1,\dots,\lambda_m}$ were in the open lune

$\displaystyle \{ \lambda: |\lambda| < 1 < |\lambda-1| \}$

and if one had the inequality

$\displaystyle c_2 - c_1 - c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| < 0 \ \ \ \ \ (5)$

for all ${0 \leq \theta \leq 2\pi}$. However, if one takes the mean of this inequality in ${\theta}$, one arrives at the inequality

$\displaystyle c_2 - c_1 + \sum_{j=1}^m \log |1 - \lambda_j| < 0$

which is incompatible with the hypotheses ${c_2 > c_1}$ and ${|\lambda_j-1| > 1}$. In order to extend this argument to more general polynomials ${f}$, we require a stability analysis of the endpoint equation

$\displaystyle c_2 - c_1 + c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| = 0 \ \ \ \ \ (6)$

where we now only assume the closed conditions ${c_2 \geq c_1}$ and ${|\lambda_j-1| \geq 1}$. The above discussion then places all the zeros ${\lambda_j}$ on the arc

$\displaystyle \{ \lambda: |\lambda| < 1 = |\lambda-1|\} \ \ \ \ \ (7)$

and if one also takes the second Fourier coefficient of (6) one also obtains the vanishing second moment

$\displaystyle \sum_{j=1}^m \lambda_j^2 = 0.$

These two conditions are incompatible with each other (except in the degenerate case when all the ${\lambda_j}$ vanish), because all the non-zero elements ${\lambda}$ of the arc (7) have argument in ${\pm [\pi/3,\pi/2]}$, so in particular their square ${\lambda^2}$ will have negative real part. It turns out that one can adapt this argument to the more general potential counterexamples to Sendov’s conjecture (in the form of Theorem 4). The starting point is to use (1), (4), and Theorem 5(ii) to obtain good control on ${f''/f'}$, which one then integrates and exponentiates to get good control on ${f'}$, and then on a second integration one gets enough information about ${f}$ to pin down the location of its zeroes to high accuracy. The constraint that these zeroes lie inside the unit disk then gives an inequality resembling (5), and an adaptation of the above stability analysis is then enough to conclude. The arguments here are inspired by the previous arguments of Miller, which treated the case when ${a}$ was extremely close to ${1}$ via a similar perturbative analysis; the main novelty is to control the error terms not in terms of the magnitude of the largest zero ${\zeta}$ of ${f'}$ (which is difficult to manage when ${n}$ gets large), but rather by the variance of those zeroes, which ends up being a more tractable expression to keep track of.

Laura Cladek and I have just uploaded to the arXiv our paper “Additive energy of regular measures in one and higher dimensions, and the fractal uncertainty principle“. This paper concerns a continuous version of the notion of additive energy. Given a finite measure ${\mu}$ on ${{\bf R}^d}$ and a scale ${r>0}$, define the energy ${\mathrm{E}(\mu,r)}$ at scale ${r}$ to be the quantity

$\displaystyle \mathrm{E}(\mu,r) := \mu^4\left( \{ (x_1,x_2,x_3,x_4) \in ({\bf R}^d)^4: |x_1+x_2-x_3-x_4| \leq r \}\right) \ \ \ \ \ (1)$

where ${\mu^4}$ is the product measure on ${({\bf R}^d)^4}$ formed from four copies of the measure ${\mu}$ on ${{\bf R}^d}$. We will be interested in Cantor-type measures ${\mu}$, supported on a compact set ${X \subset B(0,1)}$ and obeying the Ahlfors-David regularity condition

$\displaystyle \mu(B(x,r)) \leq C r^\delta$

for all balls ${B(x,r)}$ and some constants ${C, \delta > 0}$, as well as the matching lower bound

$\displaystyle \mu(B(x,r)) \geq C^{-1} r^\delta$

when ${x \in X}$ whenever ${0 < r < 1}$. One should think of ${X}$ as a ${\delta}$-dimensional fractal set, and ${\mu}$ as some vaguely self-similar measure on this set.

Note that once one fixes ${x_1,x_2,x_3}$, the variable ${x_4}$ in (1) is constrained to a ball of radius ${r}$, hence we obtain the trivial upper bound

$\displaystyle \mathrm{E}(\mu,r) \leq C^4 r^\delta. \ \ \ \ \ (2)$

If the set ${X}$ contains a lot of “additive structure”, one can expect this bound to be basically sharp; for instance, if ${\delta}$ is an integer, ${X}$ is a ${\delta}$-dimensional unit disk, and ${\mu}$ is Lebesgue measure on this disk, one can verify that ${\mathrm{E}(\mu,r) \sim r^\delta}$ (where we allow implied constants to depend on ${d,\delta}$. However we show that if the dimension is non-integer, then one obtains a gain:

Theorem 1 If ${0 < \delta < d}$ is not an integer, and ${X, \mu}$ are as above, then

$\displaystyle \mathrm{E}(\mu,r) \lesssim_{C,\delta,d} r^{\delta+\beta}$

for some ${\beta>0}$ depending only on ${C,\delta,d}$.

Informally, this asserts that Ahlfors-David regular fractal sets of non-integer dimension cannot behave as if they are approximately closed under addition. In fact the gain ${\beta}$ we obtain is quasipolynomial in the regularity constant ${C}$:

$\displaystyle \beta = \exp\left( - O_{\delta,d}( 1 + \log^{O_{\delta,d}(1)}(C) ) \right).$

(We also obtain a localised version in which the regularity condition is only required to hold at scales between ${r}$ and ${1}$.) Such a result was previously obtained (with more explicit values of the ${O_{\delta,d}()}$ implied constants) in the one-dimensional case ${d=1}$ by Dyatlov and Zahl; but in higher dimensions there does not appear to have been any results for this general class of sets ${X}$ and measures ${\mu}$. In the paper of Dyatlov and Zahl it is noted that some dependence on ${C}$ is necessary; in particular, ${\beta}$ cannot be much better than ${1/\log C}$. This reflects the fact that there are fractal sets that do behave reasonably well with respect to addition (basically because they are built out of long arithmetic progressions at many scales); however, such sets are not very Ahlfors-David regular. Among other things, this result readily implies a dimension expansion result

$\displaystyle \mathrm{dim}( f( X, X) ) \geq \delta + \beta$

for any non-degenerate smooth map ${f: {\bf R}^d \times {\bf R}^d \rightarrow {\bf R}^d}$, including the sum map ${f(x,y) := x+y}$ and (in one dimension) the product map ${f(x,y) := x \cdot y}$, where the non-degeneracy condition required is that the gradients ${D_x f(x,y), D_y f(x,y): {\bf R}^d \rightarrow {\bf R}^d}$ are invertible for every ${x,y}$. We refer to the paper for the formal statement.

Our higher-dimensional argument shares many features in common with that of Dyatlov and Zahl, notably a reliance on the modern tools of additive combinatorics (and specifically the Bogulybov-Ruzsa lemma of Sanders). However, in one dimension we were also able to find a completely elementary argument, avoiding any particularly advanced additive combinatorics and instead primarily exploiting the order-theoretic properties of the real line, that gave a superior value of ${\beta}$, namely

$\displaystyle \beta := c \min(\delta,1-\delta) C^{-25}.$

One of the main reasons for obtaining such improved energy bounds is that they imply a fractal uncertainty principle in some regimes. We focus attention on the model case of obtaining such an uncertainty principle for the semiclassical Fourier transform

$\displaystyle {\mathcal F}_h f(\xi) := (2\pi h)^{-d/2} \int_{{\bf R}^d} e^{-i x \cdot \xi/h} f(x)\ dx$

where ${h>0}$ is a small parameter. If ${X, \mu, \delta}$ are as above, and ${X_h}$ denotes the ${h}$-neighbourhood of ${X}$, then from the Hausdorff-Young inequality one obtains the trivial bound

$\displaystyle \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right)}.$

(There are also variants involving pairs of sets ${X_h, Y_h}$, but for simplicity we focus on the uncertainty principle for a single set ${X_h}$.) The fractal uncertainty principle, when it applies, asserts that one can improve this to

$\displaystyle \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right) + \beta}$

for some ${\beta>0}$; informally, this asserts that a function and its Fourier transform cannot simultaneously be concentrated in the set ${X_h}$ when ${\delta \leq \frac{d}{2}}$, and that a function cannot be concentrated on ${X_h}$ and have its Fourier transform be of maximum size on ${X_h}$ when ${\delta \geq \frac{d}{2}}$. A modification of the disk example mentioned previously shows that such a fractal uncertainty principle cannot hold if ${\delta}$ is an integer. However, in one dimension, the fractal uncertainty principle is known to hold for all ${0 < \delta < 1}$. The above-mentioned results of Dyatlov and Zahl were able to establish this for ${\delta}$ close to ${1/2}$, and the remaining cases ${1/2 < \delta < 1}$ and ${0 < \delta < 1/2}$ were later established by Bourgain-Dyatlov and Dyatlov-Jin respectively. Such uncertainty principles have applications to hyperbolic dynamics, in particular in establishing spectral gaps for certain Selberg zeta functions.

It remains a largely open problem to establish a fractal uncertainty principle in higher dimensions. Our results allow one to establish such a principle when the dimension ${\delta}$ is close to ${d/2}$, and ${d}$ is assumed to be odd (to make ${d/2}$ a non-integer). There is also work of Han and Schlag that obtains such a principle when one of the copies of ${X_h}$ is assumed to have a product structure. We hope to obtain further higher-dimensional fractal uncertainty principles in subsequent work.

We now sketch how our main theorem is proved. In both one dimension and higher dimensions, the main point is to get a preliminary improvement

$\displaystyle \mathrm{E}(\mu,r_0) \leq \varepsilon r_0^\delta \ \ \ \ \ (3)$

over the trivial bound (2) for any small ${\varepsilon>0}$, provided ${r_0}$ is sufficiently small depending on ${\varepsilon, \delta, d}$; one can then iterate this bound by a fairly standard “induction on scales” argument (which roughly speaking can be used to show that energies ${\mathrm{E}(\mu,r)}$ behave somewhat multiplicatively in the scale parameter ${r}$) to propagate the bound to a power gain at smaller scales. We found that a particularly clean way to run the induction on scales was via use of the Gowers uniformity norm ${U^2}$, and particularly via a clean Fubini-type inequality

$\displaystyle \| f \|_{U^2(V \times V')} \leq \|f\|_{U^2(V; U^2(V'))}$

(ultimately proven using the Gowers-Cauchy-Schwarz inequality) that allows one to “decouple” coarse and fine scale aspects of the Gowers norms (and hence of additive energies).

It remains to obtain the preliminary improvement. In one dimension this is done by identifying some “left edges” of the set ${X}$ that supports ${\mu}$: intervals ${[x, x+K^{-n}]}$ that intersect ${X}$, but such that a large interval ${[x-K^{-n+1},x]}$ just to the left of this interval is disjoint from ${X}$. Here ${K}$ is a large constant and ${n}$ is a scale parameter. It is not difficult to show (using in particular the Archimedean nature of the real line) that if one has the Ahlfors-David regularity condition for some ${0 < \delta < 1}$ then left edges exist in abundance at every scale; for instance most points of ${X}$ would be expected to lie in quite a few of these left edges (much as most elements of, say, the ternary Cantor set ${\{ \sum_{n=1}^\infty \varepsilon_n 3^{-n} \varepsilon_n \in \{0,1\} \}}$ would be expected to contain a lot of ${0}$s in their base ${3}$ expansion). In particular, most pairs ${(x_1,x_2) \in X \times X}$ would be expected to lie in a pair ${[x,x+K^{-n}] \times [y,y+K^{-n}]}$ of left edges of equal length. The key point is then that if ${(x_1,x_2) \in X \times X}$ lies in such a pair with ${K^{-n} \geq r}$, then there are relatively few pairs ${(x_3,x_4) \in X \times X}$ at distance ${O(K^{-n+1})}$ from ${(x_1,x_2)}$ for which one has the relation ${x_1+x_2 = x_3+x_4 + O(r)}$, because ${x_3,x_4}$ will both tend to be to the right of ${x_1,x_2}$ respectively. This causes a decrement in the energy at scale ${K^{-n+1}}$, and by carefully combining all these energy decrements one can eventually cobble together the energy bound (3).

We were not able to make this argument work in higher dimension (though perhaps the cases ${0 < \delta < 1}$ and ${d-1 < \delta < d}$ might not be completely out of reach from these methods). Instead we return to additive combinatorics methods. If the claim (3) failed, then by applying the Balog-Szemeredi-Gowers theorem we can show that the set ${X}$ has high correlation with an approximate group ${H}$, and hence (by the aforementioned Bogulybov-Ruzsa type theorem of Sanders, which is the main source of the quasipolynomial bounds in our final exponent) ${X}$ will exhibit an approximate “symmetry” along some non-trivial arithmetic progression of some spacing length ${r}$ and some diameter ${R \gg r}$. The ${r}$-neighbourhood ${X_r}$ of ${X}$ will then resemble the union of parallel “cylinders” of dimensions ${r \times R}$. If we focus on a typical ${R}$-ball of ${X_r}$, the set now resembles a Cartesian product of an interval of length ${R}$ with a subset of a ${d-1}$-dimensional hyperplane, which behaves approximately like an Ahlfors-David regular set of dimension ${\delta-1}$ (this already lets us conclude a contradiction if ${\delta<1}$). Note that if the original dimension ${\delta}$ was non-integer then this new dimension ${\delta-1}$ will also be non-integer. It is then possible to contradict the failure of (3) by appealing to a suitable induction hypothesis at one lower dimension.