You are currently browsing the category archive for the ‘paper’ category.

I have uploaded to the arXiv my paper “Exploring the toolkit of Jean Bourgain“. This is one of a collection of papers to be published in the Bulletin of the American Mathematical Society describing aspects of the work of Jean Bourgain; other contributors to this collection include Keith Ball, Ciprian Demeter, and Carlos Kenig. Because the other contributors will be covering specific areas of Jean’s work in some detail, I decided to take a non-overlapping tack, and focus instead on some basic tools of Jean that he frequently used across many of the fields he contributed to. Jean had a surprising number of these “basic tools” that he wielded with great dexterity, and in this paper I focus on just a few of them:

• Reducing qualitative analysis results (e.g., convergence theorems or dimension bounds) to quantitative analysis estimates (e.g., variational inequalities or maximal function estimates).
• Using dyadic pigeonholing to locate good scales to work in or to apply truncations.
• Using random translations to amplify small sets (low density) into large sets (positive density).
• Combining large deviation inequalities with metric entropy bounds to control suprema of various random processes.

Each of these techniques is individually not too difficult to explain, and were certainly employed on occasion by various mathematicians prior to Bourgain’s work; but Jean had internalized them to the point where he would instinctively use them as soon as they became relevant to a given problem at hand. I illustrate this at the end of the paper with an exposition of one particular result of Jean, on the Erdős similarity problem, in which his main result (that any sum ${S = S_1+S_2+S_3}$ of three infinite sets of reals has the property that there exists a positive measure set ${E}$ that does not contain any homothetic copy ${x+tS}$ of ${S}$) is basically proven by a sequential application of these tools (except for dyadic pigeonholing, which turns out not to be needed here).

I had initially intended to also cover some other basic tools in Jean’s toolkit, such as the uncertainty principle and the use of probabilistic decoupling, but was having trouble keeping the paper coherent with such a broad focus (certainly I could not identify a single paper of Jean’s that employed all of these tools at once). I hope though that the examples given in the paper gives some reasonable impression of Jean’s research style.

Abdul Basit, Artem Chernikov, Sergei Starchenko, Chiu-Minh Tran and I have uploaded to the arXiv our paper Zarankiewicz’s problem for semilinear hypergraphs. This paper is in the spirit of a number of results in extremal graph theory in which the bounds for various graph-theoretic problems or results can be greatly improved if one makes some additional hypotheses regarding the structure of the graph, for instance by requiring that the graph be “definable” with respect to some theory with good model-theoretic properties.

A basic motivating example is the question of counting the number of incidences between points and lines (or between points and other geometric objects). Suppose one has ${n}$ points and ${n}$ lines in a space. How many incidences can there be between these points and lines? The utterly trivial bound is ${n^2}$, but by using the basic fact that two points determine a line (or two lines intersect in at most one point), a simple application of Cauchy-Schwarz improves this bound to ${n^{3/2}}$. In graph theoretic terms, the point is that the bipartite incidence graph between points and lines does not contain a copy of ${K_{2,2}}$ (there does not exist two points and two lines that are all incident to each other). Without any other further hypotheses, this bound is basically sharp: consider for instance the collection of ${p^2}$ points and ${p^2+p}$ lines in a finite plane ${{\bf F}_p^2}$, that has ${p^3+p^2}$ incidences (one can make the situation more symmetric by working with a projective plane rather than an affine plane). If however one considers lines in the real plane ${{\bf R}^2}$, the famous Szemerédi-Trotter theorem improves the incidence bound further from ${n^{3/2}}$ to ${O(n^{4/3})}$. Thus the incidence graph between real points and lines contains more structure than merely the absence of ${K_{2,2}}$.

More generally, bounding on the size of bipartite graphs (or multipartite hypergraphs) not containing a copy of some complete bipartite subgraph ${K_{k,k}}$ (or ${K_{k,\dots,k}}$ in the hypergraph case) is known as Zarankiewicz’s problem. We have results for all ${k}$ and all orders of hypergraph, but for sake of this post I will focus on the bipartite ${k=2}$ case.

In our paper we improve the ${n^{3/2}}$ bound to a near-linear bound in the case that the incidence graph is “semilinear”. A model case occurs when one considers incidences between points and axis-parallel rectangles in the plane. Now the ${K_{2,2}}$ condition is not automatic (it is of course possible for two distinct points to both lie in two distinct rectangles), so we impose this condition by fiat:

Theorem 1 Suppose one has ${n}$ points and ${n}$ axis-parallel rectangles in the plane, whose incidence graph contains no ${K_{2,2}}$‘s, for some large ${n}$.
• (i) The total number of incidences is ${O(n \log^4 n)}$.
• (ii) If all the rectangles are dyadic, the bound can be improved to ${O( n \frac{\log n}{\log\log n} )}$.
• (iii) The bound in (ii) is best possible (up to the choice of implied constant).

We don’t know whether the bound in (i) is similarly tight for non-dyadic boxes; the usual tricks for reducing the non-dyadic case to the dyadic case strangely fail to apply here. One can generalise to higher dimensions, replacing rectangles by polytopes with faces in some fixed finite set of orientations, at the cost of adding several more logarithmic factors; also, one can replace the reals by other ordered division rings, and replace polytopes by other sets of bounded “semilinear descriptive complexity”, e.g., unions of boundedly many polytopes, or which are cut out by boundedly many functions that enjoy coordinatewise monotonicity properties. For certain specific graphs we can remove the logarithmic factors entirely. We refer to the preprint for precise details.

The proof techniques are combinatorial. The proof of (i) relies primarily on the order structure of ${{\bf R}}$ to implement a “divide and conquer” strategy in which one can efficiently control incidences between ${n}$ points and rectangles by incidences between approximately ${n/2}$ points and boxes. For (ii) there is additional order-theoretic structure one can work with: first there is an easy pruning device to reduce to the case when no rectangle is completely contained inside another, and then one can impose the “tile partial order” in which one dyadic rectangle ${I \times J}$ is less than another ${I' \times J'}$ if ${I \subset I'}$ and ${J' \subset J}$. The point is that this order is “locally linear” in the sense that for any two dyadic rectangles ${R_-, R_+}$, the set ${[R_-,R_+] := \{ R: R_- \leq R \leq R_+\}}$ is linearly ordered, and this can be exploited by elementary double counting arguments to obtain a bound which eventually becomes ${O( n \frac{\log n}{\log\log n})}$ after optimising certain parameters in the argument. The proof also suggests how to construct the counterexample in (iii), which is achieved by an elementary iterative construction.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the minor process applied to a large random Hermitian ${N \times N}$ matrix ${A_N}$, in which one takes the successive upper left ${n \times n}$ minors ${A_n}$ of ${A_N}$ and computes their eigenvalues ${\lambda_1(A_n) \leq \dots \leq \lambda_n(A_n)}$ in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

$\displaystyle \lambda_i(A_{n+1}) \leq \lambda_i(A_n) \leq \lambda_{i+1}(A_{n+1})$

for ${1 \leq i \leq n < N}$, and are often arranged in a triangular array known as a Gelfand-Tsetlin pattern, as discussed in these previous blog posts.

When ${N}$ is large and the matrix ${A_N}$ is a random matrix with empirical spectral distribution converging to some compactly supported probability measure ${\mu}$ on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble ${A_N}$), a “concentration of measure” effect occurs, with the spectral distribution of the minors ${A_n}$ for ${n = \lfloor N/k\rfloor}$ for any fixed ${k \geq 1}$ converging to a specific measure ${k^{-1}_* \mu^{\boxplus k}}$ that depends only on ${\mu}$ and ${k}$. The reason for this notation is that there is a surprising description of this measure ${k^{-1}_* \mu^{\boxplus k}}$ when ${k}$ is a natural number, namely it is the free convolution ${\mu^{\boxplus k}}$ of ${k}$ copies of ${\mu}$, pushed forward by the dilation map ${x \mapsto k^{-1} x}$. For instance, if ${\mu}$ is the Wigner semicircular measure ${d\mu_{sc} = \frac{1}{\pi} (4-x^2)^{1/2}_+\ dx}$, then ${k^{-1}_* \mu_{sc}^{\boxplus k} = k^{-1/2}_* \mu_{sc}}$. At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power ${\mu^{\boxplus k}}$ of ${\mu}$ can be extended to non-integer ${k \geq 1}$, thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

$\displaystyle G_\mu(z) := \int_{\bf R} \frac{d\mu(x)}{z-x}$

of the measure ${\mu}$, and ${\mu^{\boxplus k}}$ can be defined by solving the Burgers-type equation

$\displaystyle (k \partial_k + z \partial_z) G_{\mu^{\boxplus k}}(z) = \frac{\partial_z G_{\mu^{\boxplus k}}(z)}{G_{\mu^{\boxplus k}}(z)} \ \ \ \ \ (1)$

with initial condition ${G_{\mu^{\boxplus 1}} = G_\mu}$ (see this previous blog post for a derivation). This equation can be solved explicitly using the ${R}$-transform ${R_\mu}$ of ${\mu}$, defined by solving the equation

$\displaystyle \frac{1}{G_\mu(z)} + R_\mu(G_\mu(z)) = z$

for sufficiently large ${z}$, in which case one can show that

$\displaystyle R_{\mu^{\boxplus k}}(z) = k R_\mu(z).$

(In the case of the semicircular measure ${\mu_{sc}}$, the ${R}$-transform is simply the identity: ${R_{\mu_{sc}}(z)=z}$.)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if ${A}$ is a noncommutative random variable in a noncommutative probability space ${({\mathcal A},\tau)}$ with distribution ${\mu}$, and ${p}$ is a real projection operator free of ${A}$ with trace ${1/k}$, then the “minor” ${[pAp]}$ of ${A}$ (viewed as an element of a new noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ whose elements are minors ${[pXp]}$, ${X \in {\mathcal A}}$ with trace ${\tau_p([pXp]) := k \tau(pXp)}$) has the law of ${k^{-1}_* \mu^{\boxplus k}}$ (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers ${\mu^{\boxplus k}}$ is monotonicity of the free entropy

$\displaystyle \chi(\mu) = \int_{\bf R} \int_{\bf R} \log|s-t|\ d\mu(s) d\mu(t) + \frac{3}{4} + \frac{1}{2} \log 2\pi$

and free Fisher information

$\displaystyle \Phi(\mu) = \frac{2\pi^2}{3} \int_{\bf R} \left(\frac{d\mu}{dx}\right)^3\ dx$

which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity ${\chi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-decreasing for integer ${k}$, and the Fisher information ${\Phi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-increasing for integer ${k}$. This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional ${k \geq 1}$. We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the free score ${J(X)}$ of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

$\displaystyle \frac{d}{d\varepsilon} \tau( Z P( X + \varepsilon Z) )|_{\varepsilon=0} = \tau( J(X) P(X) )$

whenever ${P}$ is a polynomial and ${Z}$ is a semicircular element free of ${X}$. If ${X}$ has an absolutely continuous law ${\mu = f\ dx}$ for a sufficiently regular ${f}$, one can calculate ${J(X)}$ explicitly as ${J(X) = 2\pi Hf(X)}$, where ${Hf}$ is the Hilbert transform of ${f}$, and the Fisher information is given by the formula

$\displaystyle \Phi(X) = \tau( J(X)^2 ).$

One can also define a notion of relative free score ${J(X:B)}$ relative to some subalgebra ${B}$ of noncommutative random variables.

The free score interacts very well with the free minor process ${X \mapsto [pXp]}$, in particular by standard calculations one can establish the identity

$\displaystyle J( [pXp] : [pBp] ) = k {\bf E}( [p J(X:B) p] | [pXp], [pBp] )$

whenever ${X}$ is a noncommutative random variable, ${B}$ is an algebra of noncommutative random variables, and ${p}$ is a real projection of trace ${1/k}$ that is free of both ${X}$ and ${B}$. The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on ${L^2}$). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when ${\mu}$ is a semicircular measure up to affine rescaling.

After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in ${k}$ and using the differential equation (1). It turns out that if ${d\mu = f\ dx}$ for sufficiently regular ${f}$, then there is an identity

$\displaystyle \partial_k \Phi( k^{-1/2}_* \mu^{\boxplus k} ) = -\frac{1}{2\pi^2} \lim_{\varepsilon \rightarrow 0} \sum_{\alpha,\beta = \pm} f(x) f(y) K(x+i\alpha \varepsilon, y+i\beta \varepsilon)\ dx dy \ \ \ \ \ (2)$

where ${K}$ is the kernel

$\displaystyle K(z,w) := \frac{1}{G(z) G(w)} (\frac{G(z)-G(w)}{z-w} + G(z) G(w))^2$

and ${G(z) := G_\mu(z)}$. It is not difficult to show that ${K(z,\overline{w})}$ is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel ${K}$ and the identity (2).

These monotonicity properties hint at the minor process ${A \mapsto [pAp]}$ being associated to some sort of “gradient flow” in the ${k}$ parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity ${\lambda(s,y)}$ on the “Gelfand-Tsetlin pyramid”

$\displaystyle \Delta = \{ (s,y): 0 < s < 1; 0 < y < s \}$

by the formula

$\displaystyle \mu^{\boxplus 1/s}((-\infty,\lambda(s,y)/s])=y/s,$

which is well defined if the density of ${\mu}$ is sufficiently well behaved. The random matrix interpretation of ${\lambda(s,y)}$ is that it is the asymptotic location of the ${\lfloor yN\rfloor^{th}}$ eigenvalue of the ${\lfloor sN \rfloor \times \lfloor sN \rfloor}$ upper left minor of a random ${N \times N}$ matrix ${A_N}$ with asymptotic empirical spectral distribution ${\mu}$ and with unitarily invariant distribution, thus ${\lambda}$ is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become

$\displaystyle 0 \leq \partial_s \lambda \leq \partial_y \lambda.$

After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation

$\displaystyle \partial_s L_{\lambda_s}(\partial_s \lambda, \partial_y \lambda) + \partial_y L_{\lambda_y}(\partial_s \lambda, \partial_y \lambda) = 0$

where ${L}$ is the Lagrangian density

$\displaystyle L(\lambda_s, \lambda_y) := \log \lambda_y + \log \sin( \pi \frac{\lambda_s}{\lambda_y} ).$

Thus the minor process is formally a critical point of the integral ${\int_\Delta L(\partial_s \lambda, \partial_y \lambda)\ ds dy}$. The quantity ${\partial_y \lambda}$ measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio ${\frac{\partial_s \lambda}{\partial_y \lambda}}$ measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density ${L}$ somehow measures the entropy density of this process.

I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers ${{\bf Z}}$ (or lattices ${{\bf Z}^d}$), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers ${{\bf A}_{\bf Z}}$ (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function ${m: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ then defines a discrete Fourier multiplier operator ${T_m: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})}$ for any ${1 \leq p \leq \infty}$ by the formula

$\displaystyle {\mathcal F}_{\bf Z} T_m f(\xi) =: m(\xi) {\mathcal F}_{\bf Z} f(\xi)$

where ${{\mathcal F}_{\bf Z} f(\xi) := \sum_{n \in {\bf Z}} f(n) e(n \xi)}$ is the Fourier transform on ${{\bf Z}}$; similarly, any test function ${m: {\bf R} \rightarrow {\bf C}}$ defines a continuous Fourier multiplier operator ${T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}$ by the formula

$\displaystyle {\mathcal F}_{\bf R} T_m f(\xi) := m(\xi) {\mathcal F}_{\bf R} f(\xi)$

where ${{\mathcal F}_{\bf R} f(\xi) := \int_{\bf R} f(x) e(x \xi)\ dx}$. In both cases we refer to ${m}$ as the symbol of the multiplier operator ${T_m}$.

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator ${T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}$, and any frequency ${\alpha \in {\bf R}/{\bf Z}}$, we define the discrete Fourier multiplier operator ${T_{m;\alpha}: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})}$ for any frequency shift ${\alpha \in {\bf R}/{\bf Z}}$ by the formula

$\displaystyle {\mathcal F}_{\bf Z} T_{m,\alpha} f(\xi) := \sum_{\theta \in {\bf R}: \xi = \alpha + \theta} m(\theta) {\mathcal F}_{\bf Z} f(\xi)$

or equivalently

$\displaystyle T_{m;\alpha} f(n) = \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.$

More generally, given any finite set ${\Sigma \subset {\bf R}/{\bf Z}}$, we can form a multifrequency projection operator ${T_{m;\Sigma}}$ on ${\ell^p({\bf Z})}$ by the formula

$\displaystyle T_{m;\Sigma} := \sum_{\alpha \in \Sigma} T_{m;\alpha}$

thus

$\displaystyle T_{m;\alpha} f(n) = \sum_{\alpha \in \Sigma} \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.$

This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if ${m: {\bf R} \rightarrow {\bf C}}$ is supported on ${[-\varepsilon,\varepsilon]}$, then ${T_{m;\Sigma}}$ is a Fourier multiplier whose symbol is supported on the set ${\bigcup_{\alpha \in \Sigma} \alpha + [-\varepsilon,\varepsilon]}$.

There are a body of results relating the ${\ell^p({\bf Z})}$ theory of discrete Fourier multiplier operators such as ${T_{m;\alpha}}$ or ${T_{m;\Sigma}}$ with the ${L^p({\bf R})}$ theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

Proposition 1 (Magyar-Stein-Wainger sampling principle) Let ${1 \leq p \leq \infty}$ and ${\alpha \in {\bf R}/{\bf Z}}$.
• (i) If ${m: {\bf R} \rightarrow {\bf C}}$ is a smooth function supported in ${[-1/2,1/2]}$, then ${\|T_{m;\alpha}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}$, where ${B(V)}$ denotes the operator norm of an operator ${T: V \rightarrow V}$.
• (ii) More generally, if ${m: {\bf R} \rightarrow {\bf C}}$ is a smooth function supported in ${[-1/2Q,1/2Q]}$ for some natural number ${Q}$, then ${\|T_{m;\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}$.

When ${p=2}$ the implied constant in these bounds can be set to equal ${1}$. In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other ${p}$; in an appendix to this paper I show that the answer is negative if ${p}$ is sufficiently close to ${1}$ or ${\infty}$, but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups ${\frac{1}{Q}{\bf Z}/{\bf Z}}$; for instance it shows that a discrete Fourier multiplier with symbol ${\sum_{\alpha \in \frac{1}{Q}{\bf Z}/{\bf Z}} \phi(Q(\xi-\alpha))}$ for a fixed test function ${\phi}$ is bounded on ${\ell^p({\bf Z})}$, uniformly in ${p}$ and ${Q}$. For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

$\displaystyle \bigcup_{q=1}^N \bigcup_{\alpha \in \frac{1}{q}{\bf Z}/{\bf Z}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (1)$

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection ${S}$ of pairwise coprime natural numbers, and a natural number ${k}$, one can form the major arc type set

$\displaystyle \bigcup_{\alpha \in \Sigma_{\leq k}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (2)$

where ${\Sigma_{\leq k} \subset {\bf R}/{\bf Z}}$ consists of all rational points in the unit circle of the form ${\frac{a}{Q} \mod 1}$ where ${Q}$ is the product of at most ${k}$ elements from ${S}$ and ${a}$ is an integer. For suitable choices of ${S}$ and ${k}$ not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses ${S}$ to consist of (small powers of) large primes between ${N^\rho}$ and ${N}$ for some small constant ${\rho>0}$, together with something like the product of all the primes up to ${N^\rho}$ (raised to suitable powers)).

In the regime where ${k}$ is fixed and ${\varepsilon}$ is small, there is a good theory:

Theorem 2 (Ionescu-Wainger theorem, rough version) If ${p}$ is an even integer or the dual of an even integer, and ${m: {\bf R} \rightarrow {\bf C}}$ is supported on ${[-\varepsilon,\varepsilon]}$ for a sufficiently small ${\varepsilon > 0}$, then

$\displaystyle \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \lesssim_{p, k} (\log(1+|S|))^{O_k(1)} \|T_m\|_{B(L^p({\bf R}))}.$

There is a more explicit description of how small ${\varepsilon}$ needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs ${\alpha + [-\varepsilon,\varepsilon]}$ in (2) to be disjoint), but we will not give it here. The logarithmic loss of ${(\log(1+|S|))^{O_k(1)}}$ was reduced to ${\log(1+|S|)}$ by Mirek. In this paper we refine the bound further to

$\displaystyle \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \leq O(r \log(2+kr))^k \|T_m\|_{B(L^p({\bf R}))}. \ \ \ \ \ (3)$

when ${p = 2r}$ or ${p = (2r)'}$ for some integer ${r}$. In particular there is no longer any logarithmic loss in the cardinality of the set ${S}$.

The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume ${p=2r}$. We use the following standard sequence of steps:

• (i) (Denominator orthogonality) First one splits ${T_{m;\Sigma_{\leq k}} f}$ into various pieces depending on the denominator ${Q}$ appearing in the element of ${\Sigma_{\leq k}}$, and exploits “superorthogonality” in ${Q}$ to estimate the ${\ell^p}$ norm by the ${\ell^p}$ norm of an appropriate square function.
• (ii) (Nonconcentration) One expands out the ${p^{th}}$ power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
• (iii) (Numerator orthogonality) We now decompose based on the numerators ${a}$ appearing in the relevant elements of ${\Sigma_{\leq k}}$, and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets ${\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}$.
• (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier ${m}$.
• (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

$\displaystyle \sum_{A \in \binom{S}{k}} f_A,$

where ${\binom{S}{k}}$ is the set of ${k}$-element subsets of an index set ${S}$, and ${f_A}$ are various complex numbers, as an average

$\displaystyle \sum_{A \in \binom{S}{k}} f_A = \frac{k^k}{k!} {\bf E} \sum_{s_1 \in {\bf S}_1,\dots,s_k \in {\bf S}_k} f_{\{s_1,\dots,s_k\}}$

where ${S = {\bf S}_1 \cup \dots \cup {\bf S}_k}$ is a random partition of ${S}$ into ${k}$ subclasses (chosen uniformly over all such partitions), basically because every ${k}$-element subset ${A}$ of ${S}$ has a probability exactly ${\frac{k!}{k^k}}$ of being completely shattered by such a random partition. This “decouples” the index set ${\binom{S}{k}}$ into a Cartesian product ${{\bf S}_1 \times \dots \times {\bf S}_k}$ which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form

$\displaystyle (\sum_{A \in \binom{S}{k}} F_A)^r \lesssim_{k,r} \sum_{A_1,\dots,A_r \in \binom{S}{k} \hbox{ sunflower}} F_{A_1} \dots F_{A_r}$

where ${F_A}$ are various non-negative quantities, and a sunflower is a collection of sets ${A_1,\dots,A_r}$ that consist of a common “core” ${A_0}$ and disjoint “petals” ${A_1 \backslash A_0,\dots,A_r \backslash A_0}$. The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case ${k=1}$.

In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function ${f: {\bf R} \rightarrow {\bf C}}$ has its Fourier transform supported on ${[-1/2,1/2]}$, then ${f}$ can be recovered uniquely from its restriction ${f|_{\bf Z}: {\bf Z} \rightarrow {\bf C}}$. In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to ${[-c,c]}$ for some ${0 < c < 1/2}$, then the restriction ${f|_{\bf Z}}$ has the same ${L^p}$ behaviour as the original function, in the sense that

$\displaystyle \| f|_{\bf Z} \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R})} \ \ \ \ \ (4)$

for all ${0 < p \leq \infty}$; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions ${f}$ should behave like a constant at scales ${\sim 1/c}$.

The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

$\displaystyle T_{m;0} (f|_{\bf Z}) = (T_m f)_{\bf Z}$

whenever ${f: {\bf R} \rightarrow {\bf C}}$ is Schwartz and has Fourier transform supported in ${[-1/2,1/2]}$, and ${m}$ is also supported on ${[-1/2,1/2]}$; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of ${{\bf R}}$ is now played by ${{\bf R} \times {\bf Z}/Q{\bf Z}}$, and the standard embedding of ${{\bf Z}}$ into ${{\bf R}}$ is now replaced by the embedding ${\iota_Q: n \mapsto (n, n \hbox{ mod } Q)}$ of ${{\bf Z}}$ into ${{\bf R} \times {\bf Z}/Q{\bf Z}}$; the analogue of (4) is now

$\displaystyle \| f \circ \iota_Q \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R} \times {\bf Z}/Q{\bf Z})} \ \ \ \ \ (5)$

whenever ${f: {\bf R} \times {\bf Z}/Q{\bf Z} \rightarrow {\bf C}}$ is Schwartz and has Fourier transform ${{\mathcal F}_{{\bf R} \times {\bf Z}/Q{\bf Z}} f\colon {\bf R} \times \frac{1}{Q}{\bf Z}/{\bf Z} \rightarrow {\bf C}}$ supported in ${[-c/Q,c/Q] \times \frac{1}{Q}{\bf Z}/{\bf Z}}$, and ${{\bf Z}/Q{\bf Z}}$ is endowed with probability Haar measure.

The locally compact abelian groups ${{\bf R}}$ and ${{\bf R} \times {\bf Z}/Q{\bf Z}}$ can all be viewed as projections of the adelic integers ${{\bf A}_{\bf Z} := {\bf R} \times \hat {\bf Z}}$ (the product of the reals and the profinite integers ${\hat {\bf Z}}$). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

$\displaystyle \| f \circ \iota \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf A}_{\bf Z})}$

whenever ${1 < p < \infty}$, ${f: {\bf A}_{\bf Z} \rightarrow {\bf C}}$ is Schwartz-Bruhat and has Fourier transform ${{\mathcal F}_{{\bf A}_{\bf Z}} f: {\bf R} \times {\bf Q}/{\bf Z} \rightarrow {\bf C}}$ supported on ${[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}$ for some sufficiently small ${\varepsilon}$ (the precise bound on ${\varepsilon}$ depends on ${S, p, c}$ in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the ${\ell^p({\bf Z})}$ operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the ${L^p({\bf A}_{\bf Z})}$ operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers ${{\bf Z}}$ to “low frequency” harmonic analysis on the adelic integers ${{\bf A}_{\bf Z}}$, which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set ${[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}$).

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem) Let ${(X,\mu,T)}$ be a measure-preserving system (by which we mean ${(X,\mu)}$ is a ${\sigma}$-finite measure space, and ${T: X \rightarrow X}$ is invertible and measure-preserving), and let ${f \in L^p(X)}$ for any ${1 \leq p < \infty}$. Then the averages ${\frac{1}{N} \sum_{n=1}^N f(T^n x)}$ converge pointwise for ${\mu}$-almost every ${x \in X}$.

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint ${p=1}$, where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages) Let ${(X,\mu,T)}$ be a measure-preserving system, and let ${f \in L^p(X)}$ for any ${1 < p < \infty}$. Let ${P \in {\bf Z}[{\mathrm n}]}$ be a polynomial with integer coefficients. Then the averages ${\frac{1}{N} \sum_{n=1}^N f(T^{P(n)} x)}$ converge pointwise for ${\mu}$-almost every ${x \in X}$.

For bilinear averages, we have a separate 1990 result of Bourgain (for ${L^\infty}$ functions), extended to other ${L^p}$ spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials) Let ${(X,\mu,T)}$ be a measure-preserving system with finite measure, and let ${f \in L^{p_1}(X)}$, ${g \in L^{p_2}}$ for some ${1 < p_1,p_2 \leq \infty}$ with ${\frac{1}{p_1}+\frac{1}{p_2} < \frac{3}{2}}$. Then for any integers ${a,b}$, the averages ${\frac{1}{N} \sum_{n=1}^N f(T^{an} x) g(T^{bn} x)}$ converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial) Let ${(X,\mu,T)}$ be a measure-preserving system, and let ${f \in L^{p_1}(X)}$, ${g \in L^{p_2}}$ for some ${1 < p_1,p_2 < \infty}$ with ${\frac{1}{p_1}+\frac{1}{p_2} \leq 1}$. Then for any polynomial ${P \in {\bf Z}[{\mathrm n}]}$ of degree ${d \geq 2}$, the averages ${\frac{1}{N} \sum_{n=1}^N f(T^{n} x) g(T^{P(n)} x)}$ converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with ${\frac{1}{p_1}+\frac{1}{p_2}>1}$, but we will not discuss these extensions here. A good model case to keep in mind is when ${p_1=p_2=2}$ and ${P(n) = n^2}$ (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the ${d=2}$ case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions ${f,g}$ involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if ${f,g}$ are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which ${X = {\bf Z}}$ with counting measure, and ${T(n) = n-1}$). A model problem is to establish the maximal inequality

$\displaystyle \| \sup_N |A_N(f,g)| \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \ \ \ \ \ (1)$

where ${N}$ ranges over powers of two and ${A_N}$ is the bilinear operator

$\displaystyle A_N(f,g)(x) := \frac{1}{N} \sum_{n=1}^N f(x-n) g(x-n^2).$

The single scale estimate

$\displaystyle \| A_N(f,g) \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})}$

or equivalently (by duality)

$\displaystyle \frac{1}{N} \sum_{n=1}^N \sum_{x \in {\bf Z}} h(x) f(x-n) g(x-n^2) \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \|h\|_{\ell^\infty({\bf Z})} \ \ \ \ \ (2)$

is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales ${N}$.

The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when ${f(x) = e(ax/q) F(x)}$, ${g(x) = e(bx/q) G(x)}$, ${h(x) = e(cx/q) H(x)}$ where ${q=O(1)}$ is a small modulus, ${a,b,c}$ are such that ${a+b+c=0 \hbox{ mod } q}$, ${G}$ is a smooth cutoff to an interval ${I}$ of length ${O(N^2)}$, and ${F=H}$ is also supported on ${I}$ and behaves like a constant on intervals of length ${O(N)}$. Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials ${P}$ by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when ${f,g,h}$ are supported on a common interval ${I}$ of length ${O(N^2)}$ and are normalised in ${\ell^\infty}$ rather than ${\ell^2}$. (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the ${f,h}$ factors; the corresponding statement for ${g}$ was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the ${\ell^\infty}$-normalized version of (2)).

For our applications we had to extend the ${\ell^\infty}$ inverse theory of Peluse and Prendiville to an ${\ell^2}$ theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

$\displaystyle A^*_N(h,g)(x) = \frac{1}{N} \sum_{n=1}^N h(x+n) g(x+n-n^2)$

can be well approximated in ${\ell^1}$ by a function that has Fourier support on “major arcs” if ${g,h}$ enjoy ${\ell^\infty}$ control. To get the required extension to ${\ell^2}$ in the ${f}$ aspect one has to improve the control on the error from ${\ell^1}$ to ${\ell^2}$; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent ${\ell^p({\bf Z})}$ improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as ${x \mapsto \frac{1}{N} \sum_{n=1}^N g(x+n-n^2)}$, one can relax the ${\ell^\infty}$ hypothesis on ${g}$ to an ${\ell^2}$ hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function ${f}$; a modification of the arguments also gives something similar for ${g}$.

Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when ${f,g}$ are supported near rational numbers ${a/q}$ with ${q \sim 2^l}$ for some moderately large ${l}$. The inverse theory gives good control (with an exponential decay in ${l}$) on individual scales ${N}$, and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of ${A_N}$ to eventually handle all “small” scales, with ${N}$ ranging up to say ${2^{2^u}}$ where ${u = C 2^{\rho l}}$ for some small constant ${\rho}$ and large constant ${C}$. For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator ${Q}$, and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers ${{\bf Z}}$ to the locally compact abelian group ${{\bf R} \times {\bf Z}/Q{\bf Z}}$. Actually it was conceptually clearer for us to work instead with the adelic integers ${{\mathbf A}_{\bf Z} ={\bf R} \times \hat {\bf Z}}$, which is the inverse limit of the ${{\bf R} \times {\bf Z}/Q{\bf Z}}$. Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

$\displaystyle A_{N,{\bf R}}(f,g)(x) := \frac{1}{N} \int_0^N f(x-t) g(x-t^2)\ dt$

on ${{\bf R}}$, and the “arithmetic” bilinear operator

$\displaystyle A_{\hat Z}(f,g)(x) := \int_{\hat {\bf Z}} f(x-y) g(x-y^2) d\mu_{\hat {\bf Z}}(y)$

on the profinite integers ${\hat {\bf Z}}$, equipped with probability Haar measure ${\mu_{\hat {\bf Z}}}$. After a number of standard manipulations (interpolation, Fubini’s theorem, Hölder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an ${L^q}$ improving estimate

$\displaystyle \| A_{\hat {\bf Z}}(f,g) \|_{L^q(\hat {\bf Z})} \lesssim \|f\|_{L^2(\hat {\bf Z})} \|g\|_{L^2(\hat {\bf Z})}$

for some ${q>2}$. Splitting the profinite integers ${\hat {\bf Z}}$ into the product of the ${p}$-adic integers ${{\bf Z}_p}$, it suffices to establish this claim for each ${{\bf Z}_p}$ separately (so long as we keep the implied constant equal to ${1}$ for sufficiently large ${p}$). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic ${L^q}$ improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the ${p}$-adic field ${{\bf Q}_p}$, which are a variant of some estimates of Kowalski and Wright.

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function ${\lambda}$. One form of this conjecture is the assertion that

$\displaystyle \int_0^X \| \lambda \|_{U^k([x,x+H])}\ dx = o(X) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$ for any fixed ${k \geq 0}$ and any ${H = H(X) \leq X}$ that goes to infinity as ${X \rightarrow \infty}$, where ${U^k([x,x+H])}$ is the (normalized) Gowers uniformity norm. Among other things this conjecture implies (logarithmically averaged version of) the Chowla and Sarnak conjectures for the Liouville function (or the Möbius function), see this previous blog post.

The conjecture gets more difficult as ${k}$ increases, and also becomes more difficult the more slowly ${H}$ grows with ${X}$. The ${k=0}$ conjecture is equivalent to the assertion

$\displaystyle \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n)| \ dx = o(HX)$

which was proven (for arbitrarily slowly growing ${H}$) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.

For ${k=1}$, the conjecture is equivalent to the assertion

$\displaystyle \int_0^X \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)| \ dx = o(HX). \ \ \ \ \ (2)$

This remains open for sufficiently slowly growing ${H}$ (and it would be a major breakthrough in particular if one could obtain this bound for ${H}$ as small as ${\log^\varepsilon X}$ for any fixed ${\varepsilon>0}$, particularly if applicable to more general bounded multiplicative functions than ${\lambda}$, as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range ${H \geq X^\varepsilon}$ (in fact we have since worked out in the current paper that we can get ${H}$ as small as ${\exp(\log^{5/8+\varepsilon} X)}$). In our current paper we establish Fourier uniformity conjecture for higher ${k}$ for the same range of ${H}$. This in particular implies local orthogonality to polynomial phases,

$\displaystyle \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) e(-P(n))| \ dx = o(HX) \ \ \ \ \ (3)$

where ${\mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})}$ denotes the polynomials of degree at most ${k-1}$, but the full conjecture is a bit stronger than this, establishing the more general statement

$\displaystyle \int_0^X \sup_{g \in \mathrm{Poly}({\bf R} \rightarrow G)} |\sum_{x \leq n \leq x+H} \lambda(n) \overline{F}(g(n) \Gamma)| \ dx = o(HX) \ \ \ \ \ (4)$

for any degree ${k}$ filtered nilmanifold ${G/\Gamma}$ and Lipschitz function ${F: G/\Gamma \rightarrow {\bf C}}$, where ${g}$ now ranges over polynomial maps from ${{\bf R}}$ to ${G}$. The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate

$\displaystyle |\sum_{x \leq n \leq x+H} \lambda(n) e(-P_x(n))| \gg H$

for many ${x \sim X}$ and some polynomial ${P_x}$ depending on ${x}$. The difficulty here is to understand how ${P_x}$ can depend on ${x}$. We write the above correlation estimate more suggestively as

$\displaystyle \lambda(n) \sim_{[x,x+H]} e(P_x(n)).$

Because of the multiplicativity ${\lambda(np) = -\lambda(p)}$ at small primes ${p}$, one expects to have a relation of the form

$\displaystyle e(P_{x'}(p'n)) \sim_{[x/p,x/p+H/p]} e(P_x(pn)) \ \ \ \ \ (5)$

for many ${x,x'}$ for which ${x/p \approx x'/p'}$ for some small primes ${p,p'}$. (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between ${P_x}$ and ${P_{x'}}$ for “edges” ${x,x'}$ in a rather sparse “graph” connecting the elements of say ${[X/2,X]}$. Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the ${P_x}$ to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form

$\displaystyle P_x(a_x \cdot) \approx P_x(b_x \cdot)$

for some large and close (but not identical) integers ${a_x,b_x}$, where ${\approx}$ should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials ${P_x}$ should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form

$\displaystyle P_x(t) \approx T_x \log t$

for some real number ${T_x}$ of polynomial size, and with further analysis of the relation (5) one can make ${T_x}$ basically independent of ${x}$. This simplifies (3) to something like

$\displaystyle \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}| \ dx = o(HX)$

and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because ${n \mapsto \lambda(n) n^{-iT}}$ is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)

Now we apply the same strategy to (4). For abelian ${G}$ the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence ${g_x \in \mathrm{Poly}({\bf R} \rightarrow G)}$ attached to many ${x \sim X}$, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

$\displaystyle g_x(a_x \cdot) \Gamma \approx g_x(b_x \cdot) \Gamma \ \ \ \ \ (6)$

where the relation ${\approx}$ is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as

$\displaystyle g_x(t) = \gamma^{\frac{\log(a_x t)}{\log(a_x/b_x)}}$

for some ${\gamma \in \Gamma}$, which do not necessarily lead to multiplicative characters like ${n^{-iT}}$ as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of ${e(\lfloor \alpha \log n \rfloor \beta \log n)}$ as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each ${x}$, but many, leading to dilation invariances

$\displaystyle g_x((1+\theta) t) \Gamma \approx g_x(t) \Gamma$

for a “dense” set of ${\theta}$. From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that ${G}$ is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) ${g_x(t)}$ must behave like ${\gamma_x^{\log t}}$ for some central element ${\gamma_x}$ of ${G}$. This eventually brings one back to the multiplicative characters ${n^{-iT}}$ that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

$\displaystyle s(k) := |\{ (\lambda(n+1),\dots,\lambda(n+k)): n \in {\bf N} \}|$

of length ${k}$ sign patterns in the Liouville function. The Chowla conjecture implies that ${s(k) = 2^k}$, but even the weaker conjecture of Sarnak that ${s(k) \gg (1+\varepsilon)^k}$ for some ${\varepsilon>0}$ remains open. Until recently, the best asymptotic lower bound on ${s(k)}$ was ${s(k) \gg k^2}$, due to McNamara; with our result, we can now show ${s(k) \gg_A k^A}$ for any ${A}$ (in fact we can get ${s(k) \gg_\varepsilon \exp(\log^{8/5-\varepsilon} k)}$ for any ${\varepsilon>0}$). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor ${p 1_{p|n}-1}$). However the observation is that if there are extremely few sign patterns of length ${k}$, then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length ${k}$ sign patterns. If there are not extremely few sign patterns of length ${k}$ then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.

The second application is to obtain cancellation for various polynomial averages involving the Liouville function ${\lambda}$ or von Mangoldt function ${\Lambda}$, such as

$\displaystyle {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \lambda(n+P_2(m)) \dots \lambda(n+P_k(m))$

or

$\displaystyle {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \Lambda(n+P_2(m)) \dots \Lambda(n+P_k(m))$

where ${P_1,\dots,P_k}$ are polynomials of degree at most ${d}$, no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials ${P}$ had distinct degrees, in which one could use the ${k=0}$ theory of Matomäki and Radziwill; now that higher ${k}$ is available at the scale ${H=X^{1/d}}$ we can now remove this restriction.

Kari Astala, Steffen Rohde, Eero Saksman and I have (finally!) uploaded to the arXiv our preprint “Homogenization of iterated singular integrals with applications to random quasiconformal maps“. This project started (and was largely completed) over a decade ago, but for various reasons it was not finalised until very recently. The motivation for this project was to study the behaviour of “random” quasiconformal maps. Recall that a (smooth) quasiconformal map is a homeomorphism ${f: {\bf C} \rightarrow {\bf C}}$ that obeys the Beltrami equation

$\displaystyle \frac{\partial f}{\partial \overline{z}} = \mu \frac{\partial f}{\partial z}$

for some Beltrami coefficient ${\mu: {\bf C} \rightarrow D(0,1)}$; this can be viewed as a deformation of the Cauchy-Riemann equation ${\frac{\partial f}{\partial \overline{z}} = 0}$. Assuming that ${f(z)}$ is asymptotic to ${z}$ at infinity, one can (formally, at least) solve for ${f}$ in terms of ${\mu}$ using the Beurling transform

$\displaystyle Tf(z) := \frac{\partial}{\partial z}(\frac{\partial f}{\partial \overline{z}})^{-1}(z) = -\frac{1}{\pi} p.v. \int_{\bf C} \frac{f(w)}{(w-z)^2}\ dw$

by the Neumann series

$\displaystyle \frac{\partial f}{\partial \overline{z}} = \mu + \mu T \mu + \mu T \mu T \mu + \dots.$

We looked at the question of the asymptotic behaviour of ${f}$ if ${\mu = \mu_\delta}$ is a random field that oscillates at some fine spatial scale ${\delta>0}$. A simple model to keep in mind is

$\displaystyle \mu_\delta(z) = \varphi(z) \sum_{n \in {\bf Z}^2} \epsilon_n 1_{n\delta + [0,\delta]^2}(z) \ \ \ \ \ (1)$

where ${\epsilon_n = \pm 1}$ are independent random signs and ${\varphi: {\bf C} \rightarrow D(0,1)}$ is a bump function. For models such as these, we show that a homogenisation occurs in the limit ${\delta \rightarrow 0}$; each multilinear expression

$\displaystyle \mu_\delta T \mu_\delta \dots T \mu_\delta \ \ \ \ \ (2)$

converges weakly in probability (and almost surely, if we restrict ${\delta}$ to a lacunary sequence) to a deterministic limit, and the associated quasiconformal map ${f = f_\delta}$ similarly converges weakly in probability (or almost surely). (Results of this latter type were also recently obtained by Ivrii and Markovic by a more geometric method which is simpler, but is applied to a narrower class of Beltrami coefficients.) In the specific case (1), the limiting quasiconformal map is just the identity map ${f(z)=z}$, but if for instance replaces the ${\epsilon_n}$ by non-symmetric random variables then one can have significantly more complicated limits. The convergence theorem for multilinear expressions such as is not specific to the Beurling transform ${T}$; any other translation and dilation invariant singular integral can be used here.

The random expression (2) is somewhat reminiscent of a moment of a random matrix, and one can start computing it analogously. For instance, if one has a decomposition ${\mu_\delta = \sum_{n \in {\bf Z}^2} \mu_{\delta,n}}$ such as (1), then (2) expands out as a sum

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}$

The random fluctuations of this sum can be treated by a routine second moment estimate, and the main task is to show that the expected value

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}) \ \ \ \ \ (3)$

becomes asymptotically independent of ${\delta}$.

If all the ${n_1,\dots,n_k}$ were distinct then one could use independence to factor the expectation to get

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1}) T \mathop{\bf E}(\mu_{\delta,n_2}) \dots T \mathop{\bf E}(\mu_{\delta,n_k})$

which is a relatively straightforward expression to calculate (particularly in the model (1), where all the expectations here in fact vanish). The main difficulty is that there are a number of configurations in (3) in which various of the ${n_j}$ collide with each other, preventing one from easily factoring the expression. A typical problematic contribution for instance would be a sum of the form

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2}). \ \ \ \ \ (4)$

This is an example of what we call a non-split sum. This can be compared with the split sum

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_2}). \ \ \ \ \ (5)$

If we ignore the constraint ${n_1 \neq n_2}$ in the latter sum, then it splits into

$\displaystyle f_\delta T g_\delta$

where

$\displaystyle f_\delta := \sum_{n_1 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1})$

and

$\displaystyle g_\delta := \sum_{n_2 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_2} T \mu_{\delta,n_2})$

and one can hope to treat this sum by an induction hypothesis. (To actually deal with constraints such as ${n_1 \neq n_2}$ requires an inclusion-exclusion argument that creates some notational headaches but is ultimately manageable.) As the name suggests, the non-split configurations such as (4) cannot be factored in this fashion, and are the most difficult to handle. A direct computation using the triangle inequality (and a certain amount of combinatorics and induction) reveals that these sums are somewhat localised, in that dyadic portions such as

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: |n_1 - n_2| \sim R} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2})$

exhibit power decay in ${R}$ (when measured in suitable function space norms), basically because of the large number of times one has to transition back and forth between ${n_1}$ and ${n_2}$. Thus, morally at least, the dominant contribution to a non-split sum such as (4) comes from the local portion when ${n_2=n_1+O(1)}$. From the translation and dilation invariance of ${T}$ this type of expression then simplifies to something like

$\displaystyle \varphi(z)^4 \sum_{n \in {\bf Z}^2} \eta( \frac{n-z}{\delta} )$

(plus negligible errors) for some reasonably decaying function ${\eta}$, and this can be shown to converge to a weak limit as ${\delta \rightarrow 0}$.

In principle all of these limits are computable, but the combinatorics is remarkably complicated, and while there is certainly some algebraic structure to the calculations, it does not seem to be easily describable in terms of an existing framework (e.g., that of free probability).

Just a short note that the memorial article “Analysis and applications: The mathematical work of Elias Stein” has just been published in the Bulletin of the American Mathematical Society.  This article was a collective effort led by Charlie Fefferman, Alex Ionescu, Steve Wainger and myself to describe the various mathematical contributions of Elias Stein, who passed away in December 2018; it also features contributions from Loredana Lanzani, Akos Magyar, Mariusz Mirek, Alexander Nagel, Duong Phong, Lillian Pierce, Fulvio Ricci, Christopher Sogge, and Brian Street.  (My contribution was mostly focused on Stein’s contribution to restriction theory.)

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv a completely rewritten version of our previous paper, now titled “Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra“. This paper is now a survey of the various literature surrounding the following basic identity in linear algebra, which we propose to call the eigenvector-eigenvalue identity:

Theorem 1 (Eigenvector-eigenvalue identity) Let ${A}$ be an ${n \times n}$ Hermitian matrix, with eigenvalues ${\lambda_1(A),\dots,\lambda_n(A)}$. Let ${v_i}$ be a unit eigenvector corresponding to the eigenvalue ${\lambda_i(A)}$, and let ${v_{i,j}}$ be the ${j^{th}}$ component of ${v_i}$. Then

$\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))$

where ${M_j}$ is the ${n-1 \times n-1}$ Hermitian matrix formed by deleting the ${j^{th}}$ row and column from ${A}$.

When we posted the first version of this paper, we were unaware of previous appearances of this identity in the literature; a related identity had been used by Erdos-Schlein-Yau and by myself and Van Vu for applications to random matrix theory, but to our knowledge this specific identity appeared to be new. Even two months after our preprint first appeared on the arXiv in August, we had only learned of one other place in the literature where the identity showed up (by Forrester and Zhang, who also cite an earlier paper of Baryshnikov).

The situation changed rather dramatically with the publication of a popular science article in Quanta on this identity in November, which gave this result significantly more exposure. Within a few weeks we became informed (through private communication, online discussion, and exploration of the citation tree around the references we were alerted to) of over three dozen places where the identity, or some other closely related identity, had previously appeared in the literature, in such areas as numerical linear algebra, various aspects of graph theory (graph reconstruction, chemical graph theory, and walks on graphs), inverse eigenvalue problems, random matrix theory, and neutrino physics. As a consequence, we have decided to completely rewrite our article in order to collate this crowdsourced information, and survey the history of this identity, all the known proofs (we collect seven distinct ways to prove the identity (or generalisations thereof)), and all the applications of it that we are currently aware of. The citation graph of the literature that this ad hoc crowdsourcing effort produced is only very weakly connected, which we found surprising:

The earliest explicit appearance of the eigenvector-eigenvalue identity we are now aware of is in a 1966 paper of Thompson, although this paper is only cited (directly or indirectly) by a fraction of the known literature, and also there is a precursor identity of Löwner from 1934 that can be shown to imply the identity as a limiting case. At the end of the paper we speculate on some possible reasons why this identity only achieved a modest amount of recognition and dissemination prior to the November 2019 Quanta article.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let ${X = (X,{\mathcal X},\mu)}$ be a probability space, and ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$ be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps ${T: X \rightarrow X}$ that preserve the measure ${\mu}$: ${T_* \mu = \mu}$. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call ${X = (X,{\mathcal X}, \mu)}$ a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let ${\Gamma = (\Gamma,\cdot)}$ be a discrete group. A (concrete) measure-preserving action of ${\Gamma}$ on ${X}$ is a group homomorphism ${\gamma \mapsto T^\gamma}$ from ${\Gamma}$ to ${\mathrm{Aut}(X, {\mathcal X}, \mu)}$, thus ${T^1}$ is the identity map and ${T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}}$ for all ${\gamma_1,\gamma_2 \in \Gamma}$. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when ${\Gamma}$ is the integers (with the additive group law).

Let ${K = (K,+)}$ be a compact Hausdorff abelian group, which we can endow with the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$. A (concrete measurable) ${K}$cocycle is a collection ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ of concrete measurable maps ${\rho_\gamma: X \rightarrow K}$ obeying the cocycle equation

$\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)$

for ${\mu}$-almost every ${x \in X}$. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action ${\gamma \mapsto S^\gamma}$ on ${X \times K}$ (which we endow with the product of ${\mu}$ with Haar probability measure on ${K}$), defined by

$\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).$

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a ${K}$-valued cocycle is a (concrete measurable) ${K}$-valued coboundary, in which ${\rho_\gamma}$ for each ${\gamma \in \Gamma}$ takes the special form

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for ${\mu}$-almost every ${x \in X}$, where ${F: X \rightarrow K}$ is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable ${K}$-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which ${K}$-valued cocycles are in fact ${K}$-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when ${K}$ is the unit circle ${\mathbf{T} = {\bf R}/{\bf Z}}$, by taking advantage of the Pontryagin dual group ${\hat K}$ of characters ${\hat k: K \rightarrow \mathbf{T}}$, that is to say the collection of continuous homomorphisms ${\hat k: k \mapsto \langle \hat k, k \rangle}$ to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting in a concrete measure-preserving fashion on a probability space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Assume the following additional hypotheses:

• (i) ${\Gamma}$ is at most countable.
• (ii) ${X}$ is a standard Borel space.
• (iii) ${K}$ is metrisable.

Then a ${K}$-valued concrete measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is a concrete coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data ${\Gamma, X, K}$ are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to ${K}$ being second countable, and (ii) is equivalent to ${X}$ being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each ${\langle \hat k, \rho \rangle}$ is a coboundary, then so is ${\rho}$. By hypothesis, we then have an equation of the form

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)$

for all ${\hat k, \gamma, x}$ and some functions ${\alpha_{\hat k}: X \rightarrow {\mathbf T}}$, and our task is then to produce a function ${F: X \rightarrow K}$ for which

$\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)$

for all ${\gamma,x}$.

Comparing the two equations, the task would be easy if we could find an ${F: X \rightarrow K}$ for which

$\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)$

for all ${\hat k, x}$. However there is an obstruction to this: the left-hand side of (3) is additive in ${\hat k}$, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)$

for all ${\hat k_1, \hat k_2, x}$. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function ${F}$ obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between ${K}$ and the homomorphisms of the (discrete) group ${\hat K}$ to ${\mathbf{T}}$.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in ${\hat k}$, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

$\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).$

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is ${\Gamma}$-invariant. Now let us assume for sake of argument that the action of ${\Gamma}$ is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only ${\Gamma}$-invariant functions are constant. So now we get a weaker version of (4), namely

$\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)$

for some constants ${c_{\hat k_1, \hat k_2} \in \mathbf{T}}$.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let ${L^0({\bf X} \rightarrow {\bf T})}$ denote the space of concrete measurable maps ${\alpha}$ from ${{\bf X}}$ to ${{\bf T}}$, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup ${{\bf T}}$ of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because ${{\bf T}}$ is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction ${w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}$, that is to say a group homomorphism that is the identity on the subgroup ${{\bf T}}$. We can use this retraction, or more precisely the complement ${\alpha \mapsto \alpha - w(\alpha)}$, to eliminate the constant in (5). Indeed, if we set

$\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})$

then from (5) we see that

$\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0$

while from (2) one has

$\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)$

and now the previous strategy works with ${\alpha_{\hat k}}$ replaced by ${\tilde \alpha_{\hat k}}$. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to ${\hat K}$ being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation ${+: K \times K \rightarrow K}$, while still continuous, can fail to be measurable as a map from ${(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))}$ to ${(K, {\mathcal B}(K))}$! Thus for instance the sum of two measurable functions ${F: X \rightarrow K}$ need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when ${K}$ is the uncountable torus ${{\mathbf T}^{{\bf R}}}$, endowed with the product topology. Crucially, the Borel ${\sigma}$-algebra ${{\mathcal B}(K)}$ generated by this uncountable product is not the product ${{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}}$ of the factor Borel ${\sigma}$-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but ${\sigma}$-algebras do not); relating to this, the product ${\sigma}$-algebra ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$ is not the same as the Borel ${\sigma}$-algebra ${{\mathcal B}(K \times K)}$, but is instead a strict sub-algebra. If the group operations on ${K}$ were measurable, then the diagonal set

$\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}$

would be measurable in ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$. But it is an easy exercise in manipulation of ${\sigma}$-algebras to show that if ${(X, {\mathcal X}), (Y, {\mathcal Y})}$ are any two measurable spaces and ${E \subset X \times Y}$ is measurable in ${{\mathcal X} \otimes {\mathcal Y}}$, then the fibres ${E_x := \{ y \in Y: (x,y) \in E \}}$ of ${E}$ are contained in some countably generated subalgebra of ${{\mathcal Y}}$. Thus if ${K^\Delta}$ were ${{\mathcal B}(K) \otimes {\mathcal B}(K)}$-measurable, then all the points of ${K}$ would lie in a single countably generated ${\sigma}$-algebra. But the cardinality of such an algebra is at most ${2^{\alpha_0}}$ while the cardinality of ${K}$ is ${2^{2^{\alpha_0}}}$, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give ${K}$ a coarser ${\sigma}$-algebra than the Borel ${\sigma}$-algebra, namely the Baire ${\sigma}$-algebra ${{\mathcal B}^\otimes(K)}$, thus coarsening the measurable space structure on ${K = (K,{\mathcal B}(K))}$ to a new measurable space ${K_\otimes := (K, {\mathcal B}^\otimes(K))}$. In the case of compact Hausdorff abelian groups, ${{\mathcal B}^{\otimes}(K)}$ can be defined as the ${\sigma}$-algebra generated by the characters ${\hat k: K \rightarrow {\mathbf T}}$; for more general compact abelian groups, one can define ${{\mathcal B}^{\otimes}(K)}$ as the ${\sigma}$-algebra generated by all continuous maps into metric spaces. This ${\sigma}$-algebra is equal to ${{\mathcal B}(K)}$ when ${K}$ is metrisable but can be smaller for other ${K}$. With this measurable structure, ${K_\otimes}$ becomes a measurable group; it seems that once one leaves the metrisable world that ${K_\otimes}$ is a superior (or at least equally good) space to work with than ${K}$ for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if ${m_K}$ is the Haar probability measure on ${K}$, then ${L^2(K,m_K) = L^2(K_\otimes,m_K)}$ (thus, every ${K}$-measurable set is equivalent modulo ${m_K}$-null sets to a ${K_\otimes}$-measurable set), so there is no damage to Plancherel caused by passing to the Baire ${\sigma}$-algebra.

Passing to the Baire ${\sigma}$-algebra ${K_\otimes}$ fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of ${\sigma}$-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form ${{\mathcal X}^{\mathrm{op}}}$, where ${{\mathcal X}}$ is an (abstract) ${\sigma}$-algebra and ${\mathrm{op}}$ is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$ is an object of the form ${(T^*)^{\mathrm{op}}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is a Boolean algebra homomorphism and ${\mathrm{op}}$ is again used as a formal placeholder; we call ${T^*}$ the pullback map associated to ${T}$.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be $\sigma$-complete (i.e., it respects countable joins).] The composition ${S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ of two abstract measurable maps ${T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}$, ${S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}}$ is defined by the formula ${S \circ T := (T^* \circ S^*)^{\mathrm{op}}}$, or equivalently ${(S \circ T)^* = T^* \circ S^*}$.

Every concrete measurable space ${X = (X,{\mathcal X})}$ can be identified with an abstract counterpart ${{\mathcal X}^{op}}$, and similarly every concrete measurable map ${T: X \rightarrow Y}$ can be identified with an abstract counterpart ${(T^*)^{op}}$, where ${T^*: {\mathcal Y} \rightarrow {\mathcal X}}$ is the pullback map ${T^* E := T^{-1}(E)}$. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra ${X_\mu}$ of ${X}$, defined as ${({\bf X}/{\bf N})^*}$ where ${{\bf N}}$ is the ideal of null sets in ${{\bf X}}$. Informally, ${X_\mu}$ is the space ${X}$ with all the null sets removed; there is a canonical abstract embedding map ${\iota: X_\mu \rightarrow X}$, which allows one to convert any concrete measurable map ${f: X \rightarrow Y}$ into an abstract one ${[f]: X_\mu \rightarrow Y}$. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing ${X}$ with the opposite measure algebra ${X_\mu}$; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let ${\Gamma}$ be a discrete group acting abstractly on a ${\sigma}$-finite measure space ${X}$. Let ${K}$ be a compact Hausdorff abelian group. Then a ${K_\otimes}$-valued abstract measurable cocycle ${\rho = (\rho_\gamma)_{\gamma \in \Gamma}}$ is an abstract coboundary if and only if for each character ${\hat k \in \hat K}$, the ${\mathbf{T}}$-valued cocycles ${\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}}$ are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map ${\alpha_{\hat k}: X_\mu \rightarrow {\bf T}}$ for each ${\hat k \in K}$ obeying the identity ${ \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0}$ for all ${\hat k_1,\hat k_2 \in \hat K}$, then there is an abstract measurable map ${F: X_\mu \rightarrow K_\otimes}$ such that ${\alpha_{\hat k} = \langle \hat k, F \rangle}$ for all ${\hat k \in \hat K}$. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the ${\sigma}$-algebra of ${X_\mu}$, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

• (i) Suppose one has an abstract measurable map ${f: X_\mu \rightarrow Y}$ into a concrete measurable space. Does there exist a representation of ${f}$ by a concrete measurable map ${\tilde f: X \rightarrow Y}$? Is it unique up to almost everywhere equivalence?
• (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

• If ${Y}$ does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If ${Y}$ is not compact or Polish, there can be counterexamples to existence.
• If ${Y}$ is a compact metric space or a Polish space, then one always has existence and uniqueness.
• If ${Y}$ is a compact Hausdorff abelian group, one always has existence.
• If ${X}$ is a complete measure space, then one always has existence (from a theorem of Maharam).
• If ${X}$ is the unit interval with the Borel ${\sigma}$-algebra and Lebesgue measure, then one has existence for all compact Hausdorff ${Y}$ assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
• For more general ${X}$, existence for all compact Hausdorff ${Y}$ is equivalent to the existence of a lifting from the ${\sigma}$-algebra ${\mathcal{X}/\mathcal{N}}$ to ${\mathcal{X}}$ (or, in the language of abstract measurable spaces, the existence of an abstract retraction from ${X}$ to ${X_\mu}$).
• It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever ${Y}$ is compact Hausdorff.

Our understanding of (ii) is much less complete:

• If ${K}$ is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
• If ${\Gamma}$ is at most countable and ${X}$ is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.