You are currently browsing the category archive for the ‘paper’ category.

Ben Green and I have updated our paper “An arithmetic regularity lemma, an associated counting lemma, and applications” to account for a somewhat serious issue with the paper that was pointed out to us recently by Daniel Altman. This paper contains two core theorems:

  • An “arithmetic regularity lemma” that, roughly speaking, decomposes an arbitrary bounded sequence {f(n)} on an interval {\{1,\dots,N\}} as an “irrational nilsequence” {F(g(n) \Gamma)} of controlled complexity, plus some “negligible” errors (where one uses the Gowers uniformity norm as the main norm to control the neglibility of the error); and
  • An “arithmetic counting lemma” that gives an asymptotic formula for counting various averages {{\mathbb E}_{{\bf n} \in {\bf Z}^d \cap P} f(\psi_1({\bf n})) \dots f(\psi_t({\bf n}))} for various affine-linear forms {\psi_1,\dots,\psi_t} when the functions {f} are given by irrational nilsequences.

The combination of the two theorems is then used to address various questions in additive combinatorics.

There are no direct issues with the arithmetic regularity lemma. However, it turns out that the arithmetic counting lemma is only true if one imposes an additional property (which we call the “flag property”) on the affine-linear forms {\psi_1,\dots,\psi_t}. Without this property, there does not appear to be a clean asymptotic formula for these averages if the only hypothesis one places on the underlying nilsequences is irrationality. Thus when trying to understand the asymptotics of averages involving linear forms that do not obey the flag property, the paradigm of understanding these averages via a combination of the regularity lemma and a counting lemma seems to require some significant revision (in particular, one would probably have to replace the existing regularity lemma with some variant, despite the fact that the lemma is still technically true in this setting). Fortunately, for most applications studied to date (including the important subclass of translation-invariant affine forms), the flag property holds; however our claim in the paper to have resolved a conjecture of Gowers and Wolf on the true complexity of systems of affine forms must now be narrowed, as our methods only verify this conjecture under the assumption of the flag property.

In a bit more detail: the asymptotic formula for our counting lemma involved some finite-dimensional vector spaces {\Psi^{[i]}} for various natural numbers {i}, defined as the linear span of the vectors {(\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n}))} as {{\bf n}} ranges over the parameter space {{\bf Z}^d}. Roughly speaking, these spaces encode some constraints one would expect to see amongst the forms {\psi^i_1({\bf n}), \dots, \psi^i_t({\bf n})}. For instance, in the case of length four arithmetic progressions when {d=2}, {{\bf n} = (n,r)}, and

\displaystyle  \psi_i({\bf n}) = n + (i-1)r

for {i=1,2,3,4}, then {\Psi^{[1]}} is spanned by the vectors {(1,1,1,1)} and {(1,2,3,4)} and can thus be described as the two-dimensional linear space

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): a-2b+c = b-2c+d = 0\} \ \ \ \ \ (1)

while {\Psi^{[2]}} is spanned by the vectors {(1,1,1,1)}, {(1,2,3,4)}, {(1^2,2^2,3^2,4^2)} and can be described as the hyperplane

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): a-3b+3c-d = 0 \}. \ \ \ \ \ (2)

As a special case of the counting lemma, we can check that if {f} takes the form {f(n) = F( \alpha n, \beta n^2 + \gamma n)} for some irrational {\alpha,\beta \in {\bf R}/{\bf Z}}, some arbitrary {\gamma \in {\bf R}/{\bf Z}}, and some smooth {F: {\bf R}/{\bf Z} \times {\bf R}/{\bf Z} \rightarrow {\bf C}}, then the limiting value of the average

\displaystyle  {\bf E}_{n, r \in [N]} f(n) f(n+r) f(n+2r) f(n+3r)

as {N \rightarrow \infty} is equal to

\displaystyle  \int_{a_1,b_1,c_1,d_1 \in {\bf R}/{\bf Z}: a_1-2b_1+c_1=b_1-2c_1+d_1=0} \int_{a_2,b_2,c_2,d_2 \in {\bf R}/{\bf Z}: a_2-3b_2+3c_2-d_2=0}

\displaystyle  F(a_1,a_2) F(b_1,b_2) F(c_1,c_2) F(d_1,d_2)

which reflects the constraints

\displaystyle  \alpha n - 2 \alpha(n+r) + \alpha(n+2r) = \alpha(n+r) - 2\alpha(n+2r)+\alpha(n+3r)=0

and

\displaystyle  (\beta n^2 + \gamma n) - 3 (\beta(n+r)^2+\gamma(n+r))

\displaystyle + 3 (\beta(n+2r)^2 +\gamma(n+2r)) - (\beta(n+3r)^2+\gamma(n+3r))=0.

These constraints follow from the descriptions (1), (2), using the containment {\Psi^{[1]} \subset \Psi^{[2]}} to dispense with the lower order term {\gamma n} (which then plays no further role in the analysis).

The arguments in our paper turn out to be perfectly correct under the assumption of the “flag property” that {\Psi^{[i]} \subset \Psi^{[i+1]}} for all {i}. The problem is that the flag property turns out to not always hold. A counterexample, provided by Daniel Altman, involves the four linear forms

\displaystyle  \psi_1(n,r) = r; \psi_2(n,r) = 2n+2r; \psi_3(n,r) = n+3r; \psi_4(n,r) = n.

Here it turns out that

\displaystyle  \Psi^{[1]} = \{ (a,b,c,d): d-c=3a; b-2a=2d\}

and

\displaystyle  \Psi^{[2]} = \{ (a,b,c,d): 24a+3b-4c-8d=0 \}

and {\Psi^{[1]}} is no longer contained in {\Psi^{[2]}}. The analogue of the asymptotic formula given previously for {f(n) = F( \alpha n, \beta n^2 + \gamma n)} is then valid when {\gamma} vanishes, but not when {\gamma} is non-zero, because the identity

\displaystyle  24 (\beta \psi_1(n,r)^2 + \gamma \psi_1(n,r)) + 3 (\beta \psi_2(n,r)^2 + \gamma \psi_2(n,r))

\displaystyle - 4 (\beta \psi_3(n,r)^2 + \gamma \psi_3(n,r)) - 8 (\beta \psi_4(n,r)^2 + \gamma \psi_4(n,r)) = 0

holds in the former case but not the latter. Thus the output of any purported arithmetic regularity lemma in this case is now sensitive to the lower order terms of the nilsequence and cannot be described in a uniform fashion for all “irrational” sequences. There should still be some sort of formula for the asymptotics from the general equidistribution theory of nilsequences, but it could be considerably more complicated than what is presented in this paper.

Fortunately, the flag property does hold in several key cases, most notably the translation invariant case when {\Psi^{[1]}} contains {(1,\dots,1)}, as well as “complexity one” cases. Nevertheless non-flag property systems of affine forms do exist, thus limiting the range of applicability of the techniques in this paper. In particular, the conjecture of Gowers and Wolf (Theorem 1.13 in the paper) is now open again in the non-flag property case.

Several years ago, I developed a public lecture on the cosmic distance ladder in astronomy from a historical perspective (and emphasising the role of mathematics in building the ladder). I previously blogged about the lecture here; the most recent version of the slides can be found here. Recently, I have begun working with Tanya Klowden (a long time friend with a background in popular writing on a variety of topics, including astronomy) to expand the lecture into a popular science book, with the tentative format being non-technical chapters interspersed with some more mathematical sections to give some technical details. We are still in the middle of the writing process, but we have produced a sample chapter (which deals with what we call the “fourth rung” of the distance ladder – the distances and orbits of the planets – and how the work of Copernicus, Brahe, Kepler and others led to accurate measurements of these orbits, as well as Kepler’s famous laws of planetary motion). As always, any feedback on the chapter is welcome. (Due to various pandemic-related uncertainties, we do not have a definite target deadline for when the book will be completed, but presumably this will occur sometime in the next year.)

The book is currently under contract with Yale University Press. My coauthor Tanya Klowden can be reached at tklowden@gmail.com.

Rachel Greenfeld and I have just uploaded to the arXiv our paper “The structure of translational tilings in {{\bf Z}^d}“. This paper studies the tilings {1_F * 1_A = 1} of a finite tile {F} in a standard lattice {{\bf Z}^d}, that is to say sets {A \subset {\bf Z}^d} (which we call tiling sets) such that every element of {{\bf Z}^d} lies in exactly one of the translates {a+F, a \in A} of {F}. We also consider more general tilings of level {k} {1_F * 1_A = k} for a natural number {k} (several of our results consider an even more general setting in which {1_F * 1_A} is periodic but allowed to be non-constant).

In many cases the tiling set {A} will be periodic (by which we mean translation invariant with respect to some lattice (a finite index subgroup) of {{\bf Z}^d}). For instance one simple example of a tiling is when {F \subset {\bf Z}^2} is the unit square {F = \{0,1\}^2} and {A} is the lattice {2{\bf Z}^2 = \{ 2x: x \in {\bf Z}^2\}}. However one can modify some tilings to make them less periodic. For instance, keeping {F = \{0,1\}^2} one also has the tiling set

\displaystyle  A = \{ (2x, 2y+a(x)): x,y \in {\bf Z} \}

where {a: {\bf Z} \rightarrow \{0,1\}} is an arbitrary function. This tiling set is periodic in a single direction {(0,2)}, but is not doubly periodic. For the slightly modified tile {F = \{0,1\} \times \{0,2\}}, the set

\displaystyle  A = \{ (2x, 4y+2a(x)): x,y \in {\bf Z} \} \cup \{ (2x+b(y), 4y+1): x,y \in {\bf Z}\}

for arbitrary {a,b: {\bf Z} \rightarrow \{0,1\}} can be verified to be a tiling set, which in general will not exhibit any periodicity whatsoever; however, it is weakly periodic in the sense that it is the disjoint union of finitely many sets, each of which is periodic in one direction.

The most well known conjecture in this area is the Periodic Tiling Conjecture:

Conjecture 1 (Periodic tiling conjecture) If a finite tile {F \subset {\bf Z}^d} has at least one tiling set, then it has a tiling set which is periodic.

This conjecture was stated explicitly by Lagarias and Wang, and also appears implicitly in this text of Grunbaum and Shepard. In one dimension {d=1} there is a simple pigeonhole principle argument of Newman that shows that all tiling sets are in fact periodic, which certainly implies the periodic tiling conjecture in this case. The {d=2} case was settled more recently by Bhattacharya, but the higher dimensional cases {d > 2} remain open in general.

We are able to obtain a new proof of Bhattacharya’s result that also gives some quantitative bounds on the periodic tiling set, which are polynomial in the diameter of the set if the cardinality {|F|} of the tile is bounded:

Theorem 2 (Quantitative periodic tiling in {{\bf Z}^2}) If a finite tile {F \subset {\bf Z}^2} has at least one tiling set, then it has a tiling set which is {M{\bf Z}^2}-periodic for some {M \ll_{|F|} \mathrm{diam}(F)^{O(|F|^4)}}.

Among other things, this shows that the problem of deciding whether a given subset of {{\bf Z}^2} of bounded cardinality tiles {{\bf Z}^2} or not is in the NP complexity class with respect to the diameter {\mathrm{diam}(F)}. (Even the decidability of this problem was not known until the result of Bhattacharya.)

We also have a closely related structural theorem:

Theorem 3 (Quantitative weakly periodic tiling in {{\bf Z}^2}) Every tiling set of a finite tile {F \subset {\bf Z}^2} is weakly periodic. In fact, the tiling set is the union of at most {|F|-1} disjoint sets, each of which is periodic in a direction of magnitude {O_{|F|}( \mathrm{diam}(F)^{O(|F|^2)})}.

We also have a new bound for the periodicity of tilings in {{\bf Z}}:

Theorem 4 (Universal period for tilings in {{\bf Z}}) Let {F \subset {\bf Z}} be finite, and normalized so that {0 \in F}. Then every tiling set of {F} is {qn}-periodic, where {q} is the least common multiple of all primes up to {2|F|}, and {n} is the least common multiple of the magnitudes {|f|} of all {f \in F \backslash \{0\}}.

We remark that the current best complexity bound of determining whether a subset of {{\bf Z}} tiles {{\bf Z}} or not is {O( \exp(\mathrm{diam}(F)^{1/3+o(1)}))}, due to Biro. It may be that the results in this paper can improve upon this bound, at least for tiles of bounded cardinality.

On the other hand, we discovered a genuine difference between level one tiling and higher level tiling, by locating a counterexample to the higher level analogue of (the qualitative version of) Theorem 3:

Theorem 5 (Counterexample) There exists an eight-element subset {F \subset {\bf Z}^2} and a level {4} tiling {1_F * 1_A = 4} such that {A} is not weakly periodic.

We do not know if there is a corresponding counterexample to the higher level periodic tiling conjecture (that if {F} tiles {{\bf Z}^d} at level {k}, then there is a periodic tiling at the same level {k}). Note that it is important to keep the level fixed, since one trivially always has a periodic tiling at level {|F|} from the identity {1_F * 1 = |F|}.

The methods of Bhattacharya used the language of ergodic theory. Our investigations also originally used ergodic-theoretic and Fourier-analytic techniques, but we ultimately found combinatorial methods to be more effective in this problem (and in particular led to quite strong quantitative bounds). The engine powering all of our results is the following remarkable fact, valid in all dimensions:

Lemma 6 (Dilation lemma) Suppose that {A} is a tiling of a finite tile {F \subset {\bf Z}^d}. Then {A} is also a tiling of the dilated tile {rF} for any {r} coprime to {n}, where {n} is the least common multiple of all the primes up to {|F|}.

Versions of this dilation lemma have previously appeared in work of Tijdeman and of Bhattacharya. We sketch a proof here. By the fundamental theorem of arithmetic and iteration it suffices to establish the case where {r} is a prime {p>|F|}. We need to show that {1_{pF} * 1_A = 1}. It suffices to show the claim {1_{pF} * 1_A = 1 \hbox{ mod } p}, since both sides take values in {\{0,\dots,|F|\} \subset \{0,\dots,p-1\}}. The convolution algebra {{\bf F}_p[{\bf Z}^d]} (or group algebra) of finitely supported functions from {{\bf Z}^d} to {{\bf F}_p} is a commutative algebra of characteristic {p}, so we have the Frobenius identity {(f+g)^{*p} = f^{*p} + g^{*p}} for any {f,g}. As a consequence we see that {1_{pF} = 1_F^{*p} \hbox{ mod } p}. The claim now follows by convolving the identity {1_F * 1_A = 1 \hbox{ mod } p} by {p-1} further copies of {1_F}.

In our paper we actually establish a more general version of the dilation lemma that can handle tilings of higher level or of a periodic set, and this stronger version is useful to get the best quantitative results, but for simplicity we focus attention just on the above simple special case of the dilation lemma.

By averaging over all {r} in an arithmetic progression, one already gets a useful structural theorem for tilings in any dimension, which appears to be new despite being an easy consequence of Lemma 6:

Corollary 7 (Structure theorem for tilings) Suppose that {A} is a tiling of a finite tile {F \subset {\bf Z}^d}, where we normalize {0 \in F}. Then we have a decomposition

\displaystyle  1_A = 1 - \sum_{f \in F \backslash 0} \varphi_f \ \ \ \ \ (1)

where each {\varphi_f: {\bf Z}^d \rightarrow [0,1]} is a function that is periodic in the direction {nf}, where {n} is the least common multiple of all the primes up to {|F|}.

Proof: From Lemma 6 we have {1_A = 1 - \sum_{f \in F \backslash 0} \delta_{rf} * 1_A} for any {r = 1 \hbox{ mod } n}, where {\delta_{rf}} is the Kronecker delta at {rf}. Now average over {r} (extracting a weak limit or generalised limit as necessary) to obtain the conclusion. \Box

The identity (1) turns out to impose a lot of constraints on the functions {\varphi_f}, particularly in one and two dimensions. On one hand, one can work modulo {1} to eliminate the {1_A} and {1} terms to obtain the equation

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f = 0 \hbox{ mod } 1

which in two dimensions in particular puts a lot of structure on each individual {\varphi_f} (roughly speaking it makes the {\varphi_f \hbox{ mod } 1} behave in a polynomial fashion, after collecting commensurable terms). On the other hand we have the inequality

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f \leq 1 \ \ \ \ \ (2)

which can be used to exclude “equidistributed” polynomial behavior after a certain amount of combinatorial analysis. Only a small amount of further argument is then needed to conclude Theorem 3 and Theorem 2.

For level {k} tilings the analogue of (2) becomes

\displaystyle  \sum_{f \in F \backslash 0} \varphi_f \leq k

which is a significantly weaker inequality and now no longer seems to prohibit “equidistributed” behavior. After some trial and error we were able to come up with a completely explicit example of a tiling that actually utilises equidistributed polynomials; indeed the tiling set we ended up with was a finite boolean combination of Bohr sets.

We are currently studying what this machinery can tell us about tilings in higher dimensions, focusing initially on the three-dimensional case.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration“. This paper arose from our longer-term project to systematically develop “uncountable” ergodic theory – ergodic theory in which the groups acting are not required to be countable, the probability spaces one acts on are not required to be standard Borel, or Polish, and the compact groups that arise in the structural theory (e.g., the theory of group extensions) are not required to be separable. One of the motivations of doing this is to allow ergodic theory results to be applied to ultraproducts of finite dynamical systems, which can then hopefully be transferred to establish combinatorial results with good uniformity properties. An instance of this is the uncountable Mackey-Zimmer theorem, discussed in this companion blog post.

In the course of this project, we ran into the obstacle that many foundational results, such as the Riesz representation theorem, often require one or more of these countability hypotheses when encountered in textbooks. Other technical issues also arise in the uncountable setting, such as the need to distinguish the Borel {\sigma}-algebra from the (two different types of) Baire {\sigma}-algebra. As such we needed to spend some time reviewing and synthesizing the known literature on some foundational results of “uncountable” measure theory, which led to this paper. As such, most of the results of this paper are already in the literature, either explicitly or implicitly, in one form or another (with perhaps the exception of the canonical disintegration, which we discuss below); we view the main contribution of this paper as presenting the results in a coherent and unified fashion. In particular we found that the language of category theory was invaluable in clarifying and organizing all the different results. In subsequent work we (and some other authors) will use the results in this paper for various applications in uncountable ergodic theory.

The foundational results covered in this paper can be divided into a number of subtopics (Gelfand duality, Baire {\sigma}-algebras and Riesz representation, canonical models, and canonical disintegration), which we discuss further below the fold.

Read the rest of this entry »

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Mackey-Zimmer theorem“. This paper is part of our longer term project to develop “uncountable” versions of various theorems in ergodic theory; see this previous paper of Asgar and myself for the first paper in this series (and another paper will appear shortly).

In this case the theorem in question is the Mackey-Zimmer theorem, previously discussed in this blog post. This theorem gives an important classification of group and homogeneous extensions of measure-preserving systems. Let us first work in the (classical) setting of concrete measure-preserving systems. Let {Y = (Y, \mu_Y, T_Y)} be a measure-preserving system for some group {\Gamma}, thus {(Y,\mu_Y)} is a (concrete) probability space and {T_Y : \gamma \rightarrow T_Y^\gamma} is a group homomorphism from {\Gamma} to the automorphism group {\mathrm{Aut}(Y,\mu_Y)} of the probability space. (Here we are abusing notation by using {Y} to refer both to the measure-preserving system and to the underlying set. In the notation of the paper we would instead distinguish these two objects as {Y_{\mathbf{ConcPrb}_\Gamma}} and {Y_{\mathbf{Set}}} respectively, reflecting two of the (many) categories one might wish to view {Y} as a member of, but for sake of this informal overview we will not maintain such precise distinctions.) If {K} is a compact group, we define a (concrete) cocycle to be a collection of measurable functions {\rho_\gamma : Y \rightarrow K} for {\gamma \in \Gamma} that obey the cocycle equation

\displaystyle  \rho_{\gamma \gamma'}(y) = \rho_\gamma(T_Y^{\gamma'} y) \rho_{\gamma'}(y) \ \ \ \ \ (1)

for each {\gamma,\gamma' \in \Gamma} and all {y \in Y}. (One could weaken this requirement by only demanding the cocycle equation to hold for almost all {y}, rather than all {y}; we will effectively do so later in the post, when we move to opposite probability algebra systems.) Any such cocycle generates a group skew-product {X = Y \rtimes_\rho K} of {Y}, which is another measure-preserving system {(X, \mu_X, T_X)} where
  • {X = Y \times K} is the Cartesian product of {Y} and {K};
  • {\mu_X = \mu_Y \times \mathrm{Haar}_K} is the product measure of {\mu_Y} and Haar probability measure on {K}; and
  • The action {T_X: \gamma \rightarrow } is given by the formula

    \displaystyle  T_X^\gamma(y,k) := (T_Y^\gamma y, \rho_\gamma(y) k). \ \ \ \ \ (2)

The cocycle equation (1) guarantees that {T_X} is a homomorphism, and the (left) invariance of Haar measure and Fubini’s theorem guarantees that the {T_X^\gamma} remain measure preserving. There is also the more general notion of a homogeneous skew-product {X \times Y \times_\rho K/L} in which the group {K} is replaced by the homogeneous space {K/L} for some closed subgroup of {L}, noting that {K/L} still comes with a left-action of {K} and a Haar measure. Group skew-products are very “explicit” ways to extend a system {Y}, as everything is described by the cocycle {\rho} which is a relatively tractable object to manipulate. (This is not to say that the cohomology of measure-preserving systems is trivial, but at least there are many tools one can use to study them, such as the Moore-Schmidt theorem discussed in this previous post.)

This group skew-product {X} comes with a factor map {\pi: X \rightarrow Y} and a coordinate map {\theta: X \rightarrow K}, which by (2) are related to the action via the identities

\displaystyle  \pi \circ T_X^\gamma = T_Y^\gamma \circ \pi \ \ \ \ \ (3)

and

\displaystyle  \theta \circ T_X^\gamma = (\rho_\gamma \circ \pi) \theta \ \ \ \ \ (4)

where in (4) we are implicitly working in the group of (concretely) measurable functions from {Y} to {K}. Furthermore, the combined map {(\pi,\theta): X \rightarrow Y \times K} is measure-preserving (using the product measure on {Y \times K}), indeed the way we have constructed things this map is just the identity map.

We can now generalize the notion of group skew-product by just working with the maps {\pi, \theta}, and weakening the requirement that {(\pi,\theta)} be measure-preserving. Namely, define a group extension of {Y} by {K} to be a measure-preserving system {(X,\mu_X, T_X)} equipped with a measure-preserving map {\pi: X \rightarrow Y} obeying (3) and a measurable map {\theta: X \rightarrow K} obeying (4) for some cocycle {\rho}, such that the {\sigma}-algebra of {X} is generated by {\pi,\theta}. There is also a more general notion of a homogeneous extension in which {\theta} takes values in {K/L} rather than {K}. Then every group skew-product {Y \rtimes_\rho K} is a group extension of {Y} by {K}, but not conversely. Here are some key counterexamples:

  • (i) If {H} is a closed subgroup of {K}, and {\rho} is a cocycle taking values in {H}, then {Y \rtimes_\rho H} can be viewed as a group extension of {Y} by {K}, taking {\theta: Y \rtimes_\rho H \rightarrow K} to be the vertical coordinate {\theta(y,h) = h} (viewing {h} now as an element of {K}). This will not be a skew-product by {K} because {(\theta,\pi)} pushes forward to the wrong measure on {Y \times K}: it pushes forward to {\mu_Y \times \mathrm{Haar}_H} rather than {\mu_Y \times \mathrm{Haar}_K}.
  • (ii) If one takes the same example as (i), but twists the vertical coordinate {\theta} to another vertical coordinate {\tilde \theta(y,h) := \Phi(y) \theta(y,h)} for some measurable “gauge function” {\Phi: Y \rightarrow K}, then {Y \rtimes_\rho H} is still a group extension by {K}, but now with the cocycle {\rho} replaced by the cohomologous cocycle

    \displaystyle  \tilde \rho_\gamma(y) := \Phi(T_Y^\gamma y) \rho_\gamma \Phi(y)^{-1}.

    Again, this will not be a skew product by {K}, because {(\theta,\pi)} pushes forward to a twisted version of {\mu_Y \times \mathrm{Haar}_H} that is supported (at least in the case where {Y} is compact and the cocycle {\rho} is continuous) on the {H}-bundle {\bigcup_{y \in Y} \{y\} \times \Phi(y) H}.
  • (iii) With the situation as in (i), take {X} to be the union {X = Y \rtimes_\rho H \uplus Y \rtimes_\rho Hk \subset Y \times K} for some {k \in K} outside of {H}, where we continue to use the action (2) and the standard vertical coordinate {\theta: (y,k) \mapsto k} but now use the measure {\mu_Y \times (\frac{1}{2} \mathrm{Haar}_H + \frac{1}{2} \mathrm{Haar}_{Hk})}.

As it turns out, group extensions and homogeneous extensions arise naturally in the Furstenberg-Zimmer structural theory of measure-preserving systems; roughly speaking, every compact extension of {Y} is an inverse limit of group extensions. It is then of interest to classify such extensions.

Examples such as (iii) are annoying, but they can be excluded by imposing the additional condition that the system {(X,\mu_X,T_X)} is ergodic – all invariant (or essentially invariant) sets are of measure zero or measure one. (An essentially invariant set is a measurable subset {E} of {X} such that {T^\gamma E} is equal modulo null sets to {E} for all {\gamma \in \Gamma}.) For instance, the system in (iii) is non-ergodic because the set {Y \times H} (or {Y \times Hk}) is invariant but has measure {1/2}. We then have the following fundamental result of Mackey and Zimmer:

Theorem 1 (Countable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be a concrete measure-preserving system, and {K} be a compact Hausdorff group. Assume that {\Gamma} is at most countable, {Y} is a standard Borel space, and {K} is metrizable. Then every (concrete) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (concrete) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We will not define precisely what “abstractly isomorphic” means here, but it roughly speaking means “isomorphic after quotienting out the null sets”. A proof of this theorem can be found for instance in .

The main result of this paper is to remove the “countability” hypotheses from the above theorem, at the cost of working with opposite probability algebra systems rather than concrete systems. (We will discuss opposite probability algebras in a subsequent blog post relating to another paper in this series.)

Theorem 2 (Uncountable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be an opposite probability algebra measure-preserving system, and {K} be a compact Hausdorff group. Then every (abstract) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (abstract) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We plan to use this result in future work to obtain uncountable versions of the Furstenberg-Zimmer and Host-Kra structure theorems.

As one might expect, one locates a proof of Theorem 2 by finding a proof of Theorem 1 that does not rely too strongly on “countable” tools, such as disintegration or measurable selection, so that all of those tools can be replaced by “uncountable” counterparts. The proof we use is based on the one given in this previous post, and begins by comparing the system {X} with the group extension {Y \rtimes_\rho K}. As the examples (i), (ii) show, these two systems need not be isomorphic even in the ergodic case, due to the different probability measures employed. However one can relate the two after performing an additional averaging in {K}. More precisely, there is a canonical factor map {\Pi: X \rtimes_1 K \rightarrow Y \times_\rho K} given by the formula

\displaystyle  \Pi(x, k) := (\pi(x), \theta(x) k).

This is a factor map not only of {\Gamma}-systems, but actually of {\Gamma \times K^{op}}-systems, where the opposite group {K^{op}} to {K} acts (on the left) by right-multiplication of the second coordinate (this reversal of order is why we need to use the opposite group here). The key point is that the ergodicity properties of the system {Y \times_\rho K} are closely tied the group {H} that is “secretly” controlling the group extension. Indeed, in example (i), the invariant functions on {Y \times_\rho K} take the form {(y,k) \mapsto f(Hk)} for some measurable {f: H \backslash K \rightarrow {\bf C}}, while in example (ii), the invariant functions on {Y \times_{\tilde \rho} K} take the form {(y,k) \mapsto f(H \Phi(y)^{-1} k)}. In either case, the invariant factor is isomorphic to {H \backslash K}, and can be viewed as a factor of the invariant factor of {X \rtimes_1 K}, which is isomorphic to {K}. Pursuing this reasoning (using an abstract ergodic theorem of Alaoglu and Birkhoff, as discussed in the previous post) one obtains the Mackey range {H}, and also obtains the quotient {\tilde \Phi: Y \rightarrow K/H} of {\Phi: Y \rightarrow K} to {K/H} in this process. The main remaining task is to lift the quotient {\tilde \Phi} back up to a map {\Phi: Y \rightarrow K} that stays measurable, in order to “untwist” a system that looks like (ii) to make it into one that looks like (i). In countable settings this is where a “measurable selection theorem” would ordinarily be invoked, but in the uncountable setting such theorems are not available for concrete maps. However it turns out that they still remain available for abstract maps: any abstractly measurable map {\tilde \Phi} from {Y} to {K/H} has an abstractly measurable lift from {Y} to {K}. To prove this we first use a canonical model for opposite probability algebras (which we will discuss in a companion post to this one, to appear shortly) to work with continuous maps (on a Stone space) rather than abstractly measurable maps. The measurable map {\tilde \Phi} then induces a probability measure on {Y \times K/H}, formed by pushing forward {\mu_Y} by the graphing map {y \mapsto (y,\tilde \Phi(y))}. This measure in turn has several lifts up to a probability measure on {Y \times K}; for instance, one can construct such a measure {\overline{\mu}} via the Riesz representation theorem by demanding

\displaystyle  \int_{Y \times K} f(y,k) \overline{\mu}(y,k) := \int_Y (\int_{\tilde \Phi(y) H} f(y,k)\ d\mathrm{Haar}_{\tilde \Phi(y) H})\ d\mu_Y(y)

for all continuous functions {f}. This measure does not come from a graph of any single lift {\Phi: Y \rightarrow K}, but is in some sense an “average” of the entire ensemble of these lifts. But it turns out one can invoke the Krein-Milman theorem to pass to an extremal lifting measure which does come from an (abstract) lift {\Phi}, and this can be used as a substitute for a measurable selection theorem. A variant of this Krein-Milman argument can also be used to express any homogeneous extension as a quotient of a group extension, giving the second part of the Mackey-Zimmer theorem.

I have uploaded to the arXiv my paper “Exploring the toolkit of Jean Bourgain“. This is one of a collection of papers to be published in the Bulletin of the American Mathematical Society describing aspects of the work of Jean Bourgain; other contributors to this collection include Keith Ball, Ciprian Demeter, and Carlos Kenig. Because the other contributors will be covering specific areas of Jean’s work in some detail, I decided to take a non-overlapping tack, and focus instead on some basic tools of Jean that he frequently used across many of the fields he contributed to. Jean had a surprising number of these “basic tools” that he wielded with great dexterity, and in this paper I focus on just a few of them:

  • Reducing qualitative analysis results (e.g., convergence theorems or dimension bounds) to quantitative analysis estimates (e.g., variational inequalities or maximal function estimates).
  • Using dyadic pigeonholing to locate good scales to work in or to apply truncations.
  • Using random translations to amplify small sets (low density) into large sets (positive density).
  • Combining large deviation inequalities with metric entropy bounds to control suprema of various random processes.

Each of these techniques is individually not too difficult to explain, and were certainly employed on occasion by various mathematicians prior to Bourgain’s work; but Jean had internalized them to the point where he would instinctively use them as soon as they became relevant to a given problem at hand. I illustrate this at the end of the paper with an exposition of one particular result of Jean, on the Erdős similarity problem, in which his main result (that any sum {S = S_1+S_2+S_3} of three infinite sets of reals has the property that there exists a positive measure set {E} that does not contain any homothetic copy {x+tS} of {S}) is basically proven by a sequential application of these tools (except for dyadic pigeonholing, which turns out not to be needed here).

I had initially intended to also cover some other basic tools in Jean’s toolkit, such as the uncertainty principle and the use of probabilistic decoupling, but was having trouble keeping the paper coherent with such a broad focus (certainly I could not identify a single paper of Jean’s that employed all of these tools at once). I hope though that the examples given in the paper gives some reasonable impression of Jean’s research style.

Abdul Basit, Artem Chernikov, Sergei Starchenko, Chiu-Minh Tran and I have uploaded to the arXiv our paper Zarankiewicz’s problem for semilinear hypergraphs. This paper is in the spirit of a number of results in extremal graph theory in which the bounds for various graph-theoretic problems or results can be greatly improved if one makes some additional hypotheses regarding the structure of the graph, for instance by requiring that the graph be “definable” with respect to some theory with good model-theoretic properties.

A basic motivating example is the question of counting the number of incidences between points and lines (or between points and other geometric objects). Suppose one has {n} points and {n} lines in a space. How many incidences can there be between these points and lines? The utterly trivial bound is {n^2}, but by using the basic fact that two points determine a line (or two lines intersect in at most one point), a simple application of Cauchy-Schwarz improves this bound to {n^{3/2}}. In graph theoretic terms, the point is that the bipartite incidence graph between points and lines does not contain a copy of {K_{2,2}} (there does not exist two points and two lines that are all incident to each other). Without any other further hypotheses, this bound is basically sharp: consider for instance the collection of {p^2} points and {p^2+p} lines in a finite plane {{\bf F}_p^2}, that has {p^3+p^2} incidences (one can make the situation more symmetric by working with a projective plane rather than an affine plane). If however one considers lines in the real plane {{\bf R}^2}, the famous Szemerédi-Trotter theorem improves the incidence bound further from {n^{3/2}} to {O(n^{4/3})}. Thus the incidence graph between real points and lines contains more structure than merely the absence of {K_{2,2}}.

More generally, bounding on the size of bipartite graphs (or multipartite hypergraphs) not containing a copy of some complete bipartite subgraph {K_{k,k}} (or {K_{k,\dots,k}} in the hypergraph case) is known as Zarankiewicz’s problem. We have results for all {k} and all orders of hypergraph, but for sake of this post I will focus on the bipartite {k=2} case.

In our paper we improve the {n^{3/2}} bound to a near-linear bound in the case that the incidence graph is “semilinear”. A model case occurs when one considers incidences between points and axis-parallel rectangles in the plane. Now the {K_{2,2}} condition is not automatic (it is of course possible for two distinct points to both lie in two distinct rectangles), so we impose this condition by fiat:

Theorem 1 Suppose one has {n} points and {n} axis-parallel rectangles in the plane, whose incidence graph contains no {K_{2,2}}‘s, for some large {n}.
  • (i) The total number of incidences is {O(n \log^4 n)}.
  • (ii) If all the rectangles are dyadic, the bound can be improved to {O( n \frac{\log n}{\log\log n} )}.
  • (iii) The bound in (ii) is best possible (up to the choice of implied constant).

We don’t know whether the bound in (i) is similarly tight for non-dyadic boxes; the usual tricks for reducing the non-dyadic case to the dyadic case strangely fail to apply here. One can generalise to higher dimensions, replacing rectangles by polytopes with faces in some fixed finite set of orientations, at the cost of adding several more logarithmic factors; also, one can replace the reals by other ordered division rings, and replace polytopes by other sets of bounded “semilinear descriptive complexity”, e.g., unions of boundedly many polytopes, or which are cut out by boundedly many functions that enjoy coordinatewise monotonicity properties. For certain specific graphs we can remove the logarithmic factors entirely. We refer to the preprint for precise details.

The proof techniques are combinatorial. The proof of (i) relies primarily on the order structure of {{\bf R}} to implement a “divide and conquer” strategy in which one can efficiently control incidences between {n} points and rectangles by incidences between approximately {n/2} points and boxes. For (ii) there is additional order-theoretic structure one can work with: first there is an easy pruning device to reduce to the case when no rectangle is completely contained inside another, and then one can impose the “tile partial order” in which one dyadic rectangle {I \times J} is less than another {I' \times J'} if {I \subset I'} and {J' \subset J}. The point is that this order is “locally linear” in the sense that for any two dyadic rectangles {R_-, R_+}, the set {[R_-,R_+] := \{ R: R_- \leq R \leq R_+\}} is linearly ordered, and this can be exploited by elementary double counting arguments to obtain a bound which eventually becomes {O( n \frac{\log n}{\log\log n})} after optimising certain parameters in the argument. The proof also suggests how to construct the counterexample in (iii), which is achieved by an elementary iterative construction.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the minor process applied to a large random Hermitian {N \times N} matrix {A_N}, in which one takes the successive upper left {n \times n} minors {A_n} of {A_N} and computes their eigenvalues {\lambda_1(A_n) \leq \dots \leq \lambda_n(A_n)} in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

\displaystyle  \lambda_i(A_{n+1}) \leq \lambda_i(A_n) \leq \lambda_{i+1}(A_{n+1})

for {1 \leq i \leq n < N}, and are often arranged in a triangular array known as a Gelfand-Tsetlin pattern, as discussed in these previous blog posts.

When {N} is large and the matrix {A_N} is a random matrix with empirical spectral distribution converging to some compactly supported probability measure {\mu} on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble {A_N}), a “concentration of measure” effect occurs, with the spectral distribution of the minors {A_n} for {n = \lfloor N/k\rfloor} for any fixed {k \geq 1} converging to a specific measure {k^{-1}_* \mu^{\boxplus k}} that depends only on {\mu} and {k}. The reason for this notation is that there is a surprising description of this measure {k^{-1}_* \mu^{\boxplus k}} when {k} is a natural number, namely it is the free convolution {\mu^{\boxplus k}} of {k} copies of {\mu}, pushed forward by the dilation map {x \mapsto k^{-1} x}. For instance, if {\mu} is the Wigner semicircular measure {d\mu_{sc} = \frac{1}{\pi} (4-x^2)^{1/2}_+\ dx}, then {k^{-1}_* \mu_{sc}^{\boxplus k} = k^{-1/2}_* \mu_{sc}}. At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power {\mu^{\boxplus k}} of {\mu} can be extended to non-integer {k \geq 1}, thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

\displaystyle  G_\mu(z) := \int_{\bf R} \frac{d\mu(x)}{z-x}

of the measure {\mu}, and {\mu^{\boxplus k}} can be defined by solving the Burgers-type equation

\displaystyle  (k \partial_k + z \partial_z) G_{\mu^{\boxplus k}}(z) = \frac{\partial_z G_{\mu^{\boxplus k}}(z)}{G_{\mu^{\boxplus k}}(z)} \ \ \ \ \ (1)

with initial condition {G_{\mu^{\boxplus 1}} = G_\mu} (see this previous blog post for a derivation). This equation can be solved explicitly using the {R}-transform {R_\mu} of {\mu}, defined by solving the equation

\displaystyle  \frac{1}{G_\mu(z)} + R_\mu(G_\mu(z)) = z

for sufficiently large {z}, in which case one can show that

\displaystyle  R_{\mu^{\boxplus k}}(z) = k R_\mu(z).

(In the case of the semicircular measure {\mu_{sc}}, the {R}-transform is simply the identity: {R_{\mu_{sc}}(z)=z}.)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if {A} is a noncommutative random variable in a noncommutative probability space {({\mathcal A},\tau)} with distribution {\mu}, and {p} is a real projection operator free of {A} with trace {1/k}, then the “minor” {[pAp]} of {A} (viewed as an element of a new noncommutative probability space {({\mathcal A}_p, \tau_p)} whose elements are minors {[pXp]}, {X \in {\mathcal A}} with trace {\tau_p([pXp]) := k \tau(pXp)}) has the law of {k^{-1}_* \mu^{\boxplus k}} (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers {\mu^{\boxplus k}} is monotonicity of the free entropy

\displaystyle  \chi(\mu) = \int_{\bf R} \int_{\bf R} \log|s-t|\ d\mu(s) d\mu(t) + \frac{3}{4} + \frac{1}{2} \log 2\pi

and free Fisher information

\displaystyle  \Phi(\mu) = \frac{2\pi^2}{3} \int_{\bf R} \left(\frac{d\mu}{dx}\right)^3\ dx

which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity {\chi(k^{-1/2}_* \mu^{\boxplus k})} is monotone non-decreasing for integer {k}, and the Fisher information {\Phi(k^{-1/2}_* \mu^{\boxplus k})} is monotone non-increasing for integer {k}. This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional {k \geq 1}. We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the free score {J(X)} of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

\displaystyle  \frac{d}{d\varepsilon} \tau( Z P( X + \varepsilon Z) )|_{\varepsilon=0} = \tau( J(X) P(X) )

whenever {P} is a polynomial and {Z} is a semicircular element free of {X}. If {X} has an absolutely continuous law {\mu = f\ dx} for a sufficiently regular {f}, one can calculate {J(X)} explicitly as {J(X) = 2\pi Hf(X)}, where {Hf} is the Hilbert transform of {f}, and the Fisher information is given by the formula

\displaystyle  \Phi(X) = \tau( J(X)^2 ).

One can also define a notion of relative free score {J(X:B)} relative to some subalgebra {B} of noncommutative random variables.

The free score interacts very well with the free minor process {X \mapsto [pXp]}, in particular by standard calculations one can establish the identity

\displaystyle  J( [pXp] : [pBp] ) = k {\bf E}( [p J(X:B) p] | [pXp], [pBp] )

whenever {X} is a noncommutative random variable, {B} is an algebra of noncommutative random variables, and {p} is a real projection of trace {1/k} that is free of both {X} and {B}. The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on {L^2}). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when {\mu} is a semicircular measure up to affine rescaling.

After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in {k} and using the differential equation (1). It turns out that if {d\mu = f\ dx} for sufficiently regular {f}, then there is an identity

\displaystyle  \partial_k \Phi( k^{-1/2}_* \mu^{\boxplus k} ) = -\frac{1}{2\pi^2} \lim_{\varepsilon \rightarrow 0} \sum_{\alpha,\beta = \pm} f(x) f(y) K(x+i\alpha \varepsilon, y+i\beta \varepsilon)\ dx dy \ \ \ \ \ (2)

where {K} is the kernel

\displaystyle  K(z,w) := \frac{1}{G(z) G(w)} (\frac{G(z)-G(w)}{z-w} + G(z) G(w))^2

and {G(z) := G_\mu(z)}. It is not difficult to show that {K(z,\overline{w})} is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel {K} and the identity (2).

These monotonicity properties hint at the minor process {A \mapsto [pAp]} being associated to some sort of “gradient flow” in the {k} parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space {({\mathcal A}_p, \tau_p)} even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity {\lambda(s,y)} on the “Gelfand-Tsetlin pyramid”

\displaystyle  \Delta = \{ (s,y): 0 < s < 1; 0 < y < s \}

by the formula

\displaystyle  \mu^{\boxplus 1/s}((-\infty,\lambda(s,y)/s])=y/s,

which is well defined if the density of {\mu} is sufficiently well behaved. The random matrix interpretation of {\lambda(s,y)} is that it is the asymptotic location of the {\lfloor yN\rfloor^{th}} eigenvalue of the {\lfloor sN \rfloor \times \lfloor sN \rfloor} upper left minor of a random {N \times N} matrix {A_N} with asymptotic empirical spectral distribution {\mu} and with unitarily invariant distribution, thus {\lambda} is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become

\displaystyle  0 \leq \partial_s \lambda \leq \partial_y \lambda.

After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation

\displaystyle  \partial_s L_{\lambda_s}(\partial_s \lambda, \partial_y \lambda) + \partial_y L_{\lambda_y}(\partial_s \lambda, \partial_y \lambda) = 0

where {L} is the Lagrangian density

\displaystyle  L(\lambda_s, \lambda_y) := \log \lambda_y + \log \sin( \pi \frac{\lambda_s}{\lambda_y} ).

Thus the minor process is formally a critical point of the integral {\int_\Delta L(\partial_s \lambda, \partial_y \lambda)\ ds dy}. The quantity {\partial_y \lambda} measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio {\frac{\partial_s \lambda}{\partial_y \lambda}} measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density {L} somehow measures the entropy density of this process.

I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers {{\bf Z}} (or lattices {{\bf Z}^d}), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers {{\bf A}_{\bf Z}} (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function {m: {\bf R}/{\bf Z} \rightarrow {\bf C}} then defines a discrete Fourier multiplier operator {T_m: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any {1 \leq p \leq \infty} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_m f(\xi) =: m(\xi) {\mathcal F}_{\bf Z} f(\xi)

where {{\mathcal F}_{\bf Z} f(\xi) := \sum_{n \in {\bf Z}} f(n) e(n \xi)} is the Fourier transform on {{\bf Z}}; similarly, any test function {m: {\bf R} \rightarrow {\bf C}} defines a continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})} by the formula

\displaystyle  {\mathcal F}_{\bf R} T_m f(\xi) := m(\xi) {\mathcal F}_{\bf R} f(\xi)

where {{\mathcal F}_{\bf R} f(\xi) := \int_{\bf R} f(x) e(x \xi)\ dx}. In both cases we refer to {m} as the symbol of the multiplier operator {T_m}.

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}, and any frequency {\alpha \in {\bf R}/{\bf Z}}, we define the discrete Fourier multiplier operator {T_{m;\alpha}: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any frequency shift {\alpha \in {\bf R}/{\bf Z}} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_{m,\alpha} f(\xi) := \sum_{\theta \in {\bf R}: \xi = \alpha + \theta} m(\theta) {\mathcal F}_{\bf Z} f(\xi)

or equivalently

\displaystyle  T_{m;\alpha} f(n) = \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

More generally, given any finite set {\Sigma \subset {\bf R}/{\bf Z}}, we can form a multifrequency projection operator {T_{m;\Sigma}} on {\ell^p({\bf Z})} by the formula

\displaystyle  T_{m;\Sigma} := \sum_{\alpha \in \Sigma} T_{m;\alpha}

thus

\displaystyle  T_{m;\alpha} f(n) = \sum_{\alpha \in \Sigma} \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]}, then {T_{m;\Sigma}} is a Fourier multiplier whose symbol is supported on the set {\bigcup_{\alpha \in \Sigma} \alpha + [-\varepsilon,\varepsilon]}.

There are a body of results relating the {\ell^p({\bf Z})} theory of discrete Fourier multiplier operators such as {T_{m;\alpha}} or {T_{m;\Sigma}} with the {L^p({\bf R})} theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

Proposition 1 (Magyar-Stein-Wainger sampling principle) Let {1 \leq p \leq \infty} and {\alpha \in {\bf R}/{\bf Z}}.
  • (i) If {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2,1/2]}, then {\|T_{m;\alpha}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}, where {B(V)} denotes the operator norm of an operator {T: V \rightarrow V}.
  • (ii) More generally, if {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2Q,1/2Q]} for some natural number {Q}, then {\|T_{m;\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}.

When {p=2} the implied constant in these bounds can be set to equal {1}. In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other {p}; in an appendix to this paper I show that the answer is negative if {p} is sufficiently close to {1} or {\infty}, but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups {\frac{1}{Q}{\bf Z}/{\bf Z}}; for instance it shows that a discrete Fourier multiplier with symbol {\sum_{\alpha \in \frac{1}{Q}{\bf Z}/{\bf Z}} \phi(Q(\xi-\alpha))} for a fixed test function {\phi} is bounded on {\ell^p({\bf Z})}, uniformly in {p} and {Q}. For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

\displaystyle  \bigcup_{q=1}^N \bigcup_{\alpha \in \frac{1}{q}{\bf Z}/{\bf Z}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (1)

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection {S} of pairwise coprime natural numbers, and a natural number {k}, one can form the major arc type set

\displaystyle  \bigcup_{\alpha \in \Sigma_{\leq k}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (2)

where {\Sigma_{\leq k} \subset {\bf R}/{\bf Z}} consists of all rational points in the unit circle of the form {\frac{a}{Q} \mod 1} where {Q} is the product of at most {k} elements from {S} and {a} is an integer. For suitable choices of {S} and {k} not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses {S} to consist of (small powers of) large primes between {N^\rho} and {N} for some small constant {\rho>0}, together with something like the product of all the primes up to {N^\rho} (raised to suitable powers)).

In the regime where {k} is fixed and {\varepsilon} is small, there is a good theory:

Theorem 2 (Ionescu-Wainger theorem, rough version) If {p} is an even integer or the dual of an even integer, and {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]} for a sufficiently small {\varepsilon > 0}, then

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \lesssim_{p, k} (\log(1+|S|))^{O_k(1)} \|T_m\|_{B(L^p({\bf R}))}.

There is a more explicit description of how small {\varepsilon} needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs {\alpha + [-\varepsilon,\varepsilon]} in (2) to be disjoint), but we will not give it here. The logarithmic loss of {(\log(1+|S|))^{O_k(1)}} was reduced to {\log(1+|S|)} by Mirek. In this paper we refine the bound further to

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \leq O(r \log(2+kr))^k \|T_m\|_{B(L^p({\bf R}))}. \ \ \ \ \ (3)

when {p = 2r} or {p = (2r)'} for some integer {r}. In particular there is no longer any logarithmic loss in the cardinality of the set {S}.

The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume {p=2r}. We use the following standard sequence of steps:

  • (i) (Denominator orthogonality) First one splits {T_{m;\Sigma_{\leq k}} f} into various pieces depending on the denominator {Q} appearing in the element of {\Sigma_{\leq k}}, and exploits “superorthogonality” in {Q} to estimate the {\ell^p} norm by the {\ell^p} norm of an appropriate square function.
  • (ii) (Nonconcentration) One expands out the {p^{th}} power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
  • (iii) (Numerator orthogonality) We now decompose based on the numerators {a} appearing in the relevant elements of {\Sigma_{\leq k}}, and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets {\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}.
  • (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier {m}.
  • (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

\displaystyle  \sum_{A \in \binom{S}{k}} f_A,

where {\binom{S}{k}} is the set of {k}-element subsets of an index set {S}, and {f_A} are various complex numbers, as an average

\displaystyle  \sum_{A \in \binom{S}{k}} f_A = \frac{k^k}{k!} {\bf E} \sum_{s_1 \in {\bf S}_1,\dots,s_k \in {\bf S}_k} f_{\{s_1,\dots,s_k\}}

where {S = {\bf S}_1 \cup \dots \cup {\bf S}_k} is a random partition of {S} into {k} subclasses (chosen uniformly over all such partitions), basically because every {k}-element subset {A} of {S} has a probability exactly {\frac{k!}{k^k}} of being completely shattered by such a random partition. This “decouples” the index set {\binom{S}{k}} into a Cartesian product {{\bf S}_1 \times \dots \times {\bf S}_k} which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form

\displaystyle  (\sum_{A \in \binom{S}{k}} F_A)^r \lesssim_{k,r} \sum_{A_1,\dots,A_r \in \binom{S}{k} \hbox{ sunflower}} F_{A_1} \dots F_{A_r}

where {F_A} are various non-negative quantities, and a sunflower is a collection of sets {A_1,\dots,A_r} that consist of a common “core” {A_0} and disjoint “petals” {A_1 \backslash A_0,\dots,A_r \backslash A_0}. The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case {k=1}.

In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function {f: {\bf R} \rightarrow {\bf C}} has its Fourier transform supported on {[-1/2,1/2]}, then {f} can be recovered uniquely from its restriction {f|_{\bf Z}: {\bf Z} \rightarrow {\bf C}}. In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to {[-c,c]} for some {0 < c < 1/2}, then the restriction {f|_{\bf Z}} has the same {L^p} behaviour as the original function, in the sense that

\displaystyle  \| f|_{\bf Z} \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R})} \ \ \ \ \ (4)

for all {0 < p \leq \infty}; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions {f} should behave like a constant at scales {\sim 1/c}.

The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

\displaystyle  T_{m;0} (f|_{\bf Z}) = (T_m f)_{\bf Z}

whenever {f: {\bf R} \rightarrow {\bf C}} is Schwartz and has Fourier transform supported in {[-1/2,1/2]}, and {m} is also supported on {[-1/2,1/2]}; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of {{\bf R}} is now played by {{\bf R} \times {\bf Z}/Q{\bf Z}}, and the standard embedding of {{\bf Z}} into {{\bf R}} is now replaced by the embedding {\iota_Q: n \mapsto (n, n \hbox{ mod } Q)} of {{\bf Z}} into {{\bf R} \times {\bf Z}/Q{\bf Z}}; the analogue of (4) is now

\displaystyle  \| f \circ \iota_Q \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R} \times {\bf Z}/Q{\bf Z})} \ \ \ \ \ (5)

whenever {f: {\bf R} \times {\bf Z}/Q{\bf Z} \rightarrow {\bf C}} is Schwartz and has Fourier transform {{\mathcal F}_{{\bf R} \times {\bf Z}/Q{\bf Z}} f\colon {\bf R} \times \frac{1}{Q}{\bf Z}/{\bf Z} \rightarrow {\bf C}} supported in {[-c/Q,c/Q] \times \frac{1}{Q}{\bf Z}/{\bf Z}}, and {{\bf Z}/Q{\bf Z}} is endowed with probability Haar measure.

The locally compact abelian groups {{\bf R}} and {{\bf R} \times {\bf Z}/Q{\bf Z}} can all be viewed as projections of the adelic integers {{\bf A}_{\bf Z} := {\bf R} \times \hat {\bf Z}} (the product of the reals and the profinite integers {\hat {\bf Z}}). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

\displaystyle  \| f \circ \iota \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf A}_{\bf Z})}

whenever {1 < p < \infty}, {f: {\bf A}_{\bf Z} \rightarrow {\bf C}} is Schwartz-Bruhat and has Fourier transform {{\mathcal F}_{{\bf A}_{\bf Z}} f: {\bf R} \times {\bf Q}/{\bf Z} \rightarrow {\bf C}} supported on {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}} for some sufficiently small {\varepsilon} (the precise bound on {\varepsilon} depends on {S, p, c} in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the {\ell^p({\bf Z})} operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the {L^p({\bf A}_{\bf Z})} operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers {{\bf Z}} to “low frequency” harmonic analysis on the adelic integers {{\bf A}_{\bf Z}}, which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}).

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (by which we mean {(X,\mu)} is a {\sigma}-finite measure space, and {T: X \rightarrow X} is invertible and measure-preserving), and let {f \in L^p(X)} for any {1 \leq p < \infty}. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^n x)} converge pointwise for {\mu}-almost every {x \in X}.

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint {p=1}, where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^p(X)} for any {1 < p < \infty}. Let {P \in {\bf Z}[{\mathrm n}]} be a polynomial with integer coefficients. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^{P(n)} x)} converge pointwise for {\mu}-almost every {x \in X}.

For bilinear averages, we have a separate 1990 result of Bourgain (for {L^\infty} functions), extended to other {L^p} spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials) Let {(X,\mu,T)} be a measure-preserving system with finite measure, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 \leq \infty} with {\frac{1}{p_1}+\frac{1}{p_2} < \frac{3}{2}}. Then for any integers {a,b}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{an} x) g(T^{bn} x)} converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 < \infty} with {\frac{1}{p_1}+\frac{1}{p_2} \leq 1}. Then for any polynomial {P \in {\bf Z}[{\mathrm n}]} of degree {d \geq 2}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{n} x) g(T^{P(n)} x)} converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with {\frac{1}{p_1}+\frac{1}{p_2}>1}, but we will not discuss these extensions here. A good model case to keep in mind is when {p_1=p_2=2} and {P(n) = n^2} (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the {d=2} case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions {f,g} involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if {f,g} are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which {X = {\bf Z}} with counting measure, and {T(n) = n-1}). A model problem is to establish the maximal inequality

\displaystyle  \| \sup_N |A_N(f,g)| \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \ \ \ \ \ (1)

where {N} ranges over powers of two and {A_N} is the bilinear operator

\displaystyle  A_N(f,g)(x) := \frac{1}{N} \sum_{n=1}^N f(x-n) g(x-n^2).

The single scale estimate

\displaystyle  \| A_N(f,g) \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})}

or equivalently (by duality)

\displaystyle  \frac{1}{N} \sum_{n=1}^N \sum_{x \in {\bf Z}} h(x) f(x-n) g(x-n^2) \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \|h\|_{\ell^\infty({\bf Z})} \ \ \ \ \ (2)

is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales {N}.

The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when {f(x) = e(ax/q) F(x)}, {g(x) = e(bx/q) G(x)}, {h(x) = e(cx/q) H(x)} where {q=O(1)} is a small modulus, {a,b,c} are such that {a+b+c=0 \hbox{ mod } q}, {G} is a smooth cutoff to an interval {I} of length {O(N^2)}, and {F=H} is also supported on {I} and behaves like a constant on intervals of length {O(N)}. Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials {P} by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when {f,g,h} are supported on a common interval {I} of length {O(N^2)} and are normalised in {\ell^\infty} rather than {\ell^2}. (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the {f,h} factors; the corresponding statement for {g} was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the {\ell^\infty}-normalized version of (2)).

For our applications we had to extend the {\ell^\infty} inverse theory of Peluse and Prendiville to an {\ell^2} theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

\displaystyle  A^*_N(h,g)(x) = \frac{1}{N} \sum_{n=1}^N h(x+n) g(x+n-n^2)

can be well approximated in {\ell^1} by a function that has Fourier support on “major arcs” if {g,h} enjoy {\ell^\infty} control. To get the required extension to {\ell^2} in the {f} aspect one has to improve the control on the error from {\ell^1} to {\ell^2}; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent {\ell^p({\bf Z})} improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as {x \mapsto \frac{1}{N} \sum_{n=1}^N g(x+n-n^2)}, one can relax the {\ell^\infty} hypothesis on {g} to an {\ell^2} hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function {f}; a modification of the arguments also gives something similar for {g}.

Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when {f,g} are supported near rational numbers {a/q} with {q \sim 2^l} for some moderately large {l}. The inverse theory gives good control (with an exponential decay in {l}) on individual scales {N}, and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of {A_N} to eventually handle all “small” scales, with {N} ranging up to say {2^{2^u}} where {u = C 2^{\rho l}} for some small constant {\rho} and large constant {C}. For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator {Q}, and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers {{\bf Z}} to the locally compact abelian group {{\bf R} \times {\bf Z}/Q{\bf Z}}. Actually it was conceptually clearer for us to work instead with the adelic integers {{\mathbf A}_{\bf Z} ={\bf R} \times \hat {\bf Z}}, which is the inverse limit of the {{\bf R} \times {\bf Z}/Q{\bf Z}}. Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

\displaystyle  A_{N,{\bf R}}(f,g)(x) := \frac{1}{N} \int_0^N f(x-t) g(x-t^2)\ dt

on {{\bf R}}, and the “arithmetic” bilinear operator

\displaystyle  A_{\hat Z}(f,g)(x) := \int_{\hat {\bf Z}} f(x-y) g(x-y^2) d\mu_{\hat {\bf Z}}(y)

on the profinite integers {\hat {\bf Z}}, equipped with probability Haar measure {\mu_{\hat {\bf Z}}}. After a number of standard manipulations (interpolation, Fubini’s theorem, Hölder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an {L^q} improving estimate

\displaystyle  \| A_{\hat {\bf Z}}(f,g) \|_{L^q(\hat {\bf Z})} \lesssim \|f\|_{L^2(\hat {\bf Z})} \|g\|_{L^2(\hat {\bf Z})}

for some {q>2}. Splitting the profinite integers {\hat {\bf Z}} into the product of the {p}-adic integers {{\bf Z}_p}, it suffices to establish this claim for each {{\bf Z}_p} separately (so long as we keep the implied constant equal to {1} for sufficiently large {p}). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic {L^q} improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the {p}-adic field {{\bf Q}_p}, which are a variant of some estimates of Kowalski and Wright.

Archives