You are currently browsing the category archive for the ‘Mathematics’ category.

This set of notes focuses on the restriction problem in Fourier analysis. Introduced by Elias Stein in the 1970s, the restriction problem is a key model problem for understanding more general oscillatory integral operators, and which has turned out to be connected to many questions in geometric measure theory, harmonic analysis, combinatorics, number theory, and PDE. Only partial results on the problem are known, but these partial results have already proven to be very useful or influential in many applications.

We work in a Euclidean space ${{\bf R}^d}$. Recall that ${L^p({\bf R}^d)}$ is the space of ${p^{th}}$-power integrable functions ${f: {\bf R}^d \rightarrow {\bf C}}$, quotiented out by almost everywhere equivalence, with the usual modifications when ${p=\infty}$. If ${f \in L^1({\bf R}^d)}$ then the Fourier transform ${\hat f: {\bf R}^d \rightarrow {\bf C}}$ will be defined in this course by the formula

$\displaystyle \hat f(\xi) := \int_{{\bf R}^d} f(x) e^{-2\pi i x \cdot \xi}\ dx. \ \ \ \ \ (1)$

From the dominated convergence theorem we see that ${\hat f}$ is a continuous function; from the Riemann-Lebesgue lemma we see that it goes to zero at infinity. Thus ${\hat f}$ lies in the space ${C_0({\bf R}^d)}$ of continuous functions that go to zero at infinity, which is a subspace of ${L^\infty({\bf R}^d)}$. Indeed, from the triangle inequality it is obvious that

$\displaystyle \|\hat f\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^1({\bf R}^d)}. \ \ \ \ \ (2)$

If ${f \in L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, then Plancherel’s theorem tells us that we have the identity

$\displaystyle \|\hat f\|_{L^2({\bf R}^d)} = \|f\|_{L^2({\bf R}^d)}. \ \ \ \ \ (3)$

Because of this, there is a unique way to extend the Fourier transform ${f \mapsto \hat f}$ from ${L^1({\bf R}^d) \cap L^2({\bf R}^d)}$ to ${L^2({\bf R}^d)}$, in such a way that it becomes a unitary map from ${L^2({\bf R}^d)}$ to itself. By abuse of notation we continue to denote this extension of the Fourier transform by ${f \mapsto \hat f}$. Strictly speaking, this extension is no longer defined in a pointwise sense by the formula (1) (indeed, the integral on the RHS ceases to be absolutely integrable once ${f}$ leaves ${L^1({\bf R}^d)}$; we will return to the (surprisingly difficult) question of whether pointwise convergence continues to hold (at least in an almost everywhere sense) later in this course, when we discuss Carleson’s theorem. On the other hand, the formula (1) remains valid in the sense of distributions, and in practice most of the identities and inequalities one can show about the Fourier transform of “nice” functions (e.g., functions in ${L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, or in the Schwartz class ${{\mathcal S}({\bf R}^d)}$, or test function class ${C^\infty_c({\bf R}^d)}$) can be extended to functions in “rough” function spaces such as ${L^2({\bf R}^d)}$ by standard limiting arguments.

By (2), (3), and the Riesz-Thorin interpolation theorem, we also obtain the Hausdorff-Young inequality

$\displaystyle \|\hat f\|_{L^{p'}({\bf R}^d)} \leq \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)$

for all ${1 \leq p \leq 2}$ and ${f \in L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, where ${2 \leq p' \leq \infty}$ is the dual exponent to ${p}$, defined by the usual formula ${\frac{1}{p} + \frac{1}{p'} = 1}$. (One can improve this inequality by a constant factor, with the optimal constant worked out by Beckner, but the focus in these notes will not be on optimal constants.) As a consequence, the Fourier transform can also be uniquely extended as a continuous linear map from ${L^p({\bf R}^d) \rightarrow L^{p'}({\bf R}^d)}$. (The situation with ${p>2}$ is much worse; see below the fold.)

The restriction problem asks, for a given exponent ${1 \leq p \leq 2}$ and a subset ${S}$ of ${{\bf R}^d}$, whether it is possible to meaningfully restrict the Fourier transform ${\hat f}$ of a function ${f \in L^p({\bf R}^d)}$ to the set ${S}$. If the set ${S}$ has positive Lebesgue measure, then the answer is yes, since ${\hat f}$ lies in ${L^{p'}({\bf R}^d)}$ and therefore has a meaningful restriction to ${S}$ even though functions in ${L^{p'}}$ are only defined up to sets of measure zero. But what if ${S}$ has measure zero? If ${p=1}$, then ${\hat f \in C_0({\bf R}^d)}$ is continuous and therefore can be meaningfully restricted to any set ${S}$. At the other extreme, if ${p=2}$ and ${f}$ is an arbitrary function in ${L^2({\bf R}^d)}$, then by Plancherel’s theorem, ${\hat f}$ is also an arbitrary function in ${L^2({\bf R}^d)}$, and thus has no well-defined restriction to any set ${S}$ of measure zero.

It was observed by Stein (as reported in the Ph.D. thesis of Charlie Fefferman) that for certain measure zero subsets ${S}$ of ${{\bf R}^d}$, such as the sphere ${S^{d-1} := \{ \xi \in {\bf R}^d: |\xi| = 1\}}$, one can obtain meaningful restrictions of the Fourier transforms of functions ${f \in L^p({\bf R}^d)}$ for certain ${p}$ between ${1}$ and ${2}$, thus demonstrating that the Fourier transform of such functions retains more structure than a typical element of ${L^{p'}({\bf R}^d)}$:

Theorem 1 (Preliminary ${L^2}$ restriction theorem) If ${d \geq 2}$ and ${1 \leq p < \frac{4d}{3d+1}}$, then one has the estimate

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)} \lesssim_{d,p} \|f\|_{L^p({\bf R}^d)}$

for all Schwartz functions ${f \in {\mathcal S}({\bf R}^d)}$, where ${d\sigma}$ denotes surface measure on the sphere ${S^{d-1}}$. In particular, the restriction ${\hat f|_S}$ can be meaningfully defined by continuous linear extension to an element of ${L^2(S^{d-1},d\sigma)}$.

Proof: Fix ${d,p,f}$. We expand out

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)}^2 = \int_{S^{d-1}} |\hat f(\xi)|^2\ d\sigma(\xi).$

From (1) and Fubini’s theorem, the right-hand side may be expanded as

$\displaystyle \int_{{\bf R}^d} \int_{{\bf R}^d} f(x) \overline{f}(y) (d\sigma)^\vee(y-x)\ dx dy$

where the inverse Fourier transform ${(d\sigma)^\vee}$ of the measure ${d\sigma}$ is defined by the formula

$\displaystyle (d\sigma)^\vee(x) := \int_{S^{d-1}} e^{2\pi i x \cdot \xi}\ d\sigma(\xi).$

In other words, we have the identity

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)}^2 = \langle f, f * (d\sigma)^\vee \rangle_{L^2({\bf R}^d)}, \ \ \ \ \ (5)$

using the Hermitian inner product ${\langle f, g\rangle_{L^2({\bf R}^d)} := \int_{{\bf R}^d} \overline{f(x)} g(x)\ dx}$. Since the sphere ${S^{d-1}}$ have bounded measure, we have from the triangle inequality that

$\displaystyle (d\sigma)^\vee(x) \lesssim_d 1. \ \ \ \ \ (6)$

Also, from the method of stationary phase (as covered in the previous class 247A), or Bessel function asymptotics, we have the decay

$\displaystyle (d\sigma)^\vee(x) \lesssim_d |x|^{-(d-1)/2} \ \ \ \ \ (7)$

for any ${x \in {\bf R}^d}$ (note that the bound already follows from (6) unless ${|x| \geq 1}$). We remark that the exponent ${-\frac{d-1}{2}}$ here can be seen geometrically from the following considerations. For ${|x|>1}$, the phase ${e^{2\pi i x \cdot \xi}}$ on the sphere is stationary at the two antipodal points ${x/|x|, -x/|x|}$ of the sphere, and constant on the tangent hyperplanes to the sphere at these points. The wavelength of this phase is proportional to ${1/|x|}$, so the phase would be approximately stationary on a cap formed by intersecting the sphere with a ${\sim 1/|x|}$ neighbourhood of the tangent hyperplane to one of the stationary points. As the sphere is tangent to second order at these points, this cap will have diameter ${\sim 1/|x|^{1/2}}$ in the directions of the ${d-1}$-dimensional tangent space, so the cap will have surface measure ${\sim |x|^{-(d-1)/2}}$, which leads to the prediction (7). We combine (6), (7) into the unified estimate

$\displaystyle (d\sigma)^\vee(x) \lesssim_d \langle x\rangle^{-(d-1)/2}, \ \ \ \ \ (8)$

where the “Japanese bracket” ${\langle x\rangle}$ is defined as ${\langle x \rangle := (1+|x|^2)^{1/2}}$. Since ${\langle x \rangle^{-\alpha}}$ lies in ${L^p({\bf R}^d)}$ precisely when ${p > \frac{d}{\alpha}}$, we conclude that

$\displaystyle (d\sigma)^\vee \in L^q({\bf R}^d) \hbox{ iff } q > \frac{d}{(d-1)/2}.$

Applying Young’s convolution inequality, we conclude (after some arithmetic) that

$\displaystyle \| f * (d\sigma)^\vee \|_{L^{p'}({\bf R}^d)} \lesssim_{p,d} \|f\|_{L^p({\bf R}^d)}$

whenever ${1 \leq p < \frac{4d}{3d+1}}$, and the claim now follows from (5) and Hölder’s inequality. $\Box$

Remark 2 By using the Hardy-Littlewood-Sobolev inequality in place of Young’s convolution inequality, one can also establish this result for ${p = \frac{4d}{3d+1}}$.

Motivated by this result, given any Radon measure ${\mu}$ on ${{\bf R}^d}$ and any exponents ${1 \leq p,q \leq \infty}$, we use ${R_\mu(p \rightarrow q)}$ to denote the claim that the restriction estimate

$\displaystyle \| \hat f \|_{L^q({\bf R}^d, \mu)} \lesssim_{d,p,q,\mu} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (9)$

for all Schwartz functions ${f}$; if ${S}$ is a ${k}$-dimensional submanifold of ${{\bf R}^d}$ (possibly with boundary), we write ${R_S(p \rightarrow q)}$ for ${R_\mu(p \rightarrow q)}$ where ${\mu}$ is the ${k}$-dimensional surface measure on ${S}$. Thus, for instance, we trivially always have ${R_S(1 \rightarrow \infty)}$, while Theorem 1 asserts that ${R_{S^{d-1}}(p \rightarrow 2)}$ holds whenever ${1 \leq p < \frac{4d}{3d+1}}$. We will not give a comprehensive survey of restriction theory in these notes, but instead focus on some model results that showcase some of the basic techniques in the field. (I have a more detailed survey on this topic from 2003, but it is somewhat out of date.)

After some discussion with the applied math research groups here at UCLA (in particular the groups led by Andrea Bertozzi and Deanna Needell), one of the members of these groups, Chris Strohmeier, has produced a proposal for a Polymath project to crowdsource in a single repository (a) a collection of public data sets relating to the COVID-19 pandemic, (b) requests for such data sets, (c) requests for data cleaning of such sets, and (d) submissions of cleaned data sets.  (The proposal can be viewed as a PDF, and is also available on Overleaf).  As mentioned in the proposal, this database would be slightly different in focus than existing data sets such as the COVID-19 data sets hosted on Kaggle, with a focus on producing high quality cleaned data sets.  (Another relevant data set that I am aware of is the SafeGraph aggregated foot traffic data, although this data set, while open, is not quite public as it requires a non-commercial agreement to execute.  Feel free to mention further relevant data sets in the comments.)

This seems like a very interesting and timely proposal to me and I would like to open it up for discussion, for instance by proposing some seed requests for data and data cleaning and to discuss possible platforms that such a repository could be built on.  In the spirit of “building the plane while flying it”, one could begin by creating a basic github repository as a prototype and use the comments in this blog post to handle requests, and then migrate to a more high quality platform once it becomes clear what direction this project might move in.  (For instance one might eventually move beyond data cleaning to more sophisticated types of data analysis.)

UPDATE, Mar 25: a prototype page for such a clearinghouse is now up at this wiki page.

UPDATE, Mar 27: the data cleaning aspect of this project largely duplicates the existing efforts at the United against COVID-19 project, so we are redirecting requests of this type to that project (and specifically to their data discourse page).  The polymath proposal will now refocus on crowdsourcing a list of public data sets relating to the COVID-19 pandemic.

At the most recent MSRI board of trustees meeting on Mar 7 (conducted online, naturally), Nicolas Jewell (a Professor of Biostatistics and Statistics at Berkeley, also affiliated with the Berkeley School of Public Health and the London School of Health and Tropical Disease), gave a presentation on the current coronavirus epidemic entitled “2019-2020 Novel Coronavirus outbreak: mathematics of epidemics, and what it can and cannot tell us”.  The presentation (updated with Mar 18 data), hosted by David Eisenbud (the director of MSRI), together with a question and answer session, is now on Youtube:

(I am on this board, but could not make it to this particular meeting; I caught up on the presentation later, and thought it would of interest to several readers of this blog.)  While there is some mathematics in the presentation, it is relatively non-technical.

Just a short post to note that this year’s Abel prize has been awarded jointly to Hillel Furstenberg and Grigory Margulis for “for pioneering the use of methods from probability and dynamics in group theory, number theory and combinatorics”.  I was not involved in the decision making process of the Abel committee this year, but I certainly feel that the contributions of both mathematicians are worthy of the prize.  Certainly both mathematicians have influenced my own work (for instance, Furstenberg’s proof of Szemeredi’s theorem ended up being a key influence in my result with Ben Green that the primes contain arbitrarily long arithmetic progressions); see for instance these blog posts mentioning Furstenberg, and these blog posts mentioning Margulis.

Next quarter, starting March 30, I will be teaching “Math 247B: Classical Fourier Analysis” here at UCLA.  (The course should more accurately be named “Modern real-variable harmonic analysis”, but we have not gotten around to implementing such a name change.) This class (a continuation of Math 247A from previous quarter, taught by my colleague, Monica Visan) will cover the following topics:

• Restriction theory and Strichartz estimates
• Decoupling estimates and applications
• Paraproducts; time frequency analysis; Carleson’s theorem

As usual, lecture notes will be made available on this blog.

Unlike previous courses, this one will be given online as part of UCLA’s social distancing efforts.  In particular, the course will be open to anyone with an internet connection (no UCLA affiliation is required), though non-UCLA participants will not have full access to all aspects of the course, and there is the possibility that some restrictions on participation may be imposed if there are significant disruptions to class activity.  For more information, see the course descriptionUPDATE: due to time limitations, I will not be able to respond to personal email inquiries about this class from non-UCLA participants in the course.  Please use the comment thread to this blog post for such inquiries.  I will also update the course description throughout the course to reflect the latest information about the course, both for UCLA students enrolled in the course and for non-UCLA participants.

Just a short note that the memorial article “Analysis and applications: The mathematical work of Elias Stein” has just been published in the Bulletin of the American Mathematical Society.  This article was a collective effort led by Charlie Fefferman, Alex Ionescu, Steve Wainger and myself to describe the various mathematical contributions of Elias Stein, who passed away in December 2018; it also features contributions from Loredana Lanzani, Akos Magyar, Mariusz Mirek, Alexander Nagel, Duong Phong, Lillian Pierce, Fulvio Ricci, Christopher Sogge, and Brian Street.  (My contribution was mostly focused on Stein’s contribution to restriction theory.)

In the modern theory of additive combinatorics, a large role is played by the Gowers uniformity norms ${\|f\|_{U^k(G)}}$, where ${k \geq 1}$, ${G = (G,+)}$ is a finite abelian group, and ${f: G \rightarrow {\bf C}}$ is a function (one can also consider these norms in finite approximate groups such as ${[N] = \{1,\dots,N\}}$ instead of finite groups, but we will focus on the group case here for simplicity). These norms can be defined by the formula

$\displaystyle \|f\|_{U^k(G)} := (\mathop{\bf E}_{x,h_1,\dots,h_k \in G} \Delta_{h_1} \dots \Delta_{h_k} f(x))^{1/2^k}$

where we use the averaging notation

$\displaystyle \mathop{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)$

for any non-empty finite set ${A}$ (with ${|A|}$ denoting the cardinality of ${A}$), and ${\Delta_h}$ is the multiplicative discrete derivative operator

$\displaystyle \Delta_h f(x) := f(x+h) \overline{f(x)}.$

One reason why these norms play an important role is that they control various multilinear averages. We give two sample examples here:

Proposition 1 Let ${G = {\bf Z}/N{\bf Z}}$.

• (i) If ${a_1,\dots,a_k}$ are distinct elements of ${G}$ for some ${k \geq 2}$, and ${f_1,\dots,f_k: G \rightarrow {\bf C}}$ are ${1}$-bounded functions (thus ${|f_j(x)| \leq 1}$ for all ${j=1,\dots,k}$ and ${x \in G}$), then

$\displaystyle \mathop{\bf E}_{x, h \in G} f_1(x+a_1 h) \dots f_k(x+a_k h) \leq \|f_i\|_{U^{k-1}(G)} \ \ \ \ \ (1)$

for any ${i=1,\dots,k}$.

• (ii) If ${f_1,f_2,f_3: G \rightarrow {\bf C}}$ are ${1}$-bounded, then one has

$\displaystyle \mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2) \ll \|f_3\|_{U^4(G)} + N^{-1/4}.$

We establish these claims a little later in this post.

In some more recent literature (e.g., this paper of Conlon, Fox, and Zhao), the role of Gowers norms have been replaced by (generalisations) of the cut norm, a concept originating from graph theory. In this blog post, it will be convenient to define these cut norms in the language of probability theory (using boldface to denote random variables).

Definition 2 (Cut norm) Let ${{\bf X}_1,\dots,{\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l}$ be independent random variables with ${k,l \geq 0}$; to avoid minor technicalities we assume that these random variables are discrete and take values in a finite set. Given a random variable ${{\bf F} = F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ of these independent random variables, we define the cut norm

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} := \sup | \mathop{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k |$

where the supremum ranges over all choices ${{\bf B}_1,\dots,{\bf B}_k}$ of random variables ${{\bf B}_i = B_i( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ that are ${1}$-bounded (thus ${|{\bf B}_i| \leq 1}$ surely), and such that ${{\bf B}_i}$ does not depend on ${{\bf X}_i}$.

If ${l=0}$, we abbreviate ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}}$ as ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}}$.

Strictly speaking, the cut norm is only a cut semi-norm when ${k=0,1}$, but we will abuse notation by referring to it as a norm nevertheless.

Example 3 If ${G = (V_1,V_2,E)}$ is a bipartite graph, and ${\mathbf{v_1}}$, ${\mathbf{v_2}}$ are independent random variables chosen uniformly from ${V_1,V_2}$ respectively, then

$\displaystyle \| 1_E(\mathbf{v_1},\mathbf{v_2}) \|_{\mathrm{CUT}(\mathbf{v_1}, \mathbf{v_2})}$

$\displaystyle = \sup_{\|f\|_\infty, \|g\|_\infty \leq 1} |\mathop{\bf E}_{v_1 \in V_1, v_2 \in V_2} 1_E(v_1,v_2) f(v_1) g(v_2)|$

where the supremum ranges over all ${1}$-bounded functions ${f: V_1 \rightarrow [-1,1]}$, ${g: V_2 \rightarrow [-1,1]}$. The right hand side is essentially the cut norm of the graph ${G}$, as defined for instance by Frieze and Kannan.

The cut norm is basically an expectation when ${k=0,1}$:

Example 4 If ${k=0}$, we see from definition that

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( ; {\bf Y}_1,\dots,{\bf Y}_l )} =| \mathop{\bf E} {\bf F} |.$

If ${k=1}$, one easily checks that

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} | \mathop{\bf E}_{\bf X} {\bf F} |,$

where ${\mathop{\bf E}_{\bf X} {\bf F} = \mathop{\bf E}( {\bf F} | {\bf Y}_1,\dots,{\bf Y}_l )}$ is the conditional expectation of ${{\bf F}}$ to the ${\sigma}$-algebra generated by all the variables other than ${{\bf X}}$, i.e., the ${\sigma}$-algebra generated by ${{\bf Y}_1,\dots,{\bf Y}_l}$. In particular, if ${{\bf X}, {\bf Y}_1,\dots,{\bf Y}_l}$ are independent random variables drawn uniformly from ${X,Y_1,\dots,Y_l}$ respectively, then

$\displaystyle \| F( {\bf X}; {\bf Y}_1,\dots, {\bf Y}_l) \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )}$

$\displaystyle = \mathop{\bf E}_{y_1 \in Y_1,\dots, y_l \in Y_l} |\mathop{\bf E}_{x \in X} F(x; y_1,\dots,y_l)|.$

Here are some basic properties of the cut norm:

Lemma 5 (Basic properties of cut norm) Let ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$ be independent discrete random variables, and ${{\bf F} = F({\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)}$ a function of these variables.

• (i) (Permutation invariance) The cut norm ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}}$ is invariant with respect to permutations of the ${{\bf X}_1,\dots,{\bf X}_k}$, or permutations of the ${{\bf Y}_1,\dots,{\bf Y}_l}$.
• (ii) (Conditioning) One has

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}$

where on the right-hand side we view, for each realisation ${y_1,\dots,y_l}$ of ${{\bf Y}_1,\dots,{\bf Y}_l}$, ${{\bf F}}$ as a function ${F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l)}$ of the random variables ${{\bf X}_1,\dots, {\bf X}_k}$ alone, thus the right-hand side may be expanded as

$\displaystyle \sum_{y_1,\dots,y_l} \| F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l) \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}$

$\displaystyle \times \mathop{\bf P}( Y_1=y_1,\dots,Y_l=y_l).$

• (iii) (Monotonicity) If ${k \geq 1}$, we have

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \geq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_{k-1}; {\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l )}.$

• (iv) (Multiplicative invariances) If ${{\bf B} = B({\bf X}_1,\dots,{\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)}$ is a ${1}$-bounded function that does not depend on one of the ${{\bf X}_i}$, then

$\displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.$

In particular, if we additionally assume ${|{\bf B}|=1}$, then

$\displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.$

• (v) (Cauchy-Schwarz) If ${k \geq 1}$, one has

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} \|_{\mathrm{CUT}( {\bf X}_2, \dots, {\bf X}_k; {\bf X}_1, {\bf X}'_1, {\bf Y}_1,\dots,{\bf Y}_l )}^{1/2}$

where ${{\bf X}'_1}$ is a copy of ${{\bf X}_1}$ that is independent of ${{\bf X}_1,\dots,{\bf X}_k,{\bf Y}_1,\dots,{\bf Y}_l}$ and ${\Box_{{\bf X}_1, {\bf X}'_1} {\bf F}}$ is the random variable

$\displaystyle \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} := F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )$

$\displaystyle \times \overline{F}( {\bf X}'_1, {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l ).$

• (vi) (Averaging) If ${k \geq 1}$ and ${{\bf F} = \mathop{\bf E}_{\bf Z} {\bf F}_{\bf Z}}$, where ${{\bf Z}}$ is another random variable independent of ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$, and ${{\bf F}_{\bf Z} = F_{\bf Z}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ is a random variable depending on both ${{\bf Z}}$ and ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$, then

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F}_{\bf Z} \|_{\mathrm{CUT}( ({\bf X}_1, {\bf Z}), {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$

Proof: The claims (i), (ii) are clear from expanding out all the definitions. The claim (iii) also easily follows from the definitions (the left-hand side involves a supremum over a more general class of multipliers ${{\bf B}_1,\dots,{\bf B}_{k}}$, while the right-hand side omits the ${{\bf B}_k}$ multiplier), as does (iv) (the multiplier ${{\bf B}}$ can be absorbed into one of the multipliers in the definition of the cut norm). The claim (vi) follows by expanding out the definitions, and observing that all of the terms in the supremum appearing in the left-hand side also appear as terms in the supremum on the right-hand side. It remains to prove (v). By definition, the left-hand side is the supremum over all quantities of the form

$\displaystyle |{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k|$

where the ${{\bf B}_i}$ are ${1}$-bounded functions of ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$ that do not depend on ${{\bf X}_i}$. We average out in the ${{\bf X}_1}$ direction (that is, we condition out the variables ${{\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$), and pull out the factor ${{\bf B}_1}$ (which does not depend on ${{\bf X}_1}$), to write this as

$\displaystyle |{\bf E} {\bf B}_1 {\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|,$

which by Cauchy-Schwarz is bounded by

$\displaystyle ( |{\bf E} |{\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|^2)^{1/2},$

which can be expanded using the copy ${{\bf X}_1}$ as

$\displaystyle |{\bf E} \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) |^{1/2}.$

Expanding

$\displaystyle \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) = (\Box_{{\bf X}_1,{\bf X}'_1} {\bf F}) (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_2) \dots (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_k)$

and noting that each ${\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_i}$ is ${1}$-bounded and independent of ${{\bf X}_i}$ for ${i=2,\dots,k}$, we obtain the claim. $\Box$

Now we can relate the cut norm to Gowers uniformity norms:

Lemma 6 Let ${G}$ be a finite abelian group, let ${{\bf x}, {\bf h}_1,\dots,{\bf h}_k}$ be independent random variables uniformly drawn from ${G}$ for some ${k \geq 0}$, and let ${f: G \rightarrow {\bf C}}$. Then

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \leq \|f\|_{U^{k+1}(G)} \ \ \ \ \ (2)$

and similarly (if ${k \geq 1}$)

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )} \leq \|f\|_{U^{k}(G)} \ \ \ \ \ (3)$

If ${f}$ is additionally assumed to be ${1}$-bounded, we have the converse inequalities

$\displaystyle \|f\|_{U^{k+1}(G)}^{2^{k+1}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \ \ \ \ \ (4)$

and (if ${k \geq 1}$)

$\displaystyle \|f\|_{U^{k}(G)}^{2^{k}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )}. \ \ \ \ \ (5)$

Proof: Applying Lemma 5(v) ${k}$ times, we can bound

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}, {\bf x} )}$

by

$\displaystyle \| \Box_{{\bf h}_k,{\bf h}'_k} \dots \Box_{{\bf h}_1,{\bf h}'_1} (f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k)) \|_{\mathrm{CUT}( {\bf x}; {\bf h}_1, {\bf h}'_1, \dots, {\bf h}_k, {\bf h}'_k )}^{1/2^k} \ \ \ \ \ (6)$

where ${{\bf h}'_1,\dots,{\bf h}'_k}$ are independent copies of ${{\bf h}_1,\dots,{\bf h}_k}$ that are also independent of ${{\bf x}}$. The expression inside the norm can also be written as

$\displaystyle \Delta_{{\bf h}_k - {\bf h}'_k} \dots \Delta_{{\bf h}_1 - {\bf h}'_1} f({\bf x} + {\bf h}'_1 + \dots + {\bf h}'_k)$

so by Example 4 one can write (6) as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h'_1,\dots,h'_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k - h'_k} \dots \Delta_{h_1 - h'_1} f(x+h'_1+\dots+h'_k)||^{1/2^k}$

which after some change of variables simplifies to

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)||^{1/2^k}$

which by Cauchy-Schwarz is bounded by

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2|^{1/2^{k+1}}$

which one can rearrange as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h_{k+1},x \in G} \Delta_{h_{k+1}} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^{k+1}}$

giving (2). A similar argument bounds

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}; {\bf x} )}$

by

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^k}$

which gives (3).

For (4), we can reverse the above steps and expand ${\|f\|_{U^{k+1}(G)}^{2^{k+1}}}$ as

$\displaystyle \mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2$

which we can write as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} b(h_1,\dots,h_k) \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|$

for some ${1}$-bounded function ${b}$. This can in turn be expanded as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,x \in G} f(x+h_1+\dots+h_k) b(h_1,\dots,h_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k)|$

for some ${1}$-bounded functions ${b_i}$ that do not depend on ${h_i}$. By Example 4, this can be written as

$\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) b({\bf h}_1,\dots,{\bf h}_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_k, {\bf x})}$

which by several applications of Theorem 5(iii) and then Theorem 5(iv) can be bounded by

$\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x})},$

giving (4). A similar argument gives (5). $\Box$

Now we can prove Proposition 1. We begin with part (i). By permutation we may assume ${i=k}$, then by translation we may assume ${a_k=0}$. Replacing ${x}$ by ${x+h_1+\dots+h_{k-1}}$ and ${h}$ by ${h - a_1^{-1} h_1 - \dots - a_{k-1}^{-1} h_{k-1}}$, we can write the left-hand side of (1) as

$\displaystyle \mathop{\bf E}_{x,h,h_1,\dots,h_{k-1} \in G} f_k(x+h_1+\dots+h_{k-1}) \prod_{i=1}^{k-1} b_i(x,h,h_1,\dots,h_{k-1})$

where

$\displaystyle b_i(x,h,h_1,\dots,h_{k-1})$

$\displaystyle := f_i( x + h_1+\dots+h_{k-1}+ a_i(h - a_1^{-1} h_1 - \dots - a_k^{-1} h_{k-1}))$

is a ${1}$-bounded function that does not depend on ${h_i}$. Taking ${{\bf x}, {\bf h}, {\bf h}_1,\dots,{\bf h}_k}$ to be independent random variables drawn uniformly from ${G}$, the left-hand side of (1) can then be written as

$\displaystyle \mathop{\bf E} f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1})$

which by Example 4 is bounded in magnitude by

$\displaystyle \| f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1}) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_{k-1}, {\bf x}, {\bf h})}.$

After many applications of Lemma 5(iii), (iv), this is bounded by

$\displaystyle \| f_k({\bf x}+{\bf h_1}+\dots+{\bf h_{k-1}}) \|_{\mathrm{CUT}({\bf h}_1,\dots,{\bf h}_{k-1}; {\bf x}, {\bf h})}$

By Lemma 5(ii) we may drop the ${{\bf h}}$ variable, and then the claim follows from Lemma 6.

For part (ii), we replace ${x}$ by ${x+a-h^2}$ and ${h}$ by ${h-a+b}$ to write the left-hand side as

$\displaystyle \mathop{\bf E}_{x, a,b,h \in G} f_1(x+a-h^2) f_2(x+h+b-h^2) f_3(x+a+(h-a+b)^2-h^2);$

the point here is that the first factor does not involve ${b}$, the second factor does not involve ${a}$, and the third factor has no quadratic terms in ${h}$. Letting ${{\bf x}, {\bf a}, {\bf b}, {\bf h}}$ be independent variables drawn uniformly from ${G}$, we can use Example 4 to bound this in magnitude by

$\displaystyle \| f_1({\bf x}+{\bf a}-{\bf h}^2) f_2({\bf x}+{\bf h}+{\bf b}-{\bf h}^2)$

$\displaystyle f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2-{\bf h}^2 ) \|_{\mathrm{CUT}(; {\bf x}, {\bf a}, {\bf b}, {\bf h})}$

which by Lemma 5(i),(iii),(iv) is bounded by

$\displaystyle \| f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}({\bf a}, {\bf b}; {\bf x}, {\bf h})}$

and then by Lemma 5(v) we may bound this by

$\displaystyle \| \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}(;{\bf a}, {\bf a}', {\bf b}, {\bf b}', {\bf x}, {\bf h})}^{1/4}$

which by Example 4 is

$\displaystyle |\mathop{\bf E} \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|^{1/4}$

Now the expression inside the expectation is the product of four factors, each of which is ${f_3}$ or ${\overline{f}_3}$ applied to an affine form ${{\bf x} + {\bf c} + {\bf a} {\bf h}}$ where ${{\bf c}}$ depends on ${{\bf a}, {\bf a}', {\bf b}, {\bf b}'}$ and ${{\bf a}}$ is one of ${2({\bf b}-{\bf a})}$, ${2({\bf b}'-{\bf a})}$, ${2({\bf b}-{\bf a}')}$, ${2({\bf b}'-{\bf a}')}$. With probability ${1-O(1/N)}$, the four different values of ${{\bf a}}$ are distinct, and then by part (i) we have

$\displaystyle |\mathop{\bf E}(\Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|{\bf a}, {\bf a}', {\bf b}, {\bf b}')| \leq \|f_3\|_{U^4({\bf Z}/N{\bf Z})}.$

When they are not distinct, we can instead bound this quantity by ${1}$. Taking expectations in ${{\bf a}, {\bf a}', {\bf b}, {\bf b}'}$, we obtain the claim. $\Box$

The analogue of the inverse ${U^2}$ theorem for cut norms is the following claim (which I learned from Ben Green):

Lemma 7 (${U^2}$-type inverse theorem) Let ${\mathbf{x}, \mathbf{h}}$ be independent random variables drawn from a finite abelian group ${G}$, and let ${f: G \rightarrow {\bf C}}$ be ${1}$-bounded. Then we have

$\displaystyle \| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} = \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})}$

where ${\hat G}$ is the group of homomorphisms ${\xi: x \mapsto \xi \cdot x}$ is a homomorphism from ${G}$ to ${{\bf R}/{\bf Z}}$, and ${e(\theta) := e^{2\pi i \theta}}$.

Proof: Suppose first that ${\| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta}$ for some ${\delta}$, then by definition

$\displaystyle |\mathop{\bf E}_{x,h \in G} f(x+h) a(x) b(h)| > \delta$

for some ${1}$-bounded ${a,b: G \rightarrow {\bf C}}$. By Fourier expansion, the left-hand side is also

$\displaystyle \sum_{\xi \in \hat G} \hat f(-\xi) \hat a(\xi) \hat b(\xi)$

where ${\hat f(\xi) := \mathop{\bf E}_{x \in G} f(x) e(-\xi \cdot x)}$. From Plancherel’s theorem we have

$\displaystyle \sum_{\xi \in \hat G} |\hat a(\xi)|^2, \sum_{\xi \in \hat G} |\hat b(\xi)|^2 \leq 1$

hence by Hölder’s inequality one has ${|\hat f(-\xi)| > \delta}$ for some ${\xi \in \hat G}$, and hence

$\displaystyle \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta. \ \ \ \ \ (7)$

Conversely, suppose (7) holds. Then there is ${\xi \in \hat G}$ such that

$\displaystyle \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta$

which on substitution and Example 4 implies

$\displaystyle \| f(\mathbf{x}+\mathbf{h}) e(\xi \cdot (\mathbf{x}+\mathbf{h})) \|_{\mathrm{CUT}(;\mathbf{x}, \mathbf{h})} > \delta.$

The term ${e(\xi \cdot (\mathbf{x}+\mathbf{h}))}$ splits into the product of a factor ${e(\xi \cdot \mathbf{x})}$ not depending on ${\mathbf{h}}$, and a factor ${e(\xi \cdot \mathbf{h})}$ not depending on ${\mathbf{x}}$. Applying Lemma 5(iii), (iv) we conclude that

$\displaystyle \| f(\mathbf{x}+\mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta.$

The claim follows. $\Box$

The higher order inverse theorems are much less trivial (and the optimal quantitative bounds are not currently known). However, there is a useful degree lowering argument, due to Peluse and Prendiville, that can allow one to lower the order of a uniformity norm in some cases. We give a simple version of this argument here:

Lemma 8 (Degree lowering argument, special case) Let ${G}$ be a finite abelian group, let ${Y}$ be a non-empty finite set, and let ${f: G \rightarrow {\bf C}}$ be a function of the form ${f(x) := \mathop{\bf E}_{y \in Y} F_y(x)}$ for some ${1}$-bounded functions ${F_y: G \rightarrow {\bf C}}$ indexed by ${y \in Y}$. Suppose that

$\displaystyle \|f\|_{U^k(G)} \geq \delta$

for some ${k \geq 2}$ and ${0 < \delta \leq 1}$. Then one of the following claims hold (with implied constants allowed to depend on ${k}$):

• (i) (Degree lowering) one has ${\|f\|_{U^{k-1}(G)} \gg \delta^{O(1)}}$.
• (ii) (Non-zero frequency) There exist ${h_1,\dots,h_{k-2} \in G}$ and non-zero ${\xi \in \hat G}$ such that

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{h_1} \dots \Delta_{h_{k-2}} F_y(x) e( \xi \cdot x )| \gg \delta^{O(1)}.$

There are more sophisticated versions of this argument in which the frequency ${\xi}$ is “minor arc” rather than “zero frequency”, and then the Gowers norms are localised to suitable large arithmetic progressions; this is implicit in the above-mentioned paper of Peluse and Prendiville.

Proof: One can write

$\displaystyle \|f\|_{U^k(G)}^{2^k} = \mathop{\bf E}_{h_1,\dots,h_{k-2} \in G} \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)}^4$

and hence we conclude that

$\displaystyle \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)} \gg \delta^{O(1)}$

for a set ${\Sigma}$ of tuples ${(h_1,\dots,h_{k-2}) \in G^{k-2}}$ of density ${h_1,\dots,h_{k-2}}$. Applying Lemma 6 and Lemma 7, we see that for each such tuple, there exists ${\phi(h_1,\dots,h_{k-2}) \in \hat G}$ such that

$\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \phi(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}, \ \ \ \ \ (8)$

where ${{\bf x}}$ is drawn uniformly from ${G}$.

Let us adopt the convention that ${e( \phi( _1,\dots,h_{k-2}) \cdot {\bf x} ) }$ vanishes for ${(h_1,\dots,h_{k-2})}$ not in ${\Sigma}$, then from Lemma 5(ii) we have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)},$

where ${{\bf h}_1,\dots,{\bf h}_{k-2}}$ are independent random variables drawn uniformly from ${G}$ and also independent of ${{\bf x}}$. By repeated application of Lemma 5(iii) we then have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.$

Expanding out ${\Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x})}$ and using Lemma 5(iv) repeatedly we conclude that

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.$

From definition of ${f}$ we then have

$\displaystyle \| {\bf E}_{y \in Y} F_y({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})}$

$\displaystyle \gg \delta^{O(1)}.$

By Lemma 5(vi), we see that the left-hand side is less than

$\displaystyle \| F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}(({\bf x}, {\bf y}),{\bf h}_1,\dots, {\bf h}_{k-2})},$

where ${{\bf y}}$ is drawn uniformly from ${Y}$, independently of ${{\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2}}$. By repeated application of Lemma 5(i), (v) repeatedly, we conclude that

$\displaystyle \| \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) \|_{\mathrm{CUT}(({\bf x},{\bf y}); {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2})} \gg \delta^{O(1)},$

where ${{\bf h}'_1,\dots,{\bf h}'_{k-2}}$ are independent copies of ${{\bf h}_1,\dots,{\bf h}_{k-2}}$ that are also independent of ${{\bf x}}$, ${{\bf y}}$. By Lemma 5(ii) and Example 4 we conclude that

$\displaystyle |\mathop{\bf E}( \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) | {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2}) )| \gg \delta^{O(1)} \ \ \ \ \ (9)$

with probability ${\gg \delta^{O(1)}}$.

The left-hand side can be rewritten as

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_{k-2} - {\bf h}'_{k-2}} F_y( x + {\bf h}'_1 + \dots + {\bf h}'_{k-2})$

$\displaystyle e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|$

where ${\delta_{{\bf h}_1, {\bf h}'_1}}$ is the additive version of ${\Box_{{\bf h}_1, {\bf h}'_1}}$, thus

$\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) := \phi({\bf h}_1,\dots,{\bf h}_{k-2}) - \phi({\bf h}'_1,\dots,{\bf h}_{k-2}).$

Translating ${x}$, we can simplify this a little to

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_k - {\bf h}'_k} F_y( x ) e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|$

If the frequency ${\delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2})}$ is ever non-vanishing in the event (9) then conclusion (ii) applies. We conclude that

$\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0$

with probability ${\gg \delta^{O(1)}}$. In particular, by the pigeonhole principle, there exist ${h'_1,\dots,h'_{k-2} \in G}$ such that

$\displaystyle \delta_{{\bf h}_1, h'_1} \dots \delta_{{\bf h}_{k-2}, h'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0$

with probability ${\gg \delta^{O(1)}}$. Expanding this out, we obtain a representation of the form

$\displaystyle \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2})$

holding with probability ${\gg \delta^{O(1)}}$, where the ${\phi_i: G^{k-2} \rightarrow {\bf R}/{\bf Z}}$ are functions that do not depend on the ${i^{th}}$ coordinate. From (8) we conclude that

$\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}$

for ${\gg \delta^{O(1)}}$ of the tuples ${(h_1,\dots,h_{k-2}) \in G^{k-2}}$. Thus by Lemma 5(ii)

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}.$

By repeated application of Lemma 5(iii) we then have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}$

and then by repeated application of Lemma 5(iv)

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}$

and then the conclusion (i) follows from Lemma 6. $\Box$

As an application of degree lowering, we give an inverse theorem for the average in Proposition 1(ii), first established by Bourgain-Chang and later reproved by Peluse (by different methods from those given here):

Proposition 9 Let ${G = {\bf Z}/N{\bf Z}}$ be a cyclic group of prime order. Suppose that one has ${1}$-bounded functions ${f_1,f_2,f_3: G \rightarrow {\bf C}}$ such that

$\displaystyle |\mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2)| \geq \delta \ \ \ \ \ (10)$

for some ${\delta > 0}$. Then either ${N \ll \delta^{-O(1)}}$, or one has

$\displaystyle |\mathop{\bf E}_{x \in G} f_1(x)|, |\mathop{\bf E}_{x \in G} f_2(x)| \gg \delta^{O(1)}.$

We remark that a modification of the arguments below also give ${|\mathop{\bf E}_{x \in G} f_3(x)| \gg \delta^{O(1)}}$.

Proof: The left-hand side of (10) can be written as

$\displaystyle |\mathop{\bf E}_{x \in G} F(x) f_3(x)|$

where ${F}$ is the dual function

$\displaystyle F(x) := \mathop{\bf E}_{h \in G} f_1(x-h^2) f_2(x-h^2+h).$

By Cauchy-Schwarz one thus has

$\displaystyle |\mathop{\bf E}_{x \in G} F(x) \overline{F}(x)| \geq \delta^2$

and hence by Proposition 1, we either have ${N \ll \delta^{-O(1)}}$ (in which case we are done) or

$\displaystyle \|F\|_{U^4(G)} \gg \delta^2.$

Writing ${F = \mathop{\bf E}_{h \in G} F_h}$ with ${F_h(x) := f_1(x-h^2) f_2(x-h^2+h)}$, we conclude that either ${\|F\|_{U^3(G)} \gg \delta^{O(1)}}$, or that

$\displaystyle |\mathop{\bf E}_{x,h \in G} \Delta_{h_1} \Delta_{h_2} F_h(x) e(\xi x / N )| \gg \delta^{O(1)}$

for some ${h_1,h_2 \in G}$ and non-zero ${\xi \in G}$. The left-hand side can be rewritten as

$\displaystyle |\mathop{\bf E}_{x,h \in G} g_1(x-h^2) g_2(x-h^2+h) e(\xi x/N)|$

where ${g_1 = \Delta_{h_1} \Delta_{h_2} f_1}$ and ${g_2 = \Delta_{h_1} \Delta_{h_2} f_2}$. We can rewrite this in turn as

$\displaystyle |\mathop{\bf E}_{x,y \in G} g_1(x) g_2(y) e(\xi (x + (y-x)^2) / N)|$

which is bounded by

$\displaystyle \| e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}, {\bf y})}$

where ${{\bf x}, {\bf y}}$ are independent random variables drawn uniformly from ${G}$. Applying Lemma 5(v), we conclude that

$\displaystyle \| \Box_{{\bf y}, {\bf y}'} e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}; {\bf y}, {\bf y}')} \gg \delta^{O(1)}.$

However, a routine Gauss sum calculation reveals that the left-hand side is ${O(N^{-c})}$ for some absolute constant ${c>0}$ because ${\xi}$ is non-zero, so that ${N \ll \delta^{-O(1)}}$. The only remaining case to consider is when

$\displaystyle \|F\|_{U^3(G)} \gg \delta^{O(1)}.$

Repeating the above arguments we then conclude that

$\displaystyle \|F\|_{U^2(G)} \gg \delta^{O(1)},$

and then

$\displaystyle \|F\|_{U^1(G)} \gg \delta^{O(1)}.$

The left-hand side can be computed to equal ${|\mathop{\bf E}_{x \in G} f_1(x)| |\mathop{\bf E}_{x \in G} f_2(x)|}$, and the claim follows. $\Box$

This argument was given for the cyclic group setting, but the argument can also be applied to the integers (see Peluse-Prendiville) and can also be used to establish an analogue over the reals (that was first obtained by Bourgain).

I just heard the news that Louis Nirenberg died a few days ago, aged 94.  Nirenberg made a vast number of contributions to analysis and PDE (and his work has come up repeatedly on my own blog); I wrote about his beautiful moving planes argument with Gidas and Ni to establish symmetry of ground states in this post on the occasion of him receiving the Chern medal, and on how his extremely useful interpolation inequality with Gagliardo (generalising a previous inequality of Ladyzhenskaya) can be viewed as an amplification of the usual Sobolev inequality in this post.  Another fundamentally useful inequality of Nirenberg is the John-Nirenberg inequality established with Fritz John: if a (locally integrable) function $f: {\bf R} \to {\bf R}$ (which for simplicity of exposition we place in one dimension) obeys the bounded mean oscillation property

$\displaystyle \frac{1}{|I|} \int_I |f(x)-f_I|\ dx \leq A \quad (1)$

for all intervals $I$, where $f_I := \frac{1}{|I|} \int_I f$ is the average value of $f$ on $I$, then one has exponentially good large deviation estimates

$\displaystyle \frac{1}{|I|} |\{ x \in I: |f(x)-f_I| \geq \lambda A \}| \leq \exp( - c \lambda ) \quad (2)$

for all $\lambda>0$ and some absolute constant $c$.  This can be compared with Markov’s inequality, which only gives the far weaker decay

$\displaystyle \frac{1}{|I|} |\{ x \in I: |f(x)-f_I| \geq \lambda A \}| \leq \frac{1}{\lambda}. \quad (3)$

The point is that (1) is assumed to hold not just for a given interval $I$, but also all subintervals of $I$, and this is a much more powerful hypothesis, allowing one for instance to use the standard Calderon-Zygmund technique of stopping time arguments to “amplify” (3) to (2).  Basically, for any given interval $I$, one can use (1) and repeated halving of the interval $I$ until significant deviation from the mean is encountered to locate some disjoint exceptional subintervals $J$ where $f_J$ deviates from $f_I$ by $O(A)$, with the total measure of the $J$ being a small fraction of that of $I$ (thanks to a variant of (3)), and with $f$ staying within $O(A)$ of $f_I$ at almost every point of $I$ outside of these exceptional intervals.  One can then establish (2) by an induction on $\lambda$.  (There are other proofs of this inequality also, e.g., one can use Bellman functions, as discussed in this old set of notes of mine.)   Informally, the John-Nirenberg inequality asserts that functions of bounded mean oscillation are “almost as good” as bounded functions, in that they almost always stay within a bounded distance from their mean, and in fact the space BMO of functions of bounded mean oscillation ends up being superior to the space $L^\infty$ of bounded measurable functions for many harmonic analysis purposes (among other things, the space is more stable with respect to singular integral operators).

I met Louis a few times in my career; even in his later years when he was wheelchair-bound, he would often come to conferences and talks, and ask very insightful questions at the end of the lecture (even when it looked like he was asleep during much of the actual talk!).  I have a vague memory of him asking me some questions in one of the early talks I gave as a postdoc; I unfortunately do not remember exactly what the topic was (some sort of PDE, I think), but I was struck by how kindly the questions were posed, and how patiently he would listen to my excited chattering about my own work.

Define the Collatz map ${\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1}$ on the natural numbers ${{\bf N}+1 = \{1,2,\dots\}}$ by setting ${\mathrm{Col}(N)}$ to equal ${3N+1}$ when ${N}$ is odd and ${N/2}$ when ${N}$ is even, and let ${\mathrm{Col}^{\bf N}(N) := \{ N, \mathrm{Col}(N), \mathrm{Col}^2(N), \dots \}}$ denote the forward Collatz orbit of ${N}$. The notorious Collatz conjecture asserts that ${1 \in \mathrm{Col}^{\bf N}(N)}$ for all ${N \in {\bf N}+1}$. Equivalently, if we define the backwards Collatz orbit ${(\mathrm{Col}^{\bf N})^*(N) := \{ M \in {\bf N}+1: N \in \mathrm{Col}^{\bf N}(M) \}}$ to be all the natural numbers ${M}$ that encounter ${N}$ in their forward Collatz orbit, then the Collatz conjecture asserts that ${(\mathrm{Col}^{\bf N})^*(1) = {\bf N}+1}$. As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

$\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (1)$

for all ${x \geq 1}$ and ${\gamma = 0.84}$. (This improved upon previous values of ${\gamma = 0.81}$ obtained by Applegate and Lagarias in 1995, ${\gamma = 0.65}$ by Applegate and Lagarias in 1995 by a different method, ${\gamma=0.48}$ by Wirsching in 1993, ${\gamma=0.43}$ by Krasikov in 1989, ${\gamma=0.3}$ by Sander in 1990, and some ${\gamma>0}$ by Crandall in 1978.) This is still the largest value of ${\gamma}$ for which (1) has been established. Of course, the Collatz conjecture would imply that we can take ${\gamma}$ equal to ${1}$, which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the ${\gamma=1}$ case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to ${1}$. In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called Syracuse random variables.

Definition 1 (Syracuse random variables) For any natural number ${n}$, a Syracuse random variable ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ on the cyclic group ${{\bf Z}/3^n{\bf Z}}$ is defined as a random variable of the form

$\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = \sum_{m=1}^n 3^{n-m} 2^{-{\mathbf a}_m-\dots-{\mathbf a}_n} \ \ \ \ \ (2)$

where ${\mathbf{a}_1,\dots,\mathbf{a_n}}$ are independent copies of a geometric random variable ${\mathbf{Geom}(2)}$ on the natural numbers with mean ${2}$, thus

$\displaystyle \mathop{\bf P}( \mathbf{a}_1=a_1,\dots,\mathbf{a}_n=a_n) = 2^{-a_1-\dots-a_n}$

} for ${a_1,\dots,a_n \in {\bf N}+1}$. In (2) the arithmetic is performed in the ring ${{\bf Z}/3^n{\bf Z}}$.

Thus for instance

$\displaystyle \mathrm{Syrac}({\bf Z}/3{\bf Z}) = 2^{-\mathbf{a}_1} \hbox{ mod } 3$

$\displaystyle \mathrm{Syrac}({\bf Z}/3^2{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2} + 3 \times 2^{-\mathbf{a}_2} \hbox{ mod } 3^2$

$\displaystyle \mathrm{Syrac}({\bf Z}/3^3{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2-\mathbf{a}_3} + 3 \times 2^{-\mathbf{a}_2-\mathbf{a}_3} + 3^2 \times 2^{-\mathbf{a}_3} \hbox{ mod } 3^3$

and so forth. After reversing the labeling of the ${\mathbf{a}_1,\dots,\mathbf{a}_n}$, one could also view ${\mathrm{Syrac}({\bf Z}/3^n{\bf Z})}$ as the mod ${3^n}$ reduction of a ${3}$-adic random variable

$\displaystyle \mathbf{Syrac}({\bf Z}_3) = \sum_{m=1}^\infty 3^{m-1} 2^{-{\mathbf a}_1-\dots-{\mathbf a}_m}.$

The probability density function ${b \mapsto \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b )}$ of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when ${n=1}$, ${\mathbf{P}( \mathbf{Syrac}({\bf Z}/3{\bf Z}) = b )}$ is equal to ${0,1/3,2/3}$ for ${x=b,1,2 \hbox{ mod } 3}$ respectively, while when ${n=2}$, ${\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = b )}$ is equal to

$\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}$

when ${b=0,\dots,8 \hbox{ mod } 9}$ respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let ${2{\bf N}+1 = \{1,3,5,\dots\}}$ denote the odd natural numbers, and define the Syracuse map ${\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}$ by

$\displaystyle \mathrm{Syr}(N) := \frac{3n+1}{2^{\nu_2(3N+1)}}$

where the ${2}$valuation ${\nu_2(3n+1) \in {\bf N}}$ is the number of times ${2}$ divides ${3N+1}$. We can define the forward orbit ${\mathrm{Syr}^{\bf N}(n)}$ and backward orbit ${(\mathrm{Syr}^{\bf N})^*(N)}$ of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion ${(\mathrm{Syr}^{\bf N})^*(1) = 2{\bf N}+1}$, and that the assertion (1) for a given ${\gamma}$ is equivalent to the assertion

$\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (3)$

for all ${x \geq 1}$, where ${N}$ is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number ${N}$ and natural number ${n}$, one has

$\displaystyle \mathrm{Syr}^n(N) = 3^n 2^{-a_1-\dots-a_n} N + \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n}$

where the natural numbers ${a_1,\dots,a_n}$ are defined by the formula

$\displaystyle a_i := \nu_2( 3 \mathrm{Syr}^{i-1}(N) + 1 ),$

so in particular

$\displaystyle \mathrm{Syr}^n(N) = \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n} \hbox{ mod } 3^n.$

Heuristically, one expects the ${2}$-valuation ${a = \nu_2(N)}$ of a typical odd number ${N}$ to be approximately distributed according to the geometric distribution ${\mathbf{Geom}(2)}$, so one therefore expects the residue class ${\mathrm{Syr}^n(N) \hbox{ mod } 3^n}$ to be distributed approximately according to the random variable ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$.

The Syracuse random variables ${\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}$ will always avoid multiples of three (this reflects the fact that ${\mathrm{Syr}(N)}$ is never a multiple of three), but attains any non-multiple of three in ${{\bf Z}/3^n{\bf Z}}$ with positive probability. For any natural number ${n}$, set

$\displaystyle c_n := \inf_{b \in {\bf Z}/3^n{\bf Z}: 3 \nmid b} \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b ).$

Equivalently, ${c_n}$ is the greatest quantity for which we have the inequality

$\displaystyle \sum_{(a_1,\dots,a_n) \in S_{n,N}} 2^{-a_1-\dots-a_m} \geq c_n \ \ \ \ \ (4)$

for all integers ${N}$ not divisible by three, where ${S_{n,N} \subset ({\bf N}+1)^n}$ is the set of all tuples ${(a_1,\dots,a_n)}$ for which

$\displaystyle N = \sum_{m=1}^n 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n.$

Thus for instance ${c_0=1}$, ${c_1 = 1/3}$, and ${c_2 = 2/63}$. On the other hand, since all the probabilities ${\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b)}$ sum to ${1}$ as ${b \in {\bf Z}/3^n{\bf Z}}$ ranges over the non-multiples of ${3}$, we have the trivial upper bound

$\displaystyle c_n \leq \frac{3}{2} 3^{-n}.$

There is also an easy submultiplicativity result:

Lemma 2 For any natural numbers ${n_1,n_2}$, we have

$\displaystyle c_{n_1+n_2-1} \geq c_{n_1} c_{n_2}.$

Proof: Let ${N}$ be an integer not divisible by ${3}$, then by (4) we have

$\displaystyle \sum_{(a_1,\dots,a_{n_1}) \in S_{n_1,N}} 2^{-a_1-\dots-a_{n_1}} \geq c_{n_1}.$

If we let ${S'_{n_1,N}}$ denote the set of tuples ${(a_1,\dots,a_{n_1-1})}$ that can be formed from the tuples in ${S_{n_1,N}}$ by deleting the final component ${a_{n_1}}$ from each tuple, then we have

$\displaystyle \sum_{(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}} 2^{-a_1-\dots-a_{n_1-1}} \geq c_{n_1}. \ \ \ \ \ (5)$

Next, observe that if ${(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}}$, then

$\displaystyle N = \sum_{m=1}^{n_1-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_1-1} 2^{-a_1-\dots-a_{n_1-1}} M$

with ${M = M_{N,n_1,a_1,\dots,a_{n_1-1}}}$ an integer not divisible by three. By definition of ${S_{n_2,M}}$ and a relabeling, we then have

$\displaystyle M = \sum_{m=1}^{n_2} 3^{m-1} 2^{-a_{n_1}-\dots-a_{m+n_1-1}} \hbox{ mod } 3^{n_2}$

for all ${(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}}$. For such tuples we then have

$\displaystyle N = \sum_{m=1}^{n_1+n_2-1} 3^{m-1} 2^{-a_1-\dots-a_{n_1+n_2-1}} \hbox{ mod } 3^{n_1+n_2-1}$

so that ${(a_1,\dots,a_{n_1+n_2-1}) \in S_{n_1+n_2-1,N}}$. Since

$\displaystyle \sum_{(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}} 2^{-a_{n_1}-\dots-a_{n_1+n_2-1}} \geq c_{n_2}$

for each ${M}$, the claim follows. $\Box$

From this lemma we see that ${c_n = 3^{-\beta n + o(n)}}$ for some absolute constant ${\beta \geq 1}$. Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of ${{\bf Z}/3^n{\bf Z}}$ (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that ${\beta=1}$. I cannot prove this, but I can show that this conjecture would imply that we can take the exponent ${\gamma}$ in (1), (3) arbitrarily close to one:

Proposition 3 Suppose that ${\beta=1}$ (that is to say, ${c_n = 3^{-n+o(n)}}$ as ${n \rightarrow \infty}$). Then

$\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^{1-o(1)}$

as ${x \rightarrow \infty}$, or equivalently

$\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^{1-o(1)}$

as ${x \rightarrow \infty}$. In other words, (1), (3) hold for all ${\gamma < 1}$.

I prove this proposition below the fold. A variant of the argument shows that for any value of ${\beta}$, (1), (3) holds whenever ${\gamma < f(\beta)}$, where ${f: [0,1] \rightarrow [0,1]}$ is an explicitly computable function with ${f(\beta) \rightarrow 1}$ as ${\beta \rightarrow 1}$. In principle, one could then improve the Krasikov-Lagarias result ${\gamma = 0.84}$ by getting a sufficiently good upper bound on ${\beta}$, which is in principle achievable numerically (note for instance that Lemma 2 implies the bound ${c_n \leq 3^{-\beta(n-1)}}$ for any ${n}$, since ${c_{kn-k+1} \geq c_n^k}$ for any ${k}$).

Just a brief post to record some notable papers in my fields of interest that appeared on the arXiv recently.

• A sharp square function estimate for the cone in ${\bf R}^3$“, by Larry Guth, Hong Wang, and Ruixiang Zhang.  This paper establishes an optimal (up to epsilon losses) square function estimate for the three-dimensional light cone that was essentially conjectured by Mockenhaupt, Seeger, and Sogge, which has a number of other consequences including Sogge’s local smoothing conjecture for the wave equation in two spatial dimensions, which in turn implies the (already known) Bochner-Riesz, restriction, and Kakeya conjectures in two dimensions.   Interestingly, modern techniques such as polynomial partitioning and decoupling estimates are not used in this argument; instead, the authors mostly rely on an induction on scales argument and Kakeya type estimates.  Many previous authors (including myself) were able to get weaker estimates of this type by an induction on scales method, but there were always significant inefficiencies in doing so; in particular knowing the sharp square function estimate at smaller scales did not imply the sharp square function estimate at the given larger scale.  The authors here get around this issue by finding an even stronger estimate that implies the square function estimate, but behaves significantly better with respect to induction on scales.
• On the Chowla and twin primes conjectures over ${\mathbb F}_q[T]$“, by Will Sawin and Mark Shusterman.  This paper resolves a number of well known open conjectures in analytic number theory, such as the Chowla conjecture and the twin prime conjecture (in the strong form conjectured by Hardy and Littlewood), in the case of function fields where the field is a prime power $q=p^j$ which is fixed (in contrast to a number of existing results in the “large $q$” limit) but has a large exponent $j$.  The techniques here are orthogonal to those used in recent progress towards the Chowla conjecture over the integers (e.g., in this previous paper of mine); the starting point is an algebraic observation that in certain function fields, the Mobius function behaves like a quadratic Dirichlet character along certain arithmetic progressions.  In principle, this reduces problems such as Chowla’s conjecture to problems about estimating sums of Dirichlet characters, for which more is known; but the task is still far from trivial.
• Bounds for sets with no polynomial progressions“, by Sarah Peluse.  This paper can be viewed as part of a larger project to obtain quantitative density Ramsey theorems of Szemeredi type.  For instance, Gowers famously established a relatively good quantitative bound for Szemeredi’s theorem that all dense subsets of integers contain arbitrarily long arithmetic progressions $a, a+r, \dots, a+(k-1)r$.  The corresponding question for polynomial progressions $a+P_1(r), \dots, a+P_k(r)$ is considered more difficult for a number of reasons.  One of them is that dilation invariance is lost; a dilation of an arithmetic progression is again an arithmetic progression, but a dilation of a polynomial progression will in general not be a polynomial progression with the same polynomials $P_1,\dots,P_k$.  Another issue is that the ranges of the two parameters $a,r$ are now at different scales.  Peluse gets around these difficulties in the case when all the polynomials $P_1,\dots,P_k$ have distinct degrees, which is in some sense the opposite case to that considered by Gowers (in particular, she avoids the need to obtain quantitative inverse theorems for high order Gowers norms; which was recently obtained in this integer setting by Manners but with bounds that are probably not strong enough to for the bounds in Peluse’s results, due to a degree lowering argument that is available in this case).  To resolve the first difficulty one has to make all the estimates rather uniform in the coefficients of the polynomials $P_j$, so that one can still run a density increment argument efficiently.  To resolve the second difficulty one needs to find a quantitative concatenation theorem for Gowers uniformity norms.  Many of these ideas were developed in previous papers of Peluse and Peluse-Prendiville in simpler settings.
• On blow up for the energy super critical defocusing non linear Schrödinger equations“, by Frank Merle, Pierre Raphael, Igor Rodnianski, and Jeremie Szeftel.  This paper (when combined with two companion papers) resolves a long-standing problem as to whether finite time blowup occurs for the defocusing supercritical nonlinear Schrödinger equation (at least in certain dimensions and nonlinearities).  I had a previous paper establishing a result like this if one “cheated” by replacing the nonlinear Schrodinger equation by a system of such equations, but remarkably they are able to tackle the original equation itself without any such cheating.  Given the very analogous situation with Navier-Stokes, where again one can create finite time blowup by “cheating” and modifying the equation, it does raise hope that finite time blowup for the incompressible Navier-Stokes and Euler equations can be established…  In fact the connection may not just be at the level of analogy; a surprising key ingredient in the proofs here is the observation that a certain blowup ansatz for the nonlinear Schrodinger equation is governed by solutions to the (compressible) Euler equation, and finite time blowup examples for the latter can be used to construct finite time blowup examples for the former.