You are currently browsing the category archive for the ‘teaching’ category.

This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If ${f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ is an absolutely integrable function, its Fourier coefficients ${\hat f: {\bf Z} \rightarrow {\bf C}}$ are defined by the formula

$\displaystyle \hat f(n) := \int_{{\bf R}/{\bf Z}} f(x) e^{-2\pi i nx}\ dx.$

If ${f}$ is smooth, then the Fourier coefficients ${\hat f}$ are absolutely summable, and we have the Fourier inversion formula

$\displaystyle f(x) = \sum_{n \in {\bf Z}} \hat f(n) e^{2\pi i nx}$

where the series here is uniformly convergent. In particular, if we define the partial summation operators

$\displaystyle S_N f(x) := \sum_{|n| \leq N} \hat f(n) e^{2\pi i nx}$

then ${S_N f}$ converges uniformly to ${f}$ when ${f}$ is smooth.

What if ${f}$ is not smooth, but merely lies in an ${L^p({\bf R}/{\bf Z})}$ class for some ${1 \leq p \leq \infty}$? The Fourier coefficients ${\hat f}$ remain well-defined, as do the partial summation operators ${S_N}$. The question of convergence in norm is relatively easy to settle:

Exercise 1
• (i) If ${1 < p < \infty}$ and ${f \in L^p({\bf R}/{\bf Z})}$, show that ${S_N f}$ converges in ${L^p({\bf R}/{\bf Z})}$ norm to ${f}$. (Hint: first use the boundedness of the Hilbert transform to show that ${S_N}$ is bounded in ${L^p({\bf R}/{\bf Z})}$ uniformly in ${N}$.)
• (ii) If ${p=1}$ or ${p=\infty}$, show that there exists ${f \in L^p({\bf R}/{\bf Z})}$ such that the sequence ${S_N f}$ is unbounded in ${L^p({\bf R}/{\bf Z})}$ (so in particular it certainly does not converge in ${L^p({\bf R}/{\bf Z})}$ norm to ${f}$. (Hint: first show that ${S_N}$ is not bounded in ${L^p({\bf R}/{\bf Z})}$ uniformly in ${N}$, then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

Theorem 2 (Pointwise almost everywhere convergence)
• (i) (Kolmogorov, 1923) There exists ${f \in L^1({\bf R}/{\bf Z})}$ such that ${S_N f(x)}$ is unbounded in ${N}$ for almost every ${x}$.
• (ii) (Carleson, 1966; conjectured by Lusin, 1913) For every ${f \in L^2({\bf R}/{\bf Z})}$, ${S_N f(x)}$ converges to ${f(x)}$ as ${N \rightarrow \infty}$ for almost every ${x}$.
• (iii) (Hunt, 1967) For every ${1 < p \leq \infty}$ and ${f \in L^p({\bf R}/{\bf Z})}$, ${S_N f(x)}$ converges to ${f(x)}$ as ${N \rightarrow \infty}$ for almost every ${x}$.

Note from Hölder’s inequality that ${L^2({\bf R}/{\bf Z})}$ contains ${L^p({\bf R}/{\bf Z})}$ for all ${p\geq 2}$, so Carleson’s theorem covers the ${p \geq 2}$ case of Hunt’s theorem. We remark that the precise threshold near ${L^1}$ between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function ${f}$ or the summation method ${S_N}$ by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a frequency modulation symmetry in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

In contrast to previous notes, in this set of notes we shall focus exclusively on Fourier analysis in the one-dimensional setting ${d=1}$ for simplicity of notation, although all of the results here have natural extensions to higher dimensions. Depending on the physical context, one can view the physical domain ${{\bf R}}$ as representing either space or time; we will mostly think in terms of the former interpretation, even though the standard terminology of “time-frequency analysis”, which we will make more prominent use of in later notes, clearly originates from the latter.

In previous notes we have often performed various localisations in either physical space or Fourier space ${{\bf R}}$, for instance in order to take advantage of the uncertainty principle. One can formalise these operations in terms of the functional calculus of two basic operations on Schwartz functions ${{\mathcal S}({\bf R})}$, the position operator ${X: {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$ defined by

$\displaystyle (Xf)(x) := x f(x)$

and the momentum operator ${D: {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$, defined by

$\displaystyle (Df)(x) := \frac{1}{2\pi i} \frac{d}{dx} f(x). \ \ \ \ \ (1)$

(The terminology comes from quantum mechanics, where it is customary to also insert a small constant ${h}$ on the right-hand side of (1) in accordance with de Broglie’s law. Such a normalisation is also used in several branches of mathematics, most notably semiclassical analysis and microlocal analysis, where it becomes profitable to consider the semiclassical limit ${h \rightarrow 0}$, but we will not emphasise this perspective here.) The momentum operator can be viewed as the counterpart to the position operator, but in frequency space instead of physical space, since we have the standard identity

$\displaystyle \widehat{Df}(\xi) = \xi \hat f(\xi)$

for any ${\xi \in {\bf R}}$ and ${f \in {\mathcal S}({\bf R})}$. We observe that both operators ${X,D}$ are formally self-adjoint in the sense that

$\displaystyle \langle Xf, g \rangle = \langle f, Xg \rangle; \quad \langle Df, g \rangle = \langle f, Dg \rangle$

for all ${f,g \in {\mathcal S}({\bf R})}$, where we use the ${L^2({\bf R})}$ Hermitian inner product

$\displaystyle \langle f, g\rangle := \int_{\bf R} f(x) \overline{g(x)}\ dx.$

Clearly, for any polynomial ${P(x)}$ of one real variable ${x}$ (with complex coefficients), the operator ${P(X): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$ is given by the spatial multiplier operator

$\displaystyle (P(X) f)(x) = P(x) f(x)$

and similarly the operator ${P(D): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$ is given by the Fourier multiplier operator

$\displaystyle \widehat{P(D) f}(\xi) = P(\xi) \hat f(\xi).$

Inspired by this, if ${m: {\bf R} \rightarrow {\bf C}}$ is any smooth function that obeys the derivative bounds

$\displaystyle \frac{d^j}{dx^j} m(x) \lesssim_{m,j} \langle x \rangle^{O_{m,j}(1)} \ \ \ \ \ (2)$

for all ${j \geq 0}$ and ${x \in {\bf R}}$ (that is to say, all derivatives of ${m}$ grow at most polynomially), then we can define the spatial multiplier operator ${m(X): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$ by the formula

$\displaystyle (m(X) f)(x) := m(x) f(x);$

one can easily verify from several applications of the Leibniz rule that ${m(X)}$ maps Schwartz functions to Schwartz functions. We refer to ${m(x)}$ as the symbol of this spatial multiplier operator. In a similar fashion, we define the Fourier multiplier operator ${m(D)}$ associated to the symbol ${m(\xi)}$ by the formula

$\displaystyle \widehat{m(D) f}(\xi) := m(\xi) \hat f(\xi).$

For instance, any constant coefficient linear differential operators ${\sum_{k=0}^n c_k \frac{d^k}{dx^k}}$ can be written in this notation as

$\displaystyle \sum_{k=0}^n c_k \frac{d^k}{dx^k} =\sum_{k=0}^n c_k (2\pi i D)^k;$

however there are many Fourier multiplier operators that are not of this form, such as fractional derivative operators ${\langle D \rangle^s = (1- \frac{1}{4\pi^2} \frac{d^2}{dx^2})^{s/2}}$ for non-integer values of ${s}$, which is a Fourier multiplier operator with symbol ${\langle \xi \rangle^s}$. It is also very common to use spatial cutoffs ${\psi(X)}$ and Fourier cutoffs ${\psi(D)}$ for various bump functions ${\psi}$ to localise functions in either space or frequency; we have seen several examples of such cutoffs in action in previous notes (often in the higher dimensional setting ${d>1}$).

We observe that the maps ${m \mapsto m(X)}$ and ${m \mapsto m(D)}$ are ring homomorphisms, thus for instance

$\displaystyle (m_1 + m_2)(D) = m_1(D) + m_2(D)$

and

$\displaystyle (m_1 m_2)(D) = m_1(D) m_2(D)$

for any ${m_1,m_2}$ obeying the derivative bounds (2); also ${m(D)}$ is formally adjoint to ${\overline{m(D)}}$ in the sense that

$\displaystyle \langle m(D) f, g \rangle = \langle f, \overline{m}(D) g \rangle$

for ${f,g \in {\mathcal S}({\bf R})}$, and similarly for ${m(X)}$ and ${\overline{m}(X)}$. One can interpret these facts as part of the functional calculus of the operators ${X,D}$, which can be interpreted as densely defined self-adjoint operators on ${L^2({\bf R})}$. However, in this set of notes we will not develop the spectral theory necessary in order to fully set out this functional calculus rigorously.

In the field of PDE and ODE, it is also very common to study variable coefficient linear differential operators

$\displaystyle \sum_{k=0}^n c_k(x) \frac{d^k}{dx^k} \ \ \ \ \ (3)$

where the ${c_0,\dots,c_n}$ are now functions of the spatial variable ${x}$ obeying the derivative bounds (2). A simple example is the quantum harmonic oscillator Hamiltonian ${-\frac{d^2}{dx^2} + x^2}$. One can rewrite this operator in our notation as

$\displaystyle \sum_{k=0}^n c_k(X) (2\pi i D)^k$

and so it is natural to interpret this operator as a combination ${a(X,D)}$ of both the position operator ${X}$ and the momentum operator ${D}$, where the symbol ${a: {\bf R} \times {\bf R} \rightarrow {\bf C}}$ this operator is the function

$\displaystyle a(x,\xi) := \sum_{k=0}^n c_k(x) (2\pi i \xi)^k. \ \ \ \ \ (4)$

Indeed, from the Fourier inversion formula

$\displaystyle f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi}\ d\xi$

for any ${f \in {\mathcal S}({\bf R})}$ we have

$\displaystyle (2\pi i D)^k f(x) = \int_{\bf R} (2\pi i \xi)^k \hat f(\xi) e^{2\pi i x \xi}\ d\xi$

and hence on multiplying by ${c_k(x)}$ and summing we have

$\displaystyle (\sum_{k=0}^n c_k(X) (2\pi i D)^k) f(x) = \int_{\bf R} a(x,\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi.$

Inspired by this, we can introduce the Kohn-Nirenberg quantisation by defining the operator ${a(X,D) = a_{KN}(X,D): {\mathcal S}({\bf R}) \rightarrow {\mathcal S}({\bf R})}$ by the formula

$\displaystyle a(X,D) f(x) = \int_{\bf R} a(x,\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi \ \ \ \ \ (5)$

whenever ${f \in {\mathcal S}({\bf R})}$ and ${a: {\bf R} \times {\bf R} \rightarrow {\bf C}}$ is any smooth function obeying the derivative bounds

$\displaystyle \frac{\partial^j}{\partial x^j} \frac{\partial^l}{\partial \xi^l} a(x,\xi) \lesssim_{a,j,l} \langle x \rangle^{O_{a,j}(1)} \langle \xi \rangle^{O_{a,j,l}(1)} \ \ \ \ \ (6)$

for all ${j,l \geq 0}$ and ${x \in {\bf R}}$ (note carefully that the exponent in ${x}$ on the right-hand side is required to be uniform in ${l}$). This quantisation clearly generalises both the spatial multiplier operators ${m(X)}$ and the Fourier multiplier operators ${m(D)}$ defined earlier, which correspond to the cases when the symbol ${a(x,\xi)}$ is a function of ${x}$ only or ${\xi}$ only respectively. Thus we have combined the physical space ${{\bf R} = \{ x: x \in {\bf R}\}}$ and the frequency space ${{\bf R} = \{ \xi: \xi \in {\bf R}\}}$ into a single domain, known as phase space ${{\bf R} \times {\bf R} = \{ (x,\xi): x,\xi \in {\bf R} \}}$. The term “time-frequency analysis” encompasses analysis based on decompositions and other manipulations of phase space, in much the same way that “Fourier analysis” encompasses analysis based on decompositions and other manipulations of frequency space. We remark that the Kohn-Nirenberg quantization is not the only choice of quantization one could use; see Remark 19 below.

Exercise 1

• (i) Show that for ${a}$ obeying (6), that ${a(X,D)}$ does indeed map ${{\mathcal S}({\bf R})}$ to ${{\mathcal S}({\bf R})}$.
• (ii) Show that the symbol ${a}$ is uniquely determined by the operator ${a(X,D)}$. That is to say, if ${a,b}$ are two functions obeying (6) with ${a(X,D) f = b(X,D) f}$ for all ${f \in {\mathcal S}({\bf R})}$, then ${a=b}$. (Hint: apply ${a(X,D)-b(X,D)}$ to a suitable truncation of a plane wave ${x \mapsto e^{2\pi i x \xi}}$ and then take limits.)

In principle, the quantisations ${a(X,D)}$ are potentially very useful for such tasks as inverting variable coefficient linear operators, or to localize a function simultaneously in physical and Fourier space. However, a fundamental difficulty arises: map from symbols ${a}$ to operators ${a(X,D)}$ is now no longer a ring homomorphism, in particular

$\displaystyle (a_1 a_2)(X,D) \neq a_1(X,D) a_2(X,D) \ \ \ \ \ (7)$

in general. Fundamentally, this is due to the fact that pointwise multiplication of symbols is a commutative operation, whereas the composition of operators such as ${X}$ and ${D}$ does not necessarily commute. This lack of commutativity can be measured by introducing the commutator

$\displaystyle [A,B] := AB - BA$

of two operators ${A,B}$, and noting from the product rule that

$\displaystyle [X,D] = -\frac{1}{2\pi i} \neq 0.$

(In the language of Lie groups and Lie algebras, this tells us that ${X,D}$ are (up to complex constants) the standard Lie algebra generators of the Heisenberg group.) From a quantum mechanical perspective, this lack of commutativity is the root cause of the uncertainty principle that prevents one from simultaneously localizing in both position and momentum past a certain point. Here is one basic way of formalising this principle:

Exercise 2 (Heisenberg uncertainty principle) For any ${x_0, \xi_0 \in {\bf R}}$ and ${f \in \mathcal{S}({\bf R})}$, show that

$\displaystyle \| (X-x_0) f \|_{L^2({\bf R})} \| (D-\xi_0) f\|_{L^2({\bf R})} \geq \frac{1}{4\pi} \|f\|_{L^2({\bf R})}^2.$

(Hint: evaluate the expression ${\langle [X-x_0, D - \xi_0] f, f \rangle}$ in two different ways and apply the Cauchy-Schwarz inequality.) Informally, this exercise asserts that the spatial uncertainty ${\Delta x}$ and the frequency uncertainty ${\Delta \xi}$ of a function obey the Heisenberg uncertainty relation ${\Delta x \Delta \xi \gtrsim 1}$.

Nevertheless, one still has the correspondence principle, which asserts that in certain regimes (which, with our choice of normalisations, corresponds to the high-frequency regime), quantum mechanics continues to behave like a commutative theory, and one can sometimes proceed as if the operators ${X,D}$ (and the various operators ${a(X,D)}$ constructed from them) commute up to “lower order” errors. This can be formalised using the pseudodifferential calculus, which we give below the fold, in which we restrict the symbol ${a}$ to certain “symbol classes” of various orders (which then restricts ${a(X,D)}$ to be pseudodifferential operators of various orders), and obtains approximate identities such as

$\displaystyle (a_1 a_2)(X,D) \approx a_1(X,D) a_2(X,D)$

where the error between the left and right-hand sides is of “lower order” and can in fact enjoys a useful asymptotic expansion. As a first approximation to this calculus, one can think of functions ${f \in {\mathcal S}({\bf R})}$ as having some sort of “phase space portrait${\tilde f(x,\xi)}$ which somehow combines the physical space representation ${x \mapsto f(x)}$ with its Fourier representation ${\xi \mapsto f(\xi)}$, and pseudodifferential operators ${a(X,D)}$ behave approximately like “phase space multiplier operators” in this representation in the sense that

$\displaystyle \widetilde{a(X,D) f}(x,\xi) \approx a(x,\xi) \tilde f(x,\xi).$

Unfortunately the uncertainty principle (or the non-commutativity of ${X}$ and ${D}$) prevents us from making these approximations perfectly precise, and it is not always clear how to even define a phase space portrait ${\tilde f}$ of a function ${f}$ precisely (although there are certain popular candidates for such a portrait, such as the FBI transform (also known as the Gabor transform in signal processing literature), or the Wigner quasiprobability distribution, each of which have some advantages and disadvantages). Nevertheless even if the concept of a phase space portrait is somewhat fuzzy, it is of great conceptual benefit both within mathematics and outside of it. For instance, the musical score one assigns a piece of music can be viewed as a phase space portrait of the sound waves generated by that music.

To complement the pseudodifferential calculus we have the basic Calderón-Vaillancourt theorem, which asserts that pseudodifferential operators of order zero are Calderón-Zygmund operators and thus bounded on ${L^p({\bf R})}$ for ${1 < p < \infty}$. The standard proof of this theorem is a classic application of one of the basic techniques in harmonic analysis, namely the exploitation of almost orthogonality; the proof we will give here will achieve this through the elegant device of the Cotlar-Stein lemma.

Pseudodifferential operators (especially when generalised to higher dimensions ${d \geq 1}$) are a fundamental tool in the theory of linear PDE, as well as related fields such as semiclassical analysis, microlocal analysis, and geometric quantisation. There is an even wider class of operators that is also of interest, namely the Fourier integral operators, which roughly speaking not only approximately multiply the phase space portrait ${\tilde f(x,\xi)}$ of a function by some multiplier ${a(x,\xi)}$, but also move the portrait around by a canonical transformation. However, the development of theory of these operators is beyond the scope of these notes; see for instance the texts of Hormander or Eskin.

This set of notes is only the briefest introduction to the theory of pseudodifferential operators. Many texts are available that cover the theory in more detail, for instance this text of Taylor.

The square root cancellation heuristic, briefly mentioned in the preceding set of notes, predicts that if a collection ${z_1,\dots,z_n}$ of complex numbers have phases that are sufficiently “independent” of each other, then

$\displaystyle |\sum_{j=1}^n z_j| \approx (\sum_{j=1}^n |z_j|^2)^{1/2};$

similarly, if ${f_1,\dots,f_n}$ are a collection of functions in a Lebesgue space ${L^p(X,\mu)}$ that oscillate “independently” of each other, then we expect

$\displaystyle \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)} \approx \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)}.$

We have already seen one instance in which this heuristic can be made precise, namely when the phases of ${z_j,f_j}$ are randomised by a random sign, so that Khintchine’s inequality (Lemma 4 from Notes 1) can be applied. There are other contexts in which a square function estimate

$\displaystyle \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)} \lesssim \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)}$

or a reverse square function estimate

$\displaystyle \| \sum_{j=1}^n f_j \|_{L^p(X,\mu)} \lesssim \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p(X,\mu)}$

(or both) are known or conjectured to hold. For instance, the useful Littlewood-Paley inequality implies (among other things) that for any ${1 < p < \infty}$, we have the reverse square function estimate

$\displaystyle \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)} \lesssim_{p,d} \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)}, \ \ \ \ \ (1)$

whenever the Fourier transforms ${\hat f_j}$ of the ${f_j}$ are supported on disjoint annuli ${\{ \xi \in {\bf R}^d: 2^{k_j} \leq |\xi| < 2^{k_j+1} \}}$, and we also have the matching square function estimate

$\displaystyle \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)} \lesssim_{p,d} \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)}$

if there is some separation between the annuli (for instance if the ${k_j}$ are ${2}$-separated). We recall the proofs of these facts below the fold. In the ${p=2}$ case, we of course have Pythagoras’ theorem, which tells us that if the ${f_j}$ are all orthogonal elements of ${L^2(X,\mu)}$, then

$\displaystyle \| \sum_{j=1}^n f_j \|_{L^2(X,\mu)} = (\sum_{j=1}^n \| f_j \|_{L^2(X,\mu)}^2)^{1/2} = \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^2(X,\mu)}.$

In particular, this identity holds if the ${f_j \in L^2({\bf R}^d)}$ have disjoint Fourier supports in the sense that their Fourier transforms ${\hat f_j}$ are supported on disjoint sets. For ${p=4}$, the technique of bi-orthogonality can also give square function and reverse square function estimates in some cases, as we shall also see below the fold.
In recent years, it has begun to be realised that in the regime ${p > 2}$, a variant of reverse square function estimates such as (1) is also useful, namely decoupling estimates such as

$\displaystyle \| \sum_{j=1}^n f_j \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{j=1}^n \|f_j\|_{L^p({\bf R}^d)}^2)^{1/2} \ \ \ \ \ (2)$

(actually in practice we often permit small losses such as ${n^\varepsilon}$ on the right-hand side). An estimate such as (2) is weaker than (1) when ${p\geq 2}$ (or equal when ${p=2}$), as can be seen by starting with the triangle inequality

$\displaystyle \| \sum_{j=1}^n |f_j|^2 \|_{L^{p/2}({\bf R}^d)} \leq \sum_{j=1}^n \| |f_j|^2 \|_{L^{p/2}({\bf R}^d)},$

and taking the square root of both side to conclude that

$\displaystyle \| (\sum_{j=1}^n |f_j|^2)^{1/2} \|_{L^p({\bf R}^d)} \leq (\sum_{j=1}^n \|f_j\|_{L^p({\bf R}^d)}^2)^{1/2}. \ \ \ \ \ (3)$

However, the flip side of this weakness is that (2) can be easier to prove. One key reason for this is the ability to iterate decoupling estimates such as (2), in a way that does not seem to be possible with reverse square function estimates such as (1). For instance, suppose that one has a decoupling inequality such as (2), and furthermore each ${f_j}$ can be split further into components ${f_j= \sum_{k=1}^m f_{j,k}}$ for which one has the decoupling inequalities

$\displaystyle \| \sum_{k=1}^m f_{j,k} \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{k=1}^m \|f_{j,k}\|_{L^p({\bf R}^d)}^2)^{1/2}.$

Then by inserting these bounds back into (2) we see that we have the combined decoupling inequality

$\displaystyle \| \sum_{j=1}^n\sum_{k=1}^m f_{j,k} \|_{L^p({\bf R}^d)} \lesssim_{p,d} (\sum_{j=1}^n \sum_{k=1}^m \|f_{j,k}\|_{L^p({\bf R}^d)}^2)^{1/2}.$

This iterative feature of decoupling inequalities means that such inequalities work well with the method of induction on scales, that we introduced in the previous set of notes.
In fact, decoupling estimates share many features in common with restriction theorems; in addition to induction on scales, there are several other techniques that first emerged in the restriction theory literature, such as wave packet decompositions, rescaling, and bilinear or multilinear reductions, that turned out to also be well suited to proving decoupling estimates. As with restriction, the curvature or transversality of the different Fourier supports of the ${f_j}$ will be crucial in obtaining non-trivial estimates.
Strikingly, in many important model cases, the optimal decoupling inequalities (except possibly for epsilon losses in the exponents) are now known. These estimates have in turn had a number of important applications, such as establishing certain discrete analogues of the restriction conjecture, or the first proof of the main conjecture for Vinogradov mean value theorems in analytic number theory.
These notes only serve as a brief introduction to decoupling. A systematic exploration of this topic can be found in this recent text of Demeter.
Read the rest of this entry »

This set of notes focuses on the restriction problem in Fourier analysis. Introduced by Elias Stein in the 1970s, the restriction problem is a key model problem for understanding more general oscillatory integral operators, and which has turned out to be connected to many questions in geometric measure theory, harmonic analysis, combinatorics, number theory, and PDE. Only partial results on the problem are known, but these partial results have already proven to be very useful or influential in many applications.
We work in a Euclidean space ${{\bf R}^d}$. Recall that ${L^p({\bf R}^d)}$ is the space of ${p^{th}}$-power integrable functions ${f: {\bf R}^d \rightarrow {\bf C}}$, quotiented out by almost everywhere equivalence, with the usual modifications when ${p=\infty}$. If ${f \in L^1({\bf R}^d)}$ then the Fourier transform ${\hat f: {\bf R}^d \rightarrow {\bf C}}$ will be defined in this course by the formula

$\displaystyle \hat f(\xi) := \int_{{\bf R}^d} f(x) e^{-2\pi i x \cdot \xi}\ dx. \ \ \ \ \ (1)$

From the dominated convergence theorem we see that ${\hat f}$ is a continuous function; from the Riemann-Lebesgue lemma we see that it goes to zero at infinity. Thus ${\hat f}$ lies in the space ${C_0({\bf R}^d)}$ of continuous functions that go to zero at infinity, which is a subspace of ${L^\infty({\bf R}^d)}$. Indeed, from the triangle inequality it is obvious that

$\displaystyle \|\hat f\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^1({\bf R}^d)}. \ \ \ \ \ (2)$

If ${f \in L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, then Plancherel’s theorem tells us that we have the identity

$\displaystyle \|\hat f\|_{L^2({\bf R}^d)} = \|f\|_{L^2({\bf R}^d)}. \ \ \ \ \ (3)$

Because of this, there is a unique way to extend the Fourier transform ${f \mapsto \hat f}$ from ${L^1({\bf R}^d) \cap L^2({\bf R}^d)}$ to ${L^2({\bf R}^d)}$, in such a way that it becomes a unitary map from ${L^2({\bf R}^d)}$ to itself. By abuse of notation we continue to denote this extension of the Fourier transform by ${f \mapsto \hat f}$. Strictly speaking, this extension is no longer defined in a pointwise sense by the formula (1) (indeed, the integral on the RHS ceases to be absolutely integrable once ${f}$ leaves ${L^1({\bf R}^d)}$; we will return to the (surprisingly difficult) question of whether pointwise convergence continues to hold (at least in an almost everywhere sense) later in this course, when we discuss Carleson’s theorem. On the other hand, the formula (1) remains valid in the sense of distributions, and in practice most of the identities and inequalities one can show about the Fourier transform of “nice” functions (e.g., functions in ${L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, or in the Schwartz class ${{\mathcal S}({\bf R}^d)}$, or test function class ${C^\infty_c({\bf R}^d)}$) can be extended to functions in “rough” function spaces such as ${L^2({\bf R}^d)}$ by standard limiting arguments.
By (2), (3), and the Riesz-Thorin interpolation theorem, we also obtain the Hausdorff-Young inequality

$\displaystyle \|\hat f\|_{L^{p'}({\bf R}^d)} \leq \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)$

for all ${1 \leq p \leq 2}$ and ${f \in L^1({\bf R}^d) \cap L^2({\bf R}^d)}$, where ${2 \leq p' \leq \infty}$ is the dual exponent to ${p}$, defined by the usual formula ${\frac{1}{p} + \frac{1}{p'} = 1}$. (One can improve this inequality by a constant factor, with the optimal constant worked out by Beckner, but the focus in these notes will not be on optimal constants.) As a consequence, the Fourier transform can also be uniquely extended as a continuous linear map from ${L^p({\bf R}^d) \rightarrow L^{p'}({\bf R}^d)}$. (The situation with ${p>2}$ is much worse; see below the fold.)
The restriction problem asks, for a given exponent ${1 \leq p \leq 2}$ and a subset ${S}$ of ${{\bf R}^d}$, whether it is possible to meaningfully restrict the Fourier transform ${\hat f}$ of a function ${f \in L^p({\bf R}^d)}$ to the set ${S}$. If the set ${S}$ has positive Lebesgue measure, then the answer is yes, since ${\hat f}$ lies in ${L^{p'}({\bf R}^d)}$ and therefore has a meaningful restriction to ${S}$ even though functions in ${L^{p'}}$ are only defined up to sets of measure zero. But what if ${S}$ has measure zero? If ${p=1}$, then ${\hat f \in C_0({\bf R}^d)}$ is continuous and therefore can be meaningfully restricted to any set ${S}$. At the other extreme, if ${p=2}$ and ${f}$ is an arbitrary function in ${L^2({\bf R}^d)}$, then by Plancherel’s theorem, ${\hat f}$ is also an arbitrary function in ${L^2({\bf R}^d)}$, and thus has no well-defined restriction to any set ${S}$ of measure zero.
It was observed by Stein (as reported in the Ph.D. thesis of Charlie Fefferman) that for certain measure zero subsets ${S}$ of ${{\bf R}^d}$, such as the sphere ${S^{d-1} := \{ \xi \in {\bf R}^d: |\xi| = 1\}}$, one can obtain meaningful restrictions of the Fourier transforms of functions ${f \in L^p({\bf R}^d)}$ for certain ${p}$ between ${1}$ and ${2}$, thus demonstrating that the Fourier transform of such functions retains more structure than a typical element of ${L^{p'}({\bf R}^d)}$:

Theorem 1 (Preliminary ${L^2}$ restriction theorem) If ${d \geq 2}$ and ${1 \leq p < \frac{4d}{3d+1}}$, then one has the estimate

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)} \lesssim_{d,p} \|f\|_{L^p({\bf R}^d)}$

for all Schwartz functions ${f \in {\mathcal S}({\bf R}^d)}$, where ${d\sigma}$ denotes surface measure on the sphere ${S^{d-1}}$. In particular, the restriction ${\hat f|_S}$ can be meaningfully defined by continuous linear extension to an element of ${L^2(S^{d-1},d\sigma)}$.

Proof: Fix ${d,p,f}$. We expand out

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)}^2 = \int_{S^{d-1}} |\hat f(\xi)|^2\ d\sigma(\xi).$

From (1) and Fubini’s theorem, the right-hand side may be expanded as

$\displaystyle \int_{{\bf R}^d} \int_{{\bf R}^d} f(x) \overline{f}(y) (d\sigma)^\vee(y-x)\ dx dy$

where the inverse Fourier transform ${(d\sigma)^\vee}$ of the measure ${d\sigma}$ is defined by the formula

$\displaystyle (d\sigma)^\vee(x) := \int_{S^{d-1}} e^{2\pi i x \cdot \xi}\ d\sigma(\xi).$

In other words, we have the identity

$\displaystyle \| \hat f \|_{L^2(S^{d-1}, d\sigma)}^2 = \langle f, f * (d\sigma)^\vee \rangle_{L^2({\bf R}^d)}, \ \ \ \ \ (5)$

using the Hermitian inner product ${\langle f, g\rangle_{L^2({\bf R}^d)} := \int_{{\bf R}^d} \overline{f(x)} g(x)\ dx}$. Since the sphere ${S^{d-1}}$ have bounded measure, we have from the triangle inequality that

$\displaystyle (d\sigma)^\vee(x) \lesssim_d 1. \ \ \ \ \ (6)$

Also, from the method of stationary phase (as covered in the previous class 247A), or Bessel function asymptotics, we have the decay

$\displaystyle (d\sigma)^\vee(x) \lesssim_d |x|^{-(d-1)/2} \ \ \ \ \ (7)$

for any ${x \in {\bf R}^d}$ (note that the bound already follows from (6) unless ${|x| \geq 1}$). We remark that the exponent ${-\frac{d-1}{2}}$ here can be seen geometrically from the following considerations. For ${|x|>1}$, the phase ${e^{2\pi i x \cdot \xi}}$ on the sphere is stationary at the two antipodal points ${x/|x|, -x/|x|}$ of the sphere, and constant on the tangent hyperplanes to the sphere at these points. The wavelength of this phase is proportional to ${1/|x|}$, so the phase would be approximately stationary on a cap formed by intersecting the sphere with a ${\sim 1/|x|}$ neighbourhood of the tangent hyperplane to one of the stationary points. As the sphere is tangent to second order at these points, this cap will have diameter ${\sim 1/|x|^{1/2}}$ in the directions of the ${d-1}$-dimensional tangent space, so the cap will have surface measure ${\sim |x|^{-(d-1)/2}}$, which leads to the prediction (7). We combine (6), (7) into the unified estimate

$\displaystyle (d\sigma)^\vee(x) \lesssim_d \langle x\rangle^{-(d-1)/2}, \ \ \ \ \ (8)$

where the “Japanese bracket” ${\langle x\rangle}$ is defined as ${\langle x \rangle := (1+|x|^2)^{1/2}}$. Since ${\langle x \rangle^{-\alpha}}$ lies in ${L^p({\bf R}^d)}$ precisely when ${p > \frac{d}{\alpha}}$, we conclude that

$\displaystyle (d\sigma)^\vee \in L^q({\bf R}^d) \hbox{ iff } q > \frac{d}{(d-1)/2}.$

Applying Young’s convolution inequality, we conclude (after some arithmetic) that

$\displaystyle \| f * (d\sigma)^\vee \|_{L^{p'}({\bf R}^d)} \lesssim_{p,d} \|f\|_{L^p({\bf R}^d)}$

whenever ${1 \leq p < \frac{4d}{3d+1}}$, and the claim now follows from (5) and Hölder’s inequality. $\Box$

Remark 2 By using the Hardy-Littlewood-Sobolev inequality in place of Young’s convolution inequality, one can also establish this result for ${p = \frac{4d}{3d+1}}$.

Motivated by this result, given any Radon measure ${\mu}$ on ${{\bf R}^d}$ and any exponents ${1 \leq p,q \leq \infty}$, we use ${R_\mu(p \rightarrow q)}$ to denote the claim that the restriction estimate

$\displaystyle \| \hat f \|_{L^q({\bf R}^d, \mu)} \lesssim_{d,p,q,\mu} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (9)$

for all Schwartz functions ${f}$; if ${S}$ is a ${k}$-dimensional submanifold of ${{\bf R}^d}$ (possibly with boundary), we write ${R_S(p \rightarrow q)}$ for ${R_\mu(p \rightarrow q)}$ where ${\mu}$ is the ${k}$-dimensional surface measure on ${S}$. Thus, for instance, we trivially always have ${R_S(1 \rightarrow \infty)}$, while Theorem 1 asserts that ${R_{S^{d-1}}(p \rightarrow 2)}$ holds whenever ${1 \leq p < \frac{4d}{3d+1}}$. We will not give a comprehensive survey of restriction theory in these notes, but instead focus on some model results that showcase some of the basic techniques in the field. (I have a more detailed survey on this topic from 2003, but it is somewhat out of date.)
Read the rest of this entry »

Next quarter, starting March 30, I will be teaching “Math 247B: Classical Fourier Analysis” here at UCLA.  (The course should more accurately be named “Modern real-variable harmonic analysis”, but we have not gotten around to implementing such a name change.) This class (a continuation of Math 247A from previous quarter, taught by my colleague, Monica Visan) will cover the following topics:

• Restriction theory and Strichartz estimates
• Decoupling estimates and applications
• Paraproducts; time frequency analysis; Carleson’s theorem

As usual, lecture notes will be made available on this blog.

Unlike previous courses, this one will be given online as part of UCLA’s social distancing efforts.  In particular, the course will be open to anyone with an internet connection (no UCLA affiliation is required), though non-UCLA participants will not have full access to all aspects of the course, and there is the possibility that some restrictions on participation may be imposed if there are significant disruptions to class activity.  For more information, see the course descriptionUPDATE: due to time limitations, I will not be able to respond to personal email inquiries about this class from non-UCLA participants in the course.  Please use the comment thread to this blog post for such inquiries.  I will also update the course description throughout the course to reflect the latest information about the course, both for UCLA students enrolled in the course and for non-UCLA participants.

Let us call an arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$ ${1}$-bounded if we have ${|f(n)| \leq 1}$ for all ${n \in {\bf N}}$. In this section we focus on the asymptotic behaviour of ${1}$-bounded multiplicative functions. Some key examples of such functions include:

• The Möbius function ${\mu}$;
• The Liouville function ${\lambda}$;
• Archimedean” characters ${n \mapsto n^{it}}$ (which I call Archimedean because they are pullbacks of a Fourier character ${x \mapsto x^{it}}$ on the multiplicative group ${{\bf R}^+}$, which has the Archimedean property);
• Dirichlet characters (or “non-Archimedean” characters) ${\chi}$ (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group ${({\bf Z}/q{\bf Z})^\times}$ with the discrete (non-Archimedean) metric);
• Hybrid characters ${n \mapsto \chi(n) n^{it}}$.

The space of ${1}$-bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function ${f}$, we are often interested in the asymptotics of long averages such as

$\displaystyle \frac{1}{X} \sum_{n \leq X} f(n)$

for large values of ${X}$, as well as short sums

$\displaystyle \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)$

where ${H}$ and ${x}$ are both large, but ${H}$ is significantly smaller than ${x}$. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by ${O(1)}$; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

$\displaystyle \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

$\displaystyle \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)$

as ${X \rightarrow \infty}$. (Hint: use the identities ${\lambda(n) = \sum_{d^2|n} \mu(n/d^2)}$ and ${\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}$.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

$\displaystyle \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$ if ${H = H(x)}$ is such that ${H \geq \varepsilon x}$ for some fixed ${\varepsilon>0}$. However it is significantly more difficult to understand what happens when ${H}$ grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to ${H \geq x^{1/2} \log^C x}$ for an absolute constant ${C}$ (the bound ${H \geq x^{1/2+\varepsilon}}$ is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that ${H}$ should be able to be taken as small as ${x^\varepsilon}$, and perhaps even ${\log^{1+\varepsilon} x}$ if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that ${H}$ cannot be taken arbitrarily slowly growing in ${x}$, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence ${x \rightarrow \infty}$ and some sufficiently slowly growing ${H = H(x)}$, by modifying the arguments in these papers of mine).

The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let ${H = H(X)}$ be a function of ${X}$ such that ${H \rightarrow \infty}$ and ${H = o(X)}$ as ${X \rightarrow \infty}$. Let ${f: {\bf N} \rightarrow {\bf C}}$ be a ${1}$-bounded function. Show that the following assertions are equivalent:

• (i) One has

$\displaystyle \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)$

as ${X \rightarrow \infty}$, uniformly for all ${x \in [X,2X]}$ outside of a set of measure ${o(X)}$.

• (ii) One has

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)$

as ${X \rightarrow \infty}$.

• (iii) One has

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)$

as ${X \rightarrow \infty}$.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X$

whenever ${X^{1/6+\varepsilon} \leq H \leq X}$ and ${\varepsilon>0}$. With this quality of bound (saving arbitrary powers of ${\log X}$ over the trivial bound of ${O(1)}$), this is still the lowest value of ${H}$ one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of ${\log X}$ or ${\log H}$, or just a qualitative decay of ${o(1)}$), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any ${2 \leq H \leq X}$, one has

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H$

for some absolute constant ${c>0}$.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function ${H = H(X)}$ that goes to infinity as ${X \rightarrow \infty}$, no matter how slowly! This ability to let ${H}$ grow slowly with ${X}$ is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that ${H}$ be allowed to grow more slowly than ${\log X}$. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

• (i) Establish the lower bound

$\displaystyle \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c$

for some absolute constant ${c>0}$ and all sufficiently large ${X}$. (Hint: if this bound failed, then ${\lambda(n)=\lambda(n+1)}$ would hold for almost all ${n}$; use this to create many intervals ${[x,x+H]}$ for which ${\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)}$ is extremely large.)

• (ii) Show that Theorem 3 also holds with ${\lambda(n)}$ replaced by ${\chi_2 \lambda(n)}$, where ${\chi_2}$ is the principal character of period ${2}$. (Use the fact that ${\lambda(2n)=-\lambda(n)}$ for all ${n}$.) Use this to establish the corresponding upper bound

$\displaystyle \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c$

to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as ${\mu}$ or ${\lambda}$ have Dirichlet series ${{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}$, ${{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}}$ that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of ${1}$-bounded multiplicative functions ${f}$, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two ${1}$-bounded multiplicative functions ${f,g}$ (at a given scale ${X}$), which informally measures the extent to which ${f}$ “pretends” to be like ${g}$ (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two ${1}$-bounded multiplicative functions ${f,g}$, and a threshold ${X>0}$, the pretentious distance ${\mathbb{D}(f,g;X)}$ between ${f}$ and ${g}$ up to scale ${X}$ is given by the formula

$\displaystyle \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}$

Note that one can also define an infinite version ${\mathbb{D}(f,g;\infty)}$ of this distance by removing the constraint ${p \leq X}$, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because ${\mathbb{D}(f,f;X)}$ can be non-zero, and furthermore ${\mathbb{D}(f,g;X)}$ can vanish without ${f,g}$ being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two ${1}$-bounded multiplicative functions ${f,g}$ will exhibit similar behaviour at scale ${X}$ if their pretentious distance ${\mathbb{D}(f,g;X)}$ is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let ${X}$ be sufficiently large. Then for any ${1}$-bounded multiplicative functions ${f,g}$, one has

$\displaystyle \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )$

for an absolute constant ${c>0}$.

In particular, if ${f}$ does not pretend to be ${1}$, then the logarithmic average ${\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}}$ will be small. This condition is basically necessary, since of course ${\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}$.

If one works with non-logarithmic averages ${\frac{1}{X} \sum_{n \leq X} f(n)}$, then not pretending to be ${1}$ is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if ${f}$ is an Archimedean character ${f(n) = n^{it}}$ for some non-zero real ${t}$, then ${\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}}$ goes to zero as ${X \rightarrow \infty}$ (which is consistent with Proposition 6), but ${\frac{1}{X} \sum_{n \leq X} f(n)}$ does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let ${X}$ be sufficiently large. Then for any ${1}$-bounded multiplicative function ${f}$, one has

$\displaystyle \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}$

for an absolute constant ${c>0}$ and any ${T > 0}$.

Informally, we refer to a ${1}$-bounded multiplicative function as “pretentious’; if it pretends to be a character such as ${n^{it}}$, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character ${1}$ which needs to be considered, but in Theorem 7 it is the characters ${n \mapsto n^{it}}$ with ${|t| \leq T}$. In other contexts one may also need to add Dirichlet characters ${\chi(n)}$ or hybrid characters such as ${\chi(n) n^{it}}$ to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

$\displaystyle |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta$

for some ${\eta > 0}$, then one has the pretentious property

$\displaystyle \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}$

for some ${t \ll \eta^{-1}}$. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.

Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let ${\eta>0}$, and let ${1 \leq H \leq X}$, with ${H}$ sufficiently large depending on ${\eta}$. Suppose that ${f}$ is a ${1}$-bounded multiplicative function such that

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.$

Then one has

$\displaystyle \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1$

for some ${t \ll_\eta \frac{X}{H}}$.

The condition ${t \ll_\eta \frac{X}{H}}$ is basically optimal, as the following example shows:

Exercise 9 Let ${\varepsilon>0}$ be a sufficiently small constant, and let ${1 \leq H \leq X}$ be such that ${\frac{1}{\varepsilon} \leq H \leq \varepsilon X}$. Let ${f}$ be the Archimedean character ${f(n) = n^{it}}$ for some ${|t| \leq \varepsilon \frac{X}{H}}$. Show that

$\displaystyle \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.$

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only ${o(1)}$ rather than ${\log^{-c} H}$). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function ${f}$ with its Dirichlet series

$\displaystyle {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}$

which one can view as a discrete Fourier transform of ${f}$ (or more precisely of the measure ${\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}$, if one evaluates the Dirichlet series on the right edge ${\{ 1+it: t \in {\bf R} \}}$ of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let ${f,g: {\bf N} \rightarrow {\bf C}}$ be finitely supported arithmetic functions, and let ${\psi: {\bf R} \rightarrow {\bf R}}$ be a Schwartz function. Then

$\displaystyle \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt$

where ${\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du}$ is the Fourier transform of ${\psi}$. (Note that the finite support of ${f,g}$ and the Schwartz nature of ${\psi,\hat \psi}$ ensure that both sides of the identity are absolutely convergent.)

The restriction that ${f,g}$ be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

Proof: By expanding out the Dirichlet series, it suffices to show that

$\displaystyle \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt$

for any natural numbers ${n,m}$. But this follows from the Fourier inversion formula ${\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt}$ applied at ${u = \log n - \log m}$. $\Box$

For applications to Halasz type theorems, one sets ${g(n)}$ equal to the Kronecker delta ${\delta_{n=1}}$, producing weighted integrals of ${{\mathcal D} f(1+it)}$ of “${L^1}$” type. For applications to Matomaki-Radziwill theorems, one instead sets ${f=g}$, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of ${|{\mathcal D} f(1+it)|^2}$ of “${L^2}$” type:

Exercise 11 (Plancherel type identity) If ${f: {\bf N} \rightarrow {\bf C}}$ is finitely supported, and ${\varphi: {\bf R} \rightarrow {\bf R}}$ is a Schwartz function, establish the identity

$\displaystyle \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.$

In contrast, information about the non-pretentious nature of a multiplicative function ${f}$ will give “pointwise” or “${L^\infty}$” type control on the Dirichlet series ${{\mathcal D} f(1+it)}$, as is suggested from the Euler product factorisation of ${{\mathcal D} f}$.

It will be convenient to formalise the notion of ${L^1}$, ${L^2}$, and ${L^\infty}$ control of the Dirichlet series ${{\mathcal D} f}$, which as previously mentioned can be viewed as a sort of “Fourier transform” of ${f}$:

Definition 12 (Fourier norms) Let ${f: {\bf N} \rightarrow {\bf C}}$ be finitely supported, and let ${\Omega \subset {\bf R}}$ be a bounded measurable set. We define the Fourier ${L^\infty}$ norm

$\displaystyle \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,$

the Fourier ${L^2}$ norm

$\displaystyle \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},$

and the Fourier ${L^1}$ norm

$\displaystyle \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.$

One could more generally define ${FL^p}$ norms for other exponents ${p}$, but we will only need the exponents ${p=1,2,\infty}$ in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier ${L^\infty}$ norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier ${L^\infty}$ control via Halasz) Let ${f: {\bf N} \rightarrow {\bf C}}$ be a ${1}$-bounded multiplicative function, let ${I}$ be an interval in ${[C^{-1} X, CX]}$ for some ${X \geq C \geq 1}$, let ${R \geq 1}$, and let ${\Omega \subset {\bf R}}$ be a bounded measurable set. Show that

$\displaystyle \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.$

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a ${\frac{1}{n}}$ weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier ${L^2}$ norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 (${L^2}$ mean value theorem) Let ${T \geq 1}$, and let ${f: {\bf N} \rightarrow {\bf C}}$ be finitely supported. Show that

$\displaystyle \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.$

Conclude in particular that if ${f}$ is supported in ${[C^{-1} N, C N]}$ for some ${C \geq 1}$ and ${N \gg T}$, then

$\displaystyle \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.$

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier ${L^\infty}$ estimates are already sufficient to obtain decent control on the (weighted) Fourier ${L^1}$ type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier ${L^\infty}$ control to Fourier ${L^1}$ or ${L^2}$ control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

$\displaystyle {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)$

gives the inequalities

$\displaystyle \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)$

and

$\displaystyle \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)$

The strategy is then to factor (or approximately factor) the original function ${f}$ as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier ${L^2}$ or ${L^\infty}$ estimates on various regions ${\Omega}$, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split ${f}$ into the Dirichlet convolution of three factors, one of which will be estimated in ${FL^\infty}$ using the non-pretentiousness hypothesis, and the other two being estimated in ${FL^2}$ using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of ${f}$ into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain ${[-T,T]}$ into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in ${FL^\infty}$, and the other in ${FL^2}$; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in ${FL^\infty}$ by construction of the region; in other places, the ${FL^\infty}$ control will come from Exercise 13. Similarly, in some regions, satisfactory ${FL^2}$ control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard ${L^2}$ mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.

There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing ${f}$ as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges ${[P,Q]}$ of primes, the number of prime divisors of a large number ${n}$ in the range ${[P,Q]}$ is almost always close to ${\log\log Q - \log\log P}$. Thus, if we introduce the arithmetic functions

$\displaystyle w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)$

then we have

$\displaystyle 1 \approx 1 * w_{[P,Q]}$

and more generally we have a twisted approximation

$\displaystyle f \approx f * fw_{[P,Q]}$

for multiplicative functions ${f}$. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “${FL^2}$ energy” of a multiplicative function ${f}$ is concentrated in those regions where ${f w_{[P,Q]}}$ is extremely large in a ${FL^\infty}$ sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of ${{\mathcal D} f}$.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

of the prime number theorem (where we take the limit ${x \rightarrow \infty}$). One can interpret this estimate probabilistically as

$\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)$

where ${\mathbf{n} = \mathbf{n}_{\leq x}}$ is a random variable drawn uniformly from the natural numbers up to ${x}$, and ${{\mathbb E}}$ denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).$

After dividing by ${x}$, this is essentially

$\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).$

With probabilistic intuition, one may expect the random variables ${\mu(\mathbf{n}), \mu(\mathbf{n}+1)}$ to be approximately independent (there is no obvious relationship between the number of prime factors of ${\mathbf{n}}$, and of ${\mathbf{n}+1}$), and so the above average would be expected to be approximately equal to

$\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))$

which by (2) is equal to ${o(1)}$. Thus we are led to the prediction

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)$

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “${L^2}$” measure of independence known as covariance, and the “${L \log L}$” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of ${L^2}$ type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of ${L \log L}$ methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as ${L^2}$ methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

$\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)$

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound ${\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}$, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables ${\mathbf{X}}$ and ${\mathbf{Y}}$ of number-theoretic origin (such as the random variables ${\mu(\mathbf{n})}$ and ${\mu(\mathbf{n}+1)}$ mentioned previously), it can often be extremely difficult to determine the extent to which ${\mathbf{X},\mathbf{Y}}$ behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if ${\mathbf{Y}_1,\dots,\mathbf{Y}_k}$ are a large collection of “independent” random variables, and ${\mathbf{X}}$ is a further random variable that is “not too large” in some sense, then ${\mathbf{X}}$ must necessarily be nearly independent (or conditionally independent) to many of the ${\mathbf{Y}_i}$, even if one cannot pinpoint precisely which of the ${\mathbf{Y}_i}$ the variable ${\mathbf{X}}$ is independent with. In the case of the second moment method, this allows us to compute correlations such as ${{\mathbb E} {\mathbf X} \mathbf{Y}_i}$ for “most” ${i}$. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities ${{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )}$ for “most” ${i}$ thanks to tools such as the Pinsker inequality.

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

1. Elementary multiplicative number theory
2. Complex-analytic multiplicative number theory
3. The entropy decrement argument
4. Bounds for exponential sums
5. Zero density theorems
6. Halasz’s theorem and the Matomaki-Radziwill theorem
7. The circle method
8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem

Lecture notes for topics 3, 6, and 8 will be forthcoming.

We consider the incompressible Euler equations on the (Eulerian) torus ${\mathbf{T}_E := ({\bf R}/{\bf Z})^d}$, which we write in divergence form as

$\displaystyle \partial_t u^i + \partial_j(u^j u^i) = - \eta^{ij} \partial_j p \ \ \ \ \ (1)$

$\displaystyle \partial_i u^i = 0, \ \ \ \ \ (2)$

where ${\eta^{ij}}$ is the (inverse) Euclidean metric. Here we use the summation conventions for indices such as ${i,j,l}$ (reserving the symbol ${k}$ for other purposes), and are retaining the convention from Notes 1 of denoting vector fields using superscripted indices rather than subscripted indices, as we will eventually need to change variables to Lagrangian coordinates at some point. In principle, much of the discussion in this set of notes (particularly regarding the positive direction of Onsager’s conjecture) could also be modified to also treat non-periodic solutions that decay at infinity if desired, but some non-trivial technical issues do arise non-periodic settings for the negative direction.

As noted previously, the kinetic energy

$\displaystyle \frac{1}{2} \int_{\mathbf{T}_E} |u(t,x)|^2\ dx = \frac{1}{2} \int_{\mathbf{T}_E} \eta_{ij} u^i(t,x) u^j(t,x)\ dx$

is formally conserved by the flow, where ${\eta_{ij}}$ is the Euclidean metric. Indeed, if one assumes that ${u,p}$ are continuously differentiable in both space and time on ${[0,T] \times \mathbf{T}}$, then one can multiply the equation (1) by ${u^l}$ and contract against ${\eta_{il}}$ to obtain

$\displaystyle \eta_{il} u^l \partial_t u^i + \eta_{il} u^l \partial_j (u^j u^i) = - \eta_{il} u^l \eta^{ij} \partial_j p = 0$

which rearranges using (2) and the product rule to

$\displaystyle \partial_t (\frac{1}{2} \eta_{ij} u^i u^j) + \partial_j( \frac{1}{2} \eta_{il} u^i u^j u^l ) + \partial_j (u^j p)$

and then if one integrates this identity on ${[0,T] \times \mathbf{T}_E}$ and uses Stokes’ theorem, one obtains the required energy conservation law

$\displaystyle \frac{1}{2} \int_{\mathbf{T}_E} \eta_{ij} u^i(T,x) u^j(T,x)\ dx = \frac{1}{2} \int_{\mathbf{T}_E} \eta_{ij} u^i(0,x) u^j(0,x)\ dx. \ \ \ \ \ (3)$

It is then natural to ask whether the energy conservation law continues to hold for lower regularity solutions, in particular weak solutions that only obey (1), (2) in a distributional sense. The above argument no longer works as stated, because ${u^i}$ is not a test function and so one cannot immediately integrate (1) against ${u^i}$. And indeed, as we shall soon see, it is now known that once the regularity of ${u}$ is low enough, energy can “escape to frequency infinity”, leading to failure of the energy conservation law, a phenomenon known in physics as anomalous energy dissipation.

But what is the precise level of regularity needed in order to for this anomalous energy dissipation to occur? To make this question precise, we need a quantitative notion of regularity. One such measure is given by the Hölder space ${C^{0,\alpha}(\mathbf{T}_E \rightarrow {\bf R})}$ for ${0 < \alpha < 1}$, defined as the space of continuous functions ${f: \mathbf{T}_E \rightarrow {\bf R}}$ whose norm

$\displaystyle \| f \|_{C^{0,\alpha}(\mathbf{T}_E \rightarrow {\bf R})} := \sup_{x \in \mathbf{T}_E} |f(x)| + \sup_{x,y \in \mathbf{T}_E: x \neq y} \frac{|f(x)-f(y)|}{|x-y|^\alpha}$

is finite. The space ${C^{0,\alpha}}$ lies between the space ${C^0}$ of continuous functions and the space ${C^1}$ of continuously differentiable functions, and informally describes a space of functions that is “${\alpha}$ times differentiable” in some sense. The above derivation of the energy conservation law involved the integral

$\displaystyle \int_{\mathbf{T}_E} \eta_{ik} u^k \partial_j (u^j u^i)\ dx$

that roughly speaking measures the fluctuation in energy. Informally, if we could take the derivative in this integrand and somehow “integrate by parts” to split the derivative “equally” amongst the three factors, one would morally arrive at an expression that resembles

$\displaystyle \int_{\mathbf{T}} \nabla^{1/3} u \nabla^{1/3} u \nabla^{1/3} u\ dx$

which suggests that the integral can be made sense of for ${u \in C^0_t C^{0,\alpha}_x}$ once ${\alpha > 1/3}$. More precisely, one can make

Conjecture 1 (Onsager’s conjecture) Let ${0 < \alpha < 1}$ and ${d \geq 2}$, and let ${0 < T < \infty}$.

• (i) If ${\alpha > 1/3}$, then any weak solution ${u \in C^0_t C^{0,\alpha}([0,T] \times \mathbf{T} \rightarrow {\bf R})}$ to the Euler equations (in the Leray form ${\partial_t u + \partial_j {\mathbb P} (u^j u) = u_0(x) \delta_0(t)}$) obeys the energy conservation law (3).
• (ii) If ${\alpha \leq 1/3}$, then there exist weak solutions ${u \in C^0_t C^{0,\alpha}([0,T] \times \mathbf{T} \rightarrow {\bf R})}$ to the Euler equations (in Leray form) which do not obey energy conservation.

This conjecture was originally arrived at by Onsager by a somewhat different heuristic derivation; see Remark 7. The numerology is also compatible with that arising from the Kolmogorov theory of turbulence (discussed in this previous post), but we will not discuss this interesting connection further here.

The positive part (i) of Onsager conjecture was established by Constantin, E, and Titi, building upon earlier partial results by Eyink; the proof is a relatively straightforward application of Littlewood-Paley theory, and they were also able to work in larger function spaces than ${C^0_t C^{0,\alpha}_x}$ (using ${L^3_x}$-based Besov spaces instead of Hölder spaces, see Exercise 3 below). The negative part (ii) is harder. Discontinuous weak solutions to the Euler equations that did not conserve energy were first constructed by Sheffer, with an alternate construction later given by Shnirelman. De Lellis and Szekelyhidi noticed the resemblance of this problem to that of the Nash-Kuiper theorem in the isometric embedding problem, and began adapting the convex integration technique used in that theorem to construct weak solutions of the Euler equations. This began a long series of papers in which increasingly regular weak solutions that failed to conserve energy were constructed, culminating in a recent paper of Isett establishing part (ii) of the Onsager conjecture in the non-endpoint case ${\alpha < 1/3}$ in three and higher dimensions ${d \geq 3}$; the endpoint ${\alpha = 1/3}$ remains open. (In two dimensions it may be the case that the positive results extend to a larger range than Onsager's conjecture predicts; see this paper of Cheskidov, Lopes Filho, Nussenzveig Lopes, and Shvydkoy for more discussion.) Further work continues into several variations of the Onsager conjecture, in which one looks at other differential equations, other function spaces, or other criteria for bad behavior than breakdown of energy conservation. See this recent survey of de Lellis and Szekelyhidi for more discussion.

In these notes we will first establish (i), then discuss the convex integration method in the original context of the Nash-Kuiper embedding theorem. Before tackling the Onsager conjecture (ii) directly, we discuss a related construction of high-dimensional weak solutions in the Sobolev space ${L^2_t H^s_x}$ for ${s}$ close to ${1/2}$, which is slightly easier to establish, though still rather intricate. Finally, we discuss the modifications of that construction needed to establish (ii), though we shall stop short of a full proof of that part of the conjecture.

We thank Phil Isett for some comments and corrections.

These lecture notes are a continuation of the 254A lecture notes from the previous quarter.

We consider the Euler equations for incompressible fluid flow on a Euclidean space ${{\bf R}^d}$; we will label ${{\bf R}^d}$ as the “Eulerian space” ${{\bf R}^d_E}$ (or “Euclidean space”, or “physical space”) to distinguish it from the “Lagrangian space” ${{\bf R}^d_L}$ (or “labels space”) that we will introduce shortly (but the reader is free to also ignore the ${E}$ or ${L}$ subscripts if he or she wishes). Elements of Eulerian space ${{\bf R}^d_E}$ will be referred to by symbols such as ${x}$, we use ${dx}$ to denote Lebesgue measure on ${{\bf R}^d_E}$ and we will use ${x^1,\dots,x^d}$ for the ${d}$ coordinates of ${x}$, and use indices such as ${i,j,k}$ to index these coordinates (with the usual summation conventions), for instance ${\partial_i}$ denotes partial differentiation along the ${x^i}$ coordinate. (We use superscripts for coordinates ${x^i}$ instead of subscripts ${x_i}$ to be compatible with some differential geometry notation that we will use shortly; in particular, when using the summation notation, we will now be matching subscripts with superscripts for the pair of indices being summed.)

In Eulerian coordinates, the Euler equations read

$\displaystyle \partial_t u + u \cdot \nabla u = - \nabla p \ \ \ \ \ (1)$

$\displaystyle \nabla \cdot u = 0$

where ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ is the velocity field and ${p: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ is the pressure field. These are functions of time ${t \in [0,T)}$ and on the spatial location variable ${x \in {\bf R}^d_E}$. We will refer to the coordinates ${(t,x) = (t,x^1,\dots,x^d)}$ as Eulerian coordinates. However, if one reviews the physical derivation of the Euler equations from 254A Notes 0, before one takes the continuum limit, the fundamental unknowns were not the velocity field ${u}$ or the pressure field ${p}$, but rather the trajectories ${(x^{(a)}(t))_{a \in A}}$, which can be thought of as a single function ${x: [0,T) \times A \rightarrow {\bf R}^d_E}$ from the coordinates ${(t,a)}$ (where ${t}$ is a time and ${a}$ is an element of the label set ${A}$) to ${{\bf R}^d}$. The relationship between the trajectories ${x^{(a)}(t) = x(t,a)}$ and the velocity field was given by the informal relationship

$\displaystyle \partial_t x(t,a) \approx u( t, x(t,a) ). \ \ \ \ \ (2)$

We will refer to the coordinates ${(t,a)}$ as (discrete) Lagrangian coordinates for describing the fluid.

In view of this, it is natural to ask whether there is an alternate way to formulate the continuum limit of incompressible inviscid fluids, by using a continuous version ${(t,a)}$ of the Lagrangian coordinates, rather than Eulerian coordinates. This is indeed the case. Suppose for instance one has a smooth solution ${u, p}$ to the Euler equations on a spacetime slab ${[0,T) \times {\bf R}^d_E}$ in Eulerian coordinates; assume furthermore that the velocity field ${u}$ is uniformly bounded. We introduce another copy ${{\bf R}^d_L}$ of ${{\bf R}^d}$, which we call Lagrangian space or labels space; we use symbols such as ${a}$ to refer to elements of this space, ${da}$ to denote Lebesgue measure on ${{\bf R}^d_L}$, and ${a^1,\dots,a^d}$ to refer to the ${d}$ coordinates of ${a}$. We use indices such as ${\alpha,\beta,\gamma}$ to index these coordinates, thus for instance ${\partial_\alpha}$ denotes partial differentiation along the ${a^\alpha}$ coordinate. We will use summation conventions for both the Eulerian coordinates ${i,j,k}$ and the Lagrangian coordinates ${\alpha,\beta,\gamma}$, with an index being summed if it appears as both a subscript and a superscript in the same term. While ${{\bf R}^d_L}$ and ${{\bf R}^d_E}$ are of course isomorphic, we will try to refrain from identifying them, except perhaps at the initial time ${t=0}$ in order to fix the initialisation of Lagrangian coordinates.

Given a smooth and bounded velocity field ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$, define a trajectory map for this velocity to be any smooth map ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ that obeys the ODE

$\displaystyle \partial_t X(t,a) = u( t, X(t,a) ); \ \ \ \ \ (3)$

in view of (2), this describes the trajectory (in ${{\bf R}^d_E}$) of a particle labeled by an element ${a}$ of ${{\bf R}^d_L}$. From the Picard existence theorem and the hypothesis that ${u}$ is smooth and bounded, such a map exists and is unique as long as one specifies the initial location ${X(0,a)}$ assigned to each label ${a}$. Traditionally, one chooses the initial condition

$\displaystyle X(0,a) = a \ \ \ \ \ (4)$

for ${a \in {\bf R}^d_L}$, so that we label each particle by its initial location at time ${t=0}$; we are also free to specify other initial conditions for the trajectory map if we please. Indeed, we have the freedom to “permute” the labels ${a \in {\bf R}^d_L}$ by an arbitrary diffeomorphism: if ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is a trajectory map, and ${\pi: {\bf R}^d_L \rightarrow{\bf R}^d_L}$ is any diffeomorphism (a smooth map whose inverse exists and is also smooth), then the map ${X \circ \pi: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is also a trajectory map, albeit one with different initial conditions ${X(0,a)}$.

Despite the popularity of the initial condition (4), we will try to keep conceptually separate the Eulerian space ${{\bf R}^d_E}$ from the Lagrangian space ${{\bf R}^d_L}$, as they play different physical roles in the interpretation of the fluid; for instance, while the Euclidean metric ${d\eta^2 = dx^1 dx^1 + \dots + dx^d dx^d}$ is an important feature of Eulerian space ${{\bf R}^d_E}$, it is not a geometrically natural structure to use in Lagrangian space ${{\bf R}^d_L}$. We have the following more general version of Exercise 8 from 254A Notes 2:

Exercise 1 Let ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ be smooth and bounded.

• If ${X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is a smooth map, show that there exists a unique smooth trajectory map ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ with initial condition ${X(0,a) = X_0(a)}$ for all ${a \in {\bf R}^d_L}$.
• Show that if ${X_0}$ is a diffeomorphism and ${t \in [0,T)}$, then the map ${X(t): a \mapsto X(t,a)}$ is also a diffeomorphism.

Remark 2 The first of the Euler equations (1) can now be written in the form

$\displaystyle \frac{d^2}{dt^2} X(t,a) = - (\nabla p)( t, X(t,a) ) \ \ \ \ \ (5)$

which can be viewed as a continuous limit of Newton’s first law ${m^{(a)} \frac{d^2}{dt^2} x^{(a)}(t) = F^{(a)}(t)}$.

Call a diffeomorphism ${Y: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ (oriented) volume preserving if one has the equation

$\displaystyle \mathrm{det}( \nabla Y )(a) = 1 \ \ \ \ \ (6)$

for all ${a \in {\bf R}^d_L}$, where the total differential ${\nabla Y}$ is the ${d \times d}$ matrix with entries ${\partial_\alpha Y^i}$ for ${\alpha = 1,\dots,d}$ and ${i=1,\dots,d}$, where ${Y^1,\dots,Y^d:{\bf R}^d_L \rightarrow {\bf R}}$ are the components of ${Y}$. (If one wishes, one can also view ${\nabla Y}$ as a linear transformation from the tangent space ${T_a {\bf R}^d_L}$ of Lagrangian space at ${a}$ to the tangent space ${T_{Y(a)} {\bf R}^d_E}$ of Eulerian space at ${Y(a)}$.) Equivalently, ${Y}$ is orientation preserving and one has a Jacobian-free change of variables formula

$\displaystyle \int_{{\bf R}^d_F} f( Y(a) )\ da = \int_{{\bf R}^d_E} f(x)\ dx$

for all ${f \in C_c({\bf R}^d_E \rightarrow {\bf R})}$, which is in turn equivalent to ${Y(E) \subset {\bf R}^d_E}$ having the same Lebesgue measure as ${E}$ for any measurable set ${E \subset {\bf R}^d_L}$.

The divergence-free condition ${\nabla \cdot u = 0}$ then can be nicely expressed in terms of volume-preserving properties of the trajectory maps ${X}$, in a manner which confirms the interpretation of this condition as an incompressibility condition on the fluid:

Lemma 3 Let ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ be smooth and bounded, let ${X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ be a volume-preserving diffeomorphism, and let ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ be the trajectory map. Then the following are equivalent:

• ${\nabla \cdot u = 0}$ on ${[0,T) \times {\bf R}^d_E}$.
• ${X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is volume-preserving for all ${t \in [0,T)}$.

Proof: Since ${X_0}$ is orientation-preserving, we see from continuity that ${X(t)}$ is also orientation-preserving. Suppose that ${X(t)}$ is also volume-preserving, then for any ${f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})}$ we have the conservation law

$\displaystyle \int_{{\bf R}^d_L} f( X(t,a) )\ da = \int_{{\bf R}^d_E} f(x)\ dx$

for all ${t \in [0,T)}$. Differentiating in time using the chain rule and (3) we conclude that

$\displaystyle \int_{{\bf R}^d_L} (u(t) \cdot \nabla f)( X(t,a)) \ da = 0$

for all ${t \in [0,T)}$, and hence by change of variables

$\displaystyle \int_{{\bf R}^d_E} (u(t) \cdot \nabla f)(x) \ dx = 0$

which by integration by parts gives

$\displaystyle \int_{{\bf R}^d_E} (\nabla \cdot u(t,x)) f(x)\ dx = 0$

for all ${f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})}$ and ${t \in [0,T)}$, so ${u}$ is divergence-free.

To prove the converse implication, it is convenient to introduce the labels map ${A:[0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_L}$, defined by setting ${A(t): {\bf R}^d_E \rightarrow {\bf R}^d_L}$ to be the inverse of the diffeomorphism ${X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E}$, thus

$\displaystyle A(t, X(t,a)) = a$

for all ${(t,a) \in [0,T) \times {\bf R}^d_L}$. By the implicit function theorem, ${A}$ is smooth, and by differentiating the above equation in time using (3) we see that

$\displaystyle D_t A(t,x) = 0$

where ${D_t}$ is the usual material derivative

$\displaystyle D_t := \partial_t + u \cdot \nabla \ \ \ \ \ (7)$

acting on functions on ${[0,T) \times {\bf R}^d_E}$. If ${u}$ is divergence-free, we have from integration by parts that

$\displaystyle \partial_t \int_{{\bf R}^d_E} \phi(t,x)\ dx = \int_{{\bf R}^d_E} D_t \phi(t,x)\ dx$

for any test function ${\phi: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$. In particular, for any ${g \in C^\infty_c({\bf R}^d_L \rightarrow {\bf R})}$, we can calculate

$\displaystyle \partial_t \int_{{\bf R}^d_E} g( A(t,x) )\ dx = \int_{{\bf R}^d_E} D_t (g(A(t,x)))\ dx$

$\displaystyle = \int_{{\bf R}^d_E} 0\ dx$

and hence

$\displaystyle \int_{{\bf R}^d_E} g(A(t,x))\ dx = \int_{{\bf R}^d_E} g(A(0,x))\ dx$

for any ${t \in [0,T)}$. Since ${X_0}$ is volume-preserving, so is ${A(0)}$, thus

$\displaystyle \int_{{\bf R}^d_E} g \circ A(t)\ dx = \int_{{\bf R}^d_L} g\ da.$

Thus ${A(t)}$ is volume-preserving, and hence ${X(t)}$ is also. $\Box$

Exercise 4 Let ${M: [0,T) \rightarrow \mathrm{GL}_d({\bf R})}$ be a continuously differentiable map from the time interval ${[0,T)}$ to the general linear group ${\mathrm{GL}_d({\bf R})}$ of invertible ${d \times d}$ matrices. Establish Jacobi’s formula

$\displaystyle \partial_t \det(M(t)) = \det(M(t)) \mathrm{tr}( M(t)^{-1} \partial_t M(t) )$

and use this and (6) to give an alternate proof of Lemma 3 that does not involve any integration in space.

Remark 5 One can view the use of Lagrangian coordinates as an extension of the method of characteristics. Indeed, from the chain rule we see that for any smooth function ${f: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ of Eulerian spacetime, one has

$\displaystyle \frac{d}{dt} f(t,X(t,a)) = (D_t f)(t,X(t,a))$

and hence any transport equation that in Eulerian coordinates takes the form

$\displaystyle D_t f = g$

for smooth functions ${f,g: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ of Eulerian spacetime is equivalent to the ODE

$\displaystyle \frac{d}{dt} F = G$

where ${F,G: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}}$ are the smooth functions of Lagrangian spacetime defined by

$\displaystyle F(t,a) := f(t,X(t,a)); \quad G(t,a) := g(t,X(t,a)).$

In this set of notes we recall some basic differential geometry notation, particularly with regards to pullbacks and Lie derivatives of differential forms and other tensor fields on manifolds such as ${{\bf R}^d_E}$ and ${{\bf R}^d_L}$, and explore how the Euler equations look in this notation. Our discussion will be entirely formal in nature; we will assume that all functions have enough smoothness and decay at infinity to justify the relevant calculations. (It is possible to work rigorously in Lagrangian coordinates – see for instance the work of Ebin and Marsden – but we will not do so here.) As a general rule, Lagrangian coordinates tend to be somewhat less convenient to use than Eulerian coordinates for establishing the basic analytic properties of the Euler equations, such as local existence, uniqueness, and continuous dependence on the data; however, they are quite good at clarifying the more algebraic properties of these equations, such as conservation laws and the variational nature of the equations. It may well be that in the future we will be able to use the Lagrangian formalism more effectively on the analytic side of the subject also.

Remark 6 One can also write the Navier-Stokes equations in Lagrangian coordinates, but the equations are not expressed in a favourable form in these coordinates, as the Laplacian ${\Delta}$ appearing in the viscosity term becomes replaced with a time-varying Laplace-Beltrami operator. As such, we will not discuss the Lagrangian coordinate formulation of Navier-Stokes here.