You are currently browsing the category archive for the ‘math.NT’ category.

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let {\pi(x)} denote the number of primes less than a given real number {x}. Then

\displaystyle  \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1

(or in asymptotic notation, {\pi(x) = (1+o(1)) \frac{x}{\ln x}} as {x \rightarrow \infty}).

(Actually, it turns out to be slightly more natural to replace the approximation {\frac{x}{\ln x}} in the prime number theorem by the logarithmic integral {\int_2^x \frac{dt}{\ln t}}, which turns out to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function {\zeta}. Initially, it is only defined on the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}:

Definition 2 (Riemann zeta function, preliminary definition) Let {s \in {\bf C}} be such that {\mathrm{Re} s > 1}. Then we define

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)

Note that the series is locally uniformly convergent in the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}, so in particular {\zeta} is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

\displaystyle  \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)
  • (i) (Euler product formula) For any {s \in {\bf C}} with {\mathrm{Re} s > 1}, we have

    \displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (3)

    where the product is absolutely convergent (and locally uniform in {s}) and is over the prime numbers {p = 2, 3, 5, \dots}.
  • (ii) (Trivial zero-free region) {\zeta(s)} has no zeroes in the region {\{s: \mathrm{Re}(s) > 1 \}}.
  • (iii) (Meromorphic continuation) {\zeta} has a unique meromorphic continuation to the complex plane (which by abuse of notation we also call {\zeta}), with a simple pole at {s=1} and no other poles. Furthermore, the Riemann xi function

    \displaystyle  \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s) \ \ \ \ \ (4)

    is an entire function of order {1} (after removing all singularities). The function {(s-1) \zeta(s)} is an entire function of order one after removing the singularity at {s=1}.
  • (iv) (Functional equation) After applying the meromorphic continuation from (iii), we have

    \displaystyle  \zeta(s) = 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s) \zeta(1-s) \ \ \ \ \ (5)

    for all {s \in {\bf C}} (excluding poles). Equivalently, we have

    \displaystyle  \xi(s) = \xi(1-s) \ \ \ \ \ (6)

    for all {s \in {\bf C}}. (The equivalence between the (5) and (6) is a routine consequence of the Euler reflection formula and the Legendre duplication formula, see Exercises 26 and 31 of Notes 1.)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number {n} is uniquely representable as a product {n = \prod_p p^{a_p}} over primes, where the {a_p} are natural numbers, all but finitely many of which are zero. Writing this representation as {\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}, we see that

\displaystyle  \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}

whenever {x \geq 1}, {m \geq 0}, and {S_{x,m}} consists of all the natural numbers of the form {n = \prod_{p \leq x} p^{a_p}} for some {a_p \leq m}. Sending {m} and {x} to infinity, we conclude from monotone convergence and the geometric series formula that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}

whenever {s>1} is real, and then from dominated convergence we see that the same formula holds for complex {s} with {\mathrm{Re} s > 1} as well. Local uniform convergence then follows from the product form of the Weierstrass {M}-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product {\prod_p (1-\frac{1}{p^s})^{-1}} is absolutely convergent and all terms are non-zero. \Box

We remark that by sending {s} to {1} in Theorem 3(i) we conclude that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}

and from the divergence of the harmonic series we then conclude Euler’s theorem {\sum_p \frac{1}{p} = \infty}. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers {n} occurring in the sum (1), or as a consequence of various integral representations of {\zeta} (or slight modifications thereof). We will focus in this set of notes on a particular representation of {\zeta} as essentially the Mellin transform of the theta function {\theta} that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of {\zeta}. The functional equation (iv), when combined with special values of {\zeta} such as (2), gives some additional values of {\zeta} outside of its initial domain {\{s: \mathrm{Re} s > 1\}}, most famously

\displaystyle  \zeta(-1) = -\frac{1}{12}.

If one formally compares this formula with (1), one arrives at the infamous identity

\displaystyle  1 + 2 + 3 + \dots = -\frac{1}{12}

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of {\Gamma}, we see that {\zeta} has simple zeroes (known as trivial zeroes) at the negative even integers {-2, -4, \dots}, and all other zeroes (the non-trivial zeroes) inside the critical strip {\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if {\rho} is a non-trivial zero then so is {1-\rho}; also, we have the identity

\displaystyle  \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)

for all {s>1} by (1), hence for all {s} (except the pole at {s=1}) by meromorphic continuation. Thus if {\rho} is a non-trivial zero then so is {\overline{\rho}}. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of {\zeta} lie on the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where {\mathrm{Re} s > 1}. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

\displaystyle  -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.

From the geometric series formula we have

\displaystyle  \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}

and so (by another application of Fubini’s theorem) we have the identity

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)

for {\mathrm{Re} s > 1}, where the von Mangoldt function {\Lambda(n)} is defined to equal {\Lambda(n) = \ln p} whenever {n = p^j} is a power {p^j} of a prime {p} for some {j=1,2,\dots}, and {\Lambda(n)=0} otherwise. The contribution of the higher prime powers {p^2, p^3, \dots} is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series {\sum_{n=1}^\infty \frac{1}{n^s}} and {\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}} that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let {s} be a complex number with {\mathrm{Re} s > 1}.
  • (i) Show that {-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}.
  • (ii) Show that {\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}, where {\tau(n) := \sum_{d|n} 1} is the divisor function of {n} (the number of divisors of {n}).
  • (iii) Show that {\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}, where {\mu(n)} is the Möbius function, defined to equal {(-1)^k} when {n} is the product of {k} distinct primes for some {k \geq 0}, and {0} otherwise.
  • (iv) Show that {\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}, where {\lambda(n)} is the Liouville function, defined to equal {(-1)^k} when {n} is the product of {k} (not necessarily distinct) primes for some {k \geq 0}.
  • (v) Show that {\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}, where {\log \zeta} is the holomorphic branch of the logarithm that is real for {s>1}, and with the convention that {\Lambda(n)/\ln n} vanishes for {n=1}.
  • (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function {\Lambda: {\bf N} \rightarrow {\bf R}} such that

    \displaystyle  \ln n = \sum_{d|n} \Lambda(d)

    for every positive integer {n}. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function {\Lambda}, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

\displaystyle  \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1

(or in asymptotic notation, {\sum_{n\leq x} \Lambda(n) = x + o(x)} as {x \rightarrow \infty}).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any {x \geq 2}, we can write

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.

The sum {\sum_{p \leq x^{1/j}} \ln p} is non-zero for only {O(\ln x)} values of {j}, and is of size {O( x^{1/2} \ln x )}, thus

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).

Since {x^{1/2} \ln^2 x = o(x)}, we conclude from Theorem 6 that

\displaystyle  \sum_{p \leq x} \ln p = x + o(x)

as {x \rightarrow \infty}. Next, observe from the fundamental theorem of calculus that

\displaystyle  \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.

Multiplying by {\log p} and summing over all primes {p \leq x}, we conclude that

\displaystyle  \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.

From Theorem 6 we certainly have {\sum_{p \leq y} \ln p = O(y)}, thus

\displaystyle  \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).

By splitting the integral into the ranges {2 \leq y \leq \sqrt{x}} and {\sqrt{x} < y \leq x} we see that the right-hand side is {o(x/\ln x)}, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between {\zeta} and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer {x > 1}, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )

where {\rho} ranges over the non-trivial zeroes of {\zeta} with imaginary part in {[-T,T]}. Furthermore, the convergence of the limit is locally uniform in {x}.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes {\rho} of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in {x}, one is led to the (quite nonrigorous) approximation

\displaystyle  \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)

or (writing {\rho = \sigma+i\gamma})

\displaystyle  \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.

Thus we see that each zero {\rho = \sigma + i\gamma} induces an oscillation in the von Mangoldt function, with {\gamma} controlling the frequency of the oscillation and {\sigma} the rate to which the oscillation dies out as {n \rightarrow \infty}. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of {\zeta} on the line {\{ 1+it: t \in {\bf R} \}}.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at {1+it}, then there would also be a different zero at {1-it} (note {t} cannot vanish due to the pole at {s=1}), and then the approximation (9) becomes

\displaystyle  \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \log n) + \dots.

But the expression {1 - 2 \cos(t \log n)} can be negative for large regions of the variable {n}, whereas {\Lambda(n)} is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real {t}, show that

\displaystyle  -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})

as {\sigma \rightarrow 1^+}, where {o( \frac{1}{\sigma-1})} denotes a quantity that goes to zero as {\sigma \rightarrow 1^+} after being multiplied by {\sigma-1}. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

Read the rest of this entry »

Previous set of notes: Notes 2. Next set of notes: Notes 4.

On the real line, the quintessential examples of a periodic function are the (normalised) sine and cosine functions {\sin(2\pi x)}, {\cos(2\pi x)}, which are {1}-periodic in the sense that

\displaystyle  \sin(2\pi(x+1)) = \sin(2\pi x); \quad \cos(2\pi (x+1)) = \cos(2\pi x).

By taking various polynomial combinations of {\sin(2\pi x)} and {\cos(2\pi x)} we obtain more general trigonometric polynomials that are {1}-periodic; and the theory of Fourier series tells us that all other {1}-periodic functions (with reasonable integrability conditions) can be approximated in various senses by such polynomial combinations. Using Euler’s identity, one can use {e^{2\pi ix}} and {e^{-2\pi ix}} in place of {\sin(2\pi x)} and {\cos(2\pi x)} as the basic generating functions here, provided of course one is willing to use complex coefficients instead of real ones. Of course, by rescaling one can also make similar statements for other periods than {1}. {1}-periodic functions {f: {\bf R} \rightarrow {\bf C}} can also be identified (by abuse of notation) with functions {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} on the quotient space {{\bf R}/{\bf Z}} (known as the additive {1}-torus or additive unit circle), or with functions {f: [0,1] \rightarrow {\bf C}} on the fundamental domain (up to boundary) {[0,1]} of that quotient space with the periodic boundary condition {f(0)=f(1)}. The map {x \mapsto (\cos(2\pi x), \sin(2\pi x))} also identifies the additive unit circle {{\bf R}/{\bf Z}} with the geometric unit circle {S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2=1\} \subset {\bf R}^2}, thanks in large part to the fundamental trigonometric identity {\cos^2 x + \sin^2 x = 1}; this can also be identified with the multiplicative unit circle {S^1 = \{ z \in {\bf C}: |z|=1 \}}. (Usually by abuse of notation we refer to all of these three sets simultaneously as the “unit circle”.) Trigonometric polynomials on the additive unit circle then correspond to ordinary polynomials of the real coefficients {x,y} of the geometric unit circle, or Laurent polynomials of the complex variable {z}.

What about periodic functions on the complex plane? We can start with singly periodic functions {f: {\bf C} \rightarrow {\bf C}} which obey a periodicity relationship {f(z+\omega)=f(z)} for all {z} in the domain and some period {\omega \in {\bf C} \backslash \{0\}}; such functions can also be viewed as functions on the “additive cylinder” {\omega {\bf Z} \backslash {\bf C}} (or equivalently {{\bf C} / \omega {\bf Z}}). We can rescale {\omega=1} as before. For holomorphic functions, we have the following characterisations:

Proposition 1 (Description of singly periodic holomorphic functions)
  • (i) Every {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} has an absolutely convergent expansion

    \displaystyle  f(z) = \sum_{n=-\infty}^\infty a_n e^{2\pi i nz} = \sum_{n=-\infty}^\infty a_n q^n \ \ \ \ \ (1)

    where {q} is the nome {q := e^{2\pi i z}}, and the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} = \limsup_{n \rightarrow +\infty} |a_{-n}|^{1/n} = 0. \ \ \ \ \ (2)

    Conversely, every doubly infinite sequence {(a_n)_{n \in {\bf Z}}} of coefficients obeying (2) gives rise to a {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} via the formula (1).
  • (ii) Every bounded {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} on the upper half-plane {\{ z: \mathrm{Im}(z) > 0\}} has an expansion

    \displaystyle  f(z) = \sum_{n=0}^\infty a_n e^{2\pi i nz} = \sum_{n=0}^\infty a_n q^n \ \ \ \ \ (3)

    where the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq 1. \ \ \ \ \ (4)

    Conversely, every infinite sequence {(a_n)_{n \in {\bf Z}}} obeying (4) gives rise to a {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} which is bounded away from the real axis (i.e., bounded on {\{ z: \mathrm{Im}(z) \geq \varepsilon\}} for every {\varepsilon > 0}).
In both cases, the coefficients {a_n} can be recovered from {f} by the Fourier inversion formula

\displaystyle  a_n = \int_{\gamma_{z_0 \rightarrow z_0+1}} f(z) e^{-2\pi i nz}\ dz \ \ \ \ \ (5)

for any {z_0} in {{\bf C}} (in case (i)) or {{\bf H}} (in case (ii)).

Proof: If {f: {\bf C} \rightarrow {\bf C}} is {1}-periodic, then it can be expressed as {f(z) = F(q) = F(e^{2\pi i z})} for some function {F: {\bf C} \backslash \{0\} \rightarrow {\bf C}} on the “multiplicative cylinder” {{\bf C} \backslash \{0\}}, since the fibres of the map {z \mapsto e^{2\pi i z}} are cosets of the integers {{\bf Z}}, on which {f} is constant by hypothesis. As the map {z \mapsto e^{2\pi i z}} is a covering map from {{\bf C}} to {{\bf C} \backslash \{0\}}, we see that {F} will be holomorphic if and only if {f} is. Thus {F} must have a Laurent series expansion {F(q) = \sum_{n=-\infty}^\infty a_n q^n} with coefficients {a_n} obeying (2), which gives (1), and the inversion formula (5) follows from the usual contour integration formula for Laurent series coefficients. The converse direction to (i) also follows by reversing the above arguments.

For part (ii), we observe that the map {z \mapsto e^{2\pi i z}} is also a covering map from {{\bf H}} to the punctured disk {D(0,1) \backslash \{0\}}, so we can argue as before except that now {F} is a bounded holomorphic function on the punctured disk. By the Riemann singularity removal theorem (Exercise 35 of 246A Notes 3) {F} extends to be holomorphic on all of {D(0,1)}, and thus has a Taylor expansion {F(q) = \sum_{n=0}^\infty a_n q^n} for some coefficients {a_n} obeying (4). The argument now proceeds as with part (i). \Box

The additive cylinder {{\bf Z} \backslash {\bf C}} and the multiplicative cylinder {{\bf C} \backslash \{0\}} can both be identified (on the level of smooth manifolds, at least) with the geometric cylinder {\{ (x,y,z) \in {\bf R}^3: x^2+y^2=1\}}, but we will not use this identification here.

Now let us turn attention to doubly periodic functions of a complex variable {z}, that is to say functions {f} that obey two periodicity relations

\displaystyle  f(z+\omega_1) = f(z); \quad f(z+\omega_2) = f(z)

for all {z \in {\bf C}} and some periods {\omega_1,\omega_2 \in {\bf C}}, which to avoid degeneracies we will assume to be linearly independent over the reals (thus {\omega_1,\omega_2} are non-zero and the ratio {\omega_2/\omega_1} is not real). One can rescale {\omega_1,\omega_2} by a common scaling factor {\lambda \in {\bf C} \backslash \{0\}} to normalise either {\omega_1=1} or {\omega_2=1}, but one of course cannot simultaneously normalise both parameters in this fashion. As in the singly periodic case, such functions can also be identified with functions on the additive {2}-torus {\Lambda \backslash {\bf C}}, where {\Lambda} is the lattice {\Lambda := \omega_1 {\bf Z} + \omega_2 {\bf Z}}, or with functions {f} on the solid parallelogram bounded by the contour {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} (a fundamental domain up to boundary for that torus), obeying the boundary periodicity conditions

\displaystyle  f(z+\omega_1) = f(z)

for {z} in the edge {\gamma_{\omega_2 \rightarrow 0}}, and

\displaystyle  f(z+\omega_2) = f(z)

for {z} in the edge {\gamma_{\omega_0 \rightarrow 1}}.

Within the world of holomorphic functions, the collection of doubly periodic functions is boring:

Proposition 2 Let {f: {\bf C} \rightarrow {\bf C}} be an entire doubly periodic function (with periods {\omega_1,\omega_2} linearly independent over {{\bf R}}). Then {f} is constant.

In the language of Riemann surfaces, this proposition asserts that the torus {\Lambda \backslash {\bf C}} is a non-hyperbolic Riemann surface; it cannot be holomorphically mapped non-trivially into a bounded subset of the complex plane.

Proof: The fundamental domain (up to boundary) enclosed by {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} is compact, hence {f} is bounded on this domain, hence bounded on all of {{\bf C}} by double periodicity. The claim now follows from Liouville’s theorem. (One could alternatively have argued here using the compactness of the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}. \Box

To obtain more interesting examples of doubly periodic functions, one must therefore turn to the world of meromorphic functions – or equivalently, holomorphic functions into the Riemann sphere {{\bf C} \cup \{\infty\}}. As it turns out, a particularly fundamental example of such a function is the Weierstrass elliptic function

\displaystyle  \wp(z) := \frac{1}{z^2} + \sum_{z_0 \in \Lambda \backslash 0} \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} \ \ \ \ \ (6)

which plays a role in doubly periodic functions analogous to the role of {x \mapsto \cos(2\pi x)} for {1}-periodic real functions. This function will have a double pole at the origin {0}, and more generally at all other points on the lattice {\Lambda}, but no other poles. The derivative

\displaystyle  \wp'(z) = -2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3} \ \ \ \ \ (7)

of the Weierstrass function is another doubly periodic meromorphic function, now with a triple pole at every point of {\Lambda}, and plays a role analogous to {x \mapsto \sin(2\pi x)}. Remarkably, all the other doubly periodic meromorphic functions with these periods will turn out to be rational combinations of {\wp} and {\wp'}; furthermore, in analogy with the identity {\cos^2 x+ \sin^2 x = 1}, one has an identity of the form

\displaystyle  \wp'(z)^2 = 4 \wp(z)^3 - g_2 \wp(z) - g_3 \ \ \ \ \ (8)

for all {z \in {\bf C}} (avoiding poles) and some complex numbers {g_2,g_3} that depend on the lattice {\Lambda}. Indeed, much as the map {x \mapsto (\cos 2\pi x, \sin 2\pi x)} creates a diffeomorphism between the additive unit circle {{\bf R}/{\bf Z}} to the geometric unit circle {\{ (x,y) \in{\bf R}^2: x^2+y^2=1\}}, the map {z \mapsto (\wp(z), \wp'(z))} turns out to be a complex diffeomorphism between the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}} and the elliptic curve

\displaystyle  \{ (z, w) \in {\bf C}^2: z^2 = 4w^3 - g_2 w - g_3 \} \cup \{\infty\}

with the convention that {(\wp,\wp')} maps the origin {\omega_1 {\bf Z} + \omega_2 {\bf Z}} of the torus to the point {\infty} at infinity. (Indeed, one can view elliptic curves as “multiplicative tori”, and both the additive and multiplicative tori can be identified as smooth manifolds with the more familiar geometric torus, but we will not use such an identification here.) This fundamental identification with elliptic curves and tori motivates many of the further remarkable properties of elliptic curves; for instance, the fact that tori are obviously an abelian group gives rise to an abelian group law on elliptic curves (and this law can be interpreted as an analogue of the trigonometric sum identities for {\wp, \wp'}). The description of the various meromorphic functions on the torus also helps motivate the more general Riemann-Roch theorem that is a fundamental law governing meromorphic functions on other compact Riemann surfaces (and is discussed further in these 246C notes). So far we have focused on studying a single torus {\Lambda \backslash {\bf C}}. However, another important mathematical object of study is the space of all such tori, modulo isomorphism; this is a basic example of a moduli space, known as the (classical, level one) modular curve {X_0(1)}. This curve can be described in a number of ways. On the one hand, it can be viewed as the upper half-plane {{\bf H} = \{ z: \mathrm{Im}(z) > 0 \}} quotiented out by the discrete group {SL_2({\bf Z})}; on the other hand, by using the {j}-invariant, it can be identified with the complex plane {{\bf C}}; alternatively, one can compactify the modular curve and identify this compactification with the Riemann sphere {{\bf C} \cup \{\infty\}}. (This identification, by the way, produces a very short proof of the little and great Picard theorems, which we proved in 246A Notes 4.) Functions on the modular curve (such as the {j}-invariant) can be viewed as {SL_2({\bf Z})}-invariant functions on {{\bf H}}, and include the important class of modular functions; they naturally generalise to the larger class of (weakly) modular forms, which are functions on {{\bf H}} which transform in a very specific way under {SL_2({\bf Z})}-action, and which are ubiquitous throughout mathematics, and particularly in number theory. Basic examples of modular forms include the Eisenstein series, which are also the Laurent coefficients of the Weierstrass elliptic functions {\wp}. More number theoretic examples of modular forms include (suitable powers of) theta functions {\theta}, and the modular discriminant {\Delta}. Modular forms are {1}-periodic functions on the half-plane, and hence by Proposition 1 come with Fourier coefficients {a_n}; these coefficients often turn out to encode a surprising amount of number-theoretic information; a dramatic example of this is the famous modularity theorem, (a special case of which was) used amongst other things to establish Fermat’s last theorem. Modular forms can be generalised to other discrete groups than {SL_2({\bf Z})} (such as congruence groups) and to other domains than the half-plane {{\bf H}}, leading to the important larger class of automorphic forms, which are of major importance in number theory and representation theory, but which are well outside the scope of this course to discuss.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers {{\bf Z}} (or lattices {{\bf Z}^d}), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers {{\bf A}_{\bf Z}} (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function {m: {\bf R}/{\bf Z} \rightarrow {\bf C}} then defines a discrete Fourier multiplier operator {T_m: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any {1 \leq p \leq \infty} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_m f(\xi) =: m(\xi) {\mathcal F}_{\bf Z} f(\xi)

where {{\mathcal F}_{\bf Z} f(\xi) := \sum_{n \in {\bf Z}} f(n) e(n \xi)} is the Fourier transform on {{\bf Z}}; similarly, any test function {m: {\bf R} \rightarrow {\bf C}} defines a continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})} by the formula

\displaystyle  {\mathcal F}_{\bf R} T_m f(\xi) := m(\xi) {\mathcal F}_{\bf R} f(\xi)

where {{\mathcal F}_{\bf R} f(\xi) := \int_{\bf R} f(x) e(x \xi)\ dx}. In both cases we refer to {m} as the symbol of the multiplier operator {T_m}.

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator {T_m: L^p({\bf R}) \rightarrow L^p({\bf R})}, and any frequency {\alpha \in {\bf R}/{\bf Z}}, we define the discrete Fourier multiplier operator {T_{m;\alpha}: \ell^p({\bf Z}) \rightarrow \ell^p({\bf Z})} for any frequency shift {\alpha \in {\bf R}/{\bf Z}} by the formula

\displaystyle  {\mathcal F}_{\bf Z} T_{m,\alpha} f(\xi) := \sum_{\theta \in {\bf R}: \xi = \alpha + \theta} m(\theta) {\mathcal F}_{\bf Z} f(\xi)

or equivalently

\displaystyle  T_{m;\alpha} f(n) = \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

More generally, given any finite set {\Sigma \subset {\bf R}/{\bf Z}}, we can form a multifrequency projection operator {T_{m;\Sigma}} on {\ell^p({\bf Z})} by the formula

\displaystyle  T_{m;\Sigma} := \sum_{\alpha \in \Sigma} T_{m;\alpha}


\displaystyle  T_{m;\alpha} f(n) = \sum_{\alpha \in \Sigma} \int_{\bf R} m(\theta) {\mathcal F}_{\bf Z} f(\alpha+\theta) e( n(\alpha+\theta) )\ d\theta.

This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]}, then {T_{m;\Sigma}} is a Fourier multiplier whose symbol is supported on the set {\bigcup_{\alpha \in \Sigma} \alpha + [-\varepsilon,\varepsilon]}.

There are a body of results relating the {\ell^p({\bf Z})} theory of discrete Fourier multiplier operators such as {T_{m;\alpha}} or {T_{m;\Sigma}} with the {L^p({\bf R})} theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

Proposition 1 (Magyar-Stein-Wainger sampling principle) Let {1 \leq p \leq \infty} and {\alpha \in {\bf R}/{\bf Z}}.
  • (i) If {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2,1/2]}, then {\|T_{m;\alpha}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}, where {B(V)} denotes the operator norm of an operator {T: V \rightarrow V}.
  • (ii) More generally, if {m: {\bf R} \rightarrow {\bf C}} is a smooth function supported in {[-1/2Q,1/2Q]} for some natural number {Q}, then {\|T_{m;\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}\|_{B(\ell^p({\bf Z}))} \lesssim \|T_m\|_{B(L^p({\bf R}))}}.

When {p=2} the implied constant in these bounds can be set to equal {1}. In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other {p}; in an appendix to this paper I show that the answer is negative if {p} is sufficiently close to {1} or {\infty}, but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups {\frac{1}{Q}{\bf Z}/{\bf Z}}; for instance it shows that a discrete Fourier multiplier with symbol {\sum_{\alpha \in \frac{1}{Q}{\bf Z}/{\bf Z}} \phi(Q(\xi-\alpha))} for a fixed test function {\phi} is bounded on {\ell^p({\bf Z})}, uniformly in {p} and {Q}. For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

\displaystyle  \bigcup_{q=1}^N \bigcup_{\alpha \in \frac{1}{q}{\bf Z}/{\bf Z}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (1)

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection {S} of pairwise coprime natural numbers, and a natural number {k}, one can form the major arc type set

\displaystyle  \bigcup_{\alpha \in \Sigma_{\leq k}} \alpha + [-\varepsilon,\varepsilon] \ \ \ \ \ (2)

where {\Sigma_{\leq k} \subset {\bf R}/{\bf Z}} consists of all rational points in the unit circle of the form {\frac{a}{Q} \mod 1} where {Q} is the product of at most {k} elements from {S} and {a} is an integer. For suitable choices of {S} and {k} not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses {S} to consist of (small powers of) large primes between {N^\rho} and {N} for some small constant {\rho>0}, together with something like the product of all the primes up to {N^\rho} (raised to suitable powers)).

In the regime where {k} is fixed and {\varepsilon} is small, there is a good theory:

Theorem 2 (Ionescu-Wainger theorem, rough version) If {p} is an even integer or the dual of an even integer, and {m: {\bf R} \rightarrow {\bf C}} is supported on {[-\varepsilon,\varepsilon]} for a sufficiently small {\varepsilon > 0}, then

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \lesssim_{p, k} (\log(1+|S|))^{O_k(1)} \|T_m\|_{B(L^p({\bf R}))}.

There is a more explicit description of how small {\varepsilon} needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs {\alpha + [-\varepsilon,\varepsilon]} in (2) to be disjoint), but we will not give it here. The logarithmic loss of {(\log(1+|S|))^{O_k(1)}} was reduced to {\log(1+|S|)} by Mirek. In this paper we refine the bound further to

\displaystyle  \|T_{m;\Sigma_{\leq k}}\|_{B(\ell^p({\bf Z}))} \leq O(r \log(2+kr))^k \|T_m\|_{B(L^p({\bf R}))}. \ \ \ \ \ (3)

when {p = 2r} or {p = (2r)'} for some integer {r}. In particular there is no longer any logarithmic loss in the cardinality of the set {S}.

The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume {p=2r}. We use the following standard sequence of steps:

  • (i) (Denominator orthogonality) First one splits {T_{m;\Sigma_{\leq k}} f} into various pieces depending on the denominator {Q} appearing in the element of {\Sigma_{\leq k}}, and exploits “superorthogonality” in {Q} to estimate the {\ell^p} norm by the {\ell^p} norm of an appropriate square function.
  • (ii) (Nonconcentration) One expands out the {p^{th}} power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
  • (iii) (Numerator orthogonality) We now decompose based on the numerators {a} appearing in the relevant elements of {\Sigma_{\leq k}}, and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets {\alpha + \frac{1}{Q}{\bf Z}/{\bf Z}}.
  • (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier {m}.
  • (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

\displaystyle  \sum_{A \in \binom{S}{k}} f_A,

where {\binom{S}{k}} is the set of {k}-element subsets of an index set {S}, and {f_A} are various complex numbers, as an average

\displaystyle  \sum_{A \in \binom{S}{k}} f_A = \frac{k^k}{k!} {\bf E} \sum_{s_1 \in {\bf S}_1,\dots,s_k \in {\bf S}_k} f_{\{s_1,\dots,s_k\}}

where {S = {\bf S}_1 \cup \dots \cup {\bf S}_k} is a random partition of {S} into {k} subclasses (chosen uniformly over all such partitions), basically because every {k}-element subset {A} of {S} has a probability exactly {\frac{k!}{k^k}} of being completely shattered by such a random partition. This “decouples” the index set {\binom{S}{k}} into a Cartesian product {{\bf S}_1 \times \dots \times {\bf S}_k} which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form

\displaystyle  (\sum_{A \in \binom{S}{k}} F_A)^r \lesssim_{k,r} \sum_{A_1,\dots,A_r \in \binom{S}{k} \hbox{ sunflower}} F_{A_1} \dots F_{A_r}

where {F_A} are various non-negative quantities, and a sunflower is a collection of sets {A_1,\dots,A_r} that consist of a common “core” {A_0} and disjoint “petals” {A_1 \backslash A_0,\dots,A_r \backslash A_0}. The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case {k=1}.

In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function {f: {\bf R} \rightarrow {\bf C}} has its Fourier transform supported on {[-1/2,1/2]}, then {f} can be recovered uniquely from its restriction {f|_{\bf Z}: {\bf Z} \rightarrow {\bf C}}. In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to {[-c,c]} for some {0 < c < 1/2}, then the restriction {f|_{\bf Z}} has the same {L^p} behaviour as the original function, in the sense that

\displaystyle  \| f|_{\bf Z} \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R})} \ \ \ \ \ (4)

for all {0 < p \leq \infty}; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions {f} should behave like a constant at scales {\sim 1/c}.

The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

\displaystyle  T_{m;0} (f|_{\bf Z}) = (T_m f)_{\bf Z}

whenever {f: {\bf R} \rightarrow {\bf C}} is Schwartz and has Fourier transform supported in {[-1/2,1/2]}, and {m} is also supported on {[-1/2,1/2]}; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of {{\bf R}} is now played by {{\bf R} \times {\bf Z}/Q{\bf Z}}, and the standard embedding of {{\bf Z}} into {{\bf R}} is now replaced by the embedding {\iota_Q: n \mapsto (n, n \hbox{ mod } Q)} of {{\bf Z}} into {{\bf R} \times {\bf Z}/Q{\bf Z}}; the analogue of (4) is now

\displaystyle  \| f \circ \iota_Q \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf R} \times {\bf Z}/Q{\bf Z})} \ \ \ \ \ (5)

whenever {f: {\bf R} \times {\bf Z}/Q{\bf Z} \rightarrow {\bf C}} is Schwartz and has Fourier transform {{\mathcal F}_{{\bf R} \times {\bf Z}/Q{\bf Z}} f\colon {\bf R} \times \frac{1}{Q}{\bf Z}/{\bf Z} \rightarrow {\bf C}} supported in {[-c/Q,c/Q] \times \frac{1}{Q}{\bf Z}/{\bf Z}}, and {{\bf Z}/Q{\bf Z}} is endowed with probability Haar measure.

The locally compact abelian groups {{\bf R}} and {{\bf R} \times {\bf Z}/Q{\bf Z}} can all be viewed as projections of the adelic integers {{\bf A}_{\bf Z} := {\bf R} \times \hat {\bf Z}} (the product of the reals and the profinite integers {\hat {\bf Z}}). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

\displaystyle  \| f \circ \iota \|_{\ell^p({\bf Z})} \sim_{c,p} \|f\|_{L^p({\bf A}_{\bf Z})}

whenever {1 < p < \infty}, {f: {\bf A}_{\bf Z} \rightarrow {\bf C}} is Schwartz-Bruhat and has Fourier transform {{\mathcal F}_{{\bf A}_{\bf Z}} f: {\bf R} \times {\bf Q}/{\bf Z} \rightarrow {\bf C}} supported on {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}} for some sufficiently small {\varepsilon} (the precise bound on {\varepsilon} depends on {S, p, c} in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the {\ell^p({\bf Z})} operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the {L^p({\bf A}_{\bf Z})} operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers {{\bf Z}} to “low frequency” harmonic analysis on the adelic integers {{\bf A}_{\bf Z}}, which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set {[-\varepsilon,\varepsilon] \times \Sigma_{\leq k}}).

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function {\lambda}. One form of this conjecture is the assertion that

\displaystyle  \int_0^X \| \lambda \|_{U^k([x,x+H])}\ dx = o(X) \ \ \ \ \ (1)

as {X \rightarrow \infty} for any fixed {k \geq 0} and any {H = H(X) \leq X} that goes to infinity as {X \rightarrow \infty}, where {U^k([x,x+H])} is the (normalized) Gowers uniformity norm. Among other things this conjecture implies (logarithmically averaged version of) the Chowla and Sarnak conjectures for the Liouville function (or the Möbius function), see this previous blog post.

The conjecture gets more difficult as {k} increases, and also becomes more difficult the more slowly {H} grows with {X}. The {k=0} conjecture is equivalent to the assertion

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n)| \ dx = o(HX)

which was proven (for arbitrarily slowly growing {H}) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.

For {k=1}, the conjecture is equivalent to the assertion

\displaystyle  \int_0^X \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)| \ dx = o(HX). \ \ \ \ \ (2)

This remains open for sufficiently slowly growing {H} (and it would be a major breakthrough in particular if one could obtain this bound for {H} as small as {\log^\varepsilon X} for any fixed {\varepsilon>0}, particularly if applicable to more general bounded multiplicative functions than {\lambda}, as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range {H \geq X^\varepsilon} (in fact we have since worked out in the current paper that we can get {H} as small as {\exp(\log^{5/8+\varepsilon} X)}). In our current paper we establish Fourier uniformity conjecture for higher {k} for the same range of {H}. This in particular implies local orthogonality to polynomial phases,

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) e(-P(n))| \ dx = o(HX) \ \ \ \ \ (3)

where {\mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} denotes the polynomials of degree at most {k-1}, but the full conjecture is a bit stronger than this, establishing the more general statement

\displaystyle  \int_0^X \sup_{g \in \mathrm{Poly}({\bf R} \rightarrow G)} |\sum_{x \leq n \leq x+H} \lambda(n) \overline{F}(g(n) \Gamma)| \ dx = o(HX) \ \ \ \ \ (4)

for any degree {k} filtered nilmanifold {G/\Gamma} and Lipschitz function {F: G/\Gamma \rightarrow {\bf C}}, where {g} now ranges over polynomial maps from {{\bf R}} to {G}. The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-P_x(n))| \gg H

for many {x \sim X} and some polynomial {P_x} depending on {x}. The difficulty here is to understand how {P_x} can depend on {x}. We write the above correlation estimate more suggestively as

\displaystyle  \lambda(n) \sim_{[x,x+H]} e(P_x(n)).

Because of the multiplicativity {\lambda(np) = -\lambda(p)} at small primes {p}, one expects to have a relation of the form

\displaystyle  e(P_{x'}(p'n)) \sim_{[x/p,x/p+H/p]} e(P_x(pn)) \ \ \ \ \ (5)

for many {x,x'} for which {x/p \approx x'/p'} for some small primes {p,p'}. (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between {P_x} and {P_{x'}} for “edges” {x,x'} in a rather sparse “graph” connecting the elements of say {[X/2,X]}. Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the {P_x} to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form

\displaystyle  P_x(a_x \cdot) \approx P_x(b_x \cdot)

for some large and close (but not identical) integers {a_x,b_x}, where {\approx} should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials {P_x} should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form

\displaystyle  P_x(t) \approx T_x \log t

for some real number {T_x} of polynomial size, and with further analysis of the relation (5) one can make {T_x} basically independent of {x}. This simplifies (3) to something like

\displaystyle  \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}| \ dx = o(HX)

and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because {n \mapsto \lambda(n) n^{-iT}} is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)

Now we apply the same strategy to (4). For abelian {G} the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence {g_x \in \mathrm{Poly}({\bf R} \rightarrow G)} attached to many {x \sim X}, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

\displaystyle  g_x(a_x \cdot) \Gamma \approx g_x(b_x \cdot) \Gamma \ \ \ \ \ (6)

where the relation {\approx} is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as

\displaystyle  g_x(t) = \gamma^{\frac{\log(a_x t)}{\log(a_x/b_x)}}

for some {\gamma \in \Gamma}, which do not necessarily lead to multiplicative characters like {n^{-iT}} as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of {e(\lfloor \alpha \log n \rfloor \beta \log n)} as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each {x}, but many, leading to dilation invariances

\displaystyle  g_x((1+\theta) t) \Gamma \approx g_x(t) \Gamma

for a “dense” set of {\theta}. From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that {G} is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) {g_x(t)} must behave like {\gamma_x^{\log t}} for some central element {\gamma_x} of {G}. This eventually brings one back to the multiplicative characters {n^{-iT}} that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

\displaystyle  s(k) := |\{ (\lambda(n+1),\dots,\lambda(n+k)): n \in {\bf N} \}|

of length {k} sign patterns in the Liouville function. The Chowla conjecture implies that {s(k) = 2^k}, but even the weaker conjecture of Sarnak that {s(k) \gg (1+\varepsilon)^k} for some {\varepsilon>0} remains open. Until recently, the best asymptotic lower bound on {s(k)} was {s(k) \gg k^2}, due to McNamara; with our result, we can now show {s(k) \gg_A k^A} for any {A} (in fact we can get {s(k) \gg_\varepsilon \exp(\log^{8/5-\varepsilon} k)} for any {\varepsilon>0}). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor {p 1_{p|n}-1}). However the observation is that if there are extremely few sign patterns of length {k}, then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length {k} sign patterns. If there are not extremely few sign patterns of length {k} then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.

The second application is to obtain cancellation for various polynomial averages involving the Liouville function {\lambda} or von Mangoldt function {\Lambda}, such as

\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \lambda(n+P_2(m)) \dots \lambda(n+P_k(m))


\displaystyle  {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \Lambda(n+P_2(m)) \dots \Lambda(n+P_k(m))

where {P_1,\dots,P_k} are polynomials of degree at most {d}, no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials {P} had distinct degrees, in which one could use the {k=0} theory of Matomäki and Radziwill; now that higher {k} is available at the scale {H=X^{1/d}} we can now remove this restriction.

Define the Collatz map {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} on the natural numbers {{\bf N}+1 = \{1,2,\dots\}} by setting {\mathrm{Col}(N)} to equal {3N+1} when {N} is odd and {N/2} when {N} is even, and let {\mathrm{Col}^{\bf N}(N) := \{ N, \mathrm{Col}(N), \mathrm{Col}^2(N), \dots \}} denote the forward Collatz orbit of {N}. The notorious Collatz conjecture asserts that {1 \in \mathrm{Col}^{\bf N}(N)} for all {N \in {\bf N}+1}. Equivalently, if we define the backwards Collatz orbit {(\mathrm{Col}^{\bf N})^*(N) := \{ M \in {\bf N}+1: N \in \mathrm{Col}^{\bf N}(M) \}} to be all the natural numbers {M} that encounter {N} in their forward Collatz orbit, then the Collatz conjecture asserts that {(\mathrm{Col}^{\bf N})^*(1) = {\bf N}+1}. As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (1)


for all {x \geq 1} and {\gamma = 0.84}. (This improved upon previous values of {\gamma = 0.81} obtained by Applegate and Lagarias in 1995, {\gamma = 0.65} by Applegate and Lagarias in 1995 by a different method, {\gamma=0.48} by Wirsching in 1993, {\gamma=0.43} by Krasikov in 1989, {\gamma=0.3} by Sander in 1990, and some {\gamma>0} by Crandall in 1978.) This is still the largest value of {\gamma} for which (1) has been established. Of course, the Collatz conjecture would imply that we can take {\gamma} equal to {1}, which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the {\gamma=1} case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to {1}. In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called Syracuse random variables.

Definition 1 (Syracuse random variables) For any natural number {n}, a Syracuse random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on the cyclic group {{\bf Z}/3^n{\bf Z}} is defined as a random variable of the form

\displaystyle \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = \sum_{m=1}^n 3^{n-m} 2^{-{\mathbf a}_m-\dots-{\mathbf a}_n} \ \ \ \ \ (2)


where {\mathbf{a}_1,\dots,\mathbf{a_n}} are independent copies of a geometric random variable {\mathbf{Geom}(2)} on the natural numbers with mean {2}, thus

\displaystyle \mathop{\bf P}( \mathbf{a}_1=a_1,\dots,\mathbf{a}_n=a_n) = 2^{-a_1-\dots-a_n}

} for {a_1,\dots,a_n \in {\bf N}+1}. In (2) the arithmetic is performed in the ring {{\bf Z}/3^n{\bf Z}}.

Thus for instance

\displaystyle \mathrm{Syrac}({\bf Z}/3{\bf Z}) = 2^{-\mathbf{a}_1} \hbox{ mod } 3

\displaystyle \mathrm{Syrac}({\bf Z}/3^2{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2} + 3 \times 2^{-\mathbf{a}_2} \hbox{ mod } 3^2

\displaystyle \mathrm{Syrac}({\bf Z}/3^3{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2-\mathbf{a}_3} + 3 \times 2^{-\mathbf{a}_2-\mathbf{a}_3} + 3^2 \times 2^{-\mathbf{a}_3} \hbox{ mod } 3^3

and so forth. After reversing the labeling of the {\mathbf{a}_1,\dots,\mathbf{a}_n}, one could also view {\mathrm{Syrac}({\bf Z}/3^n{\bf Z})} as the mod {3^n} reduction of a {3}-adic random variable

\displaystyle \mathbf{Syrac}({\bf Z}_3) = \sum_{m=1}^\infty 3^{m-1} 2^{-{\mathbf a}_1-\dots-{\mathbf a}_m}.

The probability density function {b \mapsto \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b )} of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when {n=1}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3{\bf Z}) = b )} is equal to {0,1/3,2/3} for {x=b,1,2 \hbox{ mod } 3} respectively, while when {n=2}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = b )} is equal to

\displaystyle 0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

when {b=0,\dots,8 \hbox{ mod } 9} respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let {2{\bf N}+1 = \{1,3,5,\dots\}} denote the odd natural numbers, and define the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1} by

\displaystyle \mathrm{Syr}(N) := \frac{3n+1}{2^{\nu_2(3N+1)}}

where the {2}valuation {\nu_2(3n+1) \in {\bf N}} is the number of times {2} divides {3N+1}. We can define the forward orbit {\mathrm{Syr}^{\bf N}(n)} and backward orbit {(\mathrm{Syr}^{\bf N})^*(N)} of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion {(\mathrm{Syr}^{\bf N})^*(1) = 2{\bf N}+1}, and that the assertion (1) for a given {\gamma} is equivalent to the assertion

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (3)


for all {x \geq 1}, where {N} is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number {N} and natural number {n}, one has

\displaystyle \mathrm{Syr}^n(N) = 3^n 2^{-a_1-\dots-a_n} N + \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n}

where the natural numbers {a_1,\dots,a_n} are defined by the formula

\displaystyle a_i := \nu_2( 3 \mathrm{Syr}^{i-1}(N) + 1 ),

so in particular

\displaystyle \mathrm{Syr}^n(N) = \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n} \hbox{ mod } 3^n.

Heuristically, one expects the {2}-valuation {a = \nu_2(N)} of a typical odd number {N} to be approximately distributed according to the geometric distribution {\mathbf{Geom}(2)}, so one therefore expects the residue class {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} to be distributed approximately according to the random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}.

The Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} will always avoid multiples of three (this reflects the fact that {\mathrm{Syr}(N)} is never a multiple of three), but attains any non-multiple of three in {{\bf Z}/3^n{\bf Z}} with positive probability. For any natural number {n}, set

\displaystyle c_n := \inf_{b \in {\bf Z}/3^n{\bf Z}: 3 \nmid b} \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b ).

Equivalently, {c_n} is the greatest quantity for which we have the inequality

\displaystyle \sum_{(a_1,\dots,a_n) \in S_{n,N}} 2^{-a_1-\dots-a_m} \geq c_n \ \ \ \ \ (4)


for all integers {N} not divisible by three, where {S_{n,N} \subset ({\bf N}+1)^n} is the set of all tuples {(a_1,\dots,a_n)} for which

\displaystyle N = \sum_{m=1}^n 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n.

Thus for instance {c_0=1}, {c_1 = 1/3}, and {c_2 = 2/63}. On the other hand, since all the probabilities {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b)} sum to {1} as {b \in {\bf Z}/3^n{\bf Z}} ranges over the non-multiples of {3}, we have the trivial upper bound

\displaystyle c_n \leq \frac{3}{2} 3^{-n}.

There is also an easy submultiplicativity result:

Lemma 2 For any natural numbers {n_1,n_2}, we have

\displaystyle c_{n_1+n_2-1} \geq c_{n_1} c_{n_2}.

Proof: Let {N} be an integer not divisible by {3}, then by (4) we have

\displaystyle \sum_{(a_1,\dots,a_{n_1}) \in S_{n_1,N}} 2^{-a_1-\dots-a_{n_1}} \geq c_{n_1}.

If we let {S'_{n_1,N}} denote the set of tuples {(a_1,\dots,a_{n_1-1})} that can be formed from the tuples in {S_{n_1,N}} by deleting the final component {a_{n_1}} from each tuple, then we have

\displaystyle \sum_{(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}} 2^{-a_1-\dots-a_{n_1-1}} \geq c_{n_1}. \ \ \ \ \ (5)


Next, observe that if {(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}}, then

\displaystyle N = \sum_{m=1}^{n_1-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_1-1} 2^{-a_1-\dots-a_{n_1-1}} M

with {M = M_{N,n_1,a_1,\dots,a_{n_1-1}}} an integer not divisible by three. By definition of {S_{n_2,M}} and a relabeling, we then have

\displaystyle M = \sum_{m=1}^{n_2} 3^{m-1} 2^{-a_{n_1}-\dots-a_{m+n_1-1}} \hbox{ mod } 3^{n_2}

for all {(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}}. For such tuples we then have

\displaystyle N = \sum_{m=1}^{n_1+n_2-1} 3^{m-1} 2^{-a_1-\dots-a_{n_1+n_2-1}} \hbox{ mod } 3^{n_1+n_2-1}

so that {(a_1,\dots,a_{n_1+n_2-1}) \in S_{n_1+n_2-1,N}}. Since

\displaystyle \sum_{(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}} 2^{-a_{n_1}-\dots-a_{n_1+n_2-1}} \geq c_{n_2}

for each {M}, the claim follows. \Box

From this lemma we see that {c_n = 3^{-\beta n + o(n)}} for some absolute constant {\beta \geq 1}. Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of {{\bf Z}/3^n{\bf Z}} (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that {\beta=1}. I cannot prove this, but I can show that this conjecture would imply that we can take the exponent {\gamma} in (1), (3) arbitrarily close to one:

Proposition 3 Suppose that {\beta=1} (that is to say, {c_n = 3^{-n+o(n)}} as {n \rightarrow \infty}). Then

\displaystyle \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}, or equivalently

\displaystyle \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}. In other words, (1), (3) hold for all {\gamma < 1}.

I prove this proposition below the fold. A variant of the argument shows that for any value of {\beta}, (1), (3) holds whenever {\gamma < f(\beta)}, where {f: [0,1] \rightarrow [0,1]} is an explicitly computable function with {f(\beta) \rightarrow 1} as {\beta \rightarrow 1}. In principle, one could then improve the Krasikov-Lagarias result {\gamma = 0.84} by getting a sufficiently good upper bound on {\beta}, which is in principle achievable numerically (note for instance that Lemma 2 implies the bound {c_n \leq 3^{-\beta(n-1)}} for any {n}, since {c_{kn-k+1} \geq c_n^k} for any {k}).

Read the rest of this entry »

Just a brief post to record some notable papers in my fields of interest that appeared on the arXiv recently.

  • A sharp square function estimate for the cone in {\bf R}^3“, by Larry Guth, Hong Wang, and Ruixiang Zhang.  This paper establishes an optimal (up to epsilon losses) square function estimate for the three-dimensional light cone that was essentially conjectured by Mockenhaupt, Seeger, and Sogge, which has a number of other consequences including Sogge’s local smoothing conjecture for the wave equation in two spatial dimensions, which in turn implies the (already known) Bochner-Riesz, restriction, and Kakeya conjectures in two dimensions.   Interestingly, modern techniques such as polynomial partitioning and decoupling estimates are not used in this argument; instead, the authors mostly rely on an induction on scales argument and Kakeya type estimates.  Many previous authors (including myself) were able to get weaker estimates of this type by an induction on scales method, but there were always significant inefficiencies in doing so; in particular knowing the sharp square function estimate at smaller scales did not imply the sharp square function estimate at the given larger scale.  The authors here get around this issue by finding an even stronger estimate that implies the square function estimate, but behaves significantly better with respect to induction on scales.
  • On the Chowla and twin primes conjectures over {\mathbb F}_q[T]“, by Will Sawin and Mark Shusterman.  This paper resolves a number of well known open conjectures in analytic number theory, such as the Chowla conjecture and the twin prime conjecture (in the strong form conjectured by Hardy and Littlewood), in the case of function fields where the field is a prime power q=p^j which is fixed (in contrast to a number of existing results in the “large q” limit) but has a large exponent j.  The techniques here are orthogonal to those used in recent progress towards the Chowla conjecture over the integers (e.g., in this previous paper of mine); the starting point is an algebraic observation that in certain function fields, the Mobius function behaves like a quadratic Dirichlet character along certain arithmetic progressions.  In principle, this reduces problems such as Chowla’s conjecture to problems about estimating sums of Dirichlet characters, for which more is known; but the task is still far from trivial.
  • Bounds for sets with no polynomial progressions“, by Sarah Peluse.  This paper can be viewed as part of a larger project to obtain quantitative density Ramsey theorems of Szemeredi type.  For instance, Gowers famously established a relatively good quantitative bound for Szemeredi’s theorem that all dense subsets of integers contain arbitrarily long arithmetic progressions a, a+r, \dots, a+(k-1)r.  The corresponding question for polynomial progressions a+P_1(r), \dots, a+P_k(r) is considered more difficult for a number of reasons.  One of them is that dilation invariance is lost; a dilation of an arithmetic progression is again an arithmetic progression, but a dilation of a polynomial progression will in general not be a polynomial progression with the same polynomials P_1,\dots,P_k.  Another issue is that the ranges of the two parameters a,r are now at different scales.  Peluse gets around these difficulties in the case when all the polynomials P_1,\dots,P_k have distinct degrees, which is in some sense the opposite case to that considered by Gowers (in particular, she avoids the need to obtain quantitative inverse theorems for high order Gowers norms; which was recently obtained in this integer setting by Manners but with bounds that are probably not strong enough to for the bounds in Peluse’s results, due to a degree lowering argument that is available in this case).  To resolve the first difficulty one has to make all the estimates rather uniform in the coefficients of the polynomials P_j, so that one can still run a density increment argument efficiently.  To resolve the second difficulty one needs to find a quantitative concatenation theorem for Gowers uniformity norms.  Many of these ideas were developed in previous papers of Peluse and Peluse-Prendiville in simpler settings.
  • On blow up for the energy super critical defocusing non linear Schrödinger equations“, by Frank Merle, Pierre Raphael, Igor Rodnianski, and Jeremie Szeftel.  This paper (when combined with two companion papers) resolves a long-standing problem as to whether finite time blowup occurs for the defocusing supercritical nonlinear Schrödinger equation (at least in certain dimensions and nonlinearities).  I had a previous paper establishing a result like this if one “cheated” by replacing the nonlinear Schrodinger equation by a system of such equations, but remarkably they are able to tackle the original equation itself without any such cheating.  Given the very analogous situation with Navier-Stokes, where again one can create finite time blowup by “cheating” and modifying the equation, it does raise hope that finite time blowup for the incompressible Navier-Stokes and Euler equations can be established…  In fact the connection may not just be at the level of analogy; a surprising key ingredient in the proofs here is the observation that a certain blowup ansatz for the nonlinear Schrodinger equation is governed by solutions to the (compressible) Euler equation, and finite time blowup examples for the latter can be used to construct finite time blowup examples for the former.

Let us call an arithmetic function {f: {\bf N} \rightarrow {\bf C}} {1}-bounded if we have {|f(n)| \leq 1} for all {n \in {\bf N}}. In this section we focus on the asymptotic behaviour of {1}-bounded multiplicative functions. Some key examples of such functions include:

  • The Möbius function {\mu};
  • The Liouville function {\lambda};
  • Archimedean” characters {n \mapsto n^{it}} (which I call Archimedean because they are pullbacks of a Fourier character {x \mapsto x^{it}} on the multiplicative group {{\bf R}^+}, which has the Archimedean property);
  • Dirichlet characters (or “non-Archimedean” characters) {\chi} (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group {({\bf Z}/q{\bf Z})^\times} with the discrete (non-Archimedean) metric);
  • Hybrid characters {n \mapsto \chi(n) n^{it}}.

The space of {1}-bounded multiplicative functions is also closed under multiplication and complex conjugation.
Given a multiplicative function {f}, we are often interested in the asymptotics of long averages such as

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n)

for large values of {X}, as well as short sums

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)

where {H} and {x} are both large, but {H} is significantly smaller than {x}. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by {O(1)}; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)

as {X \rightarrow \infty}. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

\displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)

as {X \rightarrow \infty}. (Hint: use the identities {\lambda(n) = \sum_{d^2|n} \mu(n/d^2)} and {\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)

as {x \rightarrow \infty} if {H = H(x)} is such that {H \geq \varepsilon x} for some fixed {\varepsilon>0}. However it is significantly more difficult to understand what happens when {H} grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to {H \geq x^{1/2} \log^C x} for an absolute constant {C} (the bound {H \geq x^{1/2+\varepsilon}} is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that {H} should be able to be taken as small as {x^\varepsilon}, and perhaps even {\log^{1+\varepsilon} x} if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that {H} cannot be taken arbitrarily slowly growing in {x}, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence {x \rightarrow \infty} and some sufficiently slowly growing {H = H(x)}, by modifying the arguments in these papers of mine).
The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let {H = H(X)} be a function of {X} such that {H \rightarrow \infty} and {H = o(X)} as {X \rightarrow \infty}. Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded function. Show that the following assertions are equivalent:

  • (i) One has

    \displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)

    as {X \rightarrow \infty}, uniformly for all {x \in [X,2X]} outside of a set of measure {o(X)}.

  • (ii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)

    as {X \rightarrow \infty}.

  • (iii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)

    as {X \rightarrow \infty}.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).
Using zero density methods, for instance, it was shown by Ramachandra that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X

whenever {X^{1/6+\varepsilon} \leq H \leq X} and {\varepsilon>0}. With this quality of bound (saving arbitrary powers of {\log X} over the trivial bound of {O(1)}), this is still the lowest value of {H} one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of {\log X} or {\log H}, or just a qualitative decay of {o(1)}), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any {2 \leq H \leq X}, one has

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H

for some absolute constant {c>0}.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function {H = H(X)} that goes to infinity as {X \rightarrow \infty}, no matter how slowly! This ability to let {H} grow slowly with {X} is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that {H} be allowed to grow more slowly than {\log X}. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

  • (i) Establish the lower bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c

    for some absolute constant {c>0} and all sufficiently large {X}. (Hint: if this bound failed, then {\lambda(n)=\lambda(n+1)} would hold for almost all {n}; use this to create many intervals {[x,x+H]} for which {\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)} is extremely large.)

  • (ii) Show that Theorem 3 also holds with {\lambda(n)} replaced by {\chi_2 \lambda(n)}, where {\chi_2} is the principal character of period {2}. (Use the fact that {\lambda(2n)=-\lambda(n)} for all {n}.) Use this to establish the corresponding upper bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c

    to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as {\mu} or {\lambda} have Dirichlet series {{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}, {{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}} that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of {1}-bounded multiplicative functions {f}, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two {1}-bounded multiplicative functions {f,g} (at a given scale {X}), which informally measures the extent to which {f} “pretends” to be like {g} (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two {1}-bounded multiplicative functions {f,g}, and a threshold {X>0}, the pretentious distance {\mathbb{D}(f,g;X)} between {f} and {g} up to scale {X} is given by the formula

\displaystyle  \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}

Note that one can also define an infinite version {\mathbb{D}(f,g;\infty)} of this distance by removing the constraint {p \leq X}, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because {\mathbb{D}(f,f;X)} can be non-zero, and furthermore {\mathbb{D}(f,g;X)} can vanish without {f,g} being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two {1}-bounded multiplicative functions {f,g} will exhibit similar behaviour at scale {X} if their pretentious distance {\mathbb{D}(f,g;X)} is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative functions {f,g}, one has

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )

for an absolute constant {c>0}.

In particular, if {f} does not pretend to be {1}, then the logarithmic average {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} will be small. This condition is basically necessary, since of course {\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}.
If one works with non-logarithmic averages {\frac{1}{X} \sum_{n \leq X} f(n)}, then not pretending to be {1} is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if {f} is an Archimedean character {f(n) = n^{it}} for some non-zero real {t}, then {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} goes to zero as {X \rightarrow \infty} (which is consistent with Proposition 6), but {\frac{1}{X} \sum_{n \leq X} f(n)} does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}

for an absolute constant {c>0} and any {T > 0}.

Informally, we refer to a {1}-bounded multiplicative function as “pretentious’; if it pretends to be a character such as {n^{it}}, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character {1} which needs to be considered, but in Theorem 7 it is the characters {n \mapsto n^{it}} with {|t| \leq T}. In other contexts one may also need to add Dirichlet characters {\chi(n)} or hybrid characters such as {\chi(n) n^{it}} to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.
In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

\displaystyle  |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta

for some {\eta > 0}, then one has the pretentious property

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}

for some {t \ll \eta^{-1}}. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.
Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.
We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let {\eta>0}, and let {1 \leq H \leq X}, with {H} sufficiently large depending on {\eta}. Suppose that {f} is a {1}-bounded multiplicative function such that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta \frac{X}{H}}.

The condition {t \ll_\eta \frac{X}{H}} is basically optimal, as the following example shows:

Exercise 9 Let {\varepsilon>0} be a sufficiently small constant, and let {1 \leq H \leq X} be such that {\frac{1}{\varepsilon} \leq H \leq \varepsilon X}. Let {f} be the Archimedean character {f(n) = n^{it}} for some {|t| \leq \varepsilon \frac{X}{H}}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only {o(1)} rather than {\log^{-c} H}). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.
With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function {f} with its Dirichlet series

\displaystyle  {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}

which one can view as a discrete Fourier transform of {f} (or more precisely of the measure {\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}, if one evaluates the Dirichlet series on the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let {f,g: {\bf N} \rightarrow {\bf C}} be finitely supported arithmetic functions, and let {\psi: {\bf R} \rightarrow {\bf R}} be a Schwartz function. Then

\displaystyle  \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt

where {\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du} is the Fourier transform of {\psi}. (Note that the finite support of {f,g} and the Schwartz nature of {\psi,\hat \psi} ensure that both sides of the identity are absolutely convergent.)

The restriction that {f,g} be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.
Proof: By expanding out the Dirichlet series, it suffices to show that

\displaystyle  \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt

for any natural numbers {n,m}. But this follows from the Fourier inversion formula {\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt} applied at {u = \log n - \log m}. \Box
For applications to Halasz type theorems, one sets {g(n)} equal to the Kronecker delta {\delta_{n=1}}, producing weighted integrals of {{\mathcal D} f(1+it)} of “{L^1}” type. For applications to Matomaki-Radziwill theorems, one instead sets {f=g}, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of {|{\mathcal D} f(1+it)|^2} of “{L^2}” type:

Exercise 11 (Plancherel type identity) If {f: {\bf N} \rightarrow {\bf C}} is finitely supported, and {\varphi: {\bf R} \rightarrow {\bf R}} is a Schwartz function, establish the identity

\displaystyle  \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.

In contrast, information about the non-pretentious nature of a multiplicative function {f} will give “pointwise” or “{L^\infty}” type control on the Dirichlet series {{\mathcal D} f(1+it)}, as is suggested from the Euler product factorisation of {{\mathcal D} f}.
It will be convenient to formalise the notion of {L^1}, {L^2}, and {L^\infty} control of the Dirichlet series {{\mathcal D} f}, which as previously mentioned can be viewed as a sort of “Fourier transform” of {f}:

Definition 12 (Fourier norms) Let {f: {\bf N} \rightarrow {\bf C}} be finitely supported, and let {\Omega \subset {\bf R}} be a bounded measurable set. We define the Fourier {L^\infty} norm

\displaystyle  \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,

the Fourier {L^2} norm

\displaystyle  \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},

and the Fourier {L^1} norm

\displaystyle  \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.

One could more generally define {FL^p} norms for other exponents {p}, but we will only need the exponents {p=1,2,\infty} in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.
As mentioned above, Halasz’s theorem gives good control on the Fourier {L^\infty} norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier {L^\infty} control via Halasz) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, let {I} be an interval in {[C^{-1} X, CX]} for some {X \geq C \geq 1}, let {R \geq 1}, and let {\Omega \subset {\bf R}} be a bounded measurable set. Show that

\displaystyle  \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a {\frac{1}{n}} weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier {L^2} norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ({L^2} mean value theorem) Let {T \geq 1}, and let {f: {\bf N} \rightarrow {\bf C}} be finitely supported. Show that

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.

Conclude in particular that if {f} is supported in {[C^{-1} N, C N]} for some {C \geq 1} and {N \gg T}, then

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier {L^\infty} estimates are already sufficient to obtain decent control on the (weighted) Fourier {L^1} type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier {L^\infty} control to Fourier {L^1} or {L^2} control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

\displaystyle  {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)

gives the inequalities

\displaystyle  \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)


\displaystyle  \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)

The strategy is then to factor (or approximately factor) the original function {f} as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier {L^2} or {L^\infty} estimates on various regions {\Omega}, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split {f} into the Dirichlet convolution of three factors, one of which will be estimated in {FL^\infty} using the non-pretentiousness hypothesis, and the other two being estimated in {FL^2} using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of {f} into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain {[-T,T]} into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in {FL^\infty}, and the other in {FL^2}; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in {FL^\infty} by construction of the region; in other places, the {FL^\infty} control will come from Exercise 13. Similarly, in some regions, satisfactory {FL^2} control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard {L^2} mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.
There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing {f} as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges {[P,Q]} of primes, the number of prime divisors of a large number {n} in the range {[P,Q]} is almost always close to {\log\log Q - \log\log P}. Thus, if we introduce the arithmetic functions

\displaystyle  w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)

then we have

\displaystyle  1 \approx 1 * w_{[P,Q]}

and more generally we have a twisted approximation

\displaystyle  f \approx f * fw_{[P,Q]}

for multiplicative functions {f}. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “{FL^2} energy” of a multiplicative function {f} is concentrated in those regions where {f w_{[P,Q]}} is extremely large in a {FL^\infty} sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of {{\mathcal D} f}.
Read the rest of this entry »

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)

of the prime number theorem (where we take the limit {x \rightarrow \infty}). One can interpret this estimate probabilistically as

\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)

where {\mathbf{n} = \mathbf{n}_{\leq x}} is a random variable drawn uniformly from the natural numbers up to {x}, and {{\mathbb E}} denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).

After dividing by {x}, this is essentially

\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).

With probabilistic intuition, one may expect the random variables {\mu(\mathbf{n}), \mu(\mathbf{n}+1)} to be approximately independent (there is no obvious relationship between the number of prime factors of {\mathbf{n}}, and of {\mathbf{n}+1}), and so the above average would be expected to be approximately equal to

\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))

which by (2) is equal to {o(1)}. Thus we are led to the prediction

\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “{L^2}” measure of independence known as covariance, and the “{L \log L}” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of {L^2} type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of {L \log L} methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as {L^2} methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound {\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables {\mathbf{X}} and {\mathbf{Y}} of number-theoretic origin (such as the random variables {\mu(\mathbf{n})} and {\mu(\mathbf{n}+1)} mentioned previously), it can often be extremely difficult to determine the extent to which {\mathbf{X},\mathbf{Y}} behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if {\mathbf{Y}_1,\dots,\mathbf{Y}_k} are a large collection of “independent” random variables, and {\mathbf{X}} is a further random variable that is “not too large” in some sense, then {\mathbf{X}} must necessarily be nearly independent (or conditionally independent) to many of the {\mathbf{Y}_i}, even if one cannot pinpoint precisely which of the {\mathbf{Y}_i} the variable {\mathbf{X}} is independent with. In the case of the second moment method, this allows us to compute correlations such as {{\mathbb E} {\mathbf X} \mathbf{Y}_i} for “most” {i}. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities {{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )} for “most” {i} thanks to tools such as the Pinsker inequality.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle  \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N) < N^\theta} for almost all {N} and any fixed {\theta > 0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).
The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.
However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.
So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)
When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.
In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle  \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

\displaystyle  \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.
A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle  \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.
If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle  (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle  + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle  \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle  \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.
A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

  1. Elementary multiplicative number theory
  2. Complex-analytic multiplicative number theory
  3. The entropy decrement argument
  4. Bounds for exponential sums
  5. Zero density theorems
  6. Halasz’s theorem and the Matomaki-Radziwill theorem
  7. The circle method
  8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem [Update: I did not end up writing notes on this topic.]

Lecture notes for topics 3, 6, and 8 will be forthcoming.