You are currently browsing the tag archive for the ‘second moment method’ tag.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

of the prime number theorem (where we take the limit ${x \rightarrow \infty}$). One can interpret this estimate probabilistically as

$\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)$

where ${\mathbf{n} = \mathbf{n}_{\leq x}}$ is a random variable drawn uniformly from the natural numbers up to ${x}$, and ${{\mathbb E}}$ denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).$

After dividing by ${x}$, this is essentially

$\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).$

With probabilistic intuition, one may expect the random variables ${\mu(\mathbf{n}), \mu(\mathbf{n}+1)}$ to be approximately independent (there is no obvious relationship between the number of prime factors of ${\mathbf{n}}$, and of ${\mathbf{n}+1}$), and so the above average would be expected to be approximately equal to

$\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))$

which by (2) is equal to ${o(1)}$. Thus we are led to the prediction

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)$

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “${L^2}$” measure of independence known as covariance, and the “${L \log L}$” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of ${L^2}$ type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of ${L \log L}$ methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as ${L^2}$ methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

$\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)$

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound ${\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}$, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables ${\mathbf{X}}$ and ${\mathbf{Y}}$ of number-theoretic origin (such as the random variables ${\mu(\mathbf{n})}$ and ${\mu(\mathbf{n}+1)}$ mentioned previously), it can often be extremely difficult to determine the extent to which ${\mathbf{X},\mathbf{Y}}$ behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if ${\mathbf{Y}_1,\dots,\mathbf{Y}_k}$ are a large collection of “independent” random variables, and ${\mathbf{X}}$ is a further random variable that is “not too large” in some sense, then ${\mathbf{X}}$ must necessarily be nearly independent (or conditionally independent) to many of the ${\mathbf{Y}_i}$, even if one cannot pinpoint precisely which of the ${\mathbf{Y}_i}$ the variable ${\mathbf{X}}$ is independent with. In the case of the second moment method, this allows us to compute correlations such as ${{\mathbb E} {\mathbf X} \mathbf{Y}_i}$ for “most” ${i}$. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities ${{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )}$ for “most” ${i}$ thanks to tools such as the Pinsker inequality.

Let ${\lambda: {\bf N} \rightarrow \{-1,1\}}$ be the Liouville function, thus ${\lambda(n)}$ is defined to equal ${+1}$ when ${n}$ is the product of an even number of primes, and ${-1}$ when ${n}$ is the product of an odd number of primes. The Chowla conjecture asserts that ${\lambda}$ has the statistics of a random sign pattern, in the sense that

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (1)$

for all ${k \geq 1}$ and all distinct natural numbers ${h_1,\dots,h_k}$, where we use the averaging notation

$\displaystyle \mathbb{E}_{n \leq N} f(n) := \frac{1}{N} \sum_{n \leq N} f(n).$

For ${k=1}$, this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any ${k \geq 2}$.

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (2)$

of the conjecture, where we use the logarithmic averaging notation

$\displaystyle \mathbb{E}_{n \leq N}^{\log} f(n) := \frac{\sum_{n \leq N} \frac{f(n)}{n}}{\sum_{n \leq N} \frac{1}{n}}.$

Using the summation by parts (or telescoping series) identity

$\displaystyle \sum_{n \leq N} \frac{f(n)}{n} = \sum_{M < N} \frac{1}{M(M+1)} (\sum_{n \leq M} f(n)) + \frac{1}{N} \sum_{n \leq N} f(n) \ \ \ \ \ (3)$

it is not difficult to show that the Chowla conjecture (1) for a given ${k,h_1,\dots,h_k}$ implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for ${k=1}$, we have already mentioned that the Chowla conjecture

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0$

is equivalent to the prime number theorem; but the logarithmically averaged analogue

$\displaystyle \lim_{N \rightarrow \infty} \mathbb{E}^{\log}_{n \leq N} \lambda(n) = 0$

is significantly easier to show (a proof with the Liouville function ${\lambda}$ replaced by the closely related Möbius function ${\mu}$ is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for ${k=2}$, and in this recent paper with Joni Teravainen, we proved the conjecture for all odd ${k}$ (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1 Assume that the logarithmically averaged Chowla conjecture (2) is true for all ${k}$. Then there exists a sequence ${N_i}$ going to infinity such that the Chowla conjecture (1) is true for all ${k}$ along that sequence, that is to say

$\displaystyle \lim_{N_i \rightarrow \infty} \mathbb{E}_{n \leq N_i} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

for all ${k}$ and all distinct ${h_1,\dots,h_k}$.

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2 Let ${k}$ be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for ${2k}$. Then there exists a set ${{\mathcal N}}$ of natural numbers of logarithmic density ${1}$ (that is, ${\lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}} = 1}$) such that

$\displaystyle \lim_{N \rightarrow \infty: N \in {\mathcal N}} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

for any distinct ${h_1,\dots,h_k}$.

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture (${k=2}$ and odd ${k}$) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct ${h_1,\dots,h_k}$, we take a large number ${H}$ and consider the limiting the second moment

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2.$

We can expand this as

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{m,m' \leq H} \mathop{\bf E}_{n \leq N}^{\log} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)$

$\displaystyle \lambda(n+m'+h_1) \dots \lambda(n+m'+h_k).$

If all the ${m+h_1,\dots,m+h_k,m'+h_1,\dots,m'+h_k}$ are distinct, the hypothesis (2) tells us that the inner averages goes to zero as ${N \rightarrow \infty}$. The remaining averages are ${O(1)}$, and there are ${O( k^2 )}$ of these averages. We conclude that

$\displaystyle \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k^2 / H.$

By Markov’s inequality (and (3)), we conclude that for any fixed ${h_1,\dots,h_k, H}$, there exists a set ${{\mathcal N}_{h_1,\dots,h_k,H}}$ of upper logarithmic density at least ${1-k/H^{1/2}}$, thus

$\displaystyle \limsup_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}_{h_1,\dots,h_k,H}} \geq 1 - k/H^{1/2}$

such that

$\displaystyle \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.$

By deleting at most finitely many elements, we may assume that ${{\mathcal N}_{h_1,\dots,h_k,H}}$ consists only of elements of size at least ${H^2}$ (say).

For any ${H_0}$, if we let ${{\mathcal N}_{h_1,\dots,h_k, \geq H_0}}$ be the union of ${{\mathcal N}_{h_1,\dots,h_k, H}}$ for ${H \geq H_0}$, then ${{\mathcal N}_{h_1,\dots,h_k, \geq H_0}}$ has logarithmic density ${1}$. By a diagonalisation argument (using the fact that the set of tuples ${(h_1,\dots,h_k)}$ is countable), we can then find a set ${{\mathcal N}}$ of natural numbers of logarithmic density ${1}$, such that for every ${h_1,\dots,h_k,H_0}$, every sufficiently large element of ${{\mathcal N}}$ lies in ${{\mathcal N}_{h_1,\dots,h_k,\geq H_0}}$. Thus for every sufficiently large ${N}$ in ${{\mathcal N}}$, one has

$\displaystyle \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.$

for some ${H \geq H_0}$ with ${N \geq H^2}$. By Cauchy-Schwarz, this implies that

$\displaystyle \mathop{\bf E}_{n \leq N} \mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k) \ll k^{1/2} / H^{1/4};$

interchanging the sums and using ${N \geq H^2}$ and ${H \geq H_0}$, this implies that

$\displaystyle \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) \ll k^{1/2} / H^{1/4} \leq k^{1/2} / H_0^{1/4}.$

We conclude on taking ${H_0}$ to infinity that

$\displaystyle \lim_{N \rightarrow \infty; N \in {\mathcal N}} \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0$

as required.

Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc..  This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures).  We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly.  One such observation is an identity which we call the negative second moment identity, which I would like to discuss here.    Let A be an $n \times n$ matrix; for simplicity we assume that the entries are real-valued.  Denote the n rows of A by $X_1,\ldots,X_n$, which we view as vectors in ${\Bbb R}^n$.  Let $\sigma_1(A) \geq \ldots \geq \sigma_n(A) \geq 0$ be the singular values of A. In our applications, the vectors $X_j$ are easily described (e.g. they might be randomly distributed on the discrete cube $\{-1,1\}^n$), but the distribution of the singular values $\sigma_j(A)$ is much more mysterious, and understanding this distribution is a key objective in this entire theory.