You are currently browsing the tag archive for the ‘entropy decrement argument’ tag.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

of the prime number theorem (where we take the limit ${x \rightarrow \infty}$). One can interpret this estimate probabilistically as

$\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)$

where ${\mathbf{n} = \mathbf{n}_{\leq x}}$ is a random variable drawn uniformly from the natural numbers up to ${x}$, and ${{\mathbb E}}$ denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).$

After dividing by ${x}$, this is essentially

$\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).$

With probabilistic intuition, one may expect the random variables ${\mu(\mathbf{n}), \mu(\mathbf{n}+1)}$ to be approximately independent (there is no obvious relationship between the number of prime factors of ${\mathbf{n}}$, and of ${\mathbf{n}+1}$), and so the above average would be expected to be approximately equal to

$\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))$

which by (2) is equal to ${o(1)}$. Thus we are led to the prediction

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)$

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “${L^2}$” measure of independence known as covariance, and the “${L \log L}$” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of ${L^2}$ type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of ${L \log L}$ methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as ${L^2}$ methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

$\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)$

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound ${\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}$, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables ${\mathbf{X}}$ and ${\mathbf{Y}}$ of number-theoretic origin (such as the random variables ${\mu(\mathbf{n})}$ and ${\mu(\mathbf{n}+1)}$ mentioned previously), it can often be extremely difficult to determine the extent to which ${\mathbf{X},\mathbf{Y}}$ behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if ${\mathbf{Y}_1,\dots,\mathbf{Y}_k}$ are a large collection of “independent” random variables, and ${\mathbf{X}}$ is a further random variable that is “not too large” in some sense, then ${\mathbf{X}}$ must necessarily be nearly independent (or conditionally independent) to many of the ${\mathbf{Y}_i}$, even if one cannot pinpoint precisely which of the ${\mathbf{Y}_i}$ the variable ${\mathbf{X}}$ is independent with. In the case of the second moment method, this allows us to compute correlations such as ${{\mathbb E} {\mathbf X} \mathbf{Y}_i}$ for “most” ${i}$. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities ${{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )}$ for “most” ${i}$ thanks to tools such as the Pinsker inequality.

I’ve just uploaded to the arXiv my paper “Equivalence of the logarithmically averaged Chowla and Sarnak conjectures“, submitted to the Festschrift “Number Theory – Diophantine problems, uniform distribution and applications” in honour of Robert F. Tichy. This paper is a spinoff of my previous paper establishing a logarithmically averaged version of the Chowla (and Elliott) conjectures in the two-point case. In that paper, the estimate

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o( \log x )$

as ${x \rightarrow \infty}$ was demonstrated, where ${h}$ was any positive integer and ${\lambda}$ denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form

$\displaystyle \sum_{n \leq x} \frac{|\sum_{h \leq H} \lambda(n+h) e( \alpha h)|}{n} = o( H \log x )$

whenever ${H}$ was a slowly growing function of ${x}$. This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.

It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that

$\displaystyle \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o( \log x )$

as ${x \rightarrow \infty}$ for any fixed distinct integers ${h_1,\dots,h_k}$. As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form

$\displaystyle \sum_{n \leq x} \frac{1}{n} \| \lambda \|_{U^d[n, n+H]} = o( \log x )$

for ${H}$ a slowly growing function of ${x}$ and some fixed ${d}$ (in fact we can take ${d=k-1}$ for ${k \geq 3}$), where ${U^d}$ is the (normalised) local Gowers uniformity norm. (In the case ${k=3}$, ${d=2}$, this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion

$\displaystyle \sum_{n \leq x} \frac{1}{n} \sup |\sum_{h \leq H} \lambda(n+h) F( g^h x )| = o( \log x ) \ \ \ \ \ (1)$

where the supremum is over all possible choices of nilsequences ${h \mapsto F(g^h x)}$ of controlled step and complexity (see the paper for definitions of these terms).

The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that

$\displaystyle \sum_{n \leq x} \frac{1}{n} \lambda(n) F( T^n x )= o( \log x )$

whenever ${n \mapsto F(T^n x)}$ is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the class of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.

On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.

It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.